Neural Question Answering Systems: The Roles of Attention and Recurrent Neural Networks

Shen, Yuanyuan

Neural Question Answering Systems: The Roles of Attention and Recurrent Neural Networks

Date

2022

Authors

Shen, Yuanyuan

Supervisor

Lai, Edmund M-K

Mohaghegh, Mahsa

Item type

Thesis

Degree name

Doctor of Philosophy

Publisher

Auckland University of Technology

Abstract

The roles of attention and recurrent neural networks (RNN) in RNN-based neural question answering (QA) systems are investigated. As an important component of neural QA systems, attention provides a way for the most relevant words in the passage text that are relevant to the question to be identified so that a subsequent module can make use of this information to infer the answer. There are two main steps involved. The first one computes similarity scores between the words in the question and those in the passage. The second step generates the information that is relevant to the question for subsequent layers in the QA model. Many different attention similarity functions and relevant information generation methods have been used by various neural QA systems. It is important to understand the characteristics of the similarity functions and the relevant information generation approaches that perform well.

In order to make fair comparisons among similarity functions and among relevant information generation methods, a novel baseline QA model is designed. It captures all the major common characteristics of the leading RNN-based neural QA models. It is made up of four parts – the embedding layer, context encoder, attention mechanism, and answer predictor. In this way, the various similarity functions and relevant information generation methods could be easily plugged in.

Using this baseline model, eleven existing similarity score functions are compared. Experimental results show that the group additive functions perform better than the multiplicative functions. Based on this insight, a new similarity function, called T-trilinear function, is proposed. It combines the strengths of both the additive and multiplicative functions, and it generally outperforms all the other existing functions.

Regarding relevant information generation, five existing methods are com pared. Experimental results show that incorporating element-wise products into the information concatenation helps to achieve better results. A new method is proposed, which is able to produce better results than these five methods. Further investigation reveals that using an FNN over the concatenation can further improve the performance. This finding results in the second new method. Results show that it achieves better performances than the other methods.

The role of RNNs in the neural QA systems is investigated using a representative of such systems known as DMN+. Although DMN+ performs well on most of the 20 tasks in the bAbI dataset, it is not able to tackle those tasks that involve multi-step inductive reasoning effectively. Research results show that the RNNs in the attention mechanism memorize the order of facts in the training data. As a result, the trained model does not generalize well to the test samples with different orders of facts. This problem is overcome by developing a new QA model called MoDMN+ which has an RNN-free attention mechanism. Experimental results demonstrate that MoDMN+ has better generalization ability than DMN+. Considering the adverse effect of the RNNs in the attention mechanism on the multi-step induction tasks, a new QA model called ff-DMN is proposed by discarding the RNNs from the input model in the MoDMN+ model. Experiments show that ff-DMN can successfully solve the inductive reasoning tasks with a significantly higher predictive accuracy than DMN+ and the other existing RNN-based QA models. Furthermore, an ensemble model is proposed and can tackle all the 20 reasoning tasks in the bAbI dataset.

Keywords

Question answering , Attention mechanism , Deep learning , Neural networks , Recurrent neural networks , Natural Language processing , Artificial intelligence

Permanent link

https://hdl.handle.net/10292/14998

Collections

Doctoral Theses

Full item page