Objective and Contribution

Showcase that a simple bidirectional attention-based seq2seq model with copy mechanism trained with exponential moving average (EMA) can achieve SOTA results in both table-to-text generation and neural question generation (NQG). Rather than continuously increasing the complexity of the neural network, we shown that a properly fine-tuned simpler model can also achieve SOTA results, encouraging us to thoroughly explore simpler models before introducing complex ones.

What is table-to-text generation task?

The goal is to generate description of the table. Specifically, in this paper, we explored generating biographies based on Wikipedia infoboxes as shown below.

What is neural question generation (NQG) task?

The goal is to generate correct meaningful question from a source document and the target answer is within it. In this paper, we used the SQUAD dataset as shown below.

Bidirectional Seq2Seq with Attention and Copy Mechanism

There are 3 main components of the model architecture:

  1. Encoder. The encoder is a biLSTM that takes in the concatenation of word embedding and additional task-specific features. For table-to-text generation, the additional features are the field name and the position information. For NQG, the additional feature is a single bit indicating whether the word belongs to the target answer.

  2. Attention-based Decoder. Our decoder uses the standard attention mechanism and copy mechanism

  3. Exponential Moving Average (EMA). This is the key driver of the model’s performance. EMA is also known as temporal averaging. Here, we have two sets of parameters: a) training parameters and b) evaluation parameters which it’s computed by taking the exponentially weighted moving average of training parameters, controlled by the decay rate.

Experiments and Results

We use WIKIBIO dataset for table-to-text generation and SQUAD dataset for NQG. WIKIBIO dataset has over 720,000 Wikipedia articles and uses the first sentence of each article as ground-truth description to the infobox. SQUAD dataset has 536 Wikipedia articles and over 100,000 question-answer pairs. For evaluation metrics, we use BLEU-4, METEOR, ROUGE-4, and ROUGE-L.


The results for both table-to-text generation and NQG are shown in the tables below. Overall, our model (without EMA) performed competitively with previous work across all the metrics. With the additional of the EMA technique, our model was able to achieve SOTA results in all metrics except BLEU-4 in SQUAD, where our model is still competitive. This heavily emphasise the fact that complex architecture is not always the best approach and we should devote more time to explore and improve basic models to drive results before exploring more complex models.

Conclusion and Future Work

A potential future work would be to investigate the use of EMA technique on transformer models as well as conduct similar studies to examine the need for complex architecture in other NLP tasks.



Data Scientist

Leave a Reply