A basic seq2seq model, as shown below, has two components – an encoder (orange) and a decoder (green). An encoder is responsible for reading and processing each item in the source article and compiles the information it has captured into a context vector. Once the entire input sequence has been processed, the context vector is send to the decoder, which uses it to produce the output summary item by item. Encoders and decoders can be feed-forward neural networks, convolutional neural network (CNN) or recurrent neural network (RNN). RNN encoders and decoders have been most widely adopted for seq2seq models. An RNN takes two inputs at each time step: an input and a hidden state, and produces a new hidden state to be pass over to the next time step. The last hidden state of the encoder is actually the context vector that we pass over to the decoders to use at each decoding step.
RNN based seq2seq has been the foundation of many abstractive text summarisation models. However, there exists many problems with this architecture. Shi et al. (2018) identified 3 shortcomings of RNN based seq2seq:
- The model cannot reproduce salient information of source articles accurately
- The model performs badly when dealing with lots of out-of-vocabulary (OOV) words
- The model tend to suffer from word/sentence repetitiveness which affects the readability of generated summaries.