Feature Extraction

Recurrent Neural Networks

  • RNN is good for dealing with sequential information. Output in each time step is dependent on the previous hidden state and the current output

  • Two popular types of RNNs are Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs)

  • In MRC, both the preceding and following words are important and so it is common to use bidirectional RNNs to encode the context and question embeddings and extract sequential information
    • This can be word-level or sentence-level
      • Researchers tend to use word-level feature extraction to encode the context as it is usually a very long sequence

  • Training process for RNNs is very long due to the sequential nature of the architecture. Therefore, they can’t be train in parallel

Convolutional Neural Networks

  • Convolutional layers extract local features in various kernel sizes. The outputs are fed into pooling layers to reduce dimensionality whiles keeping the most important features

  • CNN can extract and train efficiently due to parallelism and so scale very well with vocabulary size. Faster than RNNs!

  • However, it can’t deal with long sequences

Transformers

  • QANet by Yu et al. (2018) is a MRC model that uses the transformer architecture
    • The encoder block combines multi-head self-attention with convolutions

    • QANet was able to achieve the same accuracy on SQuAD as RNN models with muct faster training and inference speed

Ryan

Ryan

Data Scientist

Leave a Reply