Answer Prediction

The design of the answer prediction module is highly dependent on the specific MRC task. There are four different answer prediction methods:

  1. Word Predictor
  2. Option Selector
  3. Span Extractor
  4. Answer Generator

Word Predictor

  • Cloze tests require filling in blanks using words or entities in the context

  • For Attentive Reader, the combination of query-aware context and the question is used to produce the vocabulary space, which it’s used to search for the correct answer
    • This method cannot ensure that the answer produced is in the context, which goes against the cloze tests. For example, if the answer could be “January” or “March”, then the attentive context representations in the vocabulary space might produce an answer that’s similar to those two words, for example, “February”

  • To overcome this problem, Attention Sum reader (AS Reader) uses the mechanism of pointer networks by directly uses the attention weights to predict the answer. The attention results of the same word are added together and the one with the maximum value is selected as the answer

Option Selector

  • The common method is to compute the similarity between attentive context representations and candidate answer representations. The highest similarity is the correct answer
    • This was used by Chaturvedi et al. – Use CNNs to encode question-option tuples and relevant context sentences. Then compute cosine similarity and most relevant answer is chosen

  • Zhu et al. uses a bilinear function to score each option against the attentive information and the one with the highest score is the predicted answer

  • Chen et al. calculate similarities among question-aware candidate, context-aware, and self-attended question representations using dot product to extract correlations among context, question, and options. The similarities are concatenated, fed into CNNs (feature extraction), and subsequently into a fully connected layers to calculate the score for each candidate

Span Extractor

  • Can be seen as an extension of cloze tests

  • Wang and Jiang propose two models
    • Sequence model
      • This model outputs positions where answer tokens appear in the context. Answers are generated by selecting tokens with the highest probability successively until no more answer tokens

      • This means the answer might not be a consecutive span and so might not even be a subsequence of context

    • Boundary model
      • This model only predicts the start and end positions of the answer

      • Simpler and shows better performance on SQuAD dataset

  • There might be more than one plausible answer span in context
    • Xiong et al. proposed a dynamic pointing decoder to select an answer span in multiple iterations. It uses LSTM to estimate the start and end positions based on the previous state answer prediction

Answer Generator

  • Answers can be anything and may differ from sequences in context or be generated from multiple pieces of evidence in different paragraphs or context

  • S-NET (Tan et al.)
    • There are two stages: extraction and generation

    • The extraction module is a variant of R-NET

    • The generation module is a seq2seq architecture
      • The encoder produces the context and question representations
        • Start and end positions of evidence snippets are added to the context representations

      • For the decoder, the state of the GRUs is updated by previous context word representations and attentive intermediate information
        • We generate the answer by applying the softmax function to the output of the decoder

  • The answers generated by existing approaches may suffer from syntax errors and incorrect logics. Therefore, the generation and extraction methods are usually used together



Data Scientist

Leave a Reply