New Trends in MRC

Knowledge-based MRC (KBMRC)

  • In human reading comprehension, we might use common sense when we can’t answer a question with the given context

  • External knowledge is so important and it’s the biggest gap between MRC and human reading comprehension

  • The difference between KBMRC and MRC is in the inputs whereby in addition to context and questions, KBMRC also has additional related knowledge extracted from knowledge bases

  • MCScripts is an example dataset for KBMRC. It’s about human daily activities where answering some questions require common sense knowledge

  • Key challenges:
    • Relevant External Knowledge Retrieval
      • There are many different types of knowledge in the KB and entities may be misleading due to polysemy (“apple” could mean fruit or organisation)

      • Extracting only the relevant knowledge will heavily determine the performance of KB answer prediction

    • External Knowledge Integration
      • External knowledge has different structure to the text in context and question. How to effectively encode such knowledge remains an ongoing challenge

  • Research work
    • Long et al. propose rare entity prediction task which involves predicting missing name entities. However, the context alone is not enough in predicting the missing name entities, which forces the model to use external knowledge from KB

    • Yang and Mitchell use attention mechanism to determine how relevant the knowledge is to the context

    • Mihaylov and Frank and Sun et al. use key-value memory networks. All related knowledge is first selected from KB and stored in memory slots as key-value pairs. Then keys are used to match with the query which the values are summed together based on different weights to compute the final relevant knowledge representations

    • Wang and Jiang propose a data enrichment method with semantic relations in WordNet. For each word in the context and question, they would try to find words that have direct or indirect semantic relationship to it. This position would be fed to MRC model to assist answer prediction

MRC with Unanswerable Questions

  • Where questions have no answers given the context

  • MRC model should be able to distinguish those unanswerable questions. The process consists of two subtasks: answerability detection and reading comprehension

  • Two challenges:
    • Unanswerable Question Detection
      • The model should be able to judge which questions are impossible to answer based on the given context

    • Plausible Answer Discrimination
      • MRC model must be able to verify predicted answers and separate plausible answers from correct ones

  • Methods to tackle the two challenges
    • No-answer cases
      • Employ a shared-normalisation operation between a no-answer score and an answer span score

      • Levy et al. added an extra trainable bias to the confidence score of start and end position and obtain a new probability distributions of no answer. If this probability is higher than the probability of the best span, it means the question in unanswerable. Alternatively, you can set a global confidence threshold. If predicted answer confidence is below the threshold, the model labels the question as unanswerable. This method won’t assess whether predicted answers are correct

    • No-answer option by padding
      • Tan et al. add a padding position for the original passage to determine if the question is answerable. When the model predicts that position, it refuses to give an answer

    • Unanswerable question detection
      • Hu et al. propose two types of auxiliary loss:
        • Independent span loss to predict plausible answers regardless of the answerability of the question

        • Independent no-answer loss, to alleviate the conflict between answer extraction and no-answer detection tasks

      • Answer verification
        • Sequential architecture treats question, answer, context sentence containing the answers as a whole sequence and inputs that to a transformer to predict the no-answer probability

        • Interactive architecture calculates the correlation between question-and-answer sentence in context to classify whether the question is answerable

        • Integration of the two architecture above by concatenating the outputs of the two models as joint representations – yield better performance

  • Sun et al. use multi-task learning to jointly train answer prediction, no-answer detection, and answer validation
    • They use combined universal node to encode passage and question information, which is then integrated with question representations and answer position aware passage representations

    • The fused representations post linear classification layer can determine whether questions are answerable

Ryan

Ryan

Data Scientist

Leave a Reply