22.2 Coreference Tasks and Datasets
Describe the coreference resolution task.
Given the raw text as input, the model is required to identify all the entities and the coreference links between them. We would evaluate the generated links against ground-truth links. Different dataset has different annotations of mention and links. Some dataset doesn’t include singleton while others include gold mentions, which means your model only need to focus on clustering the mentions.
Describe the OntoNotes coreference dataset.
OntoNotes is hand-annotated Chinese and English coreference datasets, consisting of around 1 million words, covering different document types such as newswire, magazine articles, broadcast news and conversations, web data, etc. OntoNotes does not include singletons and mentions noun phrases that are coreferent are marked as mentions. Appositive clauses are included in the mention but not as a separate mentions.
22.3 Mention Detection
What does mention detection entails?
It involves identifying all the mentions in a given document, which involves finding all the spans of text of mentions. Many systems tend to follow the recall first, filter later approach. A simple approach would be to use named entity taggers to extract all the spans that are either noun phrases, possessive pronoun, or a named entity. The filtering process can be rule-based. Earlier systems use regular expressions to filter out expletive pronouns.
It’s common that mention-detection rules are tailored specifically to the dataset. For example, the mention detection algorithm for OntoNotes has a common first pass of:
Take all NPs, possessive pronouns, and named entities
Remove numeric values, mentions embedded in larger mentions, adjectives, and stop words
Remove pleonastic “it” using regular expressions
What’s an anaphoricity classifier and a discourse-new classifier?
Anaphoricity classifier is a classifier to detect whether an noun phrase is an anaphor. A discourse-new classifier detects whether a mention is discourse-new and a potential antecedent for a future anaphor. These classifiers are used as filters, however, research found that this filtration method tend to lead to poor performance. If the anaphoricity classifier has a high threshold, too many mentions would be filtered out but if it’s too low, we would have many pleonastic mentions (precision vs recall problem).
What’s the modern approach in mention detection?
A single end-to-end neural network that performs mention detection, anaphoricity, and coreference jointly. The neural net would compute a referential score for each mention, a coreference score between two mentions and combines these score to derive the final decision. However, accurately detecting referential mentions are still an unsolved problem.