22.4 Architectures for Coreference Algorithms
What is the mention-pair architecture?
The mention-pair architecture is the simplest, which includes a classifier that takes in a pair of mentions, an anaphor and an antecedent, and determine whether they corefer or not. For each prior mention, the classifier would compute the probability of whether or not the mention is the antecedent of the current mention.
The common method for building the training set for mention-pair architecture is to choose the highest probability antecedent as a positive example and all the pairs in between as negative examples. During inference, for each mention in the document, the classifier will examine the prior mentions and cluster them in two main ways: closest-first or best-first. The closest-first means the classifier will be given a threshold and as it examines prior mentions from right to left, the first antecedent to score a probability above the threshold will be link to the current mention. The best-first requires the classifier to run on all prior antecedents and select the one with the highest probability.
What’s the problem with mention-pair architecture?
Doesn’t directly compare candidate antecedents to each other so doesn’t really know how to distinguish between two likely antecedents
Ignores discourse model as it only focuses at mentions and not entities
What is the mention-rank architecture?
The mention-rank model directly compares the candidate antecedents to each other and choose the highest-scoring antecedent for each anaphor. Research found that a model tend to performs better if it jointly learns anaphoricity detection and coreference together with a single loss.
During inference, given a mention, the model will compute softmax probability over all the candidate antecedents. Training the mention-ranking model is harder as for each anaphor, we don’t know which possible gold antecedents should be use for training. Early work uses rule-based to select gold antecedent. The mention-ranking models can be implemented with hand-build features or with neural learning.
What is the entity-based models?
Entity-based models link each mention to a previous discourse entity (rather than previous mentions). You can convert a mention-ranking model into an entity-ranking model by training the classifier to make decisions based on clusters of mentions rather than individual mentions. Neural methods can use RNN to encode the mentions within the cluster to build a cluster representation. The entity-based models still underperformed mention-ranking models, making them less common.
22.5 Classifiers using hand-built features
What are the three types of coreference resolution features?
Features of the anaphor
Features of the candidate antecedent
Features of the relationship between the pair
Note that entity-based models have two additional feature classes:
Features of all mentions from the antecedent’s entity cluster
Features of the relation between the anaphor and the mentions in the antecedent entity cluster
Figure below showcase the commonly used features in the different types of features:
What are some of the most useful features of the list above?
Prior work has found the following features to be very useful:
Exact string match
Entity headword agreement
Exact attribute match and i-within-i
Word inclusion and cosine
It is important to use conjunctions of features as results show that this will lead to strong performance.