What is coreference resolution, mention, referent, and corefer?

Coreference resolution is a subarea within NLP that aims to determine whether two mentions corefer. For example, “Ryan is happy because he’s learning about NLP”. “He” is a mention to the discourse entity “Ryan”. “Ryan” is the referent. Corefer is when two or more mentions are used to refer to the same referent. Coreference resolution is an important component of language understanding.

What is a discourse model?

A discourse model is a mental model that the system continuously builds and iterate on as it interprets texts. An entity is “evoked” when it is first mentioned in a discourse. Any future mentions on the same referent is “accessing” the representation. See below the figure for a clearer understanding of the discourse model.

What is anaphora, anaphor, antecedent, singleton, and coreference chain?

Anaphora refers to the reference of an entity that has been previously introduced. The referring mention is the anaphor. Antecedent is when an anaphor corefers with a prior mention. A singleton is when an entity only has a single mention in a text. Coreference chain (or cluster) is the set of corefering mentions to the same entity. Note that mentions can be nested and appear in different coreference chains.

What are the two tasks of coreference resolution?
  1. Identify the mentions

  2. Cluster them into coreference chains

Two mentions are corefered if they are referring to the same discourse entity. However, entity can sometimes be ambiguous (apple could refer to the company or the fruit) and so entity linking is required to map a discourse entity to some real-word entity.

What are other coreferences?

Event coreference is the task of determining whether two event mentions are referring to the same event. Event mentions are much harder than entity mentions as they can be very verbal and nominal. Another is discourse deixis where an anaphor refers back to a discourse segment.

Ryan

Ryan

Data Scientist

Leave a Reply