22.1 Coreference Phenomena: Linguistic Background

What are the four types of mentions / referring expressions?
  1. Indefinite Noun Phrases

  2. Definite Noun Phrases

  3. Pronouns

  4. Names

Indefinite noun phrases introduces new context entities into the discourse. It is generally marked with determiner such as “a” / “an” / “this” and quantifier such as “some”. Definite noun phrases are entities that have been mentioned previously and it’s usually marked by “the” or the entities are included in the hearer’s beliefs of the world.

Pronouns are used for entities that are very important in the discourse. Pronouns can be used in cataphora. Cataphora is when mentions are mentioned before the referents / entities. Pronouns can also be bound in quantified contexts. “This” and “That” is known as demonstrative pronouns as they can appear either alone or as determiners. In some languages, it is possible to have zero anaphor (zero pronoun).

Names can be used to refer to both new and old entities in the discourse.

What are the three kinds of entities?
  1. New Nouns Phrases (NPs)

  2. Old Nouns Phrases (NPs)

  3. Inferrables

New NPs are split into brand new NPs and unused NPs. Brand new NPs are entities that are discourse-new and hearer-new whereas unused NPs are discourse-new but hearer-old (country, person etc). Old NPs are also known as evoked NPs and it refers to entities that are already in the discourse model. Lastly, inferrables refers to entities that are neither hearer-old nor discourse-old but the hearer can infer the entities by reasoning based on other entities in the discourse.

In what circumstances can you referred to a referent with little linguistic material?

You can referred to a referent with little linguistic material when the referent is very accessible. The higher the saliency of the referent, the less mentions (on average) needed to refer to the discourse entity.

What are the four types of structures that are not counted as mentions in coreference tasks, making the task more complicated?
  1. Appositives

  2. Predicative and Prenominal NPs

  3. Expletives

  4. Generics

Appositives are NPs that appears next to a head noun phrase and they are not referring expressions. They are more supplementary description of the head NP. For example, “Ryan Ong, Data Scientist at X company,….”. “Ryan Ong” is the head NP and it’s a mention whereas “Data Scientist at X company” is appositional NP and are not a mention. Predicative NPs describe the properties of the head noun rather than referring to a distinct entity. Both Expletive and Generics is where the pronouns are not referring to any entities and are general categorisation of objects.

What are the linguistic properties that govern the coreference relation?
  1. Number agreement which basically means that the mentions and their referents must agree in numbers. If the referent is a plural antecedent, then the anaphor can’t be singular. However, algorithms cannot enforce number agreement too strictly

  2. Person agreement which means a pronoun’s antecedent must agree with the pronoun in person. Therefore, a third person pronoun must have a third person antecedent unless it is all within a quotation

  3. Gender / noun class agreement where the pronouns have to agree with the grammatical gender of nouns. For example, male is refer with “he, him, his”. This agreement is complex and may require world knowledge about the entity

  4. Binding theory constraints. It is a syntactic constraints on the relations between the mention and an antecedent. Reflexive pronouns like “himself” corefer with the entity of the most immediate clause that contains them

  5. Recency where entities introduced in recent utterances are more salient than those introduced from old utterances

  6. Grammatical role where entities mentioned in the subject position are more salient than entities in the object position, which are more salient than entities in oblique positions. For example, “Ryan went to the bar with John. He called for a glass of rum.” [he = Ryan]

  7. Verb semantics where verbs tend to emphasise one of their arguments (entities), biasing the meaning of subsequent pronouns. This is the link between implicit causality and saliency

  8. Selectional restrictions where semantic knowledge play a role in referent preference. For example, “I ate the noodles in my new bowl after cooking it for 30 minutes.”. Here, “it” can refer to “noodles” or “new bowl” but because of “ate”, we can eliminate “new bowl as an option”



Data Scientist

Leave a Reply