Entity linking is usually more accurate when it is performed jointly across a document. A general approach to collective entity linking is to make good use of compatibility score. We want to optimise the following global objective:

The compatibility score can be computed in the following ways:

  • Wikipedia defines high-level categories for entities and entity pairs with high number of common categories have high compatibility score

  • Compatibility can be measured by the number of incoming hyperlinks shared by the Wikipedia pages for the two entities

  • The inner product of the two entities embeddings can be the compatibility score

  • Use probabilistic topic model to compute a non-pairwise compatibility score. Each latent topic is a probability distribution over entities, and each document has a probability distribution over topics. The entity helps in determining the document’s distribution over topics and this means that we can use the topics to help us alleviate ambiguous in entity mentions. We can perform inference here using different sampling techniques

Collective entity linking is NP-hard and so exact optimisation is intractable. This means we would need to turn to approximate inference techniques such as integer linear programming, Gibbs sampling, and/or graph-based algorithms.



Data Scientist

Leave a Reply