What is Link Prediction?

Link prediction is an active research area that aims to use existing facts within a knowledge graph / knowledge base to predict new / missing information. More formally, link prediction task is to learn a scoring function that given a fact triplet input, predict a score of whether the fact is true or not in the real-world. This is to solve the problem of graph incompleteness. There are generally two types of link prediction models:

  1. Observed Feature Models

  2. Latent Feature Models

Observed features

Assume conditional independence between features, most observed feature models often look at entity similarity or paths between nodes. Essentially, things that can be observed and measured. Common algorithms include Rule Mining and Path Ranking algorithm. The downside of observed features is that it won’t be able to capture uncommon / unseen patterns. This is where latent features comes in.

Latent features

Latent feature models is the most common approach in link prediction task and it’s currently the approach that drives the SOTA performance. Most latent feature models are evaluated on benchmark datasets derived from public knowledge bases, where the datasets contain facts in the form of Subject, Relation, Object (SRO). However, most large knowledge bases does not support SRO triples or any extensions of properties (for dynamic graphs). This means that all the latent feature models couldn’t fully utilise the available information. Currently, no framework exists that support both standard and dynamic models.

Evaluation Metrics

Link prediction is evaluated using ranking metrics. There are three ranking metrics used:

  1. Mean Rank

  2. Mean Reciprocal Rank

  3. Hits at K

[Need to understand more of the calculation procedure – refer to the original paper for example]



Data Scientist

Leave a Reply