Knowledge Graph Embeddings

Knowledge graph stores real-world facts in the form of RDF-style triplets. Knowledge graph embedding has been used to convert these facts into low dimensional features for many inference tasks such as link prediction, triple classifications, and many downstream tasks. We can differentiate different embedding algorithms using three aspects:

  1. How they represent entities and relations

  2. Their scoring function

  3. How they optimise the ranking criterion that maximise global plausibility of existing triplets

The paper has broadly classified existing methods into two categories:

  1. Triplet fact-based representation

  2. Description-based representation

For triplet fact-based embedding models, they are further three subgroups: translation-based, tensor factorisation-based, and neural network-based.

Evaluation process and metrics

There are two tasks that are commonly used to evaluate the performance of embedding models:

  1. Link prediction. Predict new entity given existing entity and relation

  2. Triple classification. Binary classification task that focuses on estimating the plausibility of a triplet

The two most common benchmark datasets for link prediction tasks are: FB15K (Freebase) and WN18 (WordNet). The three most common metrics are Mean Rank, MMR, and Hits@K.

There are two main limitations of existing KGs:

  1. Computational efficiency. Need special graph algorithm to derive the semantic relations between entities but this graph algorithm usually has high computational complexity and is not scalable

  2. Data sparsity. Common problem with any large-scale data, making calculation very inaccurate

Future Work

There are three potential research directions in KGEs:

  1. Better utilise additional types of information such as hierarchical descriptions, unstructured online text, and other extracted information from other KGs. We still have plenty of information to use to refine and improve the knowledge graph representation

  2. Enhance the connectivity between head and tail entity. When relation paths between entities become too long, existing models can’t effectively solve complex relation path problems yet

  3. Investigate transfer learning that allows us to apply KG representation learning to new fields with minor fine-tuning. This will allow us to expand into more domains

Explanation in details

If you are new to knowledge graph embeddings and translation models, I have made a video summarising the TransE paper and its training and evaluation process of translation models. It gives a good foundation background of the overall process without over complicating the translation model itself 🙂 Check it out here!



Data Scientist

Leave a Reply