What are the two types of methods to develop knowledge graph?
  1. Top-down – Focus on developing knowledge schema such as domain ontologies

  2. Bottom-up – Focus on knowledge instances such s Linked Open Data (LOD) datasets

This means that in the top-down approach, we first define the domain ontologies and schema and then we add knowledge instances to the knowledge base. The bottom-up approach extracts knowledge instances from knowledge resources and after knowledge fusing, the top-level ontologies are built based on the knowledge instances to create the whole KGs. The figure below showcase the bottom-up approach.

What are the general three phases of KG development?
  1. Knowledge extraction – Use NLP and ML for text mining and analytics

  2. Knowledge construction – LOD is commonly used as the basic knowledge model

  3. Knowledge management

3 Procedures of Bottom-up Approach of Knowledge Graphs

What are the four stages of KGs?
  1. Knowledge acquisition – Structured, Semi-structured, Unstructured

  2. Knowledge extraction – Entity, relation, and attribute extraction

  3. Knowledge fusion – Iterative process of entity alignment and ontology construction

  4. Knowledge storage – Commonly stored in NoSQL databases

Describe the knowledge extraction from data sources (acquisition), types, approaches, and tools aspects.

In terms of data sources, there are three kinds of data: structured, semi-structured, and unstructured. Different data sources require different methods of knowledge extraction. Today’s knowledge resources are heterogeneous and cross-domain which poses great challenge for knowledge extraction.

In terms of types of knowledge extraction, we have three types:

  1. Entity. Use NER to discover entities from different knowledge sources and classify them into pre-defined categories such as person, location, etc. The quality of entity extraction greatly influences the subsequent steps of knowledge acquisition

  2. Relation. Extract the relationships between the entities, obtaining semantic information to construct the knowledge graphs

  3. Attribute. The attribute extraction is a special type of relation extraction. It defines the intentional semantics of entities

In terms of knowledge extraction approaches, it heavily involves NLP, text mining, and machine learning. Early days uses rule-based and dictionary-based methods. Supervised learning was also used but it relies on manually annotated data and it fails to generalise to identify new named entities. Currently, semi-supervised and unsupervised algorithms are proposed and many different classifiers have been applied to knowledge extraction such as Hidden Markov Models (HMM), Conditional Random Fields (CRF), KNN, and SVM.

The evaluation metrics are precision, recall, and F-measure (harmonic mean of precision and recall).

Lastly, in terms of tools for knowledge extraction, see figure below.

Describe the knowledge fusion purpose and prospect.

The purpose of knowledge fusion is to:

  1. Entity alignment

  2. Ontology construction (iterative process until quality evaluation reaches the requirements)

Entity alignment (also known as entity matching) is the process of grouping different entities that refer to the same real world objects. The figure below showcase the process of entity alignment:

  1. Data preprocessing – standardise the data and must deal with multi-source data, inconsistency in data definitions and diversity of data representations

  2. Pairwise alignment – Textual similarity function is used to match and compare the attributes

  3. Collective entity alignment – The structural similarity function is used to match and compare the relationships

In recent years, knowledge inference has been proposed for entity alignment, where you used a third knowledge graph to identify and align entity.

The ultimate goal of knowledge fusion is to create ontology and knowledge graph. This requires constructing taxonomy and hierarchical structure, adding metadata and other data source. To assess the quality of knowledge graph, we can use general ontology like FOAF and general metadata (from schema.org). The process of construction and fusion of knowledge graph continues until the quality of ontology and knowledge graph meets the requirements.

How to store knowledge graph?

There are two main types of storage:

  1. Resource Description Framework (RDF)

  2. Graph databases

RDF uses triple (subject, predicate, object) to describe the graph structure. There are many storage systems such as Jena2, 3store, Virtuoso, and etc. The advantage of RDF-based storage is that the efficiency of query and merge-join of triple patterns is good.

Graph databases store KGs in terms of nodes, edges, and properties of graphs. The advantage is that the graph database themselves provides the perfect graph query languages and support a variety of graph mining algorithms. An example of graph database is Neo4j.

What are the main query language of KGs and why is visualisation so important in KGs?

SPARQL is the query language for almost all large KGs and it provides different output formats such as JSON, CSV, RDF, HTML, etc. However, the output are machine-readable and not human-readable which it’s why visualisation is so important in KGs. The visualisation using the browsers are the most common approaches and this includes:

  1. IsaVix

  2. RDF Gravity

  3. DBpedia

  4. Fenfire

What are the knowledge retrieval procedure?

Knowledge retrieval is also semantic retrieval where it uses logic rules under semantic model and inference model to retrieve information. This means that knowledge retrieval has the ability of reasons. KGs has been widely used in smart search, Q&A systems and recommendation systems.

Ryan

Ryan

Data Scientist

Leave a Reply