What is data integration?

It is the problem of combining different sources of data and providing the user with a unified view of these data. There are usually 3 elements: different databases, a global schema / ontology, and mappings that connects the global schema with the different databases.

What are the challenges of building KG with enterprise relational databases?
  1. Too many tables and attributes

  2. Complex relationships

  3. Data experts unavailable

  4. Master databases are off limits

  5. Impossible to understand naming

  6. Data is application centric

  7. Documentation is non-existent

  8. Data quality unknown

Describe the Pay-as-you-go methodology to design and build enterprise knowledge graphs from relational databases.
  1. Knowledge Capture – Analyse as-is process, collect documentation, and develop knowledge report

  2. Knowledge Implementation – Create / extend ontology, implement mapping, create extract queries, and validate data

  3. Self Service Analytics – Build report, answer business question, and move to production

What are the two ways to create data / execute queries?
  1. Materialisation (ETL)

  2. Virtualisation (SPARQL)

In materialisation, you will map the relational databases to thje predefined ontology whereas in virtualisation, you would write SPARQL and under the hood, the machine will map these SPAQL to the necessary SQL queries to query our databases. These mappings can be created using R2RML.

What tools can you use to design knowledge graphs?

Grafo (https://gra.fo/).

What tools can be used to programmatically build training data?

Snorkel (https://www.snorkel.org/).

Ryan

Ryan

Data Scientist

Leave a Reply