Extending a Knowledge Graph from Wikidata Notes


  • Related to Wikipedia and was created in 2006. In 2020, they have more than 74 million items! You can access it at wikidata.org

  • Wikidata can be understood by human AND machine!

  • Semantic web is the extension of the web aims to facilitate data exchange. It covers different entities and relationships!

  • Wikidata can be queried using the SPARQL language! SPARQL is the query language for the RDF data format! SPARQL has a search engine such that you don’t necessarily need to know the identifier

If you have already built a knowledge graph, you can use wikidata to extend it by searching the wikidata using the entities in your knowledge graph and see what other information in the wikidata is related to it and it’s also of interest to you!

  • You can also do this directly from Neo4j using APOC (Wikidata query API) + SPARQL!!

The extensions can improve categorisation, recommendation engine, and search engine!

Recommended Book – Graph Analytics with Neo4j

Network Like an Egghead: Analytics and Visualisation on LinkedIn Notes

Tech Stack

  • Neo4j

  • Gephi – visualisation

  • py2neo

The Workflow

  1. Data Collection

  2. Data Ingestion & Graph Creation

  3. Gephi Import and Visualisation

  4. Influence Analysis

  5. Community Detection & Homophily

Data Collection

Download the data from LinkedIn. LinkedIn allows you to download your own personal data. We required the following data:

  • Nodes – LinkedIn “connections”

  • Mutual Connections – safest method is manual collection

The challenge here is that social network data is dynamic as well as LinkedIn prohibits bots and automation

Data Ingestion & Graph Creation

Uses Node desktop and Jupyter notebook. The development is done using Python and Cypher, specifically py2neo. We have to manually import the data and config with Gephi! The workflow is as follows:

  1. Import serialised data

  2. Create link between notebook and Neo4j

  3. Create node and edges

  4. Deduplicate edges

Influence Analysis

To detect the influential nodes, you can measure:

  1. Centrality – Degree. Betweenness, and PageRank

  2. Network diameter & Shortest paths

Community Detection & Homophily

Community detection is where you evaluate how groups of nodes are clustered. Homophily is where you measure the tendency for people to seek out to those who are similar to themselves!

Developing a Knowledge Graph of Your Knowledge, Skills, Abilities, Tasks and Training (KSATT) Notes

The Questions

  • I want to change careers, what should I learn?

  • Can I identify adjacent skills?
    • Cypher → SQL

    • Python → R

Information tech Competency Model

  • Personal effectiveness

  • Academic competencies

  • Industry-technical competencies

  • Industry-managerial competencies

Define your own graph model!



Data Scientist

Leave a Reply