Executive Summary

Documents to Graph

  1. Whiteboard domain (ontology?)
    • Determine entities and their relationships

    • Determine potential entity and relationship properties

    • Determine sources for those entities and their properties

  2. Build analysers, rules, parsers, and NER for documents

  3. Parse and store document metadata, document, and entity relationships

  4. Infer entity relationships

  5. Compute similarities, transitive cover, and triangles

  6. Analyse data using graph queries and visualisations

For metadata for each entity, you have both direct and indirect metadata. Inferred metadata can be developed from the content of the documents using NLP, NER, and so on. The first step to building a knowledge graph is to extract all the named entities from documents and their metadata. The entities are the nodes in the graph. Relationships can also be extracted directly or inferred from the document. The inference logic requires domain knowledge.

Note that the complexity of your ontology will determine how much information you can capture. Graph data is flexible whereby an additional new entity or relationships could add new patterns and relationships that you couldn’t trace before.



Data Scientist

Leave a Reply