Conversational QA system has become very popular in recent years and many techniques have been explored to build a good reliable QA system. Most QA systems allow users to either query from a knowledge base (KB-QA) or from a collection of documents (text-QA). We will first review the KB-QA space, both symbolic and neural methods and then the text-QA space.
In terms of KB-QA space, we will go over the symbolic methods in this blog post and then the neural methods in tomorrow’s post. We will first review symbolic approaches in KB-QA and showcase some of its limitations including scalability. We will then review different neural methods that have been proposed to address these issues. We will also be discussing multi-turn and conversational KB-QA agents which are not as well-studied.
What is a Knowledge Base?
A standard KB is a structured database that stores all the different entities and relationships in the form of subject-predicate-object triplets (s, r, t). In this case, s and t are the entities and r is the relation between the two entities. KB can be represented in the form of a knowledge graph as shown below, where entities are the nodes and relations are the directed edges.
Semantic Parsing for KB-QA
Figure above (on the right) showcase semantic parsing for KB-QA, which it’s the core of many symbolic methods SOTA KB-QA systems. Questions are mapped to its formal meaning representations (logic form) and its equivalent graph representation (query graph), before it’s feed into the KB. Reasoning will then happen within the KB through finding a sequence of paths hat match the query and retrieve the end nodes.
With the figure above, the two rounded rectangle represents the entities. The shaded circle node x represents the answer node. The circle node y represents that there should be an entity that describe the queried casting relations like character, actor, and time. The diamond node argmin provides a constraint to the answer. This query graph without the constraint would return “Lacey Chabert” and “Mila Kunis”. With the constraint, “Lucy Chabert” was returned as she started the role first.
There are two main challenges of symbolic KB-QA systems:
Paraphrasing in language. With semantic parsing, it is vulnerable to paraphrasing in natural language as different semantically equivalent questions can lead to different query graphs and subsequently different inaccurate answers
Search complexity. The number of paths grow exponentially with path length