Throughout this long survey paper, we have explored three different types of conversational AI in details: KB-QA systems, task-completion systems, and social bots. But how are they being used in the industry? In this post, we will explore the different conversational AI that’s currently used in industry for all three types.

Question Answering Systems

Bing QA

Bing QA is an extension of Bing search engine where instead of returning links, it generates a direct answer to the user query by analysing the source documents retrieved by Bing search engine. This is text-QA. Bing QA is tackling Web QA task which it’s more challenging than traditional MRC tasks due to:

  1. The large volume and noisy text collection. The source of documents come from many different websites and the answer span is hidden within noisy passages

  2. Runtime latency. Bing QA’s MRC component should not add more than 10 mini seconds to the entire serving stack. This is in comparison to the academic settings where runtime isn’t the main priority

  3. User experience. Bing QA also focuses on user experiences on how the answers should be display on different devices. Figure below showcase a typical search query followed by highlighted answer span and supporting evidence. It also showcase the Bing QA architecture.

Given the user query, a set of candidate documents are first retrieved. Then, these candidate documents are feed into the Document Ranking module to be assigned relevance scores. Top ranked documents would be presented in the results page with their captions generated from the Query-focused Captioning module. The Passage Chunking takes in top ranked documents and chunk them into set of candidate passages, which are then ranked by the Passage Ranking module. Finally, the MRC module would identify the answer span from the top-ranked passages. Bing QA already demonstrated the ability to deal with conversational queries using the Conversational Query Understanding module, which consists of two steps: 1) determine if the query depends on the context in the same session and if so 2) rewrite the query to include necessary context s shown below.

Satori QA

Satori QA is a KB-QA that uses both neural and symbolic methods to generate answers to factual questions. It deals with Web QA task and faces similar problem as Bing QA. One strategy to improve runtime efficiency is to split a complex question into multiple simple questions which are relatively easier to answer by the KB-QA system. The final answer is computed by re-joining the sequence of answers as shown below.

Customer Support Agents

These are essentially multi-turn KB-QA agents. Given a user’s query and problem, the agent would need to recommend a pre-defined solution or connect user to a human agent for complex issues. This dialogue often contains multiple turns as agents try to gain clarity on the problem while navigating the knowledge base for the most relevant solution.

Task-oriented Dialogue Systems

This is also known as virtual assistants and they often reside in phones and smart speakers. They are designed to provide common information and services. Example of virtual assistants include Apple’s Sir and Amazon Alexa. There are multiple platforms that provide development toolkits to facilitate development of virtual assistants. For example:

  1. Microsoft’s Task Completion Platform (TCP). Used for developing multi-domain dialogue systems as shown in the figure below. TCP allows you to define individual tasks and can be used to power multi-turn dialogues

  2. Azure’s Cognitive Services LUIS. Built-in natural language understanding API to understand user’s intentions and domains. This is an NLU blackbox that allows developers to incorporate machine learning to their virtual assistants

  3. Google’s DialogFlow. Google’s development suite for developing virtual assistants on websites and other IoT devices.


A lot of virtual assistants these days have built-in chitchat functionalities. However, there are certain agents that are built solely to handle chitchat queries. The earliest of such systems is XiaoIce, which was designed to provide emotional connection for human needs. Figure below showcase the architecture of XiaoIce. The architecture consists of three layers:

  1. User experience layer

  2. Conversation engine layer

  3. Data layer

The user experience layer connects XiaoIce to popular chat platforms and has two communication modes:

  1. Full-duplex. Handles voice-based conversations where users can talk to XiaoIce

  2. Message-based conversations. Multi-turn conversation

The conversation engine layer tracks the dialogue state of each turn and based on the state, either Core Chat or dialogue skill is selected to generate a response. One unique component of XiaoIce is the Empathetic module which allows XiaoIce to understand empathy aspects, which in turn allows XiaoIce to generate empathetic responses. The Core Chat module uses both neural generation and retrieval-based methods and figure below showcase XiaoIce’s ability to generate human-like responses. The last layer is the data layer where stores all the information collected about the user in conversation.

There are other 100% chitchat bots such as the Replika system and Alexa Prize systems which you can explore further if interested.



Data Scientist

Leave a Reply