After describing the role of humanities researchers in populating and analyzing knowledge graphs supporting the design of Embodied Conversational Agents, let’s see how to connect this knowledge in technological applications that have a certain degree of intentionality, as opposed to approaches based on machine learning only.
Some Context
In the last few years, industry-grade game engines such as Unity and Unreal Engine have been made available to the general public. Differently from previous experiences, which required strong programming skills to actually make use of the engines, the newest versions of these tools pay extreme attention to entry-level users, developing user-friendly interfaces for developers to access their most powerful technologies. As a matter of fact, the interest of game engine developers is now to make features such as environment navigation, animation logic, etc., more accessible and easy to integrate into complex workflows.
While the industrial effort concentrates on interface quality, development speed, and technological integration, researchers in the AI field are mainly interested in the models underlying the interactive processes. These aspects are less explored by the industry, making the main activity of AI and HMI researchers complementary to the industry activity.
Integration between the two efforts has the potential to bring benefits to both realities. Specifically, the practical advantages of designing an approach centered on freely available, continuously evolving technology lie in the possibility, for researchers, of concentrating on interaction management models, rather than on technicalities brought in either by the necessity to compete with the design quality standards set by the entertainment industry or by the need to integrate a new control device.
Framework for Advanced Natural Tools and Applications With Social Interactive Agents
The Framework for Advanced Natural Tools and Applications with Social Interactive Agents (FANTASIA¹) is a plugin for the Unreal Engine that concentrates on supporting the development of RTI3D applications based on natural interaction and social processes. A framework to integrate game engines with tools that are important for studies on natural interaction is relevant to support AI and HMI research, and it integrates access to Neo4j from inside Unreal to build a coherent development experience through the Blueprint scripting language. The foundations of this effort can be summarized as:
- Build upon the interest of the gaming industries in providing advanced environments to develop interactive experiences
- Let researchers concentrate on Artificial Intelligence models while avoiding being left behind with respect to industrial standards set on design
The prototype version of FANTASIA was presented in 2019, while the re-engineered version has been presented at the NODES conference in 2021 and at the ACM Multimedia Conference in 2022. You can find out more about it on the Github repository and on the Youtube channel. The following Figure shows the FANTASIA architecture.
Neo4j for Embodied Conversational Agents
Results obtained through LORIEN can be used to implement ECAs using Neo4j and the Unreal Engine through FANTASIA. Our attempt to converge toward a generic dialogue management model with argumentation capabilities involves the harmonic use of different AI tools. Our ongoing work aims at formalizing methodological procedures to transfer LORIEN findings for use in a computational model we call Artificial Neural and Graphical Models for Argumentation Research (ANGMAR).
In ANGMAR, neural approaches are mainly used to manage tasks like Automatic Speech Recognition, Intent Recognition, Entity Recognition, and Speech Synthesis. These tasks are the ones that are closer, in the perception-action cycle, to the physical level. Deeper layers, dedicated to reasoning and decision-making, are instead implemented using graphical models like graph databases and Bayesian Networks. The connection between the different levels happens in Neo4j, where utterances, recognized intents and entities are represented in the form of graph and linked to the CCG in a very similar way to the one used in LORIEN to represent dialogue corpora.
In ANGMAR, the graph database is aligned with the intent/entity recognizer so that utterances can be directly represented together with what they represent. When intents need specific parameters (slots) to be filled, these are also represented as nodes and linked to the named entities recognized in the utterance. To support reasoning, intents may also imply beliefs that the speaker intends for the counterpart to accept. The following figure summarizes the steps taken in ANGMAR to update dialog history and the consequent belief graph.
Intentionality in Graphs
Through graph structures, ANGMAR represents dialogue dynamics and can analyze these to decide what to say next. This process relies on communication models coming from the linguistics field to take into account what could happen, from communication issues to intentional moves.
After each user moves, the system evaluates what to do depending on the context, intended as the graph configuration, dialog management priorities, and final objectives. System moves are intended to alter the graph and take it towards a desired configuration, which is a graph structure containing a goal pattern. A desirable graph is a graph that exhibits this pattern, and the system will not attempt to modify it: for the case of the movie recommendation task, a desirable pattern may represent a user accepting a recommended item.
In general, a linguistic description of dialog management procedures provides a set of priorities, for the communication process, for possible moves to be performed. For example, solving communications problems takes priority over answering open questions and answering open questions takes priority over performing intentional moves (in this case recommending a movie or asking a profiling question).
Task Prioritization With Behaviour Trees
In ANGMAR, dialog strategies are organized in a Behavior Tree (BT) using the Unreal Engine AI editor to represent these priorities. Using FANTASIA, it is possible to add to the BT a node designed to submit queries to Neo4j, so that graph data can be used to make decisions. During dialogue management, there are a number of communication issues that may happen and have to be solved as soon as they are detected. Over these issues, there is a further hierarchy prioritizing them so that appropriate clarification requests can be generated depending on the detected problem³.
For each kind of problem described in the linguistics background, a specific graph pattern can be searched to verify if the problem is present. If this is the case, the corresponding recovery strategy can be adopted. The following figure shows the subtree dedicated to detecting and solving acoustic problems.
BTs give priority to tasks on the leftmost branches, so if they succeed, the rightmost tasks are not evaluated. In this case, the most important problem to solve is Acoustic Confidence.
The ExecuteNeo4jQuery task allows BTs to run a Cypher query and save its result in a dedicated data structure provided by FANTASIA for Neo4j data. The following task checks if the error pattern was found and generates the appropriate reaction, ignoring all the rest of the possible moves.
In the current implementation of ANGMAR, not all problems are actually managed, but from a theory representation point of view, they all have a clear position so that, as research proceeds, we know where to position further pattern checks and specific dialog repair strategies. The flexibility offered by BTs lets us update the computational model iteratively as the theoretical model evolves.
Handling Logical Conflicts
A particularly interesting communication problem concerns conflicts between previous beliefs from the system and incoming evidence from the user.
Linguistics research highlights that specific question forms must be used to efficiently communicate the problem, and the Neo4j transaction system, made available by FANTASIA, allows the implementation of a reasoning mechanism that allows verifying that beliefs implied by the user do not conflict with existing beliefs. Specifically, if no higher priority problem is detected, the system can temporarily accept the beliefs implied by the last user utterance by opening a transaction and updating the belief graph without committing the changes. Then, conflicting patterns can be searched for in the temporary graph: this includes checking that a belief and its negation do not exist at the same time, in the graph, concerning the same subject and predicate. If a conflicting pattern is found, the corresponding clarification request is generated, and the transaction is rolled back. This effectively implements a hypothesizing mechanism that allows the system to reason about what would happen if it was to accept the belief implied by the user.
A graph, representing the PCG, that exhibits this kind of pattern is defined as incoherent. The following Figure shows the subtree dedicated to the management of this situation.
You may find out more about conflict management with clarification requests using FANTASIA from the Youtube interview given by Dr. Maria Di Maro, from our lab.
In the last part of this series, I will concentrate on decision-making based on Bayesian Networks, dynamically assembled by extracting sub-graph structures from Neo4j.
Save My Spot
Conversational Artificial Intelligence With Neo4j and Unreal Engine — Part 2 was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.