I’m in Singapore right now to give a talk and decided to visit the Institute for Media Innovation (IMI) at Nanyang Technological University. The institute has a focus on building virtual agents, and has recently been in the news for a social robot called Nadine. This visit gave me some insights on the synergy that comes from building virtual agents and robots that are focused on interacting with humans.
Visiting an Uncanny Valley
I had the opportunity to interact with Nadine today. We ran into a few glitches, but having this kind of in-person interaction with a social robot is an uncanny experience. You really do get the sense that you’re peeking into the future. You never forget that you’re dealing with a robot, mind you, but every once in a while the nuance of some of her body language catches you off guard.
Nadine answered some questions for me: her favorite color is blue, she looks thirty but is only three years old, and her favorite movie is Avengers: Age of Ultron. Even though you know her favorite movie isn’t actually about a psychopathic AI taking over the world, there’s something about the way she turns her head and looks you in the eye as she gives you that particular answer that sent a little chill down my spine.
A Shared Platform for Social Robots and Virtual Agents
It turns out that there’s a fair amount of synergy between building social robots and social virtual agents. You might call it a common platform for human interaction.
One of the first requirements for a virtual or robotic companion is that it be able visually track your behavior. Both Nadine, the social robot, and IMI’s digital agents use the motion and proximity detection capabilities of a Microsoft Kinect motion controller, as well as software that uses audio signals to help identify people’s spatial location. This is what gives screen-based agents and physical robots their ability to turn their focus towards a person in response to that person’s movements or voice. It’s what IMI calls its “gazing technology.”
After gazing, the next step is recognizing the target, which is done with a combination of facial-detection and voice-pattern recognition software. This was the part of the system that was having some difficulties when I visited, but this layer of the architecture is critical to these artificial social companions’ ability to adjust behavior for interacting with particular humans.
From gazing and recognizing, the focus shifts to remembering various attributes of the human with which the system is interacting. This happens through a kind of virtual episodic memory. The team felt that that memory system will eventually rely on some sort of neural network capable of gracefully allowing memories to fade over time as they’re no longer needed.
With gazing, recognition and episodic memory in place, the focus shifts to actually interacting with humans. These interactions can be the whole point of the system in the first place, but it’s also through these interactions that the systems build up their understanding and memory of their human counterparts. Here, conversational capabilities are key and this is where speech synthesis, speech recognition and the ability to generate dialogue are key. The team uses Google for their speech recognition and a chat bot to enable both Nadine and screen based agents to carry on conversations. As is often the case, sometimes these systems work eerily well – and sometimes they are laughably bad.
A Personal Companion Ecosystem
My main takeaway is that we are likely to see an ecosystem of services coming together in various combinations as the market for social robots and virtual agents develops further.
In the case of Nadine, the underlying robotics hardware is Japanese though it’s been customized to look like the institute’s director. The team has then connected this hardware into much of the underlying software they’ve already developed for their screen-based, digital agents. Building virtual agents is much less expensive than building robots so this is a practical path for startups to ramp up in the social robotics markets.
The market for social robots, like most technology markets, will involve a vast array of subsystems like voice recognition, chat bots, robotic chassis, robotic skins, and numerous other features. The visit to IMI showed me what some of these social systems are likely to be and how much synergy these social layers create between virtual agents and physical robots.