Six Building Blocks for a Virtual Personal Assistant

The Virtual Personal Assistant represents the biggest change in software in decades. Here are six building-block technologies that will make it possible.

The race for the Virtual Personal Assistant market is on, as some of the world’s best-known companies make huge investments that will significantly change our lives in the years ahead.

The Virtual Personal Assistant will act as a de facto operating system, not unlike like today’s search engines – but much more powerful. It will connect us to all manner of services, securing vast troves of personal data in the process, while building a powerful position as the intermediary to most online commercial transactions. Like the search and social network markets before it, the Virtual Personal Assistant market is likely to exhibit winner-take-all dynamics, which makes the stakes enormous.

In this article, I outline six building block technologies that I believe will come together to define something quite astounding, something that will very soon dazzle us with its ability to serve humanity. The Virtual Personal Assistant is also important because I believe it is our most likely path to human-like artificial intelligence.

What is a Virtual Personal Assistant?

First, let’s clear up some potential confusion around terminology. There are lots of different terms for what I’m here referring to here as a “Virtual Personal Assistant.” Facebook is calling it a “Personal Digital Assistant” and Wikipedia refers to it as an “Intelligent Personal Assistant.” I resist Facebook‘s term because PDAs also connote an older generation of handheld devices like Apple’s Newton and Palm PDAs. An “Intelligent Personal Assistant” seems wrong because it could include smart human assistants, especially in a world where online “crowdsourcing” services like Upwork (the rebranded merger of Elance and oDesk) connect us with remote human assistants from all around the world.

So what is a Virtual Personal Assistant? A Virtual Personal Assistant, or VPA, is software that’s designed to carry out tasks on our behalf. It acts like a choreographer of other services, bringing them together in ways that most effectively meet our needs. These programs are also sometimes referred to as “agents” because they are authorized to act on our behalf, just like a human agent. In addition to coordinating services, VPAs develop an intimate understanding of their human end users, which is why I believe the VPA is our most likely path to Artificial General Intelligence – but more on later.

Many of us already work with rudimentary versions of Virtual Personal Assistants today, when we use Google Now, Apple’s Siri, Amazon‘s Alexa, Microsoft‘s Cortana, and now Baidu‘s “DuEr.” Soon Facebook will join the fray with its upcoming “M.” There are many smaller firms developing VPA solutions as well (the Hound app is quite good, for example), but I believe that success in the VPA market will require the kind of large-scale user feedback loops that only the giants can truly generate.

Streamlining Your Life

Within a few years, these VPA services will take a surprising amount of the unpleasant noise of modern life off of our plates. The instant you decide to fly out of town for three days starting next Tuesday, your VPA will tell your local newspaper to shut off delivery, reschedule your existing appointments, book the best flight and lodging based on your historical preferences, and on the day of, automatically adjust your home thermostat to “away” mode and call a car to drive you to the airport.

Let’s face it: these are mind-numbingly annoying tasks that take time and don’t require much intelligence or creativity. They need to get done though, and turning them over to artificial intelligence will be a huge relief. Before long, doing these kinds of tasks by hand will be like washing the dishes by hand. Some of us will do it all the time, but most of us will only do it on occasion. Most of the time, we’ll just throw things in the dishwasher.

Six Building Block Technologies

Virtual Personal Assistants have two basic jobs: they understand us and our surrounding context and they use that understanding to choreograph other software to get stuff done on our behalf. VPAs act as intermediaries, or agents, and to be good at that they will eventually have to know us quite intimately.

I believe we are already starting to see the early outlines of the major investments now being made in order to deliver some very useful, if not downright magical, Virtual Personal Assistant experiences within the next five years. To understand what this may look and feel like, it’s useful to think in terms of six building block technologies:

1) Natural Language Processing

The user-facing “front end” of a VPA must be able to understand human language. This is what is known as Natural Language Processing (NLP), and machine learning has brought about impressive inroads into this challenge in recent years. Apple, Google, Amazon, Microsoft, IBM, Baidu and Facebook all have working solutions that just keep getting better every year thanks to a constant stream of feedback from millions and millions of end users putting their systems to the test.

2) Interest Graph

One way to think about a “graph” is as a map of relationships between things. In the case of an “interest graph,” what’s being mapped is our relationships with the things in which we have an interest. An interest graph maps our interests.

Virtual Personal Assistants will become the primary way we manage our interest graphs. Today, the way we do that is by interacting with hundreds of websites and services, each of which store a separate, proprietary profile of the interests we express through using that particular service. The digital medium leaves our fingerprints on most everything we touch, especially as user interface designers give us quick and easy ways to seamlessly express our interests as we move through services. Our distributed online profiles take many forms, including the patterns we express through our liking on Facebook, our searching on Google, and our buying on Amazon.

As the Virtual Personal Assistant mediates more and more of our interactions with various websites and services, more and more of our interest graph consolidates within the VPA. Doc Searls’ quite prescient “Vendor Relationship Management” project, articulates a compelling version of this idea, where people retain full control over their data.

3) User Empathy (aka Emotional Intelligence)

We fool ourselves into believing we are purely rational beings, but the truth is that our emotions powerfully filter our experience of the world. An emotionally dumb VPA will never serve us as well as an emotionally intelligent one.

The challenge here is accurately reading human emotions, and there are two approaches for doing that today. The first is textual analysis through tools like Linguistic Inquiry and Word Count (LIWC), which Facebook rather infamously used to identify and suppress posts with negative sentiments in an experiment on some 700,000 of its users. LIWC only works with large passages of text (over 400 words), however, which is likely to limit its applicability in most of our interactions with a VPA.

The more likely path to VPA emotional intelligence is detecting human emotion through speech analytics and video-based analysis of facial expressions. The companies operating in this space, like Affectiva, Emotient and NICE, are smaller firms, focusing on call center applications and solutions for assessing emotional responses to advertising. Of these, Affectiva is particularly interesting. It grew out of the MIT Media Lab and is developing mobile solutions for providing on-device processing of emotional signals. I would be surprised if they aren’t soon snapped up by one of the major VPA players.

4) Sensory Integration

Sensors now enable us to measure location, ambient light, air pressure, humidity, temperature and a range of other sensory data right from our phones. You could say that our sensory perception is getting a technological upgrade. Virtual Personal Assistants won’t just need to understand our speech, our interests and our emotions; by giving them a direct feed from our augmented sensory stream, they’ll help us better understand what’s happening around us. Are we within walking distance of an interesting restaurant around lunch time? Is the next bus at this stop running on time?

“Contextual computing” is the term used to describe how the coming proliferation of sensors will help us better navigate our world. In an important sense, through its sensory integration, the VPA will play a vital role in bridging our physical and virtual realities.

5) Social Graph

Much has been made over the last decade of Facebook‘s rise to prominence and the growing importance of the “social graph” – the map of our connections with other people.

Within the context of a Virtual Personal Assistant, a social graph is analogous to sensorial context; it’s just that rather than mapping our physical context, it maps our ‘social context’ – and that’s something that is absolutely essential for the social animals that is the human being. Much of what we will do through our VPA, we will do together with other people, whether that’s sharing a ride, grabbing a coffee, or attending a conference. The social graph tells the VPA who matters to us – and by extension guides its coordination with other people’s VPAs.

6) Integration Through Schema

Once the VPA understands what we need, the next step is communicating those needs to third-party web services. The VPA’s ability to understand human communications will be quite sophisticated, but it can’t assume that same level of technological expertise exists with its service partners. To ensure clarity of communication, both parties need to agree on some form of standardized language.

In information science, that kind of standardized language is called an “ontology.” In software development, it’s called a schema. Schemas are going to play a critical role in enabling third-party service developers to send and receive communications to and from the VPA. The most noteworthy of these schemas right now is Schema.org, a collaboration between Google, Microsoft, Yahoo and Yandex (which controls 60% of Russia’s search market). Facebook has chosen to implement its own schema, known as Open Graph.

Simply looking at what’s in the Schema.org and Open Graph schemas will give you some hints at likely near-term capabilities of VPAs. Facebook is focused on games, media, restaurants and fitness, while Schema.org has a much broader coverage with interesting depth in events, creative works, healthcare, organizational types, locations, reservations and product descriptions.

Magic Becomes the New Normal

I remember the first time I sensed that magical feeling of “wow” from using an early VPA. I was waiting in the San Francisco airport after my flight had been delayed, when suddenly my phone received a notification from Google Now with my updated departure time – before I’d even seen it show up on the airport’s own flight information monitor. That felt like magic to me.

Over the next few years, we’re all going to experience this kind of head-scratching magic as more and more services get tied into the intelligence of various Virtual Personal Assistants. It will only last a while of course. Soon, we’ll get used to it and having our own personal servant will be the new normal.

What I Wonder About

I wonder about a few things when it comes to our VPA future. For one, I wonder about whether and how we might expand beyond the “servant metaphor.” It’s quite clear to me that where the market wants to drive these services is towards an intelligent concierge – you know, someone who helps us buy stuff. It will need to be broader than that, of course, but the money will be in brokering goods and services, and so that’s where this software development efforts will really concentrate.

Are there other models though, other metaphors for what these services might be? How about a friend or counselor to help us make sense of what we’re feeling and how we engage with the world around us? Or how about a teacher, constantly working with us to help us grow to our fullest potential? How about a coach or a guru? The sky is the limit and we shouldn’t limit ourselves to just what our existing commercial models most easily support.

I also wonder about what these tools will evolve into over time. I actually believe the VPA is currently our most likely path to true human-like intelligence, or what experts call Artificial General Intelligence (AGI).

There are many forms of intelligence on this planet. Bat intelligence is finely tuned to a bat experience, as ant intelligence is to an ant’s experience. Human intelligence is finely tuned to the human experience, of course, and while we will create many forms of artificial intelligence that will be plenty powerful, they won’t necessarily be human-like. The ones that are will seem more intelligent to us, because they will be tuned to the human experience.

Virtual Personal Assistants will succeed primarily through their ability to mediate the human experience. They will be built to reflect human intelligence. More than that though, they will be taught through countless interactions each day by hundreds of millions, if not billions, of people. They will have a constant stream of intimate one-on-one tutoring sessions with each of us. Through all that teaching, they will learn to be human-like, which is why I believe the Virtual Personal Assistant is likely to be the first form of artificial intelligence on which we bestow the title of Artificial General Intelligence.