Emotion-Based Reinforcement Learning

New Research Hints at How Your Smiles Could One Day Teach Artificial Intelligence

We live in an era when humans are busy training a new intelligence on this planet. Every once in a while, researchers come up with a novel way to speed up that teaching process.

Simulating a Drive to Learn

That’s what happened at Microsoft Research where computer scientists recently developed a new approach to using human emotion to train machines how to learn.[i] The research used virtual agents to facilitate learning various tasks in a simulated environment. What is most significant about this research is that it trained those agents by exposing them to the smiles of human subjects as they interacted with the system.

To make sense of this research, it helps to understand a bit about Reinforcement Learning. This machine learning technique is ideal for teaching systems to do sequential tasks, which in this case focused on computer vision. Reinforcement Learning works by using a virtual agent to carry out tasks within a simulated environment, carrying out those tasks over and over, and then distilling the resulting experience into a kind of model playbook (or “policy”). This approach was famously used a few years ago to train a system to play Atari games and later, with a more advanced version, to beat world champion, Lee Sedol in Go.

The key thing about Reinforcement Learning is that it relies heavily on the data that the agent generates from exploring its environment—and this gets to the heart of this research. The Microsoft researchers wanted a generic approach to building a highly explorative agent. To get there, they drew on another technique called Imitation Learning that trains systems by exposing them to actual people interacting with simulated environments. Their hypothesis was that the best way to distill an artificial drive to explore was to capture the experience of real humans actually doing that with the system. Based on earlier research linking positive emotions to both curiosity and learning, they built a system that looked for those feelings.

The result was a system that tracked people’s smiles as they drove a computer-generated vehicle through a simulated maze environment. After running a number of tests on the system, the researchers found that this ‘emotion-based Reinforcement Learning’ led to agents that explored with 51% longer duration, 46% greater coverage and 29% fewer collisions in the maze.

Emotion-based Reinforcement Learning

The significance of using emotional responses like smiles is that they represent intrinsic rewards that can be used in many applications beyond just those demonstrated in this study. Another important thing to understand about Reinforcement Learning is that an agent’s interactions with its simulated environment are heavily influenced by the specific rewards selected by the system’s designers. Want to teach a system to win at the game of Atari Breakout? Then reward it when it scores points. The problem is that life doesn’t always provide easily quantifiable scores like that. Finding clear rewards is one of the challenges of Reinforcement Learning.

By measuring smiles as a proxy for positive emotions, the researchers found a generalizable feedback mechanism for training intelligent systems that is relatively easy to obtain. Emotion-based Reinforcement Learning essentially piggybacks on top of the hundreds of millions of years of biological intelligence that allow us to quickly and effortlessly assign value to various experiences. With the right tuning, that signal can be a powerful—and highly extensible—feedback mechanism for teaching machines how to learn.

Emotion AI in the Market Place


While these researchers used an open source smile detection tool, there are also plenty of commercial players like Affectiva and Emotient (which was acquired by Apple in 2016). This market for emotion-tracking tools is sometimes referred to as “Emotion AI.” The research at Microsoft suggests a powerful new application for these tools as a data source for emotion-based Reinforcement Learning.

Machines learn from humans in many contexts, but none more significant than in the service economy, where companies like Google, Netflix, and Amazon are automating our ability to serve ourselves. Our engagement with these powerful technology platforms generates enormous quantities of data that can be used to train machine learning systems for making these platforms smarter and more powerful over time.

Emotion-based Reinforcement Learning will be extremely useful for these platforms. Reinforcement Learning itself is particularly good for learning how to optimize sequential tasks. Sequential tasks are the very essence of business processes, and business processes are the heart of how companies create value for customers. The challenge in using Reinforcement Learning to master the tasks in business processes lies in securing a reliable source of clear rewards.

Smiles and other emotional expressions offer a plentiful source of just such rewards, thanks to the ubiquity of inexpensive cameras on the in-store kiosks, websites and apps that serve as our interfaces to automated self-service. And as the researchers note, there is an opportunity for “extension to other physiological signals,” which means that voice data from smart speakers from players like Amazon and Google could play a similar training role. Just as today, “calls may be recorded for training and quality purposes.” It’s just that the entity being trained will no longer be employees, but intelligent software agents.

Emotions Will Connect Machines to Us

What is it that these systems are learning from people’s engagement with them? If engagement is building relationships and putting those relationships to work, then machines that learn by engaging us learn how to relate to us and how to work for us.

Emotion AI helps machines better relate to us. Today, these technologies function as a kind of sensor for detecting emotional state, which can be used for marketing as well as improving the functionality of things (like automobiles). Over time, these technologies will expand their focus to encompass a form of rapport building with end users. It’s no real stretch of the imagination to see that one day we will interact with products and services through simulated personas. Today’s chatbots and agents like Siri and Alexa are just early examples. One day, you will have a relationship with your running shoes and your toothbrush.

What is interesting and new about this Microsoft research is that it foretells using Emotion AI in a new way that resembles the way our brains harness emotional signals. In psychology, valence describes the attractiveness or averseness created by our emotional response to some experience. When a child touches a hot stove, the resulting strong negative valence leaves a powerful learning signal in her brain. In much the same way, this emotion-based Reinforcement Learning could represent a powerful mechanism for using these very same emotional valences to fuel machine learning.

In the very big picture, what is most interesting about this research is the way that the planet’s ancient biological intelligence is now acting as a seedbed for a new machine intelligence. Emotions are powerful teachers and this new research points the way to what could be a powerful new mechanism for harnessing this ancient wisdom embedded within us.


[i] Affect-based Intrinsic Rewards for Learning General Representations

4 thoughts on “New Research Hints at How Your Smiles Could One Day Teach Artificial Intelligence”

  1. Very interesting, as usual. In your concluding paragraph, you say “In the very big picture, what is most interesting about this research is the way that the planet’s ancient biological intelligence is now acting as a seedbed for a new machine intelligence.” It is nice to see that human beings still have something to do with creating machine intelligence. But for how long?

    1. That’s a really good question, Bill. I’m working on a piece that I’ll publish in a few days that digs more into that question and what are the attributes of humanity that are likely to remain inaccessible to machine intelligence for the foreseeable future.

      Thanks for stopping by.

  2. Related:

    An algorithm that learns through rewards may show how our brain does too

    DeepMind’s new paper builds on the tight connection between these natural and artificial learning mechanisms. In 2017, its researchers introduced an improved reinforcement-learning algorithm that has since unlocked increasingly impressive performance on various tasks. They now believe this new method could offer an even more precise explanation of how dopamine neurons work in the brain.

    Specifically, the improved algorithm changes the way it predicts rewards. Whereas the old approach estimated rewards as a single number—meant to equal the average expected outcome—the new approach represents them more accurately as a distribution. (Think for a moment about a slot machine: you can either win or lose following some distribution. But in no instance would you ever receive the average expected outcome.)

    The modification lends itself to a new hypothesis: Do dopamine neurons also predict rewards in the same distributional way?

Your comments are welcome here:

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top