Passing on Human Values
This piece focuses on two different methods for passing on human values to artificial intelligence. The first is training, where that training may happen through intensive work with human trainers or through reading a wide range of human stories from books, television and movies. The second is embedding a kind of synthesized emotion, such as guilt.
On the teaching part, my first question was, “well, what about all those dark stories out there?”
There’s a certain poetic symmetry to the solution: from the Golem to Frankenstein’s monster and beyond, humans have always turned to stories when imagining the monstrous impact of their creations. Just as there are gloomy conclusions to these stories, there is also a worry that, if you feed the AI only dark plotlines, you could end up training it to be evil. “The only way to corrupt the AI would be to limit the stories in which typical behaviour happens somehow,” says Riedl. “I could cherry-pick stories of antiheroes or ones in which bad guys all win all the time. But if the agent is forced to read all stories, it becomes very, very hard for any one individual to corrupt the AI.”