Looking Inside the Image Recognition of Artificial Intelligence
Software is getting harder and harder for humans to decipher. Even the software developers who design a particular Deep Learning approach don’t really know exactly how their algorithms work:
“One of the challenges of neural networks is understanding what exactly goes on at each layer. We know that after training, each layer progressively extracts higher and higher-level features of the image, until the final layer essentially makes a decision on what the image shows. For example, the first layer maybe looks for edges or corners. Intermediate layers interpret the basic features to look for overall shapes or components, like a door or a leaf. The final few layers assemble those into complete interpretations—these neurons activate in response to very complex things such as entire buildings or trees.”
This is the standard way of running a Deep Learning approach to image recognition. Recently, MIT Technology Review featured a piece on some work in Japan that reversed this process, perhaps exaggerating a bit in describing the resulting images as a kind of “computational imagination:”
Now Google researchers, Alexander Mordvintsev, Christopher Olah, and Mike Tyka have published some of their own, similar results:
Here’s how they describe what they’ve done:
“One way to visualize what goes on is to turn the network upside down and ask it to enhance an input image in such a way as to elicit a particular interpretation. Say you want to know what sort of image would result in “Banana.” Start with an image full of random noise, then gradually tweak the image towards what the neural net considers a banana… By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.”
If you care about image recognition or are even just curious about how neural networks work, it’s worth reading this relatively quick post from them. It talks about how, by focusing on different layers of the Deep Learning network, they can isolate either very rudimentary shapes that change an image in ways that look similar to a Photoshop filter effect, or, more interestingly, can even create images embedded with the network’s interpretation of things like dogs, fish, insects, and temples.
Because the network’s ‘understanding’ of what these objects ‘look like’ is learned from lots and lots of source images, when systematic noise is also introduced, well, that also shows up in the network’s understanding. As an interesting example, the network’s understanding of a dumbbell included some organic, arm-like portions that likely stem from the fact that many of the source images of dumbbells had human arms connected to them.
This is fascinating stuff. As the researchers note, it’s not hard to imagine artists using these techniques to generate really interesting new approaches to artwork. Might these same techniques also help us to understand how we humans are able to generate images from scratch in our minds’ eyes? That’s certainly speculative, but I would be surprised if there aren’t some neuroscientists out there already digging into that one.
Thanks to Jeff Dean for highlighting this research in one of his recent posts here:
Also – make sure to check out the researchers’ collection of network-generated images (which is where I found this video). It’s mind blowing:
#ai #deeplearning #image #imagination