I often find explanations of machine learning either too complex or overly simplistic. I’ve recently had some luck using a simple frame for explaining it to people in person. Let’s see if I can quickly capture it in this post.
The Jet of Machine Learning
Scratch the surface, and you see that machine learning is basically a kind of ‘statistical thinking.’ We’ve long had tools for doing statistical analysis on data. Machine learning just automates that analysis so we can do it at much larger scale. The basic techniques have been around for decades, but machine learning didn’t really explode in popularity until just a few years ago with the advent of powerful new processors (Graphics Processing Units and later Tensor Processing Units) and large-scale data sets from Internet services like Google Search, Amazon and Facebook.
Andrew Ng makes the analogy that compute power is the jet engine and data is the jet fuel of machine learning. Rather than fly you to Chicago though, this jet builds statistical models. Those models draw on their underlying data to simulate reality, somewhat the way we simulate reality with our own brains. The difference is that these algorithmic models extend that biological brain of ours to do something it’s not really built for: thinking statistically.
Big Data and Models
Before this powerful new jet showed up a few years ago, we still used machine learning to help automate the way we built statistical models. It saved a lot of time and energy over the more labor-intensive statistical techniques we used to use, and that opened up interesting new capabilities, such as analyzing inventory levels in a warehouse, estimating the threat of overfishing from commercial boats, and predicting stocks prices.
These kinds of applications are what is often described as “Big Data,” or data analytics. In this work’s early phases, the models were typically static, a kind of snapshot analysis of the underlying data. Despite that limitation, the techniques proved extremely valuable in making sense of large datasets, which made them extremely popular in large corporations and resulted in a thriving ecosystem of data analytics companies.
Deepening the Automation
It’s worth calling out one of the specific tricks that we now use to automate the way we build these statistical models. It’s called Deep Learning and it is a technique that has taken the machine learning world by storm. The reason Deep Learning is so popular is that it allows developers to automatically build models through exposure to large datasets. These neural networks have multiple layers, much the way that animal brains do. The lower layers of these networks focus on identifying the most basic and specific features of the model, handing off their results to subsequent layers, which in turn handle progressively more complex and holistic interpretations of the data. The below graphic from Nvidia illustrates an example of layers in a deep neural network for identifying cars, starting with rudimentary lines, moving to wheel wells, doors, and other car parts, and finally on to full cars.
Where developers once needed to painstakingly identify these kinds of attributes (called “features”) in advance, now they simply bubble up from repeated exposure to large datasets. A lot of work still goes into designing the right architecture and preparing the training data, of course, but through this automatic generation of features, Deep Learning has revolutionized the way we build simulated models of our world.
Actuators and Inference
But wait, you say, I thought machine learning involved things like Facebook recognizing pictures of my friends or Tesla’s autopilot. Yes, those are more obvious examples, and that’s because, in these cases, we get to interact directly with the machine learning models themselves. What most of us think of as machine learning is thus actually a machine-learning model, hooked up to some form of automation. We run the model and it helps us make sense of new data, like pictures of friends, recommendations on which rice cooker to buy, or how to get your car to automatically screech to a halt when a mother raccoon suddenly sprints out into the road with her adorable little babies. For example.
I owe this insight to two people. The first is Yonatan Zunger, who recently described artificial intelligence as a triad made up of 1) sensors for collecting data; 2) a model for analyzing and interpreting the data; and 3) an actuator for turning the model’s results into some action:
The second person is Michael Copeland, who outlines two types of hardware chips, 1) training chips optimized for building models; and 2) inference chips optimized for using that trained model to analyze new data. Training new models by exposing them to millions of pictures of cats, for example, is processing and data-intensive. Once that model is trained, however, it can be optimized for greater performance and then deployed as a dedicated “cat recognizer” chip in the field.
In short, you can think of machine learning as a jet engine, fueled by lots of data. Once you’ve used that jet to build a statistical model, you can then “actuate” it, which is to say, put it to work by allowing it to interact with and infer meaning from new data.
The most powerful examples of doing that tend to include various forms of automation that make things simpler for us. The ones we seem to love most are those that provide us with some sort of user interface that allows us to interact with the model. That might mean making it easier for us to find new music on Spotify, find every picture you’ve ever taken of stained glass on Google Photos, or even beat a world champion Go player.