Explanations of machine learning are often either too complex or overly simplistic. I’ve had some luck explaining it to people in person with some simple analogies:
The Jet of Machine Learning
Scratch the surface, and you see that machine learning is basically a kind of ‘statistical thinking.’ We’ve long had tools for doing statistical analysis on data. Machine learning just automates that analysis so we can do it at much larger scale. The basic techniques have been around for decades, but machine learning didn’t really explode in popularity until just a few years ago with the advent of powerful new processors (Graphics Processing Units and later Tensor Processing Units) and large-scale data sets from Internet services like Google Search, Amazon and Facebook.
Andrew Ng makes the analogy that compute power is the jet engine and data is the jet fuel of machine learning. Rather than fly you to Chicago, this jet builds statistical models that draw on their underlying data to simulate reality, somewhat analogously to the way we simulate reality in our own brains. These algorithmic models extend our biological brains to help them do something they’re not really built for: thinking statistically.
Big Data and Models
Before this powerful new jet showed up, we were using machine learning to automate the building of statistical models. It saved a lot of time and energy over the labor-intensive statistical techniques we used to use, and that opened up interesting new applications, such as analyzing inventory levels in a warehouse, estimating the threat of over-fishing from commercial boats, and predicting stocks prices.
These kinds of applications are what is often described as “Big Data,” or data analytics. In this work’s early phases, the models were typically static, a kind of snapshot analysis of the underlying data. Despite this limitation, these techniques proved valuable for analyzing large datasets. That made them very popular in large corporations and resulted in a thriving ecosystem of data analytics companies.
Deepening the Automation
It’s worth calling out one of the specific tricks that we now use to automate building these statistical models. It’s called Deep Learning and it is a technique that has taken the machine learning world by storm. The reason Deep Learning is so popular is that it allows developers to automatically build models through exposure to large datasets. These neural networks have multiple layers, much the way animal brains do. The lower layers of these networks focus on identifying the simplest and most concrete features of a model, handing off their results to subsequent layers, which work on progressively more complex and holistic interpretations of the data. The below graphic from Nvidia illustrates an example of layers in a deep neural network for identifying cars, starting with rudimentary lines, moving to wheel wells, doors, and other car parts, and finally on to full cars.
Where developers once needed to painstakingly identify these attributes of data (called “features”) in advance, now they simply ‘bubble up’ through repeated exposure to large datasets. A lot of work still goes into designing the right architecture and preparing the training data, of course, but through this automatic generation of features, Deep Learning revolutionizes the way we build simulated models of our world.
Actuators and Inference
But wait, you say, I thought machine learning involved things like Facebook recognizing pictures of my friends or Tesla’s autopilot. Yes, those are more obvious examples, and that’s because, in these cases, we get to interact directly with the machine learning models themselves. What most of us think of as machine learning is thus actually a machine-learning model that has been hooked up to some form of automation. We run the model and it helps us make sense of new data — like pictures of friends, product recommendations, or how to get your car to automatically screech to a halt as a mother raccoon steps onto the road.
I owe this insight to two people. The first is Yonatan Zunger, who recently described artificial intelligence as a triad made up of 1) sensors for collecting data; 2) a model for analyzing and interpreting the data; and 3) an actuator for turning the model’s results into some action:
The second person is Michael Copeland, who outlines two types of hardware chips, 1) training chips optimized for building models; and 2) inference chips optimized for using that trained model to analyze new data. Training new models by exposing them to millions of pictures of cats, for example, is processing and data-intensive. Once that model is trained, however, it can be optimized for greater performance and then deployed as a dedicated “cat recognizer” — an inference system in the field.
In short, you can think of machine learning as a jet engine, fueled by lots of data. Once you’ve used that jet to build a statistical model, you can then “actuate” it, which is to say, put it to work by allowing it to interact with and infer meaning from new data.
The most powerful examples of doing that tend to include various forms of automation that make things simpler for us. The ones we seem to love most are those that provide us with some sort of user interface that allows us to interact with the model. That might mean making it easier for us to find new music on Spotify, find every picture you’ve ever taken of stained glass on Google Photos, or even beat a world champion Go player.