A new industry of crowdsourced data labeling

Crowdsourcing Data Labeling for Artificial Intelligence in China

By now, most of us know that machine learning requires huge pools of data. We also know that the operating scale of companies like Facebook, Google, and Amazon gives them a powerful advantage in getting that data. What many of us don’t realize is that there is a growing industry specialized in doing the labor-intensive work of labeling all that data.

Data labeling tells machine learning that a particular pattern is a “cat” or a “dog.” Machines excel at pattern recognition, but still rely on humans to make meaning from those patterns. One of the main techniques for connecting our meaning-making to the pattern-recognition capabilities of machines is for humans to label the data that is used to train machine learning algorithms.

A recent article in The Economist highlights the role of China’s data labeling infrastructure in the success of its growing AI sector. In particular, it highlights the work of one supplier of labeled data, called MBH:

Mr Liu claims that MBH’s trick is not just numbers, but the methods the firm uses to distribute labeling work efficiently to its workers. This is done using the same kind of machine-learning systems that Amazon, an American e-commerce giant, uses to recommend products to its customers. Instead of suggesting stuff to shoppers, MBH assigns labeling tasks to workers. First, it gathers data from its workers as they carry out labeling jobs. Mr Liu says the company records its workers’ gaze, mouse movements and keyboard strokes. It also takes note of what sort of data-labeling task the worker is performing, from medical-imagery labeling to text translation. By measuring performance according to the type of task, he says, he is able to find workers who are better at some tasks than others, and steer those tasks to those workers.

China’s success at AI has relied on good data

What is particularly interesting about the way that MBH handles its data-labeling is that it relies on the same kind of outsourcing management techniques used by a company like Uber. In short, MBH crowdsources work to huge pools of contract contributors, most of whom live in China’s poorer rural areas.

MBH, in turn, acts as a supplier to many of China’s largest machine learning companies in areas like facial recognition. So, what we are seeing here is a whole data supply chain, as companies increasingly open themselves to external contributions of work and learning. In some cases, as with Facebook, Google, and other tech giants, the supply of data comes from end users. In other cases, it now comes from professional suppliers like MBH. But even these companies aren’t sourcing their data the traditional way by staffing up. Instead, they are relying on the gig economy and drawing from huge pools of contracted workers.

2 thoughts on “Crowdsourcing Data Labeling for Artificial Intelligence in China”

  1. I wonder how they exercise quality control while using vast numbers of contract workers, as well as how reliable is the data those workers are generating. And how do they locate and pay those contract workers? Pretty complicated stuff. You always give me some real head scratchers!

    1. Gideon Rosenblatt – Gideon Rosenblatt writes about the relationship between technology and humans at <a href="http://www.the-vital-edge.com/" rel="author">the Vital Edge</a>. His mission these days is to help his readers see business as the code behind the code of the planet’s next advance in intelligence. He thinks and writes a lot about purpose, value, and equity. Gideon ran a social enterprise called Groundwire for ten years, providing technology and engagement consulting to environmental organizations. Before that, he worked in various stints at Microsoft for ten years, including marketing, product development, as a product unit manager, and as the founder of CarPoint, one of the world's first large-scale e-commerce websites. Fresh out of college, he consulted for US companies in China for four years, and yes, his Chinese is now very rusty. Gideon received an MBA with a focus in marketing from Wharton. He now lives in Seattle with his wife and two boys, and is active on <a href="https://plus.google.com/u/1/105103058358743760661/" rel="author">Google+</a> and <a href="https://twitter.com/gideonro" rel="author">Twitter</a>.

      Good questions, Bill. The Economist hints at some of that. I think it’s like the kind of systems that Uber uses or like Amazon and its Mehanical Turk. There are ways to build in quality control into the process.

      Payment in China is actually way more advanced than here, thanks to mobile payment apps.

      Thanks for dropping by. It’s interesting stuff, huh?

Your comments are welcome here: Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Exit mobile version