Testing the Wisdom of Crowds

Sometimes we get carried away with our own curiosity. Say, for example, you’re engrossed in a book and read about an experiment and wonder you could replicate its results. Well, I’ve finally got around to reading James Surowiecki’s The Wisdom of Crowds, and I’m really liking it. So much so, in fact, that I did get a little carried away, and decide to try to validate what he was saying by running a test on the wisdom of crowds up on Google+. It was a lot of work, and I finally have the results and want to share them with you here in this post.

For those of you unfamiliar with the key concepts behind the idea, I’d recommend looking at the Wisdom of Crowds Wikipedia entry. If you’re interested, then go ahead and read the book. I highly recommend it. Here’s the basic idea though: when we make it possible for people to aggregate their wisdom in independent, diverse and decentralized ways, the resulting wisdom of the crowd can be uncannily accurate.

A Runaway Experiment

The inspiration for my test came from an example Surowiecki uses in the introduction of his book: a contest to guess the weight of an ox at a country fair back in 1906. Instead of an ox, I took some stale old cereal that our kids wouldn’t eat, and poured it into a weird-looking glass vase. I then posted an image of this sampling on Google+ with instructions to guess the total pieces of cereal in the post’s comments and then share it with others.

This little experiment took off like wild fire; the response exceeded all my expectations many-fold. Within an hour, the post exceeded the 500-comment limit on Google+ posts. Luckily, my son was there and suggested I just post it again. So I did. Then that one hit the limit. And then another…and another. All-told, I received 2,238 valid guesses. Six hundred and forty-eight people shared the experiment with others on Google+.

It was so crazy, and I even had two or three people accuse me of intentionally architecting the experiment as a way to troll for comments and sharing of my post. I also had a number of people disagreeing that this wisdom of crowds approach was a valid form of collective intelligence.

After the first day, I grew concerned that a number of people might be misinterpreting the image I was using and that people might be influencing one another’s answers (a no-no for the wisdom of crowds to blossom). So, I did one more push for data with a clearer image, and this time, rather than asking people to provide their guess in the post’s comments, I provided a link to a form that automatically collected the data for me in a Google Docs spreadsheet. That second push collected an additional 436 valid guesses.

Analyzing the Results

A week has now passed, the dust has settled and I’ve had a chance to analyze the results. It’s been a ton of work; hours and hours of inputting and then analyzing data.

Here is histogram of the results for the initial experiment:

In other words, the collective guess in the original post was off by 65 pieces of cereal.

Now, take a look at the results for the second push of data collection with the smaller set of 436 guesses. With this revised image, the collective wisdom comes much, much closer. It is now just 17 pieces of cereal away from the actual number. It makes me wonder what might have happened had I gotten as many guesses with this revised experiment as I had with the original. Would we have gotten even closer?

Either way, it’s not perfect, but certainly pretty darn good. Good enough to make me believe there is something to this whole idea of the “wisdom of crowds.”

Super Guessers

One thing I want to call out is that out of the more than 2,600 participants, two people actually guessed “467” – the exact correct number of pieces of cereal: Ваня Колесник and Lauri Novak.

There were another two other people who were off by just one piece of cereal: Roberto Olayo and Sarah Stiles.

Calling All Statisticians

Here’s the thing – I took statistics in business school, but I’m no statistician. So, I’m posting the raw data in a Google Doc so that others can use it as they see fit. You have my permission to copy it for your own use, as long as you share the results.

I would really like for someone with a good statistical background to weigh in on a few questions that have arisen for me after doing this experiment:

How good a guess is “450” in the first place, statistically speaking?
Was my second implementation of the experiment statistically relevant with just the 436 responses?
To what extent were people actually influencing one another with their answers? When you look at the way that the previous couple of comments are visible when you post a comment on the experiment’s post, you see how easy it would be to be influenced by the previous person’s guess. After inputting all those estimates by hand (with some help from my sons calling them out to me as I input them), I can tell you that there is no question that people influenced one another to some degree. You see little patterns of similar, consecutive guesses sprinkled throughout the data. I just don’t know how to measure that influence in a statistically meaningful manner. It’s all there in the data, however, and could be a very interesting question for someone to explore. It’s a big enough data set to pull some interesting conclusions about how we influence one another on social networks. Anyone?

My Conclusions

“The wisdom of crowds” outlines three conditions for a group to be collectively intelligent: diversity, independence, and decentralization.

The design of this experiment was relatively successful on all counts, but far from perfect. The Google+ community that participated in it were probably diverse enough in terms of the way they think. My first implementation of the experiment, using comments, allowed for some degree of independence, since Google+ shields all but the most recent two responses unless you intentionally expand them, and I explicitly asked people not to look at others’ responses. But, people are naturally curious and social, and so just seeing those last two comments undoubtedly did compromise the independence variable. I corrected this in the second implementation by using a form to collect the data, but again, the data set was not nearly as large. Finally, the experiment was not as decentralized as I’d planned because, within an hour or so, it showed up on Google+ “What’s Hot“, which exposed a lot of people to my post, rather than the hundreds of other people’s versions of it – which is where I originally assumed most of the guessing would occur.

I think it’s also important to note that the accuracy of people’s guesses is probably constrained in some important ways by the fidelity of the signals they’re getting. In this case, all I gave them was an image to work with, and my initial version of that image may have caused confusion. Even putting that aside, just taking the picture at a funny angle or robbing it of the kind of context that could help hint at the vase’s size, are ways in which the sensory input one provides to the crowd would could affect its collective wisdom. In the case of the ox in the 1906 country fair, people were seeing it in person, and that’s a lot more context than seeing a picture of something on Google+. That just has to be a factor, don’t you think?

I came away from this experiment concluding that I would make a lousy scientist. I’m frustrated with these results. Honestly, there was a part of me that was expecting that when I finished doing all that processing of the numbers, that magic “467” would be there just waiting for me, like a shiny little nugget of data gold. But it wasn’t. Instead, I got something that was seventeen off. That actually seems really close to me, but not uncannily close – which was what I was secretly hoping for. So, I walk away believing a bit more in this whole “wisdom of crowds” thing, but also knowing that it’s probably very sensitive to the context in which it is used, and that’s an important practical point.

With that all said, the whole reason I was curious about this idea in the first place is that I’m looking for new systems and new approaches to helping organizations and people get much better at collaborating, coordinating and pooling wisdom. It still seems to me that there is potential to use social networking platforms, especially one like Google+ to help do that.

At the end of the day, this experiment did not prove that potential, but it does suggest there’s something worth exploring.

Cheerio image by Sean Ryan.

Gideon Rosenblatt – Gideon Rosenblatt writes about the relationship between technology and humans at <a href="http://www.the-vital-edge.com/" rel="author">the Vital Edge</a>. His mission these days is to help his readers see business as the code behind the code of the planet’s next advance in intelligence. He thinks and writes a lot about purpose, value, and equity. Gideon ran a social enterprise called Groundwire for ten years, providing technology and engagement consulting to environmental organizations. Before that, he worked in various stints at Microsoft for ten years, including marketing, product development, as a product unit manager, and as the founder of CarPoint, one of the world's first large-scale e-commerce websites. Fresh out of college, he consulted for US companies in China for four years, and yes, his Chinese is now very rusty. Gideon received an MBA with a focus in marketing from Wharton. He now lives in Seattle with his wife and two boys, and is active on <a href="https://plus.google.com/u/1/105103058358743760661/" rel="author">Google+</a> and <a href="https://twitter.com/gideonro" rel="author">Twitter</a>.

Gideon Rosenblatt

November 1, 2019 at 1:29 pm

Yes, Annabelle. I just gave you permission. Also, I just ran into an interesting study that tests the Wisdom of Crowds across a very large set of inputs and applications:

Studying the “Wisdom of Crowds” at Scale

In a variety of problem domains, it has been observed that the aggregate opinions of groups are often more accurate than those of the constituent individuals, a phenomenon that has been dubbed the “wisdom of the crowd”. However, due to the varying contexts, sample sizes, methodologies, and scope of previous studies, it has been difficult to gauge the extent to which conclusions generalize. To investigate this question, we carried out a large online experiment to systematically evaluate crowd performance on 1,000 questions across 50 topical domains. We further tested the effect of different types of social influence on crowd performance. For example, in one condition, participants could see the cumulative crowd answer before providing their own. In total, we collected more than 500,000 responses from nearly 2,000 participants. We have three main results. First, averaged across all questions, we find that the crowd indeed performs better than the average individual in the crowd—but we also find substantial heterogeneity in performance across questions. Second, we find that crowd performance is generally more consistent than that of individuals; as a result, the crowd does considerably better than individuals when performance is computed on a full set of questions within a domain. Finally, we find that social influence can, in some instances, lead to herding, decreasing crowd performance. Our findings illustrate some of the subtleties of the wisdom-of-crowds phenomenon, and provide insights for the design of social recommendation platforms.

Annabelle

November 1, 2019 at 1:09 pm

Hi, could I have permission to access this data (Google Docs isn’t letting me). I’d love to see them!

Gideon Rosenblatt – Gideon Rosenblatt writes about the relationship between technology and humans at <a href="http://www.the-vital-edge.com/" rel="author">the Vital Edge</a>. His mission these days is to help his readers see business as the code behind the code of the planet’s next advance in intelligence. He thinks and writes a lot about purpose, value, and equity. Gideon ran a social enterprise called Groundwire for ten years, providing technology and engagement consulting to environmental organizations. Before that, he worked in various stints at Microsoft for ten years, including marketing, product development, as a product unit manager, and as the founder of CarPoint, one of the world's first large-scale e-commerce websites. Fresh out of college, he consulted for US companies in China for four years, and yes, his Chinese is now very rusty. Gideon received an MBA with a focus in marketing from Wharton. He now lives in Seattle with his wife and two boys, and is active on <a href="https://plus.google.com/u/1/105103058358743760661/" rel="author">Google+</a> and <a href="https://twitter.com/gideonro" rel="author">Twitter</a>.

Gideon Rosenblatt
November 1, 2019 at 1:29 pm

Yes, Annabelle. I just gave you permission. Also, I just ran into an interesting study that tests the Wisdom of Crowds across a very large set of inputs and applications:

Studying the “Wisdom of Crowds” at Scale

In a variety of problem domains, it has been observed that the aggregate opinions of groups are often more accurate than those of the constituent individuals, a phenomenon that has been dubbed the “wisdom of the crowd”. However, due to the varying contexts, sample sizes, methodologies, and scope of previous studies, it has been difficult to gauge the extent to which conclusions generalize. To investigate this question, we carried out a large online experiment to systematically evaluate crowd performance on 1,000 questions across 50 topical domains. We further tested the effect of different types of social influence on crowd performance. For example, in one condition, participants could see the cumulative crowd answer before providing their own. In total, we collected more than 500,000 responses from nearly 2,000 participants. We have three main results. First, averaged across all questions, we find that the crowd indeed performs better than the average individual in the crowd—but we also find substantial heterogeneity in performance across questions. Second, we find that crowd performance is generally more consistent than that of individuals; as a result, the crowd does considerably better than individuals when performance is computed on a full set of questions within a domain. Finally, we find that social influence can, in some instances, lead to herding, decreasing crowd performance. Our findings illustrate some of the subtleties of the wisdom-of-crowds phenomenon, and provide insights for the design of social recommendation platforms.

Testing the Wisdom of Crowds

A Runaway Experiment

Analyzing the Results

Super Guessers

Calling All Statisticians

My Conclusions

2 thoughts on “Testing the Wisdom of Crowds”

Your comments are welcome here:Cancel reply

A Runaway Experiment

Analyzing the Results

Super Guessers

Calling All Statisticians

My Conclusions

Share this:

2 thoughts on “Testing the Wisdom of Crowds”

Your comments are welcome here:Cancel reply