Monday, March 28, 2011

Issues with Crowdsourced Data Part 2

A recent guest Beneblog explains why we believe a correlation found between SMS text messages and building damage by researchers was not useful. Some of the questions we received made us realize we need to be clearer about why this is important. Why did we bother analyzing this claim? Why does it matter? Thanks to Patrick Ball, Jeff Klingner and Kristian Lum for contributing this material (and making it much clearer).

We’re reacting to the following claim: “Data collected using unbounded crowdsourcing (non-representative sampling) largely in the form of SMS from the disaster affected population in Port-au-Prince can predict, with surprisingly high accuracy and statistical significance, the location and extent of structural damage post-earthquake.”

While this claim is technically correct, it misses the point. If decision makers simply had a map, they could have made better decisions more quickly, more accurately, and with less complication than if they had tried to use crowdsourcing. Our concern is that if in the future decision makers depend on crowdsourcing, bad decisions are likely to result -- decisions that impact lives. So, we’re speaking up.

In the comments to our last post, Jon from Ushahidi said "If a tool's fitness cannot be absolute, then neither can it's fallibility." And, that the correlation they found was useful. Why is this something worth arguing about?

Misunderstanding relationships in data is a problem because it can lead to choosing less effective, more expensive data instead of choosing obvious, more accurate starting points. The correlation found in Haiti is an example of a "confounding factor". A correlation was found between building damage and SMS streams, but only because both were correlated with the simple existence of buildings. Thus the correlation between the SMS feed and the building damage is an artifact or spurious correlation. Here are two other examples of confounding effects.

- Children's reading skill is strongly correlated with their shoe size -- because older kids have bigger feet and tend to read better. You wouldn't measure all the shoes in a classroom to evaluate the kids' reading ability.

- Locations with high rates of drowning deaths are correlated with locations with high rates of ice cream sales because people tend to eat ice cream and swim when they're at leisure in hot places with water, like swimming pools and seasides. If we care about preventing drowning deaths, we don't set up a system to monitor ice cream vendors.

We're particularly concerned because we think that using a SMS stream to measure a pattern is probably at its best in a disaster situation. When there's a catastrophe, people often pull together and help each other. If an SMS stream was ever going to work as a pattern measure, it was going to be in a context like this -- and it didn't work very well. We don't think that SMS was a very good measure of building damage, relative to the obvious alternative of using a map of building locations.

The problems will be much worse if SMS streams are used to try to measure public violence. In these contexts, the perpetrators will be actively trying to suppress reporting, and so the SMS streams will not just measure where the cell phones are, they'll measure where the cell phones that perpetrators can't suppress are. We'll have many more "false negative" zones where there seems to be no violence, but there's simply no SMS traffic. And we'll have dense, highly duplicated reports of visible events where there are many observers and little attempt to suppress texting.

In the measurement of building damage in Port-au-Prince, there were several zones where there was lots of damage but few or no SMS messages ("false negatives"). This occurred when no one was trying to stop people from texting. The data will be far more misleading when the phenomenon being measured is violence.

As we've said in each post, crowdsourcing generally and SMS traffic in particular is great for documenting specific requests for help. Our critique is that it's not a good way to generate a valid basis for understanding patterns.

8 comments:

Differance said...

Two points to stress here: 1) crowdsourcing isn't typically methodologically rigorous. 2) people assure information quality. While technology can aid, no automated rules can give you accuracy. Accuracy is a measure that describes judgments comparing representations to what they describe in the real world, which is done by humans. People produce, assess, and correct for accuracy.

You're describing a circumstance similar to Australia's initiative to make the primary data for collected research reusable for other research. The methodology (including relevant software and metadata such as definitions and specification) for producing this primary data has to be shared and reported in a way that allows its integrity to be tested; and the data's fitness for purpose needs to be assessed. Data usually isn't collected to be fit for *all* purposes -- and doing so generally requires common understanding of stakeholder requirements, since people will otherwise produce information only in a manner suited to accomplish the function they're most interested in or exposed to. Making this kind of common understanding apply universally, across all enterprises, is much more tricky. If the quality of information in a set were assessed, we'd have metrics for understanding its fitness for particular purposes.

And the participants in crowdsourcing (that I've seen) aren't typically wedded to a practice of managing information as a shared resource in the way members of an established enterprise can be. A practice of assessing and giving feedback could probably improve crowdsourcing despite this gap, but establishing the culture for understanding and meeting downstream stakeholder needs is more problematic in that scenario. Not entirely impossible, but I am unaware of any efforts to apply this perspective to crowdsourcing.

However: 1) as I said, information quality is produced by people; and 2) we should be able to do distributed information capture with a proper understanding of this fact. Most of the troubles with information quality come from a misunderstanding whereby people tend to expect automation to assure quality.

You improve an information production process at the point of capture (as well as through downstream automated processing) to prevent non-quality as you detect it and "build quality into" the process, rather than have to redundantly perform scrap and rework in the form of fixing already captured information that's produced by a process that continues to produce nonquality.

Finally, this understanding of the role of people as information producers, the value of whose work is understood in terms of quality characteristics of information that are measurable, is what I see will be the eventual development of the nature of work in the information economy. We will sell quality -- accuracy, timeliness, completeness, common understanding, scalability, etc. -- not "content."

Jim Fruchterman said...

Thanks, Differance. Great comment, and one I very much agree with.

Differance said...

Here's a link describing the Australian primary data project that I mention in my comment (and which I includd there, but it's not visible or accessible on the main blog's rendering of my comment): http://www.theaustralian.com.au/higher-education/project-aims-to-reuse-stored-primary-data/story-e6frgcjx-1226022044954

Iraqi Bootleg said...

Although I agree with your general contention that crowdsourced data is not a valid replacement for good sampling and data analysis, I am a bit troubled by the lack of awareness of military/disaster operations that you gloss over. It is all well and good to say "just look at a satellite image" or "just look at a map" to determine where the likely damage is going to occur... it is really not that simple. I worked in the Army HQ in Iraq for a year, and acquiring satellite imagery of an area, or even up-to-date maps, was next to impossible without some very high ranking suport behind you. I think that crowdsourced data, as you mention, is one piece of a huge jigsaw puzzle, which a coordinator only has a vague understanding of in the first place.

Proper data collection and analysis is a personal project of mine - the year spent working in Iraq was largely composed of questioniong "surveys" conducted by local companies who claimed representative sampling but would never actually give us specifics as to how those samples were actually collected. But I think that at some level there has to be a compromise between an ideal (real-time satellite imagery, a true SRS of any population, etc.), and the reality...

Jim Fruchterman said...

Thanks for these comments: makes me think there's another blog post here to articulate this a third way. Each of these interchanges allows us to try out different ways of zeroing in on our point and more clearly excluding things that are not our point. More to come!

Jim Fruchterman said...

Our third blog in this thread is now up.

Jon Gos said...

Jim, as usual, I agree with your points. I'm a data geek so the point you make about an anecdotal usecase and a more dependable one are spot on.

But again, collecting data via crowdsourcing takes many shapes. Sending surveyors out to the field with pens and pads is what might now be referred to as 'bounded crowdsourcing', which would not have to be public, would not have to be chaotic, and would not have to be contributed by unvetted (or untrained individuals). In such a usecase, tools used for "crowdsourcing", simply become tools used for data collection.

As it's been said "information quality is produced by people". That said, platforms like Ushahidi serve as more than just tools for collection, the methodology is often the mode for delivering a different message - one that is hard to quantify. That message is often "You the observer, the victim, or the person empathetic to the cause, have a voice -- you are participating."

This is why we encourage the use of Ushahidi in combination with more qualitative tools like your own; or for instance ArcGIS (for more advanced mapping). While crowdsourcing alone *can* be used for decision based campaigned, it rarely is and we don't take the stand that it should be. The recent Disaster 2.0 report shows exactly why it wasn't and the people who are indeed making decisions can see how lowering the bar for local participation does offer a different types of strategic leverage than having more accurate data alone. So I'll agree with you by using one of my favorite quotes, "More isn't better. Better is better." Yet, better isn't always the sole objective.

Jim Fruchterman said...

Thanks for your comments, Jon. This crowdsourcing-centric view of the world is getting kind of silly, though.

You are stretching the use of the term crowdsourcing way beyond its accepted meaning here. I’m sure that many statisticians will be surprised to find that “running a survey” is now going to be called “bounded crowdsourcing!” From Wikipedia: “Crowdsourcing is the act of outsourcing tasks, traditionally performed by an employee or contractor, to an undefined, large group of people or community (a 'crowd'), through an open call.” Doesn’t sound like running a survey with surveyors to me. I don't think we should keep stretching to try to make well-established approaches sound like a new branch of crowdsourcing that we just figured out!

Tools like pads of paper, PCs, landline telephones and SMS-capable phones aren’t tools for crowdsourcing becoming tools for data collection, as you suggest. They are general purpose tools used frequently for data collection that occasionally get used for crowdsourcing.

I do agree with you on the empathetic nature of SMS texts being made more widely available. If the objective is to get people to care more, or hear from people whose messages are often ignored or suppressed through other channels. As long as the personal security of the person isn’t threatened by sending a text, I think it’s a good thing.

And I do agree with you that more isn’t better. The goal of collecting data usually is to make better decisions, real-time and/or a later date.