Difference between revisions of "Crowd-sourced Data"

Jump to: navigation, search
Line 1: Line 1:
<onlyinclude>
<onlyinclude>
Modern ICT has created the possibility to collect large amounts of data by outsourcing the task to a 'crowd'. Crowd-sourcing typically involves large numbers of untrained contributors, who provide needed data points.
Crowdsourced data collection is a participatory method of building a dataset with the help of a large group of people. This page provides a brief overview of crowdsourced data collection in development and highlights points to consider when crowdsourcing data.</onlyinclude>
</onlyinclude>
== Read First ==
== Read First ==
Crowd-sourcing has many applications (e.g. this wiki!); this article focuses on crowd-sourcing data collection for impact evaluations.
*Through crowdsourced data collection, researchers can collect plentiful, valuable, and disperse data at a cost typically lower than that of traditional data collection methods.
*Crowdsourced data may introduce sampling issues. Consider the trade-offs between sample size and sampling issues before deciding to crowdsource data.  
* Make sure the platform on which you are collecting crowdsourced data is well-tested.
== Overview ==


== Guidelines ==
Crowdsourced data collection allows researchers to cheaply outsource simple tasks or questionnaires, gather data in real time, and obtain far more numerous and widespread observations than in traditional data collection given its relatively low cost. Notably, crowdsourced data collection allows researchers to more easily reach people and places, giving researchers insight into [https://mdp.berkeley.edu/data-crowdsourcing-the-gap-between-ideation-and-implementation/ local markets], [http://www.lse.ac.uk/international-development/conflict-and-civil-society/current-projects/crowdsourcing-conflict-and-peace-events-in-the-syrian-conflict events], or even [https://www.technologyreview.com/s/520151/crowdsourcing-mobile-app-takes-the-globes-economic-pulse/ prices]. Researchers may crowdsourced data collection via a number of platforms including mobile apps or internet marketplaces like [https://www.mturk.com/ Amazon Mechanical Turk].


=== Advantages of Crowd-Sourced Data===
== Considerations when crowdsourcing data ==
* Crowdsourced data is inexpensive, so number of observations can be far larger than for a traditional survey. This can increase statistical power (though sampling issues need to be carefully considered)
* Ensure a large network of contributors: this is essential to crowdsourcing success. If collecting geographically specific data, keep in mind that the potential for crowdsourcing is limited in rural areas due to technology constraints and low levels of connectivity.  
* Real-time data gathering
* Simple tasks can be cheaply outsourced
* Can be more participatory than traditional data collection
 
=== Pitfalls of Crowd-Sourced Data ===
Lessons DIME has learned from crowdsourcing:
* Recruiting a large network of contributors is essential to crowdsourcing success. The potential for crowdsourcing is limited in rural areas by technology constraints and low levels of social media connectivity. There are examples of successful recruiting through Facebook and similar social media in South Asia and Latin America; fewer in Africa
* Follow network growth carefully. Crowdsourcing requires a crowd, not a handful!
* Follow network growth carefully. Crowdsourcing requires a crowd, not a handful!
* The reliability of crowdsourcing data is often questioned because of the lack of underlying sampling frame. Crowdsourcing may not be the right tool when rigorous sampling and data structure are required
* Consider the trade-offs between sample size and sampling issues. The reliability of crowdsourcing data is often questioned because of the lack of underlying sampling frame. Crowdsourcing may not be the right tool if you need rigorous sampling and data structure.
* Make sure the technology is well-tested: in one case, DIME took the promises of a Silicon Valley partner at face value but the available version of their technology delivered less than hoped. In practice, it looked rather like traditional enumeration - a few contributors, filling out long mobile surveys with no training. This took away the advantage of multiple observations and triangulation we had assumed. Moreover, it made the advantage of going with this model, where the contributor had very little training, rather than traditional enumeration, much less clear.
* Request simple tasks from contributors. The instruments used in crowdsourced data collection should not look like traditional [[Questionnaire Design | questionnaires]] that includes skip codes, relevancies, constraints. Remember that contributors will not have the training of typical [[Enumerator Training | enumerators]].
* Stick with simple tasks. Instruments should not look like typical questionnaire – skip codes, relevancies, constraints. Contributors will not have the training of typical enumerators.  
* Ensure that the platform on which you are collecting crowdsourced data is well-tested: in one case, DIME [https://blogs.worldbank.org/impactevaluations/lessons-crowdsourcing-failure took the promises] of a Silicon Valley partner at face value -- but the available version of their technology delivered less than hoped.
* Quantify trade-offs carefully. What are the cost savings compared to traditional enumeration? Will they offset losses in precision or quality?
* Quantify trade-offs carefully. What are the cost savings compared to traditional enumeration? Will they offset losses in precision or quality?


== Back to Parent ==
== Back to Parent ==
This article is part of the topic [[Secondary Data Sources]]
This article is part of the topic [[Secondary Data Sources]]


== Additional Resources ==
== Additional Resources ==
* A DIME blogpost on [https://blogs.worldbank.org/impactevaluations/lessons-crowdsourcing-failure learning from a crowdsourcing failure]
*Hunt and Spect’s [https://jhumanitarianaction.springeropen.com/articles/10.1186/s41018-018-0048-1 Crowdsourced Mapping in Crisis Zones: Collaboration, Organisation and Impact]
*Bott, Gigler and Young's [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.474.2448&rep=rep1&type=pdf The Role of Crowdsourcing for Better Governance in Fragile State Contexts]
*Komarov, Reinecke and Gajos’ [https://dash.harvard.edu/bitstream/handle/1/12363924/Crowdsourcing%20Performance%20Evaluations.pdf?sequence=1&isAllowed=y Crowdsourcing Performance Evaluations of User Interfaces] tests whether Amazon Mechanical Turk results differ from traditional questionnaire results
*In a DAI [https://dai-global-digital.com/crowdsourced-data-collection-provides-on-the-ground-insights.html blogpost], Kelsey Stern Buchbinder explains the use of crowdsourced data in development and its role in providing on-the-ground insights


[[Category: Secondary Data Sources ]]
[[Category: Secondary Data Sources ]]

Revision as of 20:49, 12 April 2019

Crowdsourced data collection is a participatory method of building a dataset with the help of a large group of people. This page provides a brief overview of crowdsourced data collection in development and highlights points to consider when crowdsourcing data.

Read First

  • Through crowdsourced data collection, researchers can collect plentiful, valuable, and disperse data at a cost typically lower than that of traditional data collection methods.
  • Crowdsourced data may introduce sampling issues. Consider the trade-offs between sample size and sampling issues before deciding to crowdsource data.
  • Make sure the platform on which you are collecting crowdsourced data is well-tested.

Overview

Crowdsourced data collection allows researchers to cheaply outsource simple tasks or questionnaires, gather data in real time, and obtain far more numerous and widespread observations than in traditional data collection given its relatively low cost. Notably, crowdsourced data collection allows researchers to more easily reach people and places, giving researchers insight into local markets, events, or even prices. Researchers may crowdsourced data collection via a number of platforms including mobile apps or internet marketplaces like Amazon Mechanical Turk.

Considerations when crowdsourcing data

  • Ensure a large network of contributors: this is essential to crowdsourcing success. If collecting geographically specific data, keep in mind that the potential for crowdsourcing is limited in rural areas due to technology constraints and low levels of connectivity.
  • Follow network growth carefully. Crowdsourcing requires a crowd, not a handful!
  • Consider the trade-offs between sample size and sampling issues. The reliability of crowdsourcing data is often questioned because of the lack of underlying sampling frame. Crowdsourcing may not be the right tool if you need rigorous sampling and data structure.
  • Request simple tasks from contributors. The instruments used in crowdsourced data collection should not look like traditional questionnaires that includes skip codes, relevancies, constraints. Remember that contributors will not have the training of typical enumerators.
  • Ensure that the platform on which you are collecting crowdsourced data is well-tested: in one case, DIME took the promises of a Silicon Valley partner at face value -- but the available version of their technology delivered less than hoped.
  • Quantify trade-offs carefully. What are the cost savings compared to traditional enumeration? Will they offset losses in precision or quality?

Back to Parent

This article is part of the topic Secondary Data Sources

Additional Resources