Difference between revisions of "Crowd-sourced Data"

Jump to: navigation, search
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<onlyinclude>
'''Crowdsourced data''' collection is a participatory method of building a [[ Master Dataset | dataset]] with the help of a large group of people. This page provides a brief overview of '''crowdsourced''' data collection in development and highlights points to consider when crowdsourcing data. '''Crowdsourced Data''' is a form of [[Secondary Data Sources | secondary data]].  
Modern ICT has created the possibility to collect large amounts of data by outsourcing the task to a 'crowd'. Crowd-sourcing typically involves large numbers of untrained contributors, who provide needed data points.
 
</onlyinclude>
== Read First ==
== Read First ==
Crowd-sourcing has many applications (e.g. this wiki!); this article focuses on crowd-sourcing data collection for impact evaluations.
* [[Secondary Data Sources | Secondary Data]] refers to data that is collected by any party other than the researcher. ''' Secondary data''' provides important context for any investigation into a policy intervention.
*When '''crowdsourcing data''', researchers collect plentiful, valuable, and dispersed data at a cost typically lower than that of traditional [[Primary Data Collection|data collection]] methods.
*Consider the trade-offs between sample size and [[Sampling | sampling]] issues before deciding to '''crowdsource data'''.  
* Ensuring [[Data Quality Assurance Plan | data quality]] means making sure the platform on which you are collecting '''crowdsourced data''' is well-tested.


== Guidelines ==
== Overview ==


=== Advantages of Crowd-Sourced Data===
'''Crowdsourced data''' collection allows researchers to cheaply outsource simple tasks or [[Questionnaire Design | questionnaires]], gather data in real-time, and obtain far more numerous and widespread observations than in traditional [[Primary Data Collection|data collection]] given its relatively low cost.
* Crowdsourced data is inexpensive, so number of observations can be far larger than for a traditional survey. This can increase statistical power (though sampling issues need to be carefully considered)
* Real-time data gathering
* Simple tasks can be cheaply outsourced
* Can be more participatory than traditional data collection


=== Pitfalls of Crowd-Sourced Data ===
Notably, '''crowdsourced data collection''' allows researchers to more easily reach people and places, giving researchers insight into [https://mdp.berkeley.edu/data-crowdsourcing-the-gap-between-ideation-and-implementation/ local markets], [http://www.lse.ac.uk/international-development/conflict-and-civil-society/current-projects/crowdsourcing-conflict-and-peace-events-in-the-syrian-conflict events], or even [https://www.technologyreview.com/s/520151/crowdsourcing-mobile-app-takes-the-globes-economic-pulse/ prices]. Researchers may '''crowdsource data collection''' via a number of platforms including mobile apps or internet marketplaces like [https://www.mturk.com/ Amazon Mechanical Turk].
Lessons DIME has learned from crowdsourcing:
* Recruiting a large network of contributors is essential to crowdsourcing success. The potential for crowdsourcing is limited in rural areas by technology constraints and low levels of social media connectivity. There are examples of successful recruiting through Facebook and similar social media in South Asia and Latin America; fewer in Africa
* Follow network growth carefully. Crowdsourcing requires a crowd, not a handful!
* The reliability of crowdsourcing data is often questioned because of the lack of underlying sampling frame. Crowdsourcing may not be the right tool when rigorous sampling and data structure are required
* Make sure the technology is well-tested: in one case, DIME took the promises of a Silicon Valley partner at face value – but the available version of their technology delivered less than hoped. In practice, it looked rather like traditional enumeration - a few contributors, filling out long mobile surveys with no training. This took away the advantage of multiple observations and triangulation we had assumed. Moreover, it made the advantage of going with this model, where the contributor had very little training, rather than traditional enumeration, much less clear.
* Stick with simple tasks. Instruments should not look like typical questionnaire – skip codes, relevancies, constraints. Contributors will not have the training of typical enumerators.  
* Quantify trade-offs carefully. What are the cost savings compared to traditional enumeration? Will they offset losses in precision or quality?


== Back to Parent ==
== Considerations when crowdsourcing data ==
This article is part of the topic [[Secondary Data Sources]]
* '''Ensure a large network of contributors.''' This is essential to crowdsourcing success. If collecting geographically specific data, keep in mind that the potential for crowdsourcing is limited in rural areas due to technology constraints and low levels of connectivity.
* '''Follow network growth carefully.''' Crowdsourcing requires a [[Sampling | crowd]], not a handful!
* '''Consider the trade-offs between sample size and [[Sampling | sampling issues]].''' The reliability of crowdsourcing data is often questioned because of the lack of an underlying [[Sampling#Establish the Sampling Frame and Master Dataset|sampling frame]]. Crowdsourcing may not be the right tool if you need rigorous '''sampling''' and data structure.
* '''Request simple tasks from contributors.''' The instruments used in crowdsourced data collection should not look like traditional [[Questionnaire Design | questionnaires]] that includes skip codes, relevancies, constraints. Remember that contributors will not have the training of typical [[Enumerator Training | enumerators]].
* '''Ensure that the platform on which you are collecting crowdsourced data is well-testested.''' An example of [https://blogs.worldbank.org/impactevaluations/lessons-crowdsourcing-failure taking the promises] of a Silicon Valley partner at face value -- but the available version of their technology delivered less than hoped.
* '''Quantify trade-offs carefully.''' What are the cost savings compared to traditional enumeration? Will they offset losses in precision or quality?


== Related Pages ==
[[Special:WhatLinksHere/Crowd_sourced_Data|Click here for pages that link to this topic]].


== Additional Resources ==
== Additional Resources ==
* A DIME blogpost on [https://blogs.worldbank.org/impactevaluations/lessons-crowdsourcing-failure learning from a crowdsourcing failure]
*Hunt and Spect’s [https://jhumanitarianaction.springeropen.com/articles/10.1186/s41018-018-0048-1 Crowdsourced Mapping in Crisis Zones: Collaboration, Organisation and Impact]
*Bott, Gigler and Young's [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.474.2448&rep=rep1&type=pdf The Role of Crowdsourcing for Better Governance in Fragile State Contexts]
*Komarov, Reinecke and Gajos’ [https://dash.harvard.edu/bitstream/handle/1/12363924/Crowdsourcing%20Performance%20Evaluations.pdf?sequence=1&isAllowed=y Crowdsourcing Performance Evaluations of User Interfaces] tests whether Amazon Mechanical Turk results differ from traditional questionnaire results
*In a DAI [https://dai-global-digital.com/crowdsourced-data-collection-provides-on-the-ground-insights.html blogpost], Kelsey Stern Buchbinder explains the use of crowdsourced data in development and its role in providing on-the-ground insights


[[Category: Secondary Data Sources ]]
[[Category: Secondary Data Sources ]]

Latest revision as of 16:37, 7 August 2023

Crowdsourced data collection is a participatory method of building a dataset with the help of a large group of people. This page provides a brief overview of crowdsourced data collection in development and highlights points to consider when crowdsourcing data. Crowdsourced Data is a form of secondary data.

Read First

  • Secondary Data refers to data that is collected by any party other than the researcher. Secondary data provides important context for any investigation into a policy intervention.
  • When crowdsourcing data, researchers collect plentiful, valuable, and dispersed data at a cost typically lower than that of traditional data collection methods.
  • Consider the trade-offs between sample size and sampling issues before deciding to crowdsource data.
  • Ensuring data quality means making sure the platform on which you are collecting crowdsourced data is well-tested.

Overview

Crowdsourced data collection allows researchers to cheaply outsource simple tasks or questionnaires, gather data in real-time, and obtain far more numerous and widespread observations than in traditional data collection given its relatively low cost.

Notably, crowdsourced data collection allows researchers to more easily reach people and places, giving researchers insight into local markets, events, or even prices. Researchers may crowdsource data collection via a number of platforms including mobile apps or internet marketplaces like Amazon Mechanical Turk.

Considerations when crowdsourcing data

  • Ensure a large network of contributors. This is essential to crowdsourcing success. If collecting geographically specific data, keep in mind that the potential for crowdsourcing is limited in rural areas due to technology constraints and low levels of connectivity.
  • Follow network growth carefully. Crowdsourcing requires a crowd, not a handful!
  • Consider the trade-offs between sample size and sampling issues. The reliability of crowdsourcing data is often questioned because of the lack of an underlying sampling frame. Crowdsourcing may not be the right tool if you need rigorous sampling and data structure.
  • Request simple tasks from contributors. The instruments used in crowdsourced data collection should not look like traditional questionnaires that includes skip codes, relevancies, constraints. Remember that contributors will not have the training of typical enumerators.
  • Ensure that the platform on which you are collecting crowdsourced data is well-testested. An example of taking the promises of a Silicon Valley partner at face value -- but the available version of their technology delivered less than hoped.
  • Quantify trade-offs carefully. What are the cost savings compared to traditional enumeration? Will they offset losses in precision or quality?

Related Pages

Click here for pages that link to this topic.

Additional Resources