Difference between revisions of "Secondary Data Sources"

Jump to: navigation, search
 
(18 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Secondary data is data collected by any party other than the researcher, including administrative data from programs, geodata from specialized sources, and census or other population data from governments. Secondary data provides important context for any investigation, and in some cases (such as administrative program data) it is the only source which covers the full population needed to conduct a research project.
'''Secondary data''' is data collected by any party other than the researcher, including [[Administrative and Monitoring Data Data|administrative data]] from programs, [[Geo Spatial Data|geodata]] from specialized sources, and census or other population data from governments. '''Secondary data''' provides important context for any investigation, and in some cases (such as administrative program data), it is the only source which covers the full population needed to conduct a research project.


Impact Evaluations rely on many different sources of secondary data, such as: administrative, geospatial, sensors, telecomms, and crowd-sourcing. An important step in designing an impact evaluation is to evaluate what data sources are best suited (and which are available, given the context).
== Read First ==
* [[Impact Evaluation Team|Research teams]] usually rely on two broad categories of data - [[Primary Data Collection|primary data]], and '''secondary data'''.
* '''Impact evaluations''' rely on many different sources of '''secondary data''', such as: [[Administrative and Monitoring Data|administrative]], [[Geo Spatial Data|geospatial]], [[Remote Sensing|sensor]], [[Telecom Data|telecom]], and [[Crowd-sourced Data|crowd-sourcing]].  
* '''Research teams''' should decide on the kind of data they want to use, based on context and project needs.


== Guidelines ==
== Types of Secondary Data ==
=== [[Administrative and Monitoring Data]] ===
=== Administrative and Monitoring Data ===
Administrative data includes all data collected through existing government Ministries, programs and projects. It is a potentially rich source of data for an impact evaluation. Key challenges are: data is in paper format only (needs to be digitized), restricted access, lack of numeric identifier (or lack of common identifier with other key datasets).
[[Administrative and Monitoring Data|Administrative data]] includes all data collected through existing government ministries, programs and projects. It is a potentially rich source of data for an impact evaluation. Some of the key challenges with administrative data include:
* '''Digitization''': in a lot of cases, the data is in paper format only.
* '''Restricted access''': it is also difficult to get access to certain data because it contains sensitive information.
* '''Lack of unique ID''': in some cases, administrative datasets might be missing a numeric [[ID Variable Properties|ID variable]].


[https://www.povertyactionlab.org/ JPAL] provides a [https://www.povertyactionlab.org/na/administrative-data-and-evaluation-guides useful guide to using administrative data for impact evaluations].
=== National Survey Data ===
 
Existing [[Survey Pilot|survey]] data may be of use depending on the [[Sampling#Establish the Sampling Frame and Master Dataset|sampling frame]] for the '''impact evaluation''', level of representativity of the existing data, and availability of disaggregated data. National Statistics Office typically collect a wide array of nationally-representative data, such as Living Standards Measurement Surveys and censuses. International '''survey''' efforts such as the Demographic and Health Surveys [https://dhsprogram.com/] and Enterprise Surveys [http://www.enterprisesurveys.org] are also good sources.
=== Survey Data ===
The bread and butter of most impact evaluations is primary data collection; enumerators conducting personal interviews with respondents. These can be in the form of household surveys, firm surveys, school surveys, health facility surveys, etc. They can take place in-person, by telephone, or online. Data can be collected on paper with centralized data entry ([[Pen-and-Paper Personal Interviews (PAPI)]]), on paper with field data entry ([[Computer-Assisted Field Entry (CAFE)]]), or electronically ([[Computer-Assisted Personal Interviews (CAPI)]]).


=== [[Geo Spatial Data]] ===
=== [[Geo Spatial Data]] ===
This includes data from traditional satellites, micro- and nano-satellites, and unaccompanied aerial vehicles (UAVs, e.g. drones).  
This includes data from traditional satellites, micro- and nano-satellites and unaccompanied aerial vehicles (UAVs, e.g. drones).  


=== [[Remote Sensing]] ===
=== [[Remote Sensing]] ===
Line 19: Line 23:


=== [[Telecom Data]] ===
=== [[Telecom Data]] ===
This includes call detail records, social media data, web scraping.
This includes [[Innovative Data Sources#Mobile Big Data|call detail records]], [[Innovative Data sources#Types of Secondary Data|social media data]], and web scraping.


=== [[Crowd-sourced Data]] ===
=== [[Crowd-sourced Data]] ===
This includes all data collected by crowd-sourcing, often through social media or mobile apps.
This includes all data collected by crowd-sourcing, often through social media or mobile apps.


== Back to Parent ==
== Related Pages ==
This article is part of the topic [[Secondary Data Sources]]
[[Special:WhatLinksHere/Secondary Data Sources|Click here for pages that link to this topic]].
 


== Additional Resources ==
== Additional Resources ==
 
* JPAL, [https://admindatahandbook.mit.edu/ Handbook on Using Administrative Data]
* DIME Analytics (World Bank), [https://osf.io/36nyq Secondary Data Sources]


[[Category: Secondary Data Sources ]]
[[Category: Secondary Data Sources ]]

Latest revision as of 15:42, 16 August 2023

Secondary data is data collected by any party other than the researcher, including administrative data from programs, geodata from specialized sources, and census or other population data from governments. Secondary data provides important context for any investigation, and in some cases (such as administrative program data), it is the only source which covers the full population needed to conduct a research project.

Read First

Types of Secondary Data

Administrative and Monitoring Data

Administrative data includes all data collected through existing government ministries, programs and projects. It is a potentially rich source of data for an impact evaluation. Some of the key challenges with administrative data include:

  • Digitization: in a lot of cases, the data is in paper format only.
  • Restricted access: it is also difficult to get access to certain data because it contains sensitive information.
  • Lack of unique ID: in some cases, administrative datasets might be missing a numeric ID variable.

National Survey Data

Existing survey data may be of use depending on the sampling frame for the impact evaluation, level of representativity of the existing data, and availability of disaggregated data. National Statistics Office typically collect a wide array of nationally-representative data, such as Living Standards Measurement Surveys and censuses. International survey efforts such as the Demographic and Health Surveys [1] and Enterprise Surveys [2] are also good sources.

Geo Spatial Data

This includes data from traditional satellites, micro- and nano-satellites and unaccompanied aerial vehicles (UAVs, e.g. drones).

Remote Sensing

This includes all data collected by sensors, and through the Internet of Things (IoT).

Telecom Data

This includes call detail records, social media data, and web scraping.

Crowd-sourced Data

This includes all data collected by crowd-sourcing, often through social media or mobile apps.

Related Pages

Click here for pages that link to this topic.

Additional Resources