Innovative Data Sources

Jump to: navigation, search

In addition to traditional data sources, such as information gathered during surveys, data can be collected from a variety of alternative sources.

Read First

  • Primary data is the main type of information that comes to mind when people talk about collecting data. It consists of gathering data through surveys, interviews, or experiments.
  • Occasionally, researchers find that data has already been collected, sometimes by the government and sometimes by a third party. Previously collected information that the field team then makes use of is known as secondary data.
  • Any source of data, such as secondary data, that is not collected first-hand is an innovative data source.
  • Examples of secondary data include administrative and monitoring data and Mobile Big Data.

Acquiring Secondary Data

Some types of secondary data, such as satellite imagery, are publicly available and don't require special agreements with government institutions or private companies. However, most information of interest to researchers, whatever kind of secondary data it may be, must be obtained through a data license agreement. Data License Agreements formally grants rights to people who do not the own data they will be analyzing. The key elements are

  • What data will be received
  • Intended use(s)
  • How long it will be retained
  • Who will have access to it
  • Rights to derivative data, metadata, and other outputs
  • How to cite the data

Types of Secondary Data

There are a variety of categories of secondary data. Among others, examples include

  • Satellite Imagery
  • Social Media Data
  • Mobile Phone Data

Satellite imagery

Among the information satellite imagery can offer is evidence of economic activity and city expansion (seen from nighttime lights); true color imagery and vegetation (seen from daytime lights); weather patterns, such as rainfall and temperature; pollution levels of CO2 and NO2; and data on a region's terrain, i.e. is the area urban, cropland, forested, home to bodies of water, etc.

Social Media Data

Social media can offer information on poverty and education levels. For example, one way that researchers have measured poverty is by looking at Facebook users whose accounts show an interest in restaurants, luxury goods, travel, etc. Educational attainment is also self-reported on Facebook and other social media platforms, so researchers can sometimes obtain detailed information on an area's education levels.

Mobile Phone Data

Mobile phone data consists of two types: call data records (CDR), records of mobile phone activity mapped to cell towers; and GPS data which is compiled from pings from applications, such as Google Maps querying GPS. As an example of the information that can be extracted from GPS mobility data, consider the case where researchers queried the travel time for over 1000 origin and destination pairs every hour using Google & Mapbox. The resultant dataset contained information on peak and off-peak travel hours for the months April-October as well as average speeds during those hours.

For other types of secondary data, see the linked page at the beginning of this section. It contains information on geospatial data, remote sensing, telecom data, and crowd-sourced data.

Administrative Data

Data collected through existing government ministries, programs and projects is called administrative data. It is so called because data collected and maintained by government agencies are used to "administer" programs and provide services to the public.

For example, line ministries, agencies responsible for a particular economic sector or activity and for delivering government programs to citizens, have access to administrative data in order to carry out their mandate. Or take national statistics offices (NSOs), agencies responsible for producing and disseminating quantitative and qualitative information on major areas in citizens' lives, that possess, say, census and geospatial data while regulatory agencies have tax, price, and trade data.

Mobile Big Data

A type of secondary data, mobile big data (MBD) is anonymized, aggregated data generated from personal mobile devices and mobile network operators (MNOs). There is ongoing research to harness this information to track population trends, augment statistics, and deliver policy insights which can be used to provide targeted services. In response to the Covid-19 Pandemic and the push towards sustainable growth, nearly 80% of NSOs have indicated they want to improve their use of MBD. To show how effective this technology can be consider, consider the estimated impact the adoption of MBD would have in Sub-Saharan Africa:

  • 60 million people could have better access to healthcare due to better positioning of health care services
  • 120 million people saved because of better-informed measures to limit air pollution 
  • Cost effective: $30 for every $1 dollar invested in Integrated National Data Systems

MBD in Policy

The top sources of MBD for policy use are call detail records (CDR) and GPS data. CDR is metadata of voice, text, and other data points collected by MNOs. There are two advantages and one disadvantage to using CDR:

  • More representative of bottom 40%​ of population in Low-income countries
  • Event driven (voice/text), medium spatial and temporal resolution mapped to closest cell tower
  • Difficult to access, needs high performance computing and storage, data sharing arrangements with MNOs, and local capacity to analyze

GPS data consists of location coordinates generated from usage of location-enabled smartphone applications. Chipsets on smartphones communicate with global navigation satellite systems (GNSS). There are two advantages and two disadvantages to using GPD data:

  • High spatial and temporal resolution (meter)
  • Readily Accessible via third party aggregators or big tech products (Cuebiq, Veraset)
  • Less representative of bottom 40%​ of population in Low-income countries
  • High Performance Computing and technical capacity may be needed to process raw data

Those are the sources of MBD in policy. But what about the goals? Broadly speaking, there are five areas where MBD is envisioned to play a role:

  1. Dynamic Population Mapping: Population dynamics and characteristics can be used to inform a wide range of policy indicators
  2. Migration Statistics: CDR and GPS data can be used to understand and predict human mobility patterns
  3. Displacement and Disaster: CDR data can be useful for producing statistical information to supplement traditional survey data in disaster contexts.
  4. Information Society: Produce internationally agreed information and communication technology (ICT) indicators that are included in the SDG monitoring framework
  5. Tourism: MBD is as an alternative source for generating and/or filling the gap in tourism statistics.


As with all evolving technological fields, there are challenges to using MBD. These are

  • Variation: Complexity in the maturity of the Integrated National Data Systems
  • Tools: Special software / hardware must be provisioned on MNO network to store / process CDR.
  • Safeguards: Make sure to have good practices for data security, privacy preservation, and legal protections.
  • Standards: Guidance for developing measurements and official statistics as well as standard data sharing agreements
  • Capacity: L/MICS often lack capacity to repurpose MBD into policy data products
  • Funding: Has been for one-off projects to date has; programmatic funding needed
  • Access: Median ownership of all types of phones that allow for collection of MBD is lower in emerging economies