Innovative Data Sources
In addition to traditional data sources, such as information gathered during surveys, data can be collected from a variety of alternative sources.
Read First
- Primary data is the main type of information that comes to mind when people talk about collecting data. It consists of gathering data through surveys, interviews, or experiments.
- Occasionally, researchers find that data has already been collected, sometimes by the government and sometimes by a third party. Previously collected information that the field team then makes use of is known as secondary data.
- Any source of data, such as secondary data, that is not collected first-hand is an innovative data source.
- Examples of secondary data include administrative and monitoring data, geospatial data, and many more discussed in below.
Acquiring Secondary Data
Some types of secondary data, such as satellite imagery, are publicly available and don't require special agreements with government institutions or private companies. However, most information of interest to researchers, whatever kind of secondary data it may be, must be obtained through a data license agreement. Data License Agreements formally grants rights to people who do not the own data they will be analyzing. The key elements are
- What data will be received
- Intended use(s)
- How long it will be retained
- Who will have access to it
- Rights to derivative data, metadata, and other outputs
- How to cite the data
Types of Secondary Data
There are a variety of categories of secondary data. Among others, examples include
- Satellite Imagery
- Social Media Data
- Mobile Phone Data
Satellite imagery
Among the information satellite imagery can offer is evidence of economic activity and city expansion (seen from nighttime lights); true color imagery and vegetation (seen from daytime lights); weather patterns, such as rainfall and temperature; pollution levels of CO2 and NO2; and data on a region's terrain, i.e. is the area urban, cropland, forested, home to bodies of water, etc.
Social Media Data
Social media can offer information on poverty and education levels. For example, one way that researchers have measured poverty is by looking at Facebook users whose accounts show an interest in restaurants, luxury goods, travel, etc. Educational attainment is also self-reported on Facebook and other social media platforms, so researchers can sometimes obtain detailed information on an area's education levels.
Mobile Phone Data
Mobile phone data consists of two types: call data records (CDR), records of mobile phone activity mapped to cell towers; and GPS data which is compiled from pings from applications, such as Google Maps querying GPS. As an example of the information that can be extracted from GPS mobility data, consider the case where researchers queried the travel time for over 1000 origin and destination pairs every hour using Google & Mapbox. The resultant dataset contained information on peak and off-peak travel hours for the months April-October as well as average speeds during those hours.
For other types of secondary data, see the linked page at the beginning of this section. It contains information on geospatial data, remote sensing, telecom data, and crowd-sourced data.
Administrative Data
Data collected through existing government ministries, programs and projects is called administrative data. It is so called because data collected and maintained by agencies or firms are used to "administer" programs and provide services to the public. For example, national statistics offices (NSOs) possess censuses and geospatial data while regulatory agencies have tax, price, and trade data. Line ministries, agencies responsible for delivering government programs to citizens, have access to administrative data as well.
Mobile Big Data
A type of secondary data, mobile big data is anonymized, aggregated data generated from personal mobile devices (phones) and mobile network operators. There is ongoing research to harness this information to track population trends, augment statistics, and deliver policy insights which can be used to provide targeted services. For example, mobile big data can be used to predict the spread of infectious diseases which would allow governments to optimize delivery of public health services; or it can be used to track migration patterns in response to climate disasters which could be used to improve government response.