Every time you use your phone to make a call or send a text message, data is recorded by your telecom operator on that transaction. While this data will not include the specific contents of the call or text, it provides metadata, which includes variables like the phone number of the person making the call, the phone number of the person receiving the call, the length of the call, the mobile phone tower associated with the call on either side (caller and received), and the type of mobile phone device used to make the call. This page provides an introduction to telecom data, highlights different research areas in which telecom data has been used, and outlines factors to consider when working with telecom data.
- Telecom data is a powerful tool to use in development analyses on health, mobility, poverty, and other topics.
- Telecom data requires careful ethical handling in order to maintain privacy of individuals.
- Challenges to using telecom data include size and barriers to acquisition.
About Telecom Data
While one of the main purposes of telecom data is to collect the necessary information for an operator to charge customers based on their phone usage, researchers have started to use this telecom data for a wide array of analyses. While telecom data is always de-identified and anonymized when provided to researchers, meaning that the actual phone number associated with each record is removed and replaced by an anonymous ID, often the same ID is used to track calls and texts made by the same mobile phone. This allows researchers to infer a lot of information about individuals’ movements, social networks, and general phone usage.
Considerations for Use
Telecom data is always anonymized when companies share the data with researchers. However, the level of detail within the data can lead to the possibility of de-anonymization. For example, using an anonymized dataset in which individuals’ hourly locations were specified at the antenna level, De Montjoye et al. uniquely identified 95% of individuals with just four spatio-temporal points per individual. This potential to de-anonymize data and track individuals’ movements brings up particularly high concerns in the context of sensitive political situations.
As such, both operators and researchers are working to find ways to use this telecom data in a manner compatible with personal privacy and high ethical standards. Consider, for example, Orange’s Data for Development Challenge (D4D). In D4D’s second iteration in 2015, Orange created an External Ethics Panel to review proposals and entries through an ethical lens. The panel denied telecom data access to any proposals that contained questionable ethics and also reviewed ethics in ongoing research. D4D also carefully selected the temporal and spatial granularity of the released data in order to prevent de-anonymization. These types of efforts and careful review are critical to making telecom data available for research while simultaneously maintaining the privacy of telecom users.
As telecom data is proprietary data owned by telecom operators, obtaining access to it can be extremely difficult. In general, to obtain access to the telecom data, the provider and the requestor must create and sign lengthy agreements that outline the use of the data and set conditions for its use. It can be difficult to convince providers to spend time and effort on these requests, thus limiting the use of telecom data in many contexts.
Some operators have, however, released snippets of their data, such as in the case of D4D. In 2013, Orange released data from Cote d'Ivore; in 2015, Orange and Sonatel released data from Senegal. Other initiatives work to make telecom data more accessle to researchers. For example, The Open Algorithms Project (OPAL), an initiative launched in 2017, seeks to provide access to statistical information extracted from anonymized, secured and formatted telecom data. Its intention is for open algorithms accessed by an API to run on OPAL servers of partner telecom companies. The data will thus not leave partner companies, and researchers and policy makers will be able to obtain relevant, aggregated information from the telecom data. OPAL is just one example of the ways in which telecom data can be made more accessible to researchers and policymakers.
Size of the Data
As with other types of big data, telecom data can consist of millions and billions of observations, depending on the dataset provided. If the dataset is limited to millions of observations, one can manipulate and analyze it using software like Python or other libraries written for statistical analysis. When data consists of billions of observations, it will often be stored on a Hadoop cluster. In order to analyze data on this cluster, there are several different software options, including Hive, which uses commands very similar to SQL; Pig; and Spark.
Research using Telecom Data
Research using telecom data has grown tremendously in recent years. The following sections outline key pieces of current literature that use mobile data in analyses related to health, mobility, and poverty.
An array of researchers have used telecom data to study the relationship between population mobility and spread of disease. Erbach-Schoenberg et al. study the impact of seasonally varying population numbers on disease incidence estimates, while Wesolowski et al. quantify the impact of human mobility on malaria. Wesolowski et al. look at the impact of human mobility on the emergence of dengue epidemics in Pakistan, and Wesolowski et al. quantify seasonal population fluxes driving rubella transmission. Finally, Turner-McGrievy and Tate take a closer look at mobile-based health solutions in a study on remotely-delivered weight loss interventions.
Blumenstock and Wesolowski et al. use mobile phone data to study patterns of internal migration, while Bengtsson et al. use it to track post-earthquake population movements in Haiti and sculpt better responses to disaster.
Back to Parent
This article is part of the topic Secondary Data Sources
- NetMob, the main conference on the analysis of mobile phone datasets
- World Bank’s presentation, “Using Telecom Data to Track Movement at High Spatial and Temporal Frequencies”
- This podcast from the World Bank discusses telecom data and its implications for data analysis and development.
- Scientific Papers :
- Bengtsson, Linus et al. (2011). “Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: a post-earthquake geospatial study in Haiti”. PLoS Med 8.8, e1001083.
- Blumenstock, Joshua E (2012). “Inferring patterns of internal migration from mobile phone call records: Evidence from Rwanda”. Information Technology for Development 18.2, pp. 107–125.
- Blumenstock, Joshua, Gabriel Cadamuro, and Robert On. (2015). "Predicting poverty and wealth from mobile phone metadata." Science 350.6264, pp. 1073-1076.
- Blumenstock, Joshua E., Nathan Eagle, and Marcel Fafchamps. (2016). "Airtime transfers and mobile communications: Evidence in the aftermath of natural disasters." Journal of Development Economics 120, pp. 157-181.
- De Montjoye, Yves-Alexandre, et al. (2013). "Unique in the crowd: The privacy bounds of human mobility." Scientific reports 3: pp. 1376.
- Erbach-Schoenberg, Elisabeth Zu et al. (2016). “Dynamic denominators: the impact of seasonally varying population numbers on disease incidence estimates”. Population health Metrics 14.1, pp. 35.
- Le Menach, Arnaud et al. (2011). “Travel risk, malaria importation and malaria transmission in Zanzibar”. In: Scientific reports 1.
- Ruktanonchai, Nick W et al. (2016). “Identifying malaria transmission foci for elimination using human mobility data”. In: PLoS Comput Biol 12.4, e1004846.
- Steele, Jessica E., et al. (2017). "Mapping poverty using mobile phone and satellite data." Journal of The Royal Society Interface 14.127: 20160690.
- Tatem, Andrew J, Youliang Qiu, et al. (2009). “The use of mobile phone data for the estimation of the travel patterns and imported Plasmodium falciparum rates among Zanzibar residents”. In: Malar J 8, pp. 287.
- Tatem, Andrew J et al. (2014). “Integrating rapid risk mapping and mobile phone call record data for strategic malaria elimination planning”. Malaria Journal 13.1, pp. 52.
- Wesolowski, Amy et al. (2012). “Quantifying the impact of human mobility on malaria”. Science 338.6104, pp. 267–270.
- Wesolowski, Amy et al. (2013). “The use of census migration data to approximate human movement patterns across temporal scales”. PloS one 8.1, e52971.
- Wesolowski, Amy et al. (2015a). “Impact of human mobility on the emergence of dengue epidemics in Pakistan”. Proceedings of the National Academy of Sciences 112.38, pp. 11887–11892.
- Wesolowski, Amy et al. (2015b). “Quantifying seasonal population fluxes driving rubella transmission dynamics using mobile phone data”. Proceedings of the National Academy of Sciences 112.35, pp. 11114–11119.