Telecom Data

Jump to: navigation, search


Read First

  • include here key points you want to make sure all readers understand


Basics

What is Telecom Data?

Every time you use your phone to make a call or send a text message, data is recorded by your telecom operator on that transaction. While this data will not include the specific contents of the call or text, it provides what is known as metadata. It can contain things such as the phone number of the person making the call, the phone number of the person receiving the call, the length of the call, the mobile phone tower associated with the call on either side (caller and received), and the type of mobile phone device used to make the call. While one of the main purposes of this data is to collect the necessary information for an operator to charge customers based on their phone usage, researchers have started to use this type of data for a wide array of uses described in the next section. While the data is always anonymized when provided to researchers, meaning that the actual phone number associated with each record is removed and replaced by an anonymous ID, often the same ID is used to track calls and texts made by the same mobile phone, which allows researchers to infer lots of information about movements, social networks, and general phone usage.

How is Telecom Data Used?

Research using telecom data has been growing tremendously in recent years. There are lots of areas of research that this data has been used in. These include:

Health

Studying the relationship between population mobility and spread of disease using mobile phone data (Erbach-Schoenberg et al 2016, Tatel et al 2014, Wesolowski 2012, Wesolowski 2015a, Wesolowski 2015b)

Mobility

Using mobile phone data to study patterns of internal migration (Blumenstock 2012, Wesolowski 2013) and studying mobility to improve disaster response (Bengtsson et al 2011)

Poverty Mapping

What are Things to Consider?

Ethics

While telecom data is always anonymized when companies share the data with researchers; nevertheless, the level of detail provided can lead to the possibility of de-anonymization. For example, De Montjoye et al 2013 find that with just four spatio-temporal points per individual, they are able to uniquely identify 95% of individuals in an anonymized dataset where the location of individuals at the level of the antenna is specified hourly. This potential for de-anonymization is important because especially in sensitive political situations, the ability to track the movement of particular individuals is very concerning. Nevertheless, the research developed with this type of data is valuable, and both operators and researchers are working to find ways of using the data in a way that is compatible with high ethical standards. One example of this is the Data for Development Challenge led by Orange. In the second iteration of the Challenge in 2015, after having gone through the experience of the first Challenge in 2013, Orange realized the importance of ensuring that research complies with ethical standards. For the Challenge, an External Ethics Panel was created in order to review proposals and ensure that all entries were reviewed from an ethical viewpoint and any proposed projects that contained ethical concerns were not granted access to the data, and any research that along the way raised concerns was reviewed. More information is available on the ethic standards in this report. Additionally, the D4D Challenge carefully selected the granularity of the data (in terms of both time frequency and spatial granularity) that it released in order to ensure that de-anonymization would not be possible. These types of efforts and careful review are necessary to ensure that telecom data can be used for valuable research without jeopardizing the privacy of telecom users.

Working with Telecom Operators

One of the toughest aspects of working with telecom data is that it is proprietary data owned by telecom operators; therefore, obtaining access to this data can be extremely difficult. Some operators have released snippets of their data through mediums, such as the Data for Development Challenge. In 2013, Orange released data from Cote d'Ivore and in 2015 they released data from Senegal in collaboration with Sonatel. In general, to obtain access to the data, it is necessary to create and sign lengthy agreements that outline the use of the data and set conditions for its use. Yet it can be difficult to convince providers to spend time and effort on these requests, which has limited the use of this type of data in many contexts.

There is a new project currently under development, the Open Algorithms (OPAL) Project, which is being developed by a group of partners to provide access to statistical information extracted from anonymized, secured and formatted telecom data. The idea is that open algorithms accessed by an API will run on OPAL servers of partner telecom companies. In this way, the data will not leave partner companies, yet it will still be possible for researchers and policy makers to obtain relevant information from the telecom data in an aggregated manner. This is a new project, but is one example of how accessing this type of data in the future could be much easier and available to a much larger group of researchers and policy makers.

Size of the Data

As with other types of Big Data, telecom data can consist of millions and billions of observations, depending on the dataset provided. If the dataset is limited to millions of observations, it is possible to manipulate and analyze using software such as Python and the various libraries that have been written for statistical analysis. When data consists of billions of observations, it will often be stored on a Hadoop cluster. In order to analyze data on this cluster, there are several different software options. These include:

Hive, which uses commands very similar to SQL

Pig

Spark

Back to Parent

This article is part of the topic Data Sources


Additional Resources

Lots of information can be found on the website of NetMob, the main conference on the analysis of mobile phone datasets.

Scientific Papers

Bengtsson, Linus et al. (2011). “Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: a post-earthquake geospatial study in Haiti”. PLoS Med 8.8, e1001083.

Blumenstock, Joshua E (2012). “Inferring patterns of internal migration from mobile phone call records: Evidence from Rwanda”. Information Technology for Development 18.2, pp. 107–125.

De Montjoye, Yves-Alexandre, et al. "Unique in the crowd: The privacy bounds of human mobility." Scientific reports 3 (2013): 1376.

Erbach-Schoenberg, Elisabeth zu et al. (2016). “Dynamic denominators: the impact of seasonally varying population numbers on disease incidence estimates”. Population health metrics 14.1, p. 35.

Tatem, Andrew J et al. (2014). “Integrating rapid risk mapping and mobile phone call record data for strategic malaria elimination planning”. Malaria journal 13.1, p. 52.

Wesolowski, Amy et al. (2012). “Quantifying the impact of human mobility on malaria”. Science 338.6104, pp. 267–270.

Wesolowski, Amy et al. (2013). “The use of census migration data to approximate human movement patterns across temporal scales”. PloS one 8.1, e52971.

Wesolowski, Amy et al. (2015a). “Impact of human mobility on the emergence of dengue epidemics in Pakistan”. Proceedings of the National Academy of Sciences 112.38, pp. 11887–11892.

Wesolowski, Amy et al. (2015b). “Quantifying seasonal population fluxes driving rubella transmission dynamics using mobile phone data”. Proceedings of the National Academy of Sciences 112.35, pp. 11114–11119.