Difference between revisions of "Telecom Data"

Jump to: navigation, search
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
''' Telecom Data''' is a type of [[Secondary Data Sources | secondary data]]. This page introduces '''telecom data''', discusses factors to consider when working with it, and outlines different topics in which researchers have used it for analysis.
''' Telecom Data''' is a type of [[Secondary Data Sources | secondary data]]. This page introduces it, discusses factors to consider when working with it, and outlines different topics in which researchers have used it for [[Data Analysis|analysis]].


==Read First==
==Read First==
* [[Secondary Data Sources | Secondary Data]] is data collected by any party other than the researcher that provides important context for any investigation into a particular intervention.  
* [[Secondary Data Sources | Secondary Data]] is data collected by any party other than the researcher that provides important context for any investigation into a particular intervention.  
* '''Uses.''' '''Telecom data''' is a powerful tool to use for analyses of health, mobility, poverty, and other topics in development.  
* '''Telecom data''' is a powerful tool to use for [[Data Analysis|analyses]] of health, mobility, poverty, and other topics in development.  
* '''Ethics.''' Telecom data requires careful ethical handling in order to maintain the privacy of individuals.
* '''Telecom data''' requires careful [[Research Ethics|ethical handling]] in order to maintain the privacy of individuals.
*'''Challenges.''' When using '''telecom data''' you may face issues pertaining to size and barriers to acquisition.
* When using '''telecom data''' you may face issues pertaining to size and barriers to acquisition.


==About Telecom Data==
==About Telecom Data==


Every time you use your phone to make a call or send a text message, data is recorded by your telecom operator on that transaction. While this '''data''' will not include the specific contents of the call or text, it provides metadata, which includes [[Variable Construction | variables]] like the phone number of the person making the call, the phone number of the person receiving the call, the length of the call, the mobile phone tower associated with the call on either side (caller and received), and the type of mobile phone device used to make the call.  
Every time you use your phone to make a call or send a text message, data is recorded by your telecom operator on that transaction. While this data will not include the specific contents of the call or text, it provides metadata, which includes [[Variable Construction | variables]] like the phone numbers of the people making and receiving the call, call length, the mobile phone tower associated with the call on either side (caller and receiver), and the type of mobile phone device used.


While one of the main purposes of '''telecom data''' is to collect the necessary information for an operator to charge customers based on their phone usage, researchers have started to use this '''telecom data''' for a wide array of [[Data Analysis | analyses]]. While telecom data is always [[De-identification | de-identified]] and '''anonymized''' when provided to researchers, meaning that the actual phone number associated with each record is removed and replaced by an anonymous ID, often the same ID is used to track calls and texts made by the same mobile phone. This allows researchers to infer a lot of information about individuals’ movements, social networks, and general phone usage.
While one of the main purposes of '''telecom data''' is to collect the necessary information for an operator to charge customers based on their phone usage, researchers have started to use it for a wide array of [[Data Analysis | analyses]]. While '''telecom data''' is always [[De-identification | de-identified]] and anonymized when provided to researchers, meaning that the actual phone number associated with each record is removed and replaced by an anonymous ID, the same ID is often used to track calls and texts made by the same mobile phone. This allows researchers to infer a lot of information about individuals’ movements, social networks, and general phone usage.


==Considerations for Use==
==Considerations for Use==


===Ethics===
===Ethics===
'''Telecom data''' is always anonymized when companies share the data with researchers. However, the level of detail within the data can lead to the possibility of de-anonymization. For example, using an anonymized [[Master Dataset|dataset]] in which individualss\' hourly locations were specified at the antenna level, [https://www.nature.com/articles/srep01376?ial=1 De Montjoye et al.] uniquely identified 95% of individuals with just four spatio-temporal points per individual. This potential to de-anonymize data and track individuals’ movements raises acute concerns in sensitive political situations. As such, both operators and researchers are working to find ways to use this '''telecom data''' in a manner compatible with [[Personally Identifying Information (PII)|personal privacy]] and [[Research Ethics|high ethical standards]].


'''Telecom data''' is always anonymized when companies share the data with researchers. However, the level of detail within the data can lead to the possibility of [[De-identification | de-anonymization]]. For example, using an '''anonymized dataset''' in which individuals’ hourly locations were specified at the antenna level, [https://www.nature.com/articles/srep01376?ial=1 De Montjoye et al.] uniquely identified 95% of individuals with just four spatio-temporal points per individual. This potential to [[De-identification | de-anonymize data]] and track individuals’ movements raises particularly high concerns in sensitive political situations.
Consider, for example, Orange’s [https://datacollaboratives.org/cases/orange-telecom-data-for-development-challenge-d4d.html Data for Development Challenge] (D4D). In D4D’s second iteration in 2015, Orange created an External Ethics Panel to review proposals and entries through an ethical lens. The panel denied '''telecom data''' access to any proposals that contained questionable ethics and also reviewed ethics in ongoing research. D4D also carefully selected the temporal and spatial granularity of the released data in order to prevent de-anonymization. These types of efforts and careful review are critical to making '''telecom data''' available for research while simultaneously maintaining the privacy of telecom users.
 
As such, both operators and researchers are working to find ways to use this '''telecom data''' in a manner compatible with personal privacy and high ethical standards. Consider, for example, Orange’s [https://datacollaboratives.org/cases/orange-telecom-data-for-development-challenge-d4d.html Data for Development Challenge] (D4D). In D4D’s second iteration in 2015, Orange created an External Ethics Panel to review proposals and entries through an ethical lens. The panel denied '''telecom data''' access to any proposals that contained questionable ethics and also reviewed ethics in ongoing research. '''D4D''' also carefully selected '''the temporal and spatial granularity''' of the released data in order to prevent '''de-anonymization'''. These types of efforts and careful review are critical to making '''telecom data''' available for research while simultaneously maintaining the privacy of telecom users.


===Telecom Operators===
===Telecom Operators===


As '''telecom data''' is proprietary '''data''' owned by telecom operators, obtaining access to it can be extremely difficult. In general, to obtain access to the '''telecom data''', the provider and the requestor must create and sign lengthy agreements that outline the use of the '''data''' and set conditions for its use. It can be difficult to convince providers to spend time and effort on these requests, thus limiting the use of telecom data in many contexts.
As '''telecom data''' is proprietary data owned by telecom operators, obtaining access to it can be extremely difficult. In general, to obtain access to '''telecom data''', the provider and the requestor must create and sign [[Data License Agreement|lengthy agreements]] that outline the use of the data and set conditions for its use. It can be difficult to convince providers to spend time and effort on these requests, thus limiting the use of '''telecom data''' in many contexts.


Some operators have, however, released snippets of their data, such as in the case of '''D4D'''. In 2013, Orange released data from Cote d'Ivore; in 2015, Orange and Sonatel released data from Senegal. Other initiatives work to make telecom data more accessible to researchers. For example, [https://www.opalproject.org/ The Open Algorithms Project] (OPAL), an initiative launched in 2017, seeks to provide access to statistical information extracted from anonymized, secured and formatted telecom data.  
Some operators have, however, released snippets of their data, such as in the case of D4D. In 2013, Orange released data from Cote d'Ivoire and in 2015, Orange and Sonatel released data from Senegal. Other initiatives work to make '''telecom data''' more accessible to researchers. For example, [https://www.opalproject.org/ The Open Algorithms Project] (OPAL), an initiative launched in 2017, seeks to provide access to statistical information extracted from anonymized, secured and formatted '''telecom data'''.  


Its intention is for open algorithms accessed by an '''API''' to run on '''OPAL servers''' of partner telecom companies. The data will thus not leave partner companies, and researchers and policy makers will be able to obtain relevant, aggregated information from the telecom data. OPAL is just one example of the ways in which telecom data can be made more accessible to researchers and policymakers.
Its intention is for open algorithms accessed by an '''API''' to run on OPAL servers at partner telecom companies. The data will thus not leave partner companies, and researchers and policy makers will be able to obtain relevant, aggregated information from the '''telecom data'''. OPAL is just one example of the ways in which '''telecom data''' can be made more accessible to researchers and policymakers.


===Size of the Data===
===Size of the Data===
Line 34: Line 33:


==Research using Telecom Data==
==Research using Telecom Data==
 
Research using '''telecom data''' has grown tremendously in recent years. The following sections outline key pieces of current literature that use [[Innovative Data Sources#Mobile Big Data|mobile data]] in [[Data Analysis | analyses]] related to health, mobility, and poverty.  
Research using telecom data has grown tremendously in recent years. The following sections outline key pieces of current literature that use '''mobile data''' in [[Data Analysis | analyses]] related to health, mobility, and poverty.  


===Health===
===Health===
 
An array of researchers have used '''telecom data''' to study the relationship between population mobility and the spread of disease:
An array of researchers have used telecom data to study the relationship between population mobility and the spread of disease. [https://www.ncbi.nlm.nih.gov/pubmed/27777514 Erbach-Schoenberg et al.] study the impact of seasonally varying population numbers on disease incidence estimates, while [https://science.sciencemag.org/content/338/6104/267 Wesolowski et al.] quantify the impact of human mobility on malaria. [https://www.pnas.org/content/112/38/11887.short Wesolowski et al.] look at the impact of human mobility on the emergence of dengue epidemics in Pakistan and [https://www.pnas.org/content/112/35/11114.short Wesolowski et al.] quantify seasonal population fluxes driving rubella transmission. Finally, [https://www.sciencedirect.com/science/article/pii/S138650561400015X Turner-McGrievy and Tate] take a closer look at mobile-based health solutions in a study on remotely-delivered weight-loss interventions.  
*[https://www.ncbi.nlm.nih.gov/pubmed/27777514 Erbach-Schoenberg et al.] study the impact of seasonally varying population numbers on disease incidence estimates
*[https://science.sciencemag.org/content/338/6104/267 Wesolowski et al.] quantify the impact of human mobility on malaria.  
*[https://www.pnas.org/content/112/38/11887.short Wesolowski et al.] look at the impact of human mobility on the emergence of dengue epidemics in Pakistan
*[https://www.pnas.org/content/112/35/11114.short Wesolowski et al.] quantify seasonal population fluxes driving rubella transmission.
*[https://www.sciencedirect.com/science/article/pii/S138650561400015X Turner-McGrievy and Tate] take a closer look at mobile-based health solutions in a study on remotely-delivered weight-loss interventions.  


===Mobility===
===Mobility===
[https://www.jblumenstock.com/files/papers/jblumenstock_itd2012.pdf Blumenstock] and [https://royalsocietypublishing.org/doi/full/10.1098/rsif.2012.0986 Wesolowski et al.] use mobile phone data to study patterns of internal migration, while [https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001083 Bengtsson et al.] use it to track post-earthquake population movements in Haiti and sculpt better responses to disaster.  
[https://www.jblumenstock.com/files/papers/jblumenstock_itd2012.pdf Blumenstock] and [https://royalsocietypublishing.org/doi/full/10.1098/rsif.2012.0986 Wesolowski et al.] use [[Innovative Data Sources#Mobile Big Data|mobile phone data]] to study patterns of internal migration while [https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001083 Bengtsson et al.] use it to track post-earthquake population movements in Haiti and sculpt better responses to disaster.  


===Poverty===
===Poverty===
[https://royalsocietypublishing.org/doi/full/10.1098/rsif.2016.0690 Blumenstock et al.] and  
[https://royalsocietypublishing.org/doi/full/10.1098/rsif.2016.0690 Blumenstock et al.] and  
[https://science.sciencemag.org/content/350/6264/1073 Steele et al.] use mobile phone data to map and predict poverty, considering to what extent certain variables created by telecom data are correlated with measures of poverty.
[https://science.sciencemag.org/content/350/6264/1073 Steele et al.] use [[Innovative Data Sources#Mobile Big Data|mobile data]] to map and predict poverty, considering to what extent certain '''variables''' created by '''telecom data''' are correlated with measures of poverty.


== Related Pages ==
== Related Pages ==
This article is part of the topic [[Secondary Data Sources]]
[[Special:WhatLinksHere/Telecom_Data|Click here to see pages related to this topic.]]


== Additional Resources ==
== Additional Resources ==


*DIME Analytics (World Bank), [https://osf.io/5e473 Acquiring Secondary Data]
*DIME Analytics (World Bank), [https://osf.io/rv4h5 Integrated Data Systems for Impact Evaluation]
* [https://netmob.org NetMob], the main conference on the analysis of mobile phone datasets
* [https://netmob.org NetMob], the main conference on the analysis of mobile phone datasets
* World Bank’s [https://pubdocs.worldbank.org/en/233361500582117345/2-Milusheva-Telecom-Presentation-Cross-Cutting-Session.pdf presentation,] “Using Telecom Data to Track Movement at High Spatial and Temporal Frequencies”
* World Bank, [https://pubdocs.worldbank.org/en/233361500582117345/2-Milusheva-Telecom-Presentation-Cross-Cutting-Session.pdf Using Telecom Data to Track Movement at High Spatial and Temporal Frequencies]
* [https://soundcloud.com/worldbank/between-2-geeks-episode-4-what-can-you-measure-with-cell-phone-metadata?in=worldbank/sets/between-2-geeks This podcast] from the World Bank discusses telecom data and its implications for data analysis and development.
* World Bank, [https://soundcloud.com/worldbank/between-2-geeks-episode-4-what-can-you-measure-with-cell-phone-metadata?in=worldbank/sets/between-2-geeks Podcast on telecom data, and implications for development research]
* Scientific Papers :
* Scientific Papers :
** Bengtsson, Linus et al. (2011). “Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: a post-earthquake geospatial study in Haiti”. PLoS Med 8.8, e1001083.
** Bengtsson, Linus et al. (2011). “Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: a post-earthquake geospatial study in Haiti”. PLoS Med 8.8, e1001083.

Latest revision as of 18:05, 9 August 2023

Telecom Data is a type of secondary data. This page introduces it, discusses factors to consider when working with it, and outlines different topics in which researchers have used it for analysis.

Read First

  • Secondary Data is data collected by any party other than the researcher that provides important context for any investigation into a particular intervention.
  • Telecom data is a powerful tool to use for analyses of health, mobility, poverty, and other topics in development.
  • Telecom data requires careful ethical handling in order to maintain the privacy of individuals.
  • When using telecom data you may face issues pertaining to size and barriers to acquisition.

About Telecom Data

Every time you use your phone to make a call or send a text message, data is recorded by your telecom operator on that transaction. While this data will not include the specific contents of the call or text, it provides metadata, which includes variables like the phone numbers of the people making and receiving the call, call length, the mobile phone tower associated with the call on either side (caller and receiver), and the type of mobile phone device used.

While one of the main purposes of telecom data is to collect the necessary information for an operator to charge customers based on their phone usage, researchers have started to use it for a wide array of analyses. While telecom data is always de-identified and anonymized when provided to researchers, meaning that the actual phone number associated with each record is removed and replaced by an anonymous ID, the same ID is often used to track calls and texts made by the same mobile phone. This allows researchers to infer a lot of information about individuals’ movements, social networks, and general phone usage.

Considerations for Use

Ethics

Telecom data is always anonymized when companies share the data with researchers. However, the level of detail within the data can lead to the possibility of de-anonymization. For example, using an anonymized dataset in which individualss\' hourly locations were specified at the antenna level, De Montjoye et al. uniquely identified 95% of individuals with just four spatio-temporal points per individual. This potential to de-anonymize data and track individuals’ movements raises acute concerns in sensitive political situations. As such, both operators and researchers are working to find ways to use this telecom data in a manner compatible with personal privacy and high ethical standards.

Consider, for example, Orange’s Data for Development Challenge (D4D). In D4D’s second iteration in 2015, Orange created an External Ethics Panel to review proposals and entries through an ethical lens. The panel denied telecom data access to any proposals that contained questionable ethics and also reviewed ethics in ongoing research. D4D also carefully selected the temporal and spatial granularity of the released data in order to prevent de-anonymization. These types of efforts and careful review are critical to making telecom data available for research while simultaneously maintaining the privacy of telecom users.

Telecom Operators

As telecom data is proprietary data owned by telecom operators, obtaining access to it can be extremely difficult. In general, to obtain access to telecom data, the provider and the requestor must create and sign lengthy agreements that outline the use of the data and set conditions for its use. It can be difficult to convince providers to spend time and effort on these requests, thus limiting the use of telecom data in many contexts.

Some operators have, however, released snippets of their data, such as in the case of D4D. In 2013, Orange released data from Cote d'Ivoire and in 2015, Orange and Sonatel released data from Senegal. Other initiatives work to make telecom data more accessible to researchers. For example, The Open Algorithms Project (OPAL), an initiative launched in 2017, seeks to provide access to statistical information extracted from anonymized, secured and formatted telecom data.

Its intention is for open algorithms accessed by an API to run on OPAL servers at partner telecom companies. The data will thus not leave partner companies, and researchers and policy makers will be able to obtain relevant, aggregated information from the telecom data. OPAL is just one example of the ways in which telecom data can be made more accessible to researchers and policymakers.

Size of the Data

As with other types of big data, telecom data can consist of millions and billions of observations, depending on the dataset provided. If the dataset is limited to millions of observations, one can manipulate and analyze it using software like Python or other libraries written for statistical analysis. When data consists of billions of observations, it will often be stored on a Hadoop cluster. In order to analyze data on this cluster, there are several different software options, including Hive, which uses commands very similar to SQL; Pig; and Spark.

Research using Telecom Data

Research using telecom data has grown tremendously in recent years. The following sections outline key pieces of current literature that use mobile data in analyses related to health, mobility, and poverty.

Health

An array of researchers have used telecom data to study the relationship between population mobility and the spread of disease:

  • Erbach-Schoenberg et al. study the impact of seasonally varying population numbers on disease incidence estimates
  • Wesolowski et al. quantify the impact of human mobility on malaria.
  • Wesolowski et al. look at the impact of human mobility on the emergence of dengue epidemics in Pakistan
  • Wesolowski et al. quantify seasonal population fluxes driving rubella transmission.
  • Turner-McGrievy and Tate take a closer look at mobile-based health solutions in a study on remotely-delivered weight-loss interventions.

Mobility

Blumenstock and Wesolowski et al. use mobile phone data to study patterns of internal migration while Bengtsson et al. use it to track post-earthquake population movements in Haiti and sculpt better responses to disaster.

Poverty

Blumenstock et al. and Steele et al. use mobile data to map and predict poverty, considering to what extent certain variables created by telecom data are correlated with measures of poverty.

Related Pages

Click here to see pages related to this topic.

Additional Resources

  • DIME Analytics (World Bank), Acquiring Secondary Data
  • DIME Analytics (World Bank), Integrated Data Systems for Impact Evaluation
  • NetMob, the main conference on the analysis of mobile phone datasets
  • World Bank, Using Telecom Data to Track Movement at High Spatial and Temporal Frequencies
  • World Bank, Podcast on telecom data, and implications for development research
  • Scientific Papers :
    • Bengtsson, Linus et al. (2011). “Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: a post-earthquake geospatial study in Haiti”. PLoS Med 8.8, e1001083.
    • Blumenstock, Joshua E (2012). “Inferring patterns of internal migration from mobile phone call records: Evidence from Rwanda”. Information Technology for Development 18.2, pp. 107–125.
    • Blumenstock, Joshua, Gabriel Cadamuro, and Robert On. (2015). "Predicting poverty and wealth from mobile phone metadata." Science 350.6264, pp. 1073-1076.
    • Blumenstock, Joshua E., Nathan Eagle, and Marcel Fafchamps. (2016). "Airtime transfers and mobile communications: Evidence in the aftermath of natural disasters." Journal of Development Economics 120, pp. 157-181.
    • De Montjoye, Yves-Alexandre, et al. (2013). "Unique in the crowd: The privacy bounds of human mobility." Scientific reports 3: pp. 1376.
    • Erbach-Schoenberg, Elisabeth Zu et al. (2016). “Dynamic denominators: the impact of seasonally varying population numbers on disease incidence estimates”. Population health Metrics 14.1, pp. 35.
    • Le Menach, Arnaud et al. (2011). “Travel risk, malaria importation and malaria transmission in Zanzibar”. In: Scientific reports 1.
    • Ruktanonchai, Nick W et al. (2016). “Identifying malaria transmission foci for elimination using human mobility data”. In: PLoS Comput Biol 12.4, e1004846.
    • Steele, Jessica E., et al. (2017). "Mapping poverty using mobile phone and satellite data." Journal of The Royal Society Interface 14.127: 20160690.
    • Tatem, Andrew J, Youliang Qiu, et al. (2009). “The use of mobile phone data for the estimation of the travel patterns and imported Plasmodium falciparum rates among Zanzibar residents”. In: Malar J 8, pp. 287.
    • Tatem, Andrew J et al. (2014). “Integrating rapid risk mapping and mobile phone call record data for strategic malaria elimination planning”. Malaria Journal 13.1, pp. 52.
    • Wesolowski, Amy et al. (2012). “Quantifying the impact of human mobility on malaria”. Science 338.6104, pp. 267–270.
    • Wesolowski, Amy et al. (2013). “The use of census migration data to approximate human movement patterns across temporal scales”. PloS one 8.1, e52971.
    • Wesolowski, Amy et al. (2015a). “Impact of human mobility on the emergence of dengue epidemics in Pakistan”. Proceedings of the National Academy of Sciences 112.38, pp. 11887–11892.
    • Wesolowski, Amy et al. (2015b). “Quantifying seasonal population fluxes driving rubella transmission dynamics using mobile phone data”. Proceedings of the National Academy of Sciences 112.35, pp. 11114–11119.