Difference between revisions of "Administrative and Monitoring Data"

Jump to: navigation, search
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
While impact evaluations most commonly rely on [[Primary Data Collection|primary data]], [[Secondary Data Sources|secondary data]] can often provide important context for [[Randomized_Evaluations:_Principles_of_Study_Design|impact evaluation design]] and [[Data Analysis|data analysis]]. In some cases, for example '''administrative data''' from a program conducted in a district, '''secondary data''' is the only source which covers the relevant population for an impact evaluation. Similary, in some cases, '''monitoring data''' can help assess who received the [[Randomized_Evaluations:_Principles_of_Study_Design#Step_2:_Randomization|treatment]], and if this was as per the initial '''impact evaluation design'''.
While '''impact evaluations''' most commonly rely on [[Primary Data Collection|primary data]], [[Secondary Data Sources|secondary data]] can often provide important context for [[Randomized_Evaluations:_Principles_of_Study_Design|impact evaluation design]] and [[Data Analysis|data analysis]]. In some cases, for example administrative data from a program conducted in a district, '''secondary data''' is the only source which covers the relevant population for an '''impact evaluation'''. Similary, in some cases, monitoring data can help assess who received the [[Randomized_Evaluations:_Principles_of_Study_Design#Step_2:_Randomization|treatment]], and if this was as per the initial '''impact evaluation''' design.
== Read First ==
== Read First ==
* Impact Evaluations rely on many different [[Secondary Data Sources|sources of secondary data]] - '''administrative''', [[Geo_Spatial_Data|geospatial]], [[Remote Sensing|sensors]], [[Telecom Data|telecom]], and [[Crowd-sourced Data|crowd-sourcing]].  
* '''Impact evaluations''' rely on many different [[Secondary Data Sources|sources of secondary data]] - administrative, [[Geo_Spatial_Data|geospatial]], [[Remote Sensing|sensors]], [[Telecom Data|telecom]], and [[Crowd-sourced Data|crowd-sourcing]].  
* An important step in designing an impact evaluation is to evaluate which of the available data sources are best suited in a particular context.
* An important step in designing an '''impact evaluation''' is to evaluate which of the available data sources are best suited in a particular context.
* '''Administrative data''' is any data collected by national/local governments, ministries or agencies that are outside the context of an impact evaluation.  
* Administrative data is any data collected by national/local governments, ministries or agencies that are outside the context of an '''impact evaluation'''.  
* '''Monitoring data''' is data that is collected to track the implementation of [[Randomized_Evaluations:_Principles_of_Study_Design#Step_2:_Randomization|treatment]] in a given impact evaluation.
* Monitoring data is data that is collected to track the implementation of [[Randomized_Evaluations:_Principles_of_Study_Design#Step_2:_Randomization|treatment]] in a given '''impact evaluation'''.


== Administrative Data ==
== Administrative Data ==
[[Administrative Data|Administrative data]] is any data collected by national/ local governments, ministries or government agencies that are outside the context of an impact evaluation. '''Administrative data''' can include data from land registries, road networks, infrastructure investments, tax, energy billing, or social transfers.  
Administrative Data is any data collected by national or local governments, ministries or government agencies that are outside the context of an '''impact evaluation'''. Administrative data can include data from land registries, road networks, infrastructure investments, tax, energy billing, or social transfers.  


=== Overview ===
=== Overview ===
Generally, '''administrative data''' is collected to document or track beneficiaries of a government policy and the general population, and not for research purposes. [[Impact Evaluation Team|Research teams]] should aim to use administrative data in addition to other sources of data - [[Primary Data Collection|survey data]], [[Geo_Spatial_Data|geospatial]], [[Remote Sensing|sensors]], [[Telecom Data|telecom]], and [[Crowd-sourced Data|crowd-sourcing]]. This allows '''research teams''' to create sector-specific and country-specific outputs (such as  data sets, maps, and figures) that are relevant to a particular policy context.
Generally, administrative data is collected to [[Data Documentation|document]] or track beneficiaries of a government policy and the general population, and not for research purposes. [[Impact Evaluation Team|Research teams]] should aim to use administrative data in addition to other sources of data - [[Primary Data Collection|survey data]], [[Geo_Spatial_Data|geospatial data]], [[Remote Sensing|remote sensing data]], [[Telecom Data|telecom data]], and [[Crowd-sourced Data|crowd-sourced data]]. This allows '''research teams''' to create sector-specific and country-specific outputs (such as  [[Master Dataset|datasets]], maps, and figures) that are relevant to a particular policy context.


=== Case Study ==  
=== Case Study ===
In this section, we look at an example of a project by [https://www.worldbank.org/en/research/dime DIME] in Kenya where the [[Impact Evaluation Team|research team]] digitized '''administrative data''' to fill gaps in available data on road safety in Kenya.  
In this section, we look at an example of a project by [https://www.worldbank.org/en/research/dime DIME] in Kenya where the [[Impact Evaluation Team|research team]] digitized administrative data to fill in the gaps available data on road safety in Kenya.  


In this impact evaluation, the '''research team''' obtained '''administrative data''' through a [[Data License Agreement|data sharing agreement]] with the National Police Service (NPS) in Kenya, and manually digitized a total of 12,546 crash records for the city of Nairobi over a nine-year period. This data allowed the team to identify the major crash '''hot spots''', that is, regions with the highest number of road crashes. The research team combined this data with [[Crowd-sourced data|crowdsourced data]] to supplement these records. Further, the research team also accessed private sector data on speed road events, weather conditions, and land use by utilizing the [https://www.worldbank.org/en/programs/digital-development-partnership World Bank Development Data Partnership (DDP) initiative]. The administrative data was then combined with [[Primary Data Collection|primary data]] collected from 200 hot spots, which allowed the research team to generate more than 100 new variables that determin high-risk locations.
In this '''impact evaluation''', the '''research team''' obtained administrative data through a [[Data License Agreement|data sharing agreement]] with the National Police Service (NPS) in Kenya, and manually digitized a total of 12,546 crash records for the city of Nairobi over a nine-year period. This data allowed the team to identify the major crash hot spots, that is, regions with the highest number of road crashes. The '''research team''' combined this data with [[Crowd-sourced data|crowdsourced data]] to supplement these records. Further, the '''research team''' also accessed private sector data on speed road events, weather conditions, and land use by utilizing the [https://www.worldbank.org/en/programs/digital-development-partnership the World Bank Development Data Partnership (DDP) initiative]. The administrative data was then combined with [[Primary Data Collection|primary data]] collected from 200 hot spots, which allowed the '''research team''' to generate more than 100 new '''variables''' that determine high-risk locations.


Therefore, in this case study, integrating multiple sources into one data set provided unique insights into the factors that lead to more crashes in specific locations. It also allowed the research team to break down a bigger problem into a more manageable research question. For instance, it is now clear that just 200 of the 1,400 crash sites across the city are responsible for over half of road traffic deaths. This in turn means that the government should target 150 kilometers of the total 6,200-kilometer road network for road-safety interventions.
Therefore, in this case study, combining multiple [[Master Dataset|datasets]] allowed the '''research team''' to break down a big problem into a more manageable research question. For instance, it is now clear that just 200 of the 1,400 crash sites across the city are responsible for over half of road traffic deaths. This in turn means that the government should target 150 kilometers of the total 6,200-kilometer road network for road-safety interventions.


=== Advantages ===
=== Advantages ===
Using '''administrative data''' has various advantages for research teams. Some of these are as follows:  
Using administrative data has various advantages for [[Impact Evaluation Team|research teams]]. Some of these are as follows:  
* '''Quality:''' It is often more accurate, and therefore of better [[Data Quality Assurance|quality]] than self-reported [[Primary Data Collection|survey data]]. For example, a firm is more likely to accurately report profits to their country's official financial auditors than to a research team.
 
* '''Quality:''' It is often more accurate, and therefore of better [[Data Quality Assurance|quality]] than self-reported [[Primary Data Collection|survey data]]. For example, a firm is more likely to accurately report profits to their country's official financial auditors than to a '''research team'''.
* '''Cost:''' It is often less expensive to collect or acquire, since it does not involve the various steps involved in conducting [[Field Surveys|field surveys]]. Note that there might still be some costs involved in obtaining access to the data through a [[Data License Agreement|data licensing agreement (DLA)]].
* '''Cost:''' It is often less expensive to collect or acquire, since it does not involve the various steps involved in conducting [[Field Surveys|field surveys]]. Note that there might still be some costs involved in obtaining access to the data through a [[Data License Agreement|data licensing agreement (DLA)]].
* '''Time:''' Using administrative data also saves time since this data has already been collected for a purpose outside of the context of an impact evaluation. For example, in the Kenya case study in the previous section, road crash data from Kenya's National Police Service (NPS) had already been collected over a nine year period. In this case, the research team only had to wait until the '''data license agreement (DLA)''' was carried out, which was much less than the time it would have taken to conduct a '''field survey''' from scratch.
* '''Time:''' Using administrative data also saves time since this data has already been collected for a purpose outside of the context of an '''impact evaluation'''. For example, in the Kenya case study in the previous section, road crash data from Kenya's National Police Service (NPS) had already been collected over a nine year period. In this case, the '''research team''' only had to wait until the '''DLA''' was carried out, which was much less than the time it would have taken to conduct a '''field survey''' from scratch.
* '''Frequency:''' It is also collected on a regular basis. This allows research teams to evaluate past interventions even if no '''primary data''' was collected.
* '''Frequency:''' It is also collected on a regular basis. This allows '''research teams''' to evaluate past interventions even if no '''primary data''' was collected.
* '''Policy impact:''' Most importantly, as the Kenya case study showed, administrative data can hugely improve the ability of research teams to improve the efficiency of interventions by making them more targeted.
* '''Policy impact:''' Most importantly, as the Kenya case study showed, administrative data can hugely improve the ability of '''research teams''' to improve the efficiency of interventions by making them more targeted.
 
=== Challenges ===
=== Challenges ===
However, it is important to note that '''administrative data''' also has its list of challenges. Some of these include:
However, it is important to note that administrative data also has its list of challenges. Some of these include:
 
* '''Access:''' Accessing administrative data requires strong relationships with national and/or local authorities. In some cases, authorities may not agree to share the information.
* '''Access:''' Accessing administrative data requires strong relationships with national and/or local authorities. In some cases, authorities may not agree to share the information.
* '''Merging:''' After obtaining access, the research team must combine the administrative data with data from other sources. This often involves merging different datasets together, which can be tricky if there are no common [[ID_Variable_Properties#Property_1:_Uniquely_Identifying|unique IDs]].  
* '''Merging:''' After obtaining access, the [[Impact Evaluation Team|research team]] must combine the administrative data with data from other sources. This often involves merging different [[Master Dataset|datasets]] together, which can be tricky if there are no common [[ID_Variable_Properties#Property_1:_Uniquely_Identifying|unique IDs]].  
* '''Quality:'''  Finally, research teams should keep in mind that in some cases, administrative data may be badly reported, incomplete, or not available at all. This is because not all governments have the same capacity to accurately collect this information on a regular basis.
* '''Quality:'''  Finally, research teams should keep in mind that in some cases, administrative data may be badly reported, incomplete, or not available at all. This is because not all governments have the same capacity to accurately collect this information on a regular basis.


== Monitoring Data ==
== Monitoring Data ==


'''Monitoring data''' is collected to understand the implementation of the assigned treatment in the field. Typically, survey round data helps us understand changes in the outcome variables throughout the duration of the project, and monitoring data helps us understand how these changes are related to the intervention of our treatment. For example, monitoring data could be data on who actually received the treatment and if the treatment was implemented according to the research design. Our analysis might be invalid if we do not have this information and base our analysis only on what was meant by the research team to happen. Monitor data helps us understand what is usually referred to as [https://en.wikipedia.org/wiki/Internal_validity internal validity].
Monitoring data is collected to understand the implementation of the '''assigned treatment''' in the field. Typically, [[Survey Pilot|survey]] round data helps us understand changes in the outcome '''variables''' throughout the duration of the project, and monitoring data helps us understand how these changes are related to the intervention of our '''treatment'''. For example, monitoring data could be data on who actually received the '''treatment''' and if it was implemented according to the research design. Our [[Data Analysis|analysis]] might be invalid if we do not have this information and base our '''analysis''' only on what was meant by the [[Impact Evaluation Team|research team]] to happen. Monitoring data helps us understand what is usually referred to as [https://en.wikipedia.org/wiki/Internal_validity internal validity].


== Back to Parent ==
== Related Pages ==
This article is part of the topic [[Secondary Data Sources]]
[[Special:WhatLinksHere/Administrative_and_Monitoring_Data|Click here for pages that link to this topic.]]


== Additional Resources ==
== Additional Resources ==
Please include here links to relevant existing resources outside of the wiki
* Arianna Legovini and Maria Ruth Jones (World Bank), [https://admindatahandbook.mit.edu/book/v1.0-rc6/dime.html Administrative Data in Research at the World Bank: The Case of Development Impact Evaluation (DIME)]
 
* J-PAL, [https://admindatahandbook.mit.edu/book/v1.0-rc6/index.html Handbook on Using Administrative Data for Research and Evidence-based Policy]
[[Category: Secondary Data Sources ]]
[[Category: Secondary Data Sources ]]

Latest revision as of 17:14, 14 August 2023

While impact evaluations most commonly rely on primary data, secondary data can often provide important context for impact evaluation design and data analysis. In some cases, for example administrative data from a program conducted in a district, secondary data is the only source which covers the relevant population for an impact evaluation. Similary, in some cases, monitoring data can help assess who received the treatment, and if this was as per the initial impact evaluation design.

Read First

  • Impact evaluations rely on many different sources of secondary data - administrative, geospatial, sensors, telecom, and crowd-sourcing.
  • An important step in designing an impact evaluation is to evaluate which of the available data sources are best suited in a particular context.
  • Administrative data is any data collected by national/local governments, ministries or agencies that are outside the context of an impact evaluation.
  • Monitoring data is data that is collected to track the implementation of treatment in a given impact evaluation.

Administrative Data

Administrative Data is any data collected by national or local governments, ministries or government agencies that are outside the context of an impact evaluation. Administrative data can include data from land registries, road networks, infrastructure investments, tax, energy billing, or social transfers.

Overview

Generally, administrative data is collected to document or track beneficiaries of a government policy and the general population, and not for research purposes. Research teams should aim to use administrative data in addition to other sources of data - survey data, geospatial data, remote sensing data, telecom data, and crowd-sourced data. This allows research teams to create sector-specific and country-specific outputs (such as datasets, maps, and figures) that are relevant to a particular policy context.

Case Study

In this section, we look at an example of a project by DIME in Kenya where the research team digitized administrative data to fill in the gaps available data on road safety in Kenya.

In this impact evaluation, the research team obtained administrative data through a data sharing agreement with the National Police Service (NPS) in Kenya, and manually digitized a total of 12,546 crash records for the city of Nairobi over a nine-year period. This data allowed the team to identify the major crash hot spots, that is, regions with the highest number of road crashes. The research team combined this data with crowdsourced data to supplement these records. Further, the research team also accessed private sector data on speed road events, weather conditions, and land use by utilizing the the World Bank Development Data Partnership (DDP) initiative. The administrative data was then combined with primary data collected from 200 hot spots, which allowed the research team to generate more than 100 new variables that determine high-risk locations.

Therefore, in this case study, combining multiple datasets allowed the research team to break down a big problem into a more manageable research question. For instance, it is now clear that just 200 of the 1,400 crash sites across the city are responsible for over half of road traffic deaths. This in turn means that the government should target 150 kilometers of the total 6,200-kilometer road network for road-safety interventions.

Advantages

Using administrative data has various advantages for research teams. Some of these are as follows:

  • Quality: It is often more accurate, and therefore of better quality than self-reported survey data. For example, a firm is more likely to accurately report profits to their country's official financial auditors than to a research team.
  • Cost: It is often less expensive to collect or acquire, since it does not involve the various steps involved in conducting field surveys. Note that there might still be some costs involved in obtaining access to the data through a data licensing agreement (DLA).
  • Time: Using administrative data also saves time since this data has already been collected for a purpose outside of the context of an impact evaluation. For example, in the Kenya case study in the previous section, road crash data from Kenya's National Police Service (NPS) had already been collected over a nine year period. In this case, the research team only had to wait until the DLA was carried out, which was much less than the time it would have taken to conduct a field survey from scratch.
  • Frequency: It is also collected on a regular basis. This allows research teams to evaluate past interventions even if no primary data was collected.
  • Policy impact: Most importantly, as the Kenya case study showed, administrative data can hugely improve the ability of research teams to improve the efficiency of interventions by making them more targeted.

Challenges

However, it is important to note that administrative data also has its list of challenges. Some of these include:

  • Access: Accessing administrative data requires strong relationships with national and/or local authorities. In some cases, authorities may not agree to share the information.
  • Merging: After obtaining access, the research team must combine the administrative data with data from other sources. This often involves merging different datasets together, which can be tricky if there are no common unique IDs.
  • Quality: Finally, research teams should keep in mind that in some cases, administrative data may be badly reported, incomplete, or not available at all. This is because not all governments have the same capacity to accurately collect this information on a regular basis.

Monitoring Data

Monitoring data is collected to understand the implementation of the assigned treatment in the field. Typically, survey round data helps us understand changes in the outcome variables throughout the duration of the project, and monitoring data helps us understand how these changes are related to the intervention of our treatment. For example, monitoring data could be data on who actually received the treatment and if it was implemented according to the research design. Our analysis might be invalid if we do not have this information and base our analysis only on what was meant by the research team to happen. Monitoring data helps us understand what is usually referred to as internal validity.

Related Pages

Click here for pages that link to this topic.

Additional Resources