Monitoring Data Quality
Ensuring high data quality during primary data collection involves anticipating everything that can go wrong, and preparing a comprehensive data quality assurance plan to handle these issues. Monitoring data quality from the field is an important part of this broader data quality assurance plan, and involves the following - communication and reporting, field monitoring, minimizing attrition, and real-time data quality checks. Each of these steps allow the research team to identify and correct these issues by using feedback from multiple rounds of piloting, re-training enumerators accordingly, and reviewing and re-drafting protocols for efficient field management.
Read First
- This page covers the various aspects of monitoring data quality in the field, and is part of the wider data quality assurance plan.
- The research team should discuss steps for monitoring data quality as part of field management.
- The field teams, comprising field coordinators (FCs), supervisors, and field managers, should regularly monitor data quality during field data collection.
- The field teams should communicate clearly with other members of the team, as well as with enumerators, and focus on minimizing attrition.
- Attrition occurs when some study participants leave in the middle of an impact evaluation.
- Run daily data quality checks, including back checks, high frequency checks and identity checks.
- Testing for all different issues that may arise in a survey can be time-consuming. Identify and focus on the issues that are most likely to affect the results.
Communication and Reporting
During data collection, the field teams are responsible for monitoring data quality in the field. It is therefore important to ensure proper communication, both within the team, as well as with the respondents. The structure of the field team is given below (see Figure 1):
- Field coordinators (FCs): The field coordinators (FCs) oversee the overall process of field data collection. They receive feedback from the supervisors and suggests ways to make necessary changes in the process of data collection. They also share survey progress reports and observations with supervisors.
- Survey coordinators: The survey coordinators coordinate with the supervisors and data quality checkers on a regular basis to review the quality of data that is collected.
- Data quality checkers: The data quality checkers carry out regular checks, such as back checks and high frequency checks to identify problems in the data that is shared by enumerators. If there are any issues in the collected data, they share the feedback with the survey coordinators.
- Supervisors: The supervisors monitor the work of the enumerators and communicate with them to resolve any issues that the enumerators might be facing. They share this feedback with the FCs.
- Enumerators: The enumerators are responsible for asking the questions to the respondents through interviews. These interviews can be in the form of computer-assisted personal interviews (CAPI), phone interviews (or CATI), or pen-and-paper personal interviews (PAPI).
Keep these guidelines in mind to ensure proper communication and reporting during data collection:
- Set up a good feedback mechanism. The field teams should meet at the end of everyday to share experiences and challenges faced. Use an instant-messaging platform for communication within the team. The field team can store these conversations in a shared folder online, so that everyone can revisit the discussions later.
- Account for connectivity issues. It might not always be possible to share completed survey forms, or receive feedback based on high frequency checks (HFCs). Train supervisors so they can handle most of the common issues without consulting the field coordinator (FC). However, make sure that the gap between two feedback sessions does not exceed 48 hours.
- Communicate effectively with respondents. At the same time time, train enumerators to communicate and interact with the respondent. The supervisor should be present in a few interviews to ensure that the enumerator is able to communicate all aspects of the study with the respondent, and is able to resolve all concerns of the respondent.
- Be mindful of translation issues. Incorrect translation can cause respondents to incorrectly interpret questions, which can affect the quality of data. Make use of local implementing partners for help in translating technical terms. Use recordings during the training sessions to help enumerators become comfortable with the translated version of the instrument.
Field Monitoring
As part of monitoring data quality in the field, the field team should also set up a mechanism for monitoring various aspects of the data collection. There should be a clear understanding of the team structure, and every member of the field team should know their role in the team. Field monitoring has the following components:
- Field coordinator
- Supervisors
- Survey progress
- Data quality checkers
- Review completed forms
Monitoring: Field coordinators
The role of the field coordinator (FC) is to carefully oversee the work of the field teams, and monitor them in the following ways:
- Regular feedback: Provide useful feedback after, and not during, the pre-decided number of interviews are completed for the day.
- Clear understanding of questions: From the beginning of the data collection, regularly check-in with the field team to see if questions are understood correctly by the respondents. In this case, proper and regular communication with the field team is very important.
- Reporting system: Ensure that proper reporting systems are set up, and that the supervisors and data quality checkers clearly understand the system. Ensure that all members know who their first point-of-contact is in the team.
- Staff motivation: Maintain a good relationship with everyone involved in the data collection, and keep the team motivated.
Monitoring: Supervisors
Supervisors are experienced enumerators who play an important role in monitoring the work of the enumerators. As part of this process, they perform the following tasks:
- Logistics: They organize and finalize various logistics of the data collection process, such as work plan, transport, equipment, and accommodation for the field staff.
- Link with local authorities: By introducing enumerators to local authorities, they act as a link between the enumerators and local authorities. This also improves the quality of data by establishing a comfortable working relationship between different groups involved in the data collection.
- Monitor enumerators: They monitor the work of the enumerators by observing interviews, and helping them improve their interviewing skills. Supervisors can also write their observations about each enumerator on an enumerator observation form, and share it with the field coordinator. Figure 2 below gives an example of such a form.
Monitoring: Survey progress
It is also important to track the progress of the survey. The sign of a good reporting system is that the list of people who have interviewed matches the list of respondents who were originally selected for the interviews. In some cases, enumerators forget to share survey forms, or do not complete the survey forms, or the field team forgets to replace respondents (in cases where they were not available). Missing data can affect the data quality, which can then affect the conclusions of an impact evaluation. Therefore, the field team should take the following steps:
- Good reporting system: Create a good reporting system that involves regular sharing of completed survey forms. Compare the list of completed survey forms with the information on respondents who were selected for participation in the survey.
- Updated list of respondents: The supervisor should always keep an updated list of respondents.
- Survey log: At the same time, the field coordinator (FC) should keep a survey log which contains a list of completed survey forms. Do not share this log with supervisors.
Monitoring: Data quality checkers
Data quality checkers perform regular checks, such as back checks, and high frequency checks, on completed survey forms shared by enumerators. It is important to monitor the data quality checkers in the following ways:
- Report concerns: Report any concerns about data quality to the data quality checkers, for instance, errors and inconsistencies found in the data.
- Peer review: Ideally, there should be a separate team of data quality checkers and auditors. This team should edit and review the survey forms before they are submitted. They can also analyze the errors and correction sheets that are shared by the data quality checkers. Use this team to perform spot checks and back checks, and observe interviews.
- Dashboard: Set up a dashboard with results based on data checks. This creates a continuous feedback process for enumerators, and improves their accountability. Further, by making the process more transparent, it boosts motivation of the enumerators.
Monitoring: Review completed forms
Reviewing completed survey forms is another crucial part of improving data quality. The supervisors, enumerators, or even data quality checkers can conduct the form review. Follow these guidelines for this step:
- SurveyCTO forms: If the survey form was programmed in SurveyCTO, the "Go to prompt" option can list all the questions of the form on a tablet.
- Prioritize: Do not review all questions. Identify and prioritize key questions. For example, questions that determine the number of repeat groups, like household size, number of plots, number of crops, and so on.
- Structure: In this step, also review the structure of the form. Often, key questions can be found within nested repeat groups. For example, crop sales for a plot in a particular season of the year. If the form structure is too complex, it becomes hard for supervisors to locate key questions, so flag these concerns during this step.
iefieldkit
: DIME Analytics has also created the Stata package,iefieldkit
. This package allows members of the field team who do not specialize in code tools to understand and review the various tasks involved in data management and data cleaning. The following two commands iniefieldkit
are specifically designed to field teams to test for, and resolve duplicate entries in the dataset:ieduplicates
: Identifies duplicate values.iecompdup
: Resolves duplicate values.
Note: Duplicate values can bias data quality checks like back checks and high frequency checks (HFCs). Therefore, resolve duplicate values before the survey forms are sent for these checks.
Minimizing Attrition
Real-Time Data Quality Checks
the enumerator will still remember the interview should any questions arise. Further, she/he can likely return to the respondent in if necessary. These checks serve as additional enumerator support mechanisms that allow team members and enumerators to notice data discrepancies as they arise and resolve the issue(s) immediately.
The best time to design and code the back-checks and high frequency checks is in parallel to the design and the programming. Data quality checks may omit important tests or be irrelevant if not written in parallel with the questionnaire.
Back Checks
Back checks, also known as survey audits, are a quality control method implemented to verify the quality and legitimacy of data collected during a survey. Throughout the course of fieldwork, a back-check team returns to a randomly-selected subset of households for which data has been collected. The back-check team re-interviews these respondents with a short subset of survey questions, otherwise known as a back-check survey. Back-checks are used to verify the quality and legitimacy of key data collected in the actual survey. For more information on how to collect and analyze data via back-checks, see Back Checks.
High Frequency Checks
Research teams should run high frequency checks (HFC) from the office daily. Prepare the HFC code via Stata or R once the questionnaire is finalized but before it goes to the field. You should also prepare instructions for the HFCs in case someone else needs to run it while you are in the field and/or without internet connectivity. During data collection, download data and daily run the HFC to report flags. This should be a one-click process. Within the HFC, include four main types of checks: response quality checks, programming checks, enumerator checks, and duplicate/survey log checks.
Response Quality Checks
Response quality checks monitor the consistency of responses across the survey instrument and the range within the responses fall.
- Consistency of responses across the survey instrument: most consistency tests can and should be built into the questionnaire programming via logic and constraints. However, some checks may be overly complex to program in the survey instrument, particularly when comparing responses across rosters or dealing with multiple units. For example, imagine we ask about plot and harvest size and allow the respondent to answer in the unit of his/her choice. In order to test if the harvest in terms of kilos per hectare is plausible, we need to convert harvest and plot size to kilos and hectares, which may be challenging to program within the questionnaire itself. As a rule of thumb, program as many checks as possible into the survey instrument. Then include the rest in the HFC do file or script.
- Reasonable ranges of responses: while range checks should always be programmed into the survey instrument, typically questionnaires employ 'soft' constraints (i.e. warning enumerators that the response is unusual but can continue). Thus, HFC data checks should include checks for extreme values and outliers and confirm whether they make sense in context. Data checks should also check the range for constructed indicators; multiplication or division can create or expose outliers even when the numerator and denominator are reasonable. For example, say a household reported a plot size of 0.05 hectares (the low end of an acceptable range) and produced 1000kg of maize (within an acceptable range): the yield for the plot would be 20,000kg/ha. This is an extreme outlier.
Programming Checks
Programming checks help the research team to understand if they have designed and programmed the questionnaire properly. Most programming errors should be caught when testing the questionnaire, but it is impossible to test all possible outcomes before data collection. Including programming checks in the HFC is especially important when the team has made last-minute edits to the survey instrument.
Enumerators Checks
Enumerator checks help the research team determine if any individual enumerator's data is significantly different from other enumerators' data in the datasets or different from the mean of a given question. These checks should:
- Check percentage of “don’t know” and refusal responses by the enumerator.
- Check the distribution of responses for key questions by enumerator.
- Check the number of surveys per day by the enumerator.
- Check the average interview duration by the enumerator.
- Check the duration of consent by the enumerator.
- Check the duration of other modules by enumerator (anthropometrics, games, etc.).
These statistics can be output into an enumerator dashboard. Keeping track of survey team metrics and frequently discussing them with enumerators and team leaders maintain accountability, transparency, and can boost motivation. See more on SurveyCTO’s tracking dashboard here.
Duplicates and Survey Log Checks
Duplicate and survey log checks confirm that all the data from the field is on the survey in a sound manner. They should:
- Test that all data from the field is on the server: match survey data logs from the field with survey data logs on the server to confirm that all the data from the field has been transferred to the server.
- Test for target number: since surveys are submitted in daily waves, keep track of the numbers of surveys submitted and the target number of surveys needed for an area to be completed.
- Test for duplicates: since SurveyCTO/ODK data provides a number of duplicates, check for duplicates using
ieduplicates
.
Verifying these details as soon as possible is critical: since the enumerator is most likely close by if you run daily checks, it is easy for him/her to re-interview and get missing data if the HFC renders this necessary.
Project Checks
Make sure to look at the broader status and progress of the project itself. These statistics can help the research team see bigger picture trends, including:
- Overall survey progress relative to planned sample
- Summaries of key research variables
- Two-way summaries of survey variables by demographic/geographic characteristics
- Attrition rates by type and treatment status
- Comparisons of variables with known distributions
- Geographical distribution of observations via maps/GIS – are all observations where they’re meant to be?
For the validity of the research, research teams should also monitor that the units assigned to treatment actually receive the treatment – or at least an offer – and that control households do not. Treatment contamination occurs when some of the control receives the treatment. This should be reduced to as great an extent as possible. Research teams may monitor treatment via physical checks or administrative checks:
- Physical checks can ensure that the treatment was applied as stated in impact evaluation protocol to the units selected for treatment, in the correct order, according to the script, and using marketing materials as planned.
- Administrative data checks can check attendance, account records and use pre-/post- tests to ensure that only units to which treatment was offered participated in the intervention. The exact details of this method depend on both the intervention and the available data. With interventions that involve savings accounts, for example, you may check the details of the accounts opened to ensure that the account opener is related to treatment household in some way.
Data Quality for Remote Surveys
In the case of remote surveys, monitoring data quality becomes even more important. Poor quality data in remote surveys can at best reduce the effectiveness of a policy intervention, and at worst require a repeat of the entire process of data collection. Therefore the research team must prepare clear guidelines for the following:
- Type of data checks. Conduct regular back checks and high frequency checks.
- Frequency of data checks. Specify how often the supervisor should conduct data checks.
- Feedback method. Specify method for communicating feedback to the enumerators after a data check. Decide on this before any data collection starts.
Back to Parent
This article is part of the topic Field Management
Additional Resources
- DIME Analytics’ Data Quality Assurance
- Innovation for Poverty Action's template for high frequency checks
- DIME's Planning for, Preparing & Monitoring Household Surveys
- DIME Analytics’ Real Time Data Quality Checks
- SurveyCTO, Monitoring and Visualization case studies
- IPA-JPAL-SurveyCTO, Collecting High Quality Data
- SurveyCTO, Data quality with SurveyCTO