Back-checks are a quality control method implemented to verify the quality and legitimacy of data collected during a survey. Throughout the course of fieldwork, a back-check team returns to a randomly-selected subset of households for which data has been collected. The back-check team re-interviews these respondents with a short subset of survey questions, otherwise known as a back-check survey. Back-checks are used to verify the quality and legitimacy of key data collected in the actual survey. This page will provide points on how to coordinate, sample for, and design questionnaires for back-checks.
- Back-checks are an important tool to detect fraud (i.e. enumerators sitting under a tree and filling out questionnaires themselves).
- Back-checks help researchers to assess the accuracy and quality of the data collected.
- Back-checks can be conducted by in-person visits or phone calls. A complementary approach to in-person back checks is conducting Random Audio Audits.
- Problems identified through back checks can be remedied by further training enumerators or replacing low-performing or problematic enumerators.
- The total duration of each back-check survey should be around 10-15 minutes.
- The back-checks should be conducted by a team of specialized back-check enumerators. The back-check enumerators should be experienced, skilled enumerators.
- The back-check team should be independent from the rest of the survey staff. They should be trained separately and have minimal contact with the survey team.
- Administer 20% of back-checks within the first two weeks of fieldwork. This helps the research team to identify early whether the questionnaire is effective, whether enumerators are doing their jobs well, and which changes to make to ensure high quality data collection.
Sampling for Back-Checks
- Aim to back-check 10-20% of the total observations.
- The back-check sample should be stratified across survey teams/enumerators. Every team and every enumerator must be back-checked as soon as possible and regularly.
- Include missing respondents in the back-check sample to verify that enumerators are not biasing your sample by not tracking hard-to-find respondents. Also include observations flagged in other quality tests like high frequency checks and observations collected by enumerators suspected of cheating.
Designing the Back-Check Survey
Back-check questions are drawn from the original questionnaire. There are four types of questions that should be included in a back-check to gauge data and enumerator quality:
- Questions to identify respondent and interview information:
- These questions verify the identity of the respondent and check if, when, and where the original survey took place. This is useful to check for fraud and verify reported completion rates.
- Questions to detect fraud
- Include questions that ask for straightforward information with no expected variation or room for error. These should be questions that do not require particularly skilled enumeration, and do not vary over time (specifically the time period between the main interview and the backcheck). Examples include type of dwelling, education level, marital status, occupation, whether the respondent has children or not, etc. The specific variables to include will depend on the survey instrument and context. If values differ between the questionnaire and the backcheck survey, they indicate poor quality data, a serious enumerator problem, and/or potentially falsified work.
- Questions to detect errors in survey execution
- These are questions for which capable enumerators should get the true answer. These should be questions which involve relatively complex logic or consistency checks. If values for these questions differ between the questionnaire and the backcheck survey, they indicate that the enumerator may need more training.
- Questions to detect problems with the questionnaire or key outcomes
- These should be a selection of questions that are key outcome variables for the survey. The backcheck provides an additional accuracy checks, and are useful to flag difficulties and/or inconsistencies in enumerator interpretation of the questions. If these values differ between the questionnaire and the backcheck, it indicates the need for further enumerator training or, in particular cases, questionnaire modification.
- Questions that determine repeated sections of the questionnaire
- These should be included to check whether enumerators are falsifying data to reduce the length of interviews. For example, if there is a long series of questions about each household member, verify that the number of household members is correct. If an agricultural survey asks for production information by plot, verify the number of plots is correct.
Note that it is important that enumerators do not know what questions will be audited. To that end, you may consider randomizing questions or changing the back-check survey regularly during data collection.
After completing a back check, you can compare the back check data with the original survey data. You can do this using the Stata command
bcstats, developed by Innovations for Poverty Action. This command produces a dataset that lists the comparisons between the back check and original survey data. The command also allows research teams to perform enumerator checks and stability checks for variables.
The following syntax is used for performing back checks using
ssc install bcstats bcstats, // surveydata(filename) bcdata(filename) id(varlist)// [options]
To learn in more detail about the options for
bcstats and back checks, please type
help bcstats on Stata after installing the command. Listed below are two options that are used most commonly with
Comparing different variable types
As part of the functionalities under [options],
bcstats allows users to compare 3 different types of variables.
t1vars(): Specifies the list of type 1 variables. These are variables that are expected to stay constant between the survey data and the back check. In case there are differences for these variables, the research team may take action against the enumerator. This option displays variables which have high error rates, and variables with completed the enumerator checks.
t2vars(): Specifies the list of type 2 variables. These are variables that may be difficult for enumerators to work with. For instance, they may involve complicated skip patterns or complex logic. In this case, if there are differences between the survey data and the back check, it may indicate the need for further training, but will not result in action against the enumerator. This option displays the error rates for these variables, and variables with completed enumerator checks and stability checks.
t3vars(): Specifies the list of type 3 variables. These are variables whose stability between the survey and back check is of interest to the research team. If there are any differences for these variables between the survey data and back check data, it will not result in action against the enumerator. This option displays the error rates of all variables, and variables with completed stability checks.
bcstats also allows users to test for stability by running a paired t-test to compare the sample means for the survey data and the back check data. It also allows users to specify the confidence level for the t-test using the
level() option. By default, it considers a 95% confidence level.
Back to Parent
This article is part of the topic Field Management.
- DIME Analytics’ Real Time Data Quality Checks
- DIME Analytics’ Data Quality Assurance
- World Health Organization's Quality Assurance in Surveys: standards, guidelines, and procedures. This chapter provides, in detail, the approach and methodology on quality control during surveys.
- bcstats, a Stata program written by an IPA staff member for conducting back checks on survey data.