|(53 intermediate revisions by 8 users not shown)|
Back checks are
quality control method used to verify data collected during a survey. After survey data has been collected, certain households are re- interviewed for certain questions to verify and determine the legitimacy of the data collected in the actual survey.
Back checksare to verify data. , households re-questions and the of the data .
==Read First ==
==Read First ==
Back checks are done to monitor the quality of the field work. This gives us valuable information on whether the questionnaire accurately captures the key outcomes of the study or not, and on whether the enumerators are performing their jobs as expected.
are the the
Best Practices during back checks ==
Here are some of the best practices that should be done while performing back checks:
:the of the and check and . for .
*Around 10% of the total survey should be back checked for with 20% of the back checks done in the first 2 weeks of field work.
*Every team and every surveyor must be back checked.
* The back check sample must include proportional number of missing and replacement respondents.
*Households must be selected at random for back checks.
==How to Select Back Check Questions ==
to Questions checkquestions of the and the questions the the of poor data quality, , .
Back check questions should be selected with the performance of both the questionnaire and the surveyor in mind. Using different types of questions during the back check helps in finding the cause of poor data quality i.e. questionnaire language, surveyor performance, CAPI errors etc. Some of the questions that should be asked during a back check are as follows:
To test for questionnaire language, back checks can be done on questions which can be interpreted differently by different surveyors. Asking questions that can be interpreted different during the survey and the back check provides the survey team with the knowledge on whether or not the surveyor is interpreting a question correctly.
for , be questions. questions the and the surveythe .
Testing surveyor performance can be done using questions which can not have different answers at different times. Simple questions like the age of the respondent, or the number of member in the households are questions who should not differ between the survey and the back check.
the , or the of the questions differ between the survey and the back check.
To test for CAPI errors, question sections with complex skips can be tested.
==A framework for back checks from Innovations for Poverty Action==
back the .
;Identifying Respondents and Interview Information
:- Check if we have the right person
:- Check if they interview took place and when did it take place.
;Type 1 Variables
Straightforward questions where we expect no variation.
:-. , .
:-For example - education level, marital status, occupation, has children or not, etc.
;Type 2 Variables
- Questions where we expect capable enumerators to get the true answer.
;Type 3 Variables
:- Questions that we expect to be difficult. We back check these questions to understand if they were correctly interpreted in the field.
The total duration of the back checks should be around 10-15 minutes.
the back checks.
Back Checks to Actual Survey Data ==
== Comparing ==
After completing a back check, you can now compare the data obtained from the back check to your actual survey data. This can be done by using the Stata command <code> bcstats </code> developed by [http://www.poverty-action.org/ '''Innovations for Poverty Action.'''] This command compares the back check data and the survey data, and produces a data set of the comparisons between the two data sets. The command also completes enumerator checks and stability checks for variables.
, <code>bcstats</code> of variables.
The steps are as follows:
ssc install bcstats </ br>
bcstats, surveydata('' filename'' ) bcdata('' filename'' ) id('' varlist'' ) [ options]
To learn about the options for bcstats, please type <code> help bcstats </code> on Stata after installing the command.
Action to take after back checks ==
The three types of questions asked during the back check helps determine whether the problems in the data are due to the surveyor or the questionnaire. The remedial actions after back checks are as follows:
to the survey and the back check data. to the the -the . , % .
== Type 1 data ===
Since type 1 variables should have little to no variation between the main survey and the back check , discrepancies in the data are most likely due to surveyor errors. A breakdown of the discrepancy percentage and the suggested corrective measures are as follows:
* More than 10% discrepancy - You should warn the surveyor.
* Discrepancy of 20-30% - 2nd back check needs to be conducted to correct the errors.
** If the errors are surveyor errors, then 3 additional surveys by the surveyors in the same week should be audited. If 20- 30% discrepancies are found in those surveys as well, then the surveyor should be put on probation.
*Discrepancy of more than 40%- 2nd back check to determine who made the errors and maybe resurvey the household. If the surveyor made the errors, resurvey the household and audit all the surveys done by the surveyor in the batch.
**If one more survey has more than 40% discrepancy, fire the surveyor immediately and redo all surveys with 20% or more discrepancy.
=== Type 2 Data ===
==Back to Parent==
Since we expect qualified surveyors to get the answers to type 2 questions, there shouldn't be much variation in the data from the survey and the data from the back check. High discrepancy could mean that the surveyors are not particularly well trained to ask the question. Some suggested corrective measures are as follows:
This article is part of the topic []
* If the discrepancy is more than 10%, consider retraining the surveyor.
* If a particular surveyor is responsible for more than 30% of the errors in the single survey, follow the steps for Type 1.
=== Type 3 Data ===
*If the discrepancy is more than 10%, discuss with your survey team and let your PIs know. They may decide to edit the survey or add additional rounds of surveying.
== Back to Parent ==
This article is part of the topic [[
Monitoring Data Quality]]
== Additional Resources ==
== Additional Resources ==
list here other articles related to this topic, with a brief description and link
Data Quality ]]
Back checks are an important tool that allows the research team to verify the quality and validity of survey data. Throughout the duration of the fieldwork, a back check team returns to a randomly-selected sub-sample of households who have already been interviewed by enumerators. The back check team re-interviews these respondents, using a much smaller set of questions from the actual survey instrument (or questionnaire). This is known as a back check survey, and allows the research team to modify certain aspects of the data collection to improve data quality.
- Back checks are an important tool to detect fraud, for instance, enumerators filling out questionnaires themselves.
- Back checks help researchers to assess the accuracy of data collection, and monitor data quality.
- Back checks can be conducted by in-person visits, or through phone calls. A complementary approach to in-person back checks is conducting random audio audits.
- Back checks allow the research team to resolve issues in data collection by improving enumerator training , or replacing low-performing or problematic enumerators.
- Duration. The total duration of each back check survey should be around 10-15 minutes.
- Specialized enumerators. Hire a team of experienced and skilled enumerators to conduct the back checks.
- Independent team. The back check team should be independent from the rest of the survey staff. Train them separately, and ensure that there is very little or no contact between the back check team and the survey team.
- 20%-First 2 Weeks rule. Administer 20% of back checks within the first two weeks of fieldwork. This helps the research team to identify quickly whether the questionnaire is effective, whether enumerators are doing their jobs well, and what changes to make to ensure high quality data collection.
The following points are important when selecting the sample for the back check survey:
- Sample size. Aim to back check 10-20% of the total observations.
- Stratified sampling. The back check sample should be stratified, that is it must cover all survey teams and enumerators. Back check every team, and every enumerator regularly, and as frequently as possible.
- Include missing respondents. This is to verify that there is no bias in the sample just because enumerators did not track hard-to-find respondents.
- Include other flagged observations. Include observations that were flagged in other quality tests like high frequency checks. Also include respondents who were interviewed by enumerators suspected of cheating.
Designing the Back Check Survey
Questions for the back check survey (or simply back check) are drawn from the actual questionnaire which is used for data collection. There are five types of questions that should be included in a back check to get a clear idea of the data quality, as well as the enumerator's skills:
- Questions to verify respondent and interview information: Verify the identity of the respondent and check if, when, and where the original survey took place. Useful for verifying reported completion rates.
- Questions to detect fraud: Questions that ask for straightforward information which has no expected variation or room for error. They do not require particularly skilled enumerators, and do not vary over time - especially the time period between the actual interview and the back check. For example, questions about type of dwelling, education level, marital status, occupation etc. The actual questions in this category will depend on the survey instrument and context. If the answers to these questions differ between the actual survey and the backcheck survey, it is a sign of either poor data quality, a serious enumerator problem, and/or potential wrongdoing by the enumerator.
- Questions to detect errors in survey execution: Questions that have complex loops or skip patterns, or check for consistency of recorded answers. For example, if household size is recorded as 4, then the number of repeat groups for household members should not be more than 4. Capable enumerators should get the true answer for these questions. If values for these questions differ between the questionnaire and the backcheck survey, then the enumerator may need more training.
- Questions to detect problems with the questionnaire or key outcomes: Provide additional checks for accuracy, and flag difficulties and/or inconsistencies in the interpretation of the questions by enumerators. If these values differ between the actual survey and the back check, then the enumerator may need more traning. In some cases, the survey instrument may need to be simplified.
- Questions that repeat multiple times: Check whether enumerators are falsifying data to reduce the length of interviews. For example, if there is a long series of questions about each household member, verify that the number of times these questions repeat is equal to the number of household members.
Note that it is important that enumerators do not know what questions will be included in the back check survey. To do so, you may consider randomizing questions, or changing the back check survey regularly during data collection.
After completing a back check, you can compare the back check data with the original survey data. You can do this using the Stata command
bcstats, developed by Innovations for Poverty Action. This command produces a dataset that lists the comparisons between the back check and original survey data. The command also allows research teams to perform enumerator checks and stability checks for variables. The results produced by IPA are categorized into 3 types, which is slightly broader than the classification we explained above.
The following syntax is used for performing back checks using
ssc install bcstats
surveydata(filename) bcdata(filename) id(varlist)//
To learn in more detail about the options for
bcstats and back checks, please type
help bcstats on Stata after installing the command. Listed below are two options that are used most commonly with
Comparing different variable types
As part of the functionalities under [options],
bcstats allows users to compare 3 different types of variables.
t1vars(): Specifies the list of type 1 variables. These are variables that are expected to stay constant between the survey data and the back check. In case there are differences for these variables, the research team may take action against the enumerator. This option displays variables which have high error rates, and variables with completed the enumerator checks. This includes Questions to verify respondent and interview information, Questions to detect fraud and Questions that repeat multiple times.
t2vars(): Specifies the list of type 2 variables. These are variables that may be difficult for enumerators to work with. For instance, they may involve complicated skip patterns or complex logic. In this case, if there are differences between the survey data and the back check, it may indicate the need for further training, but will not result in action against the enumerator. This option displays the error rates for these variables, and variables with completed enumerator checks and stability checks. This includes Questions to detect errors in survey execution.
t3vars(): Specifies the list of type 3 variables. These are variables whose stability between the survey and back check is of interest to the research team. If there are any differences for these variables between the survey data and back check data, it will not result in action against the enumerator. This option displays the error rates of all variables, and variables with completed stability checks. This includes Questions to detect problems with the questionnaire or key outcomes.
bcstats also allows users to test for stability by running a paired t-test to compare the sample means for the survey data and the back check data. It also allows users to specify the confidence level for the t-test using the
level() option. By default, it considers a 95% confidence level.
Back to Parent
This article is part of the topic Field Management.