Difference between revisions of "Back Checks"

Jump to: navigation, search
 
(27 intermediate revisions by 5 users not shown)
Line 1: Line 1:
Back-checks are a [[Monitoring Data Quality | quality]] control method implemented to verify the quality and legitimacy of [[Primary Data Collection | data collected]] during a survey. Throughout the course of fieldwork, a back-check team returns to a randomly-selected subset of households for which data has been collected. The back-check team re-interviews these respondents with a short subset of survey questions, otherwise known as a back-check survey. Back-checks are used to verify the quality and legitimacy of key data collected in the actual survey. This page will provide points on how to coordinate, sample for, and design questionnaires for back-checks.
'''Back checks''' are an important tool that allows the [[Impact Evaluation Team|research team]] to verify the [[Monitoring Data Quality | quality]] and validity of [[Primary Data Collection | survey data]]. Throughout the duration of the [[Field Surveys|fieldwork]], a '''back check team''' returns to a randomly-selected sub-sample of households who have already been interviewed by enumerators. The back check team re-interviews these respondents, using a much smaller set of questions from the actual [[Questionnaire Design|survey instrument]] (or questionnaire). This is known as a '''back check survey''', and allows the '''research team''' to modify certain aspects of the '''data collection''' to [[Data Quality Assurance Plan|improve data quality]].
==Read First ==
* '''Back checks''' are an important tool to detect fraud, for instance, enumerators filling out questionnaires themselves.
* '''Back checks''' help researchers to assess the accuracy of [[Primary Data Collection|data collection]], and [[Monitoring Data Quality|monitor data quality]].
* '''Back checks''' can be conducted by in-person visits, or through phone calls. A complementary approach to in-person back checks is conducting [[Random Audio Audits|random audio audits]].
* '''Back checks''' allow the [[Impact Evaluation Team|research team]] to resolve issues in data collection by improving [[Enumerator Training | enumerator training ]], or replacing low-performing or problematic enumerators.
 
==Logistics==
* '''Duration.''' The total duration of each '''back check survey''' should be around 10-15 minutes.
* '''Specialized enumerators.''' Hire a team of experienced and skilled enumerators to conduct the back checks.
* '''Independent team.''' The '''back check team''' should be independent from the rest of the [[Preparing_for_Field_Data_Collection#Team_setup_and_roles|survey staff]]. Train them separately, and ensure that there is very little or no contact between the back check team and the '''survey team'''.
* '''20%-First 2 Weeks rule.''' Administer 20% of back checks within the first two weeks of [[Preparing for Field Data Collection|fieldwork]]. This helps the [[Impact Evaluation Team|research team]] to identify quickly whether the [[Questionnaire Design | questionnaire]] is effective, whether enumerators are doing their jobs well, and what changes to make to ensure [[Data Quality Assurance Plan|high quality data collection]].
 
==Sampling==
The following points are important when selecting the sample for the '''back check survey''':
* '''Sample size.''' Aim to '''back check''' 10-20% of the total observations.
* '''Stratified sampling.''' The back check sample should be [[Stratified Random Sample | stratified]], that is it must cover all [[Preparing_for_Field_Data_Collection#Team_setup_and_roles|survey teams]] and enumerators. Back check every team, and every enumerator regularly, and as frequently as possible.
* '''Include missing respondents.''' This is to verify that there is no bias in the sample just because enumerators did not track hard-to-find respondents.
* '''Include other flagged observations.''' Include observations that were flagged in other quality tests like [[High Frequency Checks | high frequency checks]]. Also include respondents who were interviewed by enumerators suspected of cheating.


==Read First ==
==Designing the Back Check Survey==
*Back-checks help to evaluate how effective the instrument is and how well the enumerators are collecting quality data.
Questions for the '''back check survey''' (or simply '''back check''') are drawn from the actual [[Questionnaire Design | questionnaire]] which is used for [[Primary Data Collection|data collection]]. There are five types of questions that should be included in a '''back check''' to get a clear idea of the [[Monitoring Data Quality|data quality]], as well as the enumerator's skills:
*Back-checks are an important tool to detect fraud (i.e. enumerators sitting under a tree and filling out questionnaires themselves).
 
*Back-checks help researchers to assess the accuracy of the data collected.  
* '''Questions to verify respondent and interview information:''' Verify the identity of the respondent and check if, when, and where the original survey took place. Useful for verifying reported completion rates.  
*Back-checks can be conducted by in-person visits or phone calls. A complementary approach to in-person back checks is conducting [[Random Audio Audits]].
 
*Problems identified through back checks can be remedied by further [[Enumerator Training | training enumerators]] or replacing low-performing or problematic enumerators.  
* '''Questions to detect fraud:''' Questions that ask for straightforward information which has no expected variation or room for error. They do not require particularly skilled enumerators, and do not vary over time - especially the time period between the actual interview and the '''back check'''. For example, questions about type of dwelling, education level, marital status, occupation etc. The actual questions in this category will depend on the survey instrument and context. If the answers to these questions differ between the actual survey and the backcheck survey, it is a sign of either poor [[Data Quality Assurance Plan|data quality]], a serious enumerator problem, and/or potential wrongdoing by the enumerator.
 
* '''Questions to detect errors in survey execution:''' Questions that have complex '''loops''' or '''skip patterns''', or check for consistency of recorded answers. For example, if household size is recorded as 4, then the number of repeat groups for household members should not be more than 4. Capable enumerators should get the true answer for these questions. If values for these questions differ between the questionnaire and the backcheck survey, then the enumerator may need more [[Enumerator Training|training]].


==Coordinating Back-Checks==
* '''Questions to detect problems with the questionnaire or key outcomes:''' Provide additional checks for accuracy, and flag difficulties and/or inconsistencies in the interpretation of the questions by enumerators. If these values differ between the actual survey and the back check, then the enumerator may need more traning. In some cases, the survey instrument may need to be simplified.
*The total duration of each back-check survey should be around 10-15 minutes.
*The back-checks should be conducted by a specialized team of a few exclusively back-checking enumerators. The back-check enumerators should be of the highest trust and quality.
*Administer 20% of back-checks within the first two weeks of fieldwork. This helps the research team to identify early whether the [[Questionnaire Design | questionnaire]] is effective, whether enumerators are doing their jobs well, and which changes to make to ensure high quality data collection.


==Sampling for Back-Checks==
* '''Questions that repeat multiple times:''' Check whether enumerators are falsifying data to reduce the length of interviews. For example, if there is a long series of questions about each household member, verify that the number of times these questions repeat is equal to the number of household members.  
*Aim to back-check 10-20% of the total observations.
*The back-check sample should be [[Stratified Random Sample | stratified]] across survey teams/enumerators. Every team and every enumerator must be back-checked as soon as possible and regularly.
*Include missing respondents in the back-check sample to verify that enumerators are not biasing your sample by not tracking hard-to-find respondents. Also include observations flagged in other quality tests like [[Monitoring Data Quality#Guidelines#High Frequency Checks | high frequency checks]] and observations collected by enumerators suspected of cheating.


==Designing the Back-Check Survey==
'''Note''' that it is important that enumerators do not know what questions will be included in the '''back check survey'''. To do so, you may consider randomizing questions, or changing the back check survey regularly during data collection.
Back-check questions are drawn from the original [[Questionnaire Design | questionnaire]]. [http://www.poverty-action.org/ Innovation for Poverty Action] identifies four types of questions that should be included in a back-check to best gauge data and enumerator quality:


*Questions to identify respondent and interview information:
== bcstats ==
: These questions verify the identity of the respondent and check if, when, and where the original survey took place.  
After completing a back check, you can compare the '''back check data''' with the original survey data. You can do this using the Stata command <code>bcstats</code>, developed by [http://www.poverty-action.org/ Innovations for Poverty Action]. This command produces a dataset that lists the comparisons between the back check and original survey data. The command also allows [[Impact Evaluation Team|research teams]] to perform [[High_Frequency_Checks#Enumerators_Checks|enumerator checks]] and [[Back Checks#Stability|stability checks]] for variables. The results produced by IPA are categorized into 3 types, which is slightly broader than the classification we explained above.  


*Type 1 Variable Questions:  
The following syntax is used for performing '''back checks''' using <code>bcstats</code>:
:These questions ask straightforward information with no expected variation or room for error. They may include questions about education level, marital status, occupation, whether the respondent has children or not, etc. If Type 1 variable values differ between the questionnaire and the backcheck survey, they indicate poor quality data, a serious enumerator problem, and potentially falsified work.


*Type 2 Variable Questions:
<syntaxhighlight lang="Stata" line>ssc install bcstats
:These are questions for which capable enumerators should get the true answer. If the Type 2 response value differ between the questionnaire and the backcheck survey, they indicate that the enumerator may need more training.
bcstats, //
  surveydata(filename) bcdata(filename) id(varlist)//
  [options]
</syntaxhighlight>


*Type 3 Variable Questions:
To learn in more detail about the options for <code>bcstats</code> and '''back checks''', please type <syntaxhighlight lang="Stata" inline>help bcstats</syntaxhighlight> on Stata after installing the command. Listed below are two options that are used most commonly with <code>bcstats</code>.
:These questions are expected to be difficult. They help research teams to understand if the questionnaire is effectively designed and if enumerators are interpreting difficult and/or nuanced questions correctly and uniformly. If Type 3 variable values differ between the questionnaire and the backcheck, they indicate the need for further enumerator training or, in particular cases, questionnaire modification.  


Back-check surveys may also test for translation issues by including questions that could be interpreted differently by different surveyors. Finally, to test whether enumerators are falsifying data to shorten interviews, back-check questions that determine repeated sections of the questionnaire. For example, if there is a long series of questions about household members, verify the correct number of household members. If an agricultural survey asks for production information by plot, verify the number of plots.  
=== Comparing different variable types ===
As part of the functionalities under '''[''options'']''', <code>bcstats</code> allows users to compare 3 different types of variables.


Note that it is important that enumerators do not know what questions will be audited. To that end, you may consider randomly changing the back-check survey regularly during data collection.
* <code>t1vars()</code>''':''' Specifies the list of '''type 1 variables'''. These are variables that are expected to stay constant between the survey data and the back check. In case there are differences for these variables, the '''research team''' may take action against the enumerator. This option displays variables which have high error rates, and variables with completed the [[High_Frequency_Checks#Enumerators_Checks|enumerator checks]]. This includes '''Questions to verify respondent and interview information''', '''Questions to detect fraud''' and '''Questions that repeat multiple times'''.


== Analyzing Back-Check Data ==
* <code>t2vars()</code>''':''' Specifies the list of '''type 2 variables'''. These are variables that may be difficult for enumerators to work with.  For instance, they may involve complicated [https://www.surveycto.com/best-practices/using-relevance/ skip patterns] or complex logic.  In this case, if there are differences between the survey data and the back check, it may indicate the need for further [[Enumerator Training|training]], but will not result in action against the enumerator. This option displays the error rates for these variables, and variables with completed '''enumerator checks''' and '''stability checks'''.  This includes '''Questions to detect errors in survey execution'''.


After completing a back-check, you can compare the back-check data to the original survey data. This can be done by using the Stata command <code>bcstats</code>, developed by [http://www.poverty-action.org/ Innovations for Poverty Action]. This command produces a dataset of the comparisons between the back-check and original survey data. The command also completes enumerator checks and stability checks for variables.
* <code>t3vars()</code>''':''' Specifies the list of '''type 3 variables'''. These are variables whose '''stability''' between the survey and back check is of interest to the research team. If there are any differences for these variables between the survey data and back check data, it will not result in action against the enumerator. This option displays the error rates of all variables, and variables with completed stability checks. This includes '''Questions to detect problems with the questionnaire or key outcomes'''.


The steps are as follows:
=== Stability ===
  <nowiki>
<code>bcstats</code> also allows users to test for '''stability''' by running a paired '''t-test''' to compare the sample means for the survey data and the back check data. It also allows users to specify the confidence level for the t-test using the <code>level()</code> option. By default, it considers a 95% confidence level.
ssc install bcstats </br>
bcstats, surveydata(''filename'') bcdata(''filename'') id(''varlist'') [options]
</nowiki>
To learn about the options for <code>bcstats</code> and back-checks, please type <code> help bcstats </code> on Stata after installing the command.


==Back to Parent==
==Back to Parent==
Line 52: Line 63:


== Additional Resources ==
== Additional Resources ==
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/stata1-4-quality.pdf Real Time Data Quality Checks]
*DIME Analytics (World Bank), [https://osf.io/j8t5f Assuring Data Quality]
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/stata2-4-quality.pdf Data Quality Assurance]
*DIME Analytics (World Bank), [https://osf.io/um5gy Field Data Quality Validation]
* World Health Organization's  [http://unstats.un.org/unsd/hhsurveys/pdf/Chapter_10.pdf '''Quality Assurance in Surveys: standards, guidelines, and procedures''']. This chapter provides, in detail,  the approach and methodology on quality control during surveys.
* World Health Organization's  [http://unstats.un.org/unsd/hhsurveys/pdf/Chapter_10.pdf Quality Assurance in Surveys: standards, guidelines, and procedures].  
*[https://ideas.repec.org/c/boc/bocode/s458173.html bcstats], a  Stata program written by an IPA staff member for conducting back checks on survey data.
*[https://ideas.repec.org/c/boc/bocode/s458173.html bcstats], a  Stata program written by an IPA staff member for conducting back checks on survey data.
[[Category: Research Design]]
[[Category: Field Management ]]
[[Category: Field Management ]]

Latest revision as of 19:49, 16 August 2023

Back checks are an important tool that allows the research team to verify the quality and validity of survey data. Throughout the duration of the fieldwork, a back check team returns to a randomly-selected sub-sample of households who have already been interviewed by enumerators. The back check team re-interviews these respondents, using a much smaller set of questions from the actual survey instrument (or questionnaire). This is known as a back check survey, and allows the research team to modify certain aspects of the data collection to improve data quality.

Read First

  • Back checks are an important tool to detect fraud, for instance, enumerators filling out questionnaires themselves.
  • Back checks help researchers to assess the accuracy of data collection, and monitor data quality.
  • Back checks can be conducted by in-person visits, or through phone calls. A complementary approach to in-person back checks is conducting random audio audits.
  • Back checks allow the research team to resolve issues in data collection by improving enumerator training , or replacing low-performing or problematic enumerators.

Logistics

  • Duration. The total duration of each back check survey should be around 10-15 minutes.
  • Specialized enumerators. Hire a team of experienced and skilled enumerators to conduct the back checks.
  • Independent team. The back check team should be independent from the rest of the survey staff. Train them separately, and ensure that there is very little or no contact between the back check team and the survey team.
  • 20%-First 2 Weeks rule. Administer 20% of back checks within the first two weeks of fieldwork. This helps the research team to identify quickly whether the questionnaire is effective, whether enumerators are doing their jobs well, and what changes to make to ensure high quality data collection.

Sampling

The following points are important when selecting the sample for the back check survey:

  • Sample size. Aim to back check 10-20% of the total observations.
  • Stratified sampling. The back check sample should be stratified, that is it must cover all survey teams and enumerators. Back check every team, and every enumerator regularly, and as frequently as possible.
  • Include missing respondents. This is to verify that there is no bias in the sample just because enumerators did not track hard-to-find respondents.
  • Include other flagged observations. Include observations that were flagged in other quality tests like high frequency checks. Also include respondents who were interviewed by enumerators suspected of cheating.

Designing the Back Check Survey

Questions for the back check survey (or simply back check) are drawn from the actual questionnaire which is used for data collection. There are five types of questions that should be included in a back check to get a clear idea of the data quality, as well as the enumerator's skills:

  • Questions to verify respondent and interview information: Verify the identity of the respondent and check if, when, and where the original survey took place. Useful for verifying reported completion rates.
  • Questions to detect fraud: Questions that ask for straightforward information which has no expected variation or room for error. They do not require particularly skilled enumerators, and do not vary over time - especially the time period between the actual interview and the back check. For example, questions about type of dwelling, education level, marital status, occupation etc. The actual questions in this category will depend on the survey instrument and context. If the answers to these questions differ between the actual survey and the backcheck survey, it is a sign of either poor data quality, a serious enumerator problem, and/or potential wrongdoing by the enumerator.
  • Questions to detect errors in survey execution: Questions that have complex loops or skip patterns, or check for consistency of recorded answers. For example, if household size is recorded as 4, then the number of repeat groups for household members should not be more than 4. Capable enumerators should get the true answer for these questions. If values for these questions differ between the questionnaire and the backcheck survey, then the enumerator may need more training.
  • Questions to detect problems with the questionnaire or key outcomes: Provide additional checks for accuracy, and flag difficulties and/or inconsistencies in the interpretation of the questions by enumerators. If these values differ between the actual survey and the back check, then the enumerator may need more traning. In some cases, the survey instrument may need to be simplified.
  • Questions that repeat multiple times: Check whether enumerators are falsifying data to reduce the length of interviews. For example, if there is a long series of questions about each household member, verify that the number of times these questions repeat is equal to the number of household members.

Note that it is important that enumerators do not know what questions will be included in the back check survey. To do so, you may consider randomizing questions, or changing the back check survey regularly during data collection.

bcstats

After completing a back check, you can compare the back check data with the original survey data. You can do this using the Stata command bcstats, developed by Innovations for Poverty Action. This command produces a dataset that lists the comparisons between the back check and original survey data. The command also allows research teams to perform enumerator checks and stability checks for variables. The results produced by IPA are categorized into 3 types, which is slightly broader than the classification we explained above.

The following syntax is used for performing back checks using bcstats:

ssc install bcstats 
bcstats, //
  surveydata(filename) bcdata(filename) id(varlist)//
  [options]

To learn in more detail about the options for bcstats and back checks, please type help bcstats on Stata after installing the command. Listed below are two options that are used most commonly with bcstats.

Comparing different variable types

As part of the functionalities under [options], bcstats allows users to compare 3 different types of variables.

  • t1vars(): Specifies the list of type 1 variables. These are variables that are expected to stay constant between the survey data and the back check. In case there are differences for these variables, the research team may take action against the enumerator. This option displays variables which have high error rates, and variables with completed the enumerator checks. This includes Questions to verify respondent and interview information, Questions to detect fraud and Questions that repeat multiple times.
  • t2vars(): Specifies the list of type 2 variables. These are variables that may be difficult for enumerators to work with. For instance, they may involve complicated skip patterns or complex logic. In this case, if there are differences between the survey data and the back check, it may indicate the need for further training, but will not result in action against the enumerator. This option displays the error rates for these variables, and variables with completed enumerator checks and stability checks. This includes Questions to detect errors in survey execution.
  • t3vars(): Specifies the list of type 3 variables. These are variables whose stability between the survey and back check is of interest to the research team. If there are any differences for these variables between the survey data and back check data, it will not result in action against the enumerator. This option displays the error rates of all variables, and variables with completed stability checks. This includes Questions to detect problems with the questionnaire or key outcomes.

Stability

bcstats also allows users to test for stability by running a paired t-test to compare the sample means for the survey data and the back check data. It also allows users to specify the confidence level for the t-test using the level() option. By default, it considers a 95% confidence level.

Back to Parent

This article is part of the topic Field Management.

Additional Resources