Difference between revisions of "High Frequency Checks"

Jump to: navigation, search
Line 5: Line 5:
* [https://www.poverty-action.org/ IPA] has created a Stata package called <code>[https://github.com/PovertyAction/high-frequency-checks ipacheck]</code> to carry out '''high frequency checks''' on [[Primary Data Collection|survey data]].
* [https://www.poverty-action.org/ IPA] has created a Stata package called <code>[https://github.com/PovertyAction/high-frequency-checks ipacheck]</code> to carry out '''high frequency checks''' on [[Primary Data Collection|survey data]].
== General Principles ==
== General Principles ==
While [[Survey Firm|survey firms]] often conduct internal quality checks, it is important for the [[Impact Evaluation Team|research team]] to conduct their own list of '''high frequency checks''' to improve the quality of [[Publishing Data|research outputs]]. For this purpose, [DIME] has proposed the following four general principles for [[Monitoring Data Quality|conducting quality checks]] on various data sources:
# '''Completeness.'''
# '''Consistency.'''
# '''Anomalous data points.'''
# '''Real-time.'''
In the following sections we look at how the '''research team''' can follow these principles, along with examples.
=== Completeness ===
=== Consistency ===
=== Anomalous data points ===
=== Real-time ===
== Types ==  
== Types ==  
=== Response Quality Checks ===
=== Response Quality Checks ===

Revision as of 16:03, 4 March 2021

Before starting with data collection, the research team should work with the field team to design and code high frequency checks, as part of the data quality assurance plan. The best time to design and code these high frequency checks is in parallel to the process of questionnaire design and programming. These checks should be run daily, to provide real-time information to the field team and research team for all surveys.

Read First

General Principles

While survey firms often conduct internal quality checks, it is important for the research team to conduct their own list of high frequency checks to improve the quality of research outputs. For this purpose, [DIME] has proposed the following four general principles for conducting quality checks on various data sources:

  1. Completeness.
  2. Consistency.
  3. Anomalous data points.
  4. Real-time.

In the following sections we look at how the research team can follow these principles, along with examples.

Completeness

Consistency

Anomalous data points

Real-time

Types

Response Quality Checks

Response quality checks monitor the consistency of responses across the survey instrument and the range within the responses fall.

  • Consistency of responses across the survey instrument: most consistency tests can and should be built into the questionnaire programming via logic and constraints. However, some checks may be overly complex to program in the survey instrument, particularly when comparing responses across rosters or dealing with multiple units. For example, imagine we ask about plot and harvest size and allow the respondent to answer in the unit of his/her choice. In order to test if the harvest in terms of kilos per hectare is plausible, we need to convert harvest and plot size to kilos and hectares, which may be challenging to program within the questionnaire itself. As a rule of thumb, program as many checks as possible into the survey instrument. Then include the rest in the HFC do file or script.
  • Reasonable ranges of responses: while range checks should always be programmed into the survey instrument, typically questionnaires employ 'soft' constraints (i.e. warning enumerators that the response is unusual but can continue). Thus, HFC data checks should include checks for extreme values and outliers and confirm whether they make sense in context. Data checks should also check the range for constructed indicators; multiplication or division can create or expose outliers even when the numerator and denominator are reasonable. For example, say a household reported a plot size of 0.05 hectares (the low end of an acceptable range) and produced 1000kg of maize (within an acceptable range): the yield for the plot would be 20,000kg/ha. This is an extreme outlier.

Programming Checks

Programming checks help the research team to understand if they have designed and programmed the questionnaire properly. Most programming errors should be caught when testing the questionnaire, but it is impossible to test all possible outcomes before data collection. Including programming checks in the HFC is especially important when the team has made last-minute edits to the survey instrument.

Enumerators Checks

Enumerator checks help the research team determine if any individual enumerator's data is significantly different from other enumerators' data in the datasets or different from the mean of a given question. These checks should:

  • Check percentage of “don’t know” and refusal responses by the enumerator.
  • Check the distribution of responses for key questions by enumerator.
  • Check the number of surveys per day by the enumerator.
  • Check the average interview duration by the enumerator.
  • Check the duration of consent by the enumerator.
  • Check the duration of other modules by enumerator (anthropometrics, games, etc.).

These statistics can be output into an enumerator dashboard. Keeping track of survey team metrics and frequently discussing them with enumerators and team leaders maintain accountability, transparency, and can boost motivation. See more on SurveyCTO’s tracking dashboard here.

Duplicates and Survey Log Checks

Duplicate and survey log checks confirm that all the data from the field is on the survey in a sound manner. They should:

  • Test that all data from the field is on the server: match survey data logs from the field with survey data logs on the server to confirm that all the data from the field has been transferred to the server.
  • Test for target number: since surveys are submitted in daily waves, keep track of the numbers of surveys submitted and the target number of surveys needed for an area to be completed.
  • Test for duplicates: since SurveyCTO/ODK data provides a number of duplicates, check for duplicates using ieduplicates.

Verifying these details as soon as possible is critical: since the enumerator is most likely close by if you run daily checks, it is easy for him/her to re-interview and get missing data if the HFC renders this necessary.

Other Checks

For the validity of the research, research teams should also monitor that the units assigned to treatment actually receive the treatment – or at least an offer – and that control households do not. Treatment contamination occurs when some of the control receives the treatment. This should be reduced to as great an extent as possible. Research teams may monitor treatment via physical checks or administrative checks:

  • Physical checks can ensure that the treatment was applied as stated in impact evaluation protocol to the units selected for treatment, in the correct order, according to the script, and using marketing materials as planned.
  • Administrative data checks can check attendance, account records and use pre-/post- tests to ensure that only units to which treatment was offered participated in the intervention. The exact details of this method depend on both the intervention and the available data. With interventions that involve savings accounts, for example, you may check the details of the accounts opened to ensure that the account opener is related to treatment household in some way.

Related Pages

Additional Resources