Monitoring Data Quality
Read First
- The best time write the data quality checks is in parallel to the Questionnaire Design and the Questionnaire Programming. Data quality checks are often completed too late to be relevant or does often omit important tests if the tests are not written in parallel to the questionnaire.
- Data quality checks should be run daily, as the enumerator will still remember the interview if we have questions, and she/he is likely to still be able to go back to the respondent in case we have questions.
Theoretical Framework For Beginners
There are many different best practices on what checks to do during data quality checks. There is no best practice that is best in every single case, but this section presents a general framework useful to anyone that are new to data quality checks.
This framework categorizes data quality checks in three categories. Response Quality Checks, Programing Checks, and Enumerator Checks. Many checks fits into more than one of these categories. The tools we present further down is not developed with this framework in mind, so it is not required to think in terms of these categories, but it will be helpful for anyone who does not know where to start.
Response Quality Checks
Programming Checks
This type of tests help us understand if we have designed and programmed the questionnaire properly. We test if some sections are incorrectly always skipped etc.
Enumerators Checks
Enumerator checks helps us to find out if any individual enumerator record data that are different from other data sets.
The enumerators are the eyes and the ears of the project in the field, and it is absolutely necessary for high data quality that our enumerators is making their best effort. Project that sees enumerator checks primarily as a policing activity will probably not have the enumerators most dedicated to go the extra mile to collect as good data as possible. Instead, see enumerators checks as a tool to support enumerators that need extra support. Ultimately there must be consequences, but that should be a last resort, and not the primary reason for these types of checks.
Practical Test used for Data Quality Checks
Duplicates and Survey Logs
It is very important to do quality checks on data during the survey as it is difficult to fix the problem/recollect the data if the error is found after the completion of the survey.
- Testing for Duplicates - Since SurveyCTO/ODK data has a number of duplicates, the first thing you need to do is check for duplicates and remove the duplicates.
- Test that all data from the field is on the server - Survey data logs from the field can then be matched with the logs from the survey data logs on the server to see if the all the data from the field has been transferred to the server.
Tip: Verifying that the data is complete should be done the day of or the day after the survey. Since, the interviewer is most likely close by, it would be easy to re-interview and get missing data if significant chunks of data were missing.
To see how to remove duplicates and check that all the field data is on servers, please see the main article at Duplicates and Survey Logs.
High Frequency Checks
After you have verified that all the data is on the server, the following steps should be undertaken:
- High frequency tests of data quality
- IPA Template only (Template assumes SurvyCTO)
- if not written in SurveyCTO -possible to adapt data to template, or template to data, but might be easier to write your own tests in Stata
- IPA Template + additional tests in Stata
- Test written in Stata only
- Option if data is not collected with SurveyCTO
- This may also be ideal if you wish to add additional checks not covered by the IPA template or not written in by SurveyCTO.
- IPA Template only (Template assumes SurvyCTO)
- Follow up using the Data Explorer in SurveyCTO
Back Checks
Back Checks, also known as Survey Audits, are a second visit to the household to confirm the interview was conducted and verify key pieces of information. Best practice is for back checks to be completed by an independent third party.