Monitoring Data Quality

Jump to: navigation, search

Read First

  • The best time write the data quality checks is in parallel to the Questionnaire Design and the Questionnaire Programming. Data quality checks are often completed too late to be relevant or does often omit important tests if the tests are not written in parallel to the questionnaire.
  • Data quality checks should be run daily, as the enumerator will still remember the interview if we have questions, and she/he is likely to still be able to go back to the respondent in case we have questions.

Theoretical Framework

There are many different best practices on what checks to do during data quality checks. There is no best practice that is best in every single case, but this section presents a general framework useful to anyone that are new to data quality checks.

This framework categorizes data quality checks in three categories. Response Quality Checks, Programing Checks, and Enumerator Checks. Many checks fits into more than one of these categories. The tools we present further down is not developed with this framework in mind, so it is not required to think in terms of these categories, but it will be helpful for anyone who does not know where to start.

Response Quality Checks

Here we test the quality of the responses given by the respondents. Since we do not know the true value to relate the answer the respondent gave to, it is not always obvious how to test this. One way to do it is that you can test two answers against each other. For example, if a respondent answered that he is male, then he may not answer that he is pregnant. Most of these tests we can and should restrict to plausible answers in the questionnaire programming. In this example we should have restricted the pregnancy question only to respondents who are female. However, the in-questionnaire test is not always as straightforward. Let's say that we ask about plot size and harvest size and allow the respondent to answer in the unit of his/her choice. In order to test if the harvest in terms of kilos per hectare is plausible, we need to convert harvest and plot size to kilos and hectares. While this is possible in SurveyCTO, it is not obvious that it is worth the effort and added complexity to the questionnaire form.

Soft in questionnaire constraints, hard ex-post quality checks. Another reason we do not want to rely too much on response quality in the questionnaire during the survey is that we might simple force the respondents to answer what we as researchers thinks is a plausible range regardless if our ex-ante expectation is true or not. See soft constraints

Another common method to test if the answer is the true value is back checks. This method is described below. While it is much more precise than the method described above, it is much more expensive.

Programming Checks

This type of tests help us understand if we have designed and programmed the questionnaire properly. Most programming errors will be caught when we are testing the questionnaire either with mock data by our desk or when piloting, but it is impossible to test all possible outcomes before data collection, so we need to build in some tests that checks for this during the data collection. While it is something we should avoid, it is common that due to reasons out of our control, we are forced to make last minute edits to the questionnaire. We will not have as much time to thoroughly test those last minute edits. Therefore, we should have planned and written programming checks.