Difference between revisions of "Duplicates and Survey Logs"

Jump to: navigation, search
Line 5: Line 5:
 
Before analyzing the outcomes of quality checks or sometimes even before running real time quality checks, we need to check for duplicates in the data. Duplicates are common in ODK/SurveyCTO and need to be removed before starting other data quality checks. There are three main types of duplicates in SurveyCTO which are:  
 
Before analyzing the outcomes of quality checks or sometimes even before running real time quality checks, we need to check for duplicates in the data. Duplicates are common in ODK/SurveyCTO and need to be removed before starting other data quality checks. There are three main types of duplicates in SurveyCTO which are:  
  
*Double Submissions of same observation and same data - This happens when the first upload from the tablet to the server was interrupted due to bad internet.  
+
*Double Submissions of same observation and same data </br> This happens when the first upload from the tablet to the server was interrupted due to bad internet.  
 
*Double submissions of same observation but with modified data(rare in SurveyCTO) - This is due to answer being modified after submission of the original survey and the survey resubmitted. This is bad practice and it is more transparent to correct errors in the do-file instead.
 
*Double submissions of same observation but with modified data(rare in SurveyCTO) - This is due to answer being modified after submission of the original survey and the survey resubmitted. This is bad practice and it is more transparent to correct errors in the do-file instead.
 
*Incorrectly assigned ID i.e. two respondents with the same ID - This is due to a  typo in the field when the a respondent ID is being entered.
 
*Incorrectly assigned ID i.e. two respondents with the same ID - This is due to a  typo in the field when the a respondent ID is being entered.

Revision as of 15:32, 26 January 2017

Read First

  • The data should be downloaded daily and checked for duplicates daily.It is much easier to solve the problem when the field team remembers the interview. Other data quality checks depend on uniquely identifying ID variables.

Types of Duplicates in SurveyCTO

Before analyzing the outcomes of quality checks or sometimes even before running real time quality checks, we need to check for duplicates in the data. Duplicates are common in ODK/SurveyCTO and need to be removed before starting other data quality checks. There are three main types of duplicates in SurveyCTO which are:

  • Double Submissions of same observation and same data
    This happens when the first upload from the tablet to the server was interrupted due to bad internet.
  • Double submissions of same observation but with modified data(rare in SurveyCTO) - This is due to answer being modified after submission of the original survey and the survey resubmitted. This is bad practice and it is more transparent to correct errors in the do-file instead.
  • Incorrectly assigned ID i.e. two respondents with the same ID - This is due to a typo in the field when the a respondent ID is being entered.


To remove duplicates, you can use the DIME's Stata command ieduplicates which can be found in the ietoolkit Stata package.

ssc install ietoolkit
ieduplicates ID_varname

This identifies the duplicates in the ID variable and exports them to an Excel file which is also used to correct duplicates in Stata. Field supervisors without knowledge of Stata can make the corrections in the Excel file and the duplicates will be corrected the next time you run the code.

Comparing Server Data to Field Logs

Comparing server data to field logs makes sure that all the data collected during the survey has made it to your server. This can be done in a few steps:

  • Gener

Back to Parent

This article is part of the topic Monitoring Data Quality

Additional Resources

  • list here other articles related to this topic, with a brief description and link