Checklist: Data Cleaning
Get printable version here and the latest version of the checklist here. For more detailed instructions on how to implement the different tasks in this checklist, see Data Cleaning. Note that this checklist is best displayed in Chrome, Firefox, Safari or any other modern browser.
![]() | |||
---|---|---|---|
Project name: _______________________________________ | |||
Country: ___________________________________________ | |||
District: ____________________________________________ | |||
Year, Month and/or Day: _____________________________ | |||
1. Before data cleaning: Importing the data | |||
Initials | #No | Checklist Item | |
[ __ ] | 1.1 | Check for importing issues such as broken lines when importing .csv files | |
[ __ ] | 1.2 | Make sure you have unique IDs | |
[ __ ] | 1.3 | De-identify all data and save in a new .dta file | |
[ __ ] | 1.4 | Never make any changes to the raw data | |
2. Important steps for data cleaning | |||
Initials | #No | Checklist Item | |
[ __ ] | 2.1 | Label variables, don’t use special characters | |
[ __ ] | 2.2 | Recode and label missing values: your data set should not have observations with -777, -88 or -9 values, for example | |
[ __ ] | 2.3 | Encode variables: all categorical variables should be saved as labeled numeric variables, no strings | |
[ __ ] | 2.4 | Don’t change variable names from questionnaire, except for nested repeat groups and reshaped roster data | |
[ __ ] | 2.5 | Check sample representativeness of age, gender, urban/rural, region and religion | |
[ __ ] | 2.6 | Check administrative data such as date, time, interviewer variables included | |
[ __ ] | 2.7 | Test variables consistency | |
[ __ ] | 2.8 | Identify and document outliers | |
[ __ ] | 2.9 | Compress dataset so it is saved in the most efficient format | |
[ __ ] | 2.10 | Save cleaned data set with an informative name. Avoid saving in a very recent Stata version | |
3. Optional steps in data cleaning | |||
Initials | #No | Checklist Item | |
[ __ ] | 3.1 | Order variables – unique ID always first, then same order as questionnaire | |
[ __ ] | 3.2 | Drop variables that only make sense for questionnaire review (duration, notes, calculates) | |
[ __ ] | 3.3 | Rename roster variables | |
[ __ ] | 3.4 | Categorize variables listed as “others” | |
[ __ ] | 3.5 | Add metadata as notes: original survey question, relevance, constraints, etc | |
The checklist are edited through Git Hub. This checklist corresponds to the file with the name chk_datacleaning.js. To read a simple step by step guide on how to edit the checklist, see this documentation: https://github.com/worldbank/DIMEwiki/tree/master/Topics/Checklists. |
Back to Parent
This article is part of the topic Check Lists