Difference between revisions of "Checklist: Data Cleaning"

Jump to: navigation, search
Line 5: Line 5:


<div id="chk_datacleaning"></div>
<div id="chk_datacleaning"></div>
The checklists are best displayed in Chrome, Firefox, Safari or any other modern browser.


==Back to Parent ==  
==Back to Parent ==  

Revision as of 19:29, 5 April 2018

Get printable version by clicking on printable version in the menu to the left. Find instructions for editing the checklist here. The latest version of this checklist can be found at https://dimewiki.worldbank.org/wiki/Checklist:_Data_Cleaning.

For more detailed instructions on how to implement the different tasks in this checklist, see Data Cleaning.

DIME Image
Project name: _______________________________________
Country: ___________________________________________
District: ____________________________________________
Year, Month and/or Day: _____________________________
1. Before data cleaning: Importing the data
Initials#NoChecklist Item
[ __ ]1.1Check for importing issues such as broken lines when importing .csv files
[ __ ]1.2Make sure you have unique IDs
[ __ ]1.3De-identify all data and save in a new .dta file
[ __ ]1.4Never make any changes to the raw data
2. Important steps for data cleaning
Initials#NoChecklist Item
[ __ ]2.1Label variables, don’t use special characters
[ __ ]2.2Recode and label missing values: your data set should not have observations with -777, -88 or -9 values, for example
[ __ ]2.3Encode variables: all categorical variables should be saved as labeled numeric variables, no strings
[ __ ]2.4Don’t change variable names from questionnaire, except for nested repeat groups and reshaped roster data
[ __ ]2.5Check sample representativeness of age, gender, urban/rural, region and religion
[ __ ]2.6Check administrative data such as date, time, interviewer variables included
[ __ ]2.7Test variables consistency
[ __ ]2.8Identify and document outliers
[ __ ]2.9Compress dataset so it is saved in the most efficient format
[ __ ]2.10Save cleaned data set with an informative name. Avoid saving in a very recent Stata version
3. Optional steps in data cleaning
Initials#NoChecklist Item
[ __ ]3.1Order variables – unique ID always first, then same order as questionnaire
[ __ ]3.2Drop variables that only make sense for questionnaire review (duration, notes, calculates)
[ __ ]3.3Rename roster variables
[ __ ]3.4Categorize variables listed as “others”
[ __ ]3.5Add metadata as notes: original survey question, relevance, constraints, etc
The checklist are edited through Git Hub. This checklist corresponds to the file with the name chk_datacleaning.js. To read a simple step by step guide on how to edit the checklist, see this documentation:
https://github.com/worldbank/DIMEwiki/tree/master/Topics/Checklists.

The checklists are best displayed in Chrome, Firefox, Safari or any other modern browser.

Back to Parent

This article is part of the topic Check Lists