Difference between revisions of "Checklist: Data Cleaning"
Kbjarkefur (talk | contribs) |
|||
Line 5: | Line 5: | ||
<div id="chk_datacleaning"></div> | <div id="chk_datacleaning"></div> | ||
The checklists are best displayed in Chrome, Firefox, Safari or any other modern browser. | |||
==Back to Parent == | ==Back to Parent == |
Revision as of 19:29, 5 April 2018
Get printable version by clicking on printable version in the menu to the left. Find instructions for editing the checklist here. The latest version of this checklist can be found at https://dimewiki.worldbank.org/wiki/Checklist:_Data_Cleaning.
For more detailed instructions on how to implement the different tasks in this checklist, see Data Cleaning.
![]() | |||
---|---|---|---|
Project name: _______________________________________ | |||
Country: ___________________________________________ | |||
District: ____________________________________________ | |||
Year, Month and/or Day: _____________________________ | |||
1. Before data cleaning: Importing the data | |||
Initials | #No | Checklist Item | |
[ __ ] | 1.1 | Check for importing issues such as broken lines when importing .csv files | |
[ __ ] | 1.2 | Make sure you have unique IDs | |
[ __ ] | 1.3 | De-identify all data and save in a new .dta file | |
[ __ ] | 1.4 | Never make any changes to the raw data | |
2. Important steps for data cleaning | |||
Initials | #No | Checklist Item | |
[ __ ] | 2.1 | Label variables, don’t use special characters | |
[ __ ] | 2.2 | Recode and label missing values: your data set should not have observations with -777, -88 or -9 values, for example | |
[ __ ] | 2.3 | Encode variables: all categorical variables should be saved as labeled numeric variables, no strings | |
[ __ ] | 2.4 | Don’t change variable names from questionnaire, except for nested repeat groups and reshaped roster data | |
[ __ ] | 2.5 | Check sample representativeness of age, gender, urban/rural, region and religion | |
[ __ ] | 2.6 | Check administrative data such as date, time, interviewer variables included | |
[ __ ] | 2.7 | Test variables consistency | |
[ __ ] | 2.8 | Identify and document outliers | |
[ __ ] | 2.9 | Compress dataset so it is saved in the most efficient format | |
[ __ ] | 2.10 | Save cleaned data set with an informative name. Avoid saving in a very recent Stata version | |
3. Optional steps in data cleaning | |||
Initials | #No | Checklist Item | |
[ __ ] | 3.1 | Order variables – unique ID always first, then same order as questionnaire | |
[ __ ] | 3.2 | Drop variables that only make sense for questionnaire review (duration, notes, calculates) | |
[ __ ] | 3.3 | Rename roster variables | |
[ __ ] | 3.4 | Categorize variables listed as “others” | |
[ __ ] | 3.5 | Add metadata as notes: original survey question, relevance, constraints, etc | |
The checklist are edited through Git Hub. This checklist corresponds to the file with the name chk_datacleaning.js. To read a simple step by step guide on how to edit the checklist, see this documentation: https://github.com/worldbank/DIMEwiki/tree/master/Topics/Checklists. |
The checklists are best displayed in Chrome, Firefox, Safari or any other modern browser.
Back to Parent
This article is part of the topic Check Lists