Data Cleaning

Jump to: navigation, search

Data cleaning is an essential step between data collection and data analysis. The aim is to (i) identify data errors, (ii) correct errors, and (iii) improve data collection process.


Read First

Picture2.png

It is really difficult to have a fully efficient data collection procedure in place that would generate error-free raw data. Any output of raw data needs some level of cleaning, either minor or major. Through the cleaning process, the research team can learn lessons and feed such information into next round's data collection, and to make the whole process more efficient.

Data cleaning becomes essential because without it any analytical work loses validity. Models used in research work assume data to be clean at the least.

Data cleaning is an important aspect of any impact evaluation project. Almost every research team keep research assistant(s) solely for the purpose of data cleaning, hence the additional costs.

The Goal of Cleaning

There are two main goals when cleaning the data set:

  1. Cleaning individual data points that invalidate or incorrectly bias the analysis
  2. Prepare a clean data set so that it is easy to use to other researcher. Both for researchers inside your team and outside your team.

Cleaning individual data points

Prepare a clean data set

Role Division during Data Cleaning

Spend time identifying and documenting irregularities in the data. It is never bad to suggest corrections to irregularities, but a common mistake RAs do is that they spend too much time on trying to fix irrgularities on the expense of having enough time to identify and document as many as possible.

Eventually you and your RA will have understanding on what corrections you can make a decision on yourself, but until then, focus your time on identifying and documenting any issues

Import Data

Incorrect Data and Other Irregularities

Missing Values

No Strings

Labels

Additional Resources

  • list here other articles related to this topic, with a brief description and link