Difference between revisions of "Data Documentation"

Jump to: navigation, search
m
Line 41: Line 41:
  
  
[[Category: Data Cleaning]]
+
[[Category: Data Cleaning]][[Category: Data Management]]

Revision as of 17:41, 12 February 2018

Documenting any aspects of the data work that may affect the analysis is a crucial part of dealing with data. Impact evaluation projects often take years to be completed and are executed by large teams. If the data work is not documented while it is ongoing, it is likely that many details will be lost and a considerable amount of time spent trying to understand what was previously done. For example, say it became clear during the field work that some respondents didn't understand a test that was applied because they had reading difficulties. If the field coordinator didn't document this issue, the research assistant will not know to flag them during data cleaning. And if the research assistant doesn't document why the observations were flagged and what the flag means, they may not be correctly dealt with during analysis.

There are different ways to document data work. One widespread practice is to send e-mails reporting issues to the team. Though this is easily done, it is time-consuming to find answers later on in the project development, even if someone in the team needs to remember that an e-mail was sent. For data cleaning, data analysis and variables construction, it is best practice to document the data work through comments on the code. However, even though this is very helpful for some reading the codes carefully, if these comments are not documented elsewhere, it may also take a long time to go through all the do-files and find the answer to a specific question. It's usually advisable to have all data work documentation in one file or folder, though how it is structured and when, how and by whom it is updated will vary from one project to the other. One advantage of submitting codes for code review and depositing data on the microdata catalog is that both cases the data work documentation will be reviewed, though does not guarantee that everything that should be documented is in fact, as reviewers cannot ask about issues unkown to them.

Read first

Field Work Documentation

Sampling

  • Sample selection
  • Replacement criteria

Field work dates

Tracking respondents

  • Total number of respondents listed
  • Total number of respondents visited
  • Refusal rates
  • Total number of respondents in final sample

Issues on the field

Report any problems that occurred during the administration of the survey (strikes, inclement weather, inability to enter parts of the country)

Data Cleaning Documentation

Outliers

Inconsistencies

Survey Codes and Missing values

Variables Construction Documentation

Sampling

Weights and expansion factors

Outliers

Inconsistencies

Variables definition

References

Datasets Documentation

Dataset creation

Linking data sets

Additional Resources