Data Management

Getting started with GitHub

GitHub is a web-based hosting service for managing code work and tracking changes made to code. It is a useful collaborative tool through all stages of research and fieldwork. This page provides resources and links to resources on how to get started with GitHub.

Data Management

Due to the long life span of a typical impact evaluation, multiple generations of team members often contribute to the same data work. Clear methods for organization of the data folder, the structure of the data sets in the folder, and identification of the observations in the data sets is critical.

Data Security

This page outlines the steps in a typical research project and lists each topic within data security that a research team should consider at that point. If you are following these best practices, then not even the full research team has access to identifying data, but very rarely that is ever needed to do the analysis.

Naming Conventions

Impact Evaluation projects should follow a clear file naming convention as many team members will need to understand and interact with files over the project lifetime. It is very important to use a naming convention that not only you understand but someone looking at the files after years also understands.

Master Do-files

The master do-file is the main do-file that calls upon and runs all the other do-files of a project. It plays a critical role throughout all stages of the research project and functions as a map to the data folder. This page outlines the components of a well-structured and replicable master do-file.

Encryption

In today's world of research, researchers regularly handle data, send it over the internet, and store it in the cloud. At any point, especially when the internet is involved, the data is exposed to some risk. Keeping data safe and encrypted is hence a key component of IRB requirements and research ethics.

Data Storage

This article discusses different aspects of data storage (such as different types of storage, data back up and data retention). It is important to make sure you have appropriate data storage solutions before you start receiving data. You should plan your data storage for the full life-cycle of a project and not just for your immediate needs. Changing data storage solution mid-project can be costly and break the code already written for the project making earlier research outputs non-reproducible.