Reproducible Research
In most scientific fields, results are validated through replication: that means that different scientists run the same experiment independently in different samples and find similar conclusions. That standard is not always feasible in development research. More often than not, the phenomena we analyze cannot be artifically re-created. Even in the case of field experiments, different populations can respond differently to a treatment, and the costs involved are high.
Even in such cases, however, we should till require reproducibility: this means that different researchers, when running the same analysis in the same data should find the same results. That may seem obvious, but unfortunately is not as widely observed as we would like. The bottom line of research reproducibility is that the path used to get to your results are as much a research output as the results themselves, making the research process fully transparent. This means that not only the final findings should be made available by researchers, but data, codes and documentation are also of great relevance to the public.
Code replication
- Git is a free version-control software. Files are stored in Git Repositories, most commonly on GitHub. To learn GitHub, there is an introductory training available through GitHub Services, and multiple tutorials available through GitHub Guides
Data publication
Dynamic documents
- R-markdown is a widely adopted tool for creating fully reproducible documents. It allows users to write text and code simultaneously, running analyses in different programming languages and printing results in the final document along with the text. Stata 15 also allows users to create dynamic documents using dyndoc.
- Jupyter Notebook is used to create and share code in different programming languages, including Python, R, Julia, and Scala. It can also create dynamic documents in HTML, LaTeX and other formats.
- LaTeX is another widely used tool in the scientific community. It is a type-setting system that allows users to reference code outputs such as tables and graphs so that they can be easily updated in a text document. Overleaf is a web based platform for collaboration in TeX documents.
- Open science framework is a web based project management platform that combines registration, data storage (through Dropbox, Box, Google Drive and other platforms), code version control (through GitHub) and document composition (through Overleaf).
Additional Resources
From Data Colada:
From the Abul Latif Jameel Poverty Action Lab (JPAL)
From Innovations for Policy Action (IPA)
- Reproducible Research: Best Practices for Data and Code Management
- Guidelines for data publication
- Randomized Control Trials in the Social Science Dataverse
Center for Open Science
- Transparency and Openness Guidelines, summarized in a 1-Page Handout
Berkeley Initiative for Transparency in the Social Sciences
Reproducible Research in R
Reproducible Research in Stata