Difference between revisions of "Reproducible Research"
Line 1: | Line 1: | ||
== | == Read First == | ||
*In most scientific fields, results are validated through replication: that means that different scientists run the same experiment independently in different samples and find similar conclusions. That standard is not always feasible in development research. More often than not, the phenomena we analyze cannot be artifically re-created. Even in the case of field experiments, different populations can respond differently to a treatment, and the costs involved are high. | |||
*Even in such cases, however, we should till require reproducibility: this means that different researchers, when running the same analysis in the same data should find the same results. That may seem obvious, but unfortunately is not as widely observed as we would like. | |||
*The bottom line of research reproducibility is that the path used to get to your results are as much a research output as the results themselves, making the research process fully transparent. This means that not only the final findings should be made available by researchers, but data, codes and documentation are also of great relevance to the public. | |||
== Pre-registration == | |||
Trial registries offer researchers the chance to upload and timestamp their study designs before they have been conducted. The aim of these registries is to build research transparency by reducing selective reporting and provide researchers with an overview of ongoing studies in their field. While trial registration is commonplace in the clinical health trials (see, for example, https://clinicaltrials.gov/), their use in development economics is more recent. | Trial registries offer researchers the chance to upload and timestamp their study designs before they have been conducted. The aim of these registries is to build research transparency by reducing selective reporting and provide researchers with an overview of ongoing studies in their field. While trial registration is commonplace in the clinical health trials (see, for example, https://clinicaltrials.gov/), their use in development economics is more recent. | ||
===Where can I register?=== | |||
The American Economic Association (AEA) hosts a trial registry specifically for randomized controlled trials[https://www.socialscienceregistry.org/]. The international Initiative for Impact Evaluation (3ie) provides a registry for experimental and quasi-experimental research in developing countries [http://www.ridie.org/]. | The American Economic Association (AEA) hosts a trial registry specifically for randomized controlled trials[https://www.socialscienceregistry.org/]. The international Initiative for Impact Evaluation (3ie) provides a registry for experimental and quasi-experimental research in developing countries [http://www.ridie.org/]. | ||
===What information should be included?=== | |||
The information required for registering a trial typically includes the country and title, a brief description of the project, timeline, outcomes, sample size, study design, and ethical approval details. Some of the details provided can be uploaded and time stamped, but hidden from public view prior to study completion. A pre-analysis plan can be uploaded providing a detailed description of how the analysis will be conducted, but this is typically not mandatory for registration. | The information required for registering a trial typically includes the country and title, a brief description of the project, timeline, outcomes, sample size, study design, and ethical approval details. Some of the details provided can be uploaded and time stamped, but hidden from public view prior to study completion. A pre-analysis plan can be uploaded providing a detailed description of how the analysis will be conducted, but this is typically not mandatory for registration. | ||
===When should I register?=== | |||
While clinical trials in health are expected to be registered before patient enrolment [http://icmje.org/recommendations/browse/publishing-and-editorial-issues/clinical-trial-registration.html], there is currently no formal requirement for development economics trials to be registered by a particular stage of the research. In cases where intervention delivery is uncertain, development economics researchers wait to register their trials after baseline and interventions have been completed, but before any follow up data collection or analysis [http://blogs.worldbank.org/impactevaluations/trying-out-new-trial-registries]. | While clinical trials in health are expected to be registered before patient enrolment [http://icmje.org/recommendations/browse/publishing-and-editorial-issues/clinical-trial-registration.html], there is currently no formal requirement for development economics trials to be registered by a particular stage of the research. In cases where intervention delivery is uncertain, development economics researchers wait to register their trials after baseline and interventions have been completed, but before any follow up data collection or analysis [http://blogs.worldbank.org/impactevaluations/trying-out-new-trial-registries]. | ||
== [[Pre-Analysis Plan]] == | |||
A pre-analysis plan (PAP) lays out how the researcher will analyze data, at the design stage of an impact evaluation. The objective of a PAP is to prevent data mining and specification searching. The development impact blog provides a checklist of what to include in a PAP [http://blogs.worldbank.org/impactevaluations/a-pre-analysis-plan-checklist]. | A pre-analysis plan (PAP) lays out how the researcher will analyze data, at the design stage of an impact evaluation. The objective of a PAP is to prevent data mining and specification searching. The development impact blog provides a checklist of what to include in a PAP [http://blogs.worldbank.org/impactevaluations/a-pre-analysis-plan-checklist]. | ||
Line 19: | Line 23: | ||
While PAPs provide the benefit of potentially reducing the prevalence of spurious results, this comes at the cost of tying researcher hands more formally to ex ante analysis plans that may limit the potential of exploratory learning. Benjamin Olken provides a summary of the costs and benefits associated with fully pre-specifying the analysis for a development economics RCT [https://www.aeaweb.org/articles?id=10.1257/jep.29.3.61]. He notes that "forcing all papers to be fully pre-specified from start to end would likely results in simpler papers, which could potentially lose some of the nuance of current work", but that "in many contexts, pre-specification of one (or a few) key primary outcome variables, statistical specifications, and control variables offers a number of advantages". | While PAPs provide the benefit of potentially reducing the prevalence of spurious results, this comes at the cost of tying researcher hands more formally to ex ante analysis plans that may limit the potential of exploratory learning. Benjamin Olken provides a summary of the costs and benefits associated with fully pre-specifying the analysis for a development economics RCT [https://www.aeaweb.org/articles?id=10.1257/jep.29.3.61]. He notes that "forcing all papers to be fully pre-specified from start to end would likely results in simpler papers, which could potentially lose some of the nuance of current work", but that "in many contexts, pre-specification of one (or a few) key primary outcome variables, statistical specifications, and control variables offers a number of advantages". | ||
== Code replication == | |||
== Data publication == | |||
== Software for reproducible research == | |||
*R-markdown is a widely adopted tool for creating fully reproducible documents. It allows users to write text and code simultaneously, running analyses in different programming languages and printing results in the final document along with the text. | |||
*Other options to create reproducible documents for Stata users are Markdoc and Jupyter Notebook. | |||
*Git is a free version-control software. Files are stored in Git Repositories, most commonly on [https://github.com/ GitHub]. To learn GitHub, there is an [https://services.github.com/on-demand/intro-to-github/ introductory training] available through GitHub Services, and multiple tutorials available through [https://guides.github.com/ GitHub Guides] | |||
*LaTeX is another widely used tool in the scientific community. It is a type-setting system that allows users to reference code outputs such as tables and graphs so that they can be easily updated in a text document. Overleaf is a web based platform for collaboration in TeX documents. | |||
* Open science framework is a web based project management platform that combines registration, data storage (through Dropbox, Box, Google Drive and other platforms), code version control (through GitHub) and document composition (through Overleaf). | |||
== Additional Resources == | == Additional Resources == |
Revision as of 21:35, 6 November 2017
Read First
- In most scientific fields, results are validated through replication: that means that different scientists run the same experiment independently in different samples and find similar conclusions. That standard is not always feasible in development research. More often than not, the phenomena we analyze cannot be artifically re-created. Even in the case of field experiments, different populations can respond differently to a treatment, and the costs involved are high.
- Even in such cases, however, we should till require reproducibility: this means that different researchers, when running the same analysis in the same data should find the same results. That may seem obvious, but unfortunately is not as widely observed as we would like.
- The bottom line of research reproducibility is that the path used to get to your results are as much a research output as the results themselves, making the research process fully transparent. This means that not only the final findings should be made available by researchers, but data, codes and documentation are also of great relevance to the public.
Pre-registration
Trial registries offer researchers the chance to upload and timestamp their study designs before they have been conducted. The aim of these registries is to build research transparency by reducing selective reporting and provide researchers with an overview of ongoing studies in their field. While trial registration is commonplace in the clinical health trials (see, for example, https://clinicaltrials.gov/), their use in development economics is more recent.
Where can I register?
The American Economic Association (AEA) hosts a trial registry specifically for randomized controlled trials[1]. The international Initiative for Impact Evaluation (3ie) provides a registry for experimental and quasi-experimental research in developing countries [2].
What information should be included?
The information required for registering a trial typically includes the country and title, a brief description of the project, timeline, outcomes, sample size, study design, and ethical approval details. Some of the details provided can be uploaded and time stamped, but hidden from public view prior to study completion. A pre-analysis plan can be uploaded providing a detailed description of how the analysis will be conducted, but this is typically not mandatory for registration.
When should I register?
While clinical trials in health are expected to be registered before patient enrolment [3], there is currently no formal requirement for development economics trials to be registered by a particular stage of the research. In cases where intervention delivery is uncertain, development economics researchers wait to register their trials after baseline and interventions have been completed, but before any follow up data collection or analysis [4].
Pre-Analysis Plan
A pre-analysis plan (PAP) lays out how the researcher will analyze data, at the design stage of an impact evaluation. The objective of a PAP is to prevent data mining and specification searching. The development impact blog provides a checklist of what to include in a PAP [5].
While most economics journals do not currently require PAPs as a condition for publication, researchers may choose to produce a PAP prior to data analysis to: (i) increase the credibility of their findings; and (ii) help researchers finetune their analysis strategy.
While PAPs provide the benefit of potentially reducing the prevalence of spurious results, this comes at the cost of tying researcher hands more formally to ex ante analysis plans that may limit the potential of exploratory learning. Benjamin Olken provides a summary of the costs and benefits associated with fully pre-specifying the analysis for a development economics RCT [6]. He notes that "forcing all papers to be fully pre-specified from start to end would likely results in simpler papers, which could potentially lose some of the nuance of current work", but that "in many contexts, pre-specification of one (or a few) key primary outcome variables, statistical specifications, and control variables offers a number of advantages".
Code replication
Data publication
Software for reproducible research
- R-markdown is a widely adopted tool for creating fully reproducible documents. It allows users to write text and code simultaneously, running analyses in different programming languages and printing results in the final document along with the text.
- Other options to create reproducible documents for Stata users are Markdoc and Jupyter Notebook.
- Git is a free version-control software. Files are stored in Git Repositories, most commonly on GitHub. To learn GitHub, there is an introductory training available through GitHub Services, and multiple tutorials available through GitHub Guides
- LaTeX is another widely used tool in the scientific community. It is a type-setting system that allows users to reference code outputs such as tables and graphs so that they can be easily updated in a text document. Overleaf is a web based platform for collaboration in TeX documents.
- Open science framework is a web based project management platform that combines registration, data storage (through Dropbox, Box, Google Drive and other platforms), code version control (through GitHub) and document composition (through Overleaf).
Additional Resources
From the Abul Latif Jameel Poverty Action Lab (JPAL)
From Innovations for Policy Action (IPA)
- Reproducible Research: Best Practices for Data and Code Management
- Guidelines for data publication
- Randomized Control Trials in the Social Science Dataverse
Center for Open Science
- Transparency and Openness Guidelines, summarized in a 1-Page Handout
Berkeley Initiative for Transparency in the Social Sciences
Johns Hopkins