Exporting Analysis

Jump to: navigation, search

This article discuss and give examples on three important concepts in relation to exporting results of analysis. Those three concepts are formatting of outputs, replicability and version control.


Read First

  • Outputted research should always be reproducible. See categories below for different levels of replicability
  • DIME does not have rules for which formatting rules you follow as long as the formatting is consistent in a project

Formatting

Formatting requirements depends a lot on who is the audience. For example, the best practice differs a lot if we are communicating results to the beneficiaries of the project or to the research community.

Fact sheets to government counterparts and to members of the communities that we are working with can be a very efficient method to disseminate our impact evaluation results. Here is a great example of a fact sheet used for a DIME project. Regression tables formatted according to journal standards would obviously not work well in this context.

For academic context then DIME has no formatting requirements other than to follow established guidelines as much as possible. Here are examples of such guidelines:

LaTeX

The best tool for excellent looking and reproducible formatting of tables is LaTeX. DIME has created a LaTeX Training with multiple stages targeting the absolute beginner as well as the experienced user.

Levels of Replicability of Exporting Analysis

We all know that all our work should be replicable, especially the outputs, but how replicable does something need to be for it to be called replicable? For example, if a report has a table that is outputted by code but formatted manually, is the report replicable? This section will walk you through different levels of replicability and tools for each of those levels and then these levels can be used to describe how replicable a report or a section of a report is.

DIME projects should at least have good replicability except in very special circumstances, but even then it should at least fulfill the requirements for Basic Replicability.

  • No replicability. Parts or all of the results are not generated by code that can be run by anyone else, and/or parts or all of the outputted results needs to be manually copied and pasted from the result window or graph window in, for example, Stata.
  • Basic replicability. All results are produced by code and saved to files on disk. However, some copy and pasting between files are required to create the final tables.
  • Good replicability. All results are produced by code and saved to files on disk and no copying and pasting of results are needed between files. However, formatting and other minor changes are needed, and/or the final tables need to be copied and pasted into the document where it will be used.
  • Full replicability. All results are generated by code and exported in a format where no changes needs to be need to finalize them (not even formatting). And the results are also fully automatically imported to, and if the results are changed fully automatically updated in, the documents where it will be used.

No replicability

Anything that needs manual copying and pasting from a Stata or R window to a file saved as disc can never be considered replicable. This applies to graphs as well. Graphs should be saved to file and not be copied and pasted from the window it pops up in in for example Stata or R.

This level of replicability is never acceptable at DIME, no matter how small or unimportant the report is. It could be acceptable to do what is described here during the initial exploration of the data, but as soon as output is produced for someone else - even within the team - it should be done with a higher level of replicability. Since analysis should eventually be shown to someone, we strongly recommend that you aim for a higher level of replicability from the start since it will save you time later.

Basic Replicability

The code generates all graphs and tables to the project folder, however, some copying and pasting between files are needed to create the tables, or some very basic math needs to be done in the outputted files. This is not best practice, but it is an minimum acceptable level of replicability. No DIME projects should ever be below this level, and we strongly recommend everyone to be at least at good replicability.

Graphs. It is easy to satisfy basic replicability for Graphs. In Stata, you simply use thesave() option included in Stata's graph commands. This even satisfies good replicability for many graphs.

Tables. For tables there are no single built in option for saving output similarly to the save() option for graphs. Common commands to output results include outreg and estout. estout is a package of commands that also include esttab, eststo, estadd and estpost. These commands will be explained in more detail in the good replicability section below.

One way to test is that all tables and graphs are exported with basic replicability is to move all tables and graph to a separate folder and run the code again. Make sure that all tables and graph files are re-created again in the folder and that it is possible to make the minor manual actions required to generate the final tables and graphs from these files.

Good replicability

This is the levels that we recommend all DIME projects to aim for. Will full replicability is objectively better, we understand that it is not a model that is practically feasible for all projects.

Graphs. For many graphs, the save() descried in basic replicability is often enough for good replicability. One exception where it is not sufficient is if several graphs are supposed to be combined into one graph. It is not good replicability to combine them manually or to simply put them in the report next to each other. For good replicability this should be done in the code. In Stata there is a command called graph combine that can be used when needed to reach good replicability.

Tables.

Full replicability

While new tools are starting to be introduced to make it possible to achieve this level of replicability using Microsoft Word and Excel, it is more common that it is done using LaTeX. https://www.latex-project.org/ At DIME we have prepared resources for getting started with LaTeX and how to write fully replicable documents using LaTeX. https://github.com/worldbank/DIME-LaTeX-Templates

While the Microsoft Word and Excel tools might one day become better, there are much more resources for LaTeX online than for any of the new tools. But one day that might have changed and we might recommend another tool.

Version control

  • GitHub
  • Dated copy - but use main copy

Back to Parent

This article is part of the topic Data Analysis

    1. Additional Resources
  • list here other articles related to this topic, with a brief description and link