Stata Coding Practices
Stata is used in all stages of an impact evaluation: sampling, randomizing, monitoring, cleaning, and analyzing. Good Stata coding practices, packages, and commands are not only a critical component of high quality, reproducible research, but they are also key in saving the research team time, energy, and sanity. This page outlines a number of packages and commands developed by DIME and externally for use in impact evaluations. For additional resources on Stata coding, see Additional Resources.
iefieldkitis a Stata package for primary data collection. It currently supports three major components of that workflow: survey design; survey completion; and data cleaning and survey harmonization.
ietoolkitis a Stata package for data management and analysis in impact evaluations.
- You can install both packages using
- DIME Analytics, alongside institutions like Innovations for Poverty Action, offers a wide range of resources on code, packages and commands -- from tutorials to code samples to installable packages and commands. See Additional Resources for more.
Packages for Impact Evaluations
iefieldkit is a Stata package developed by DIME for primary data collection. The package currently supports three major components of that workflow: survey design; survey completion; and data cleaning and survey harmonization.
iefieldkit performs the following three tasks:
- Before data collection ,
ietestformcomplements the ODK syntax test on SurveyCTO server. It runs tests to inform researchers how to use ODK programming language features to ensure high data quality. This command is especially useful if the data that will be imported to Stata has other restrictions in addition to ODK syntax.
- During data collection,
iecompdup(both previously released as a part of the package ietoolkit but now moved to this package) provide a workflow for detecting and resolving duplicate entries in the dataset. These commands ensure that the final survey dataset is a correct record of the survey sample that the researcher can then merge into the master sampling database.
- After data collection,
iecodebookprovides a workflow for rapidly cleaning, harmonizing, and documenting datasets.
iecodebookuses input specified in an Excel sheet, which provides a much more well-structured and easy to follow (especially for non-technical users) overview than the same operations written directly to a dofile.
To install the package, type
ssc install iefieldkit in your Stata command window. Note that some features of the package might require meta data specific to SurveyCTO, but feel free to try these commands on any use case. For more details, see the
iefieldkit GitHub page.
ietoolkit is a Stata package developed by DIME for data management and analysis in impact evaluations. The list of commands will be extended continuously, and suggestions for new commands are always appreciated.
Its commands for data management currently include
iefolder, which sets up project folders and creates master do-files that link to all sub-folders;
iegitaddmd, which adds a placeholder file to empty folders so that folder structures with empty folders can be shared on GitHub; and
ieboilstart, which standardizes the boilerplate code at the top of all do-files. Its commands for data analysis currently include
iematch, an algorithm for matching observations in one group to "the most similar" observations in another group;
iebaltab, which runs balance test regressions and outputs the result in well formatted balance tables;
iedropone, which drops observations and controls that the correct number was dropped;
ieboilsave, which performs checks before saving a data set;
ieddtab, which runs difference in differences regressions and outputs the result in well formatted tables; and
iegraph, which produces graphs of estimation results in common impact evaluation regression models
To install the
ssc install ietoolkit in your Stata command window. For more details, see the
ietoolkit GitHub page.
Programs and Commands
- You can find a broad variety of Stata commands in this World Bank repository, How to Write Programs in Stata, which contains ado files for commands useful for data management, statistical analysis, and the production of graphics. In many cases, these adofiles reduce the production of routine items from a tedious programming task to a single command line (i.e. data import and cleaning; production of summary statistics table; and categorical bar charts with confidence intervals.
- You can experiment with and build upon DIME Analytics’ Intro to how to write programs (also called commands or functions) in Stata and Share functions (sub-programs) between command in the same package. Download the files and read the instructions.
- This DIME Analytics Stata IE Visual Library repository hosts Stata Graph examples on GitHub; feel free to submit your own example codes there.
- Innovations for Poverty Action's Stata modules for data collection and analysis and GitHub page host programs for impact evaluations
- Innovations for Poverty Action's odkmeta command writes a do-file to import ODK data to Stata, using the metadata from the survey and choices worksheets of the XLSForm.
- Read more on
iefolderin DIME Analytics’ presentations here and here.
- Read more on
ietoolkitin DIME Analytics’ Real Time Data Quality Checks.
- Check out The World Bank's Stata GitHub.
General Coding Resources
- Read DIME Analytics' guide to Stata coding and cleaning.
- Refer to these Stata cheat sheets on GitHub.
- Gentzkow and Shapiro's Code and Data for the Social Sciences is a handbook for best practices.
- Poverty Action Lab's Programming with Stata, Princeton's Getting Started in Data Analysis Using Stata and Standford's Basics of Stata provide resources for beginning and intermediate Stata users.