Difference between revisions of "Stata Coding Practices"

Jump to: navigation, search
(54 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Researchers use Stata in all stages of an '''impact evaluation''' (or study), such as [[Sampling & Power Calculations |sampling]], [[Randomization in Stata | randomizing]], [[Monitoring Data Quality | monitoring data quality]], [[Data Cleaning | cleaning]], and [[Data Analysis | analysis]]. Good '''Stata coding practices''' (including packages and commands) are a critical component of high quality [[Reproducible Research | reproducible research]]. These practices also allow the [[Impact Evaluation Team|impact evaluation team]] (or research team) to save time and energy, and focus on other [[Randomized Evaluations: Principles of Study Design|aspects of study design]].  
Researchers use Stata in all stages of an '''impact evaluation''' (or study), such as [[Sampling & Power Calculations |sampling]], [[Randomization in Stata | randomizing]], [[Monitoring Data Quality | monitoring data quality]], [[Data Cleaning | cleaning]], and [[Data Analysis | analysis]]. Good '''Stata coding practices''', packages, and commands are a critical component of high quality [[Reproducible Research | reproducible research]]. These practices also allow the [[Impact Evaluation Team|impact evaluation team]] (or research team) to save time and energy, and focus on other [[Randomized Evaluations: Principles of Study Design|aspects of study design]].  
==Read First==
==Read First==
* DIME Analytics  and institutions like Innovations for Poverty Action (IPA) offer a wide range of resources - tutorials, sample codes, and easy-to-install packages and commands.
* [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] and institutions like [https://github.com/PovertyAction Innovations for Poverty Action (IPA)] offer a wide range of resources - tutorials, sample codes, and easy-to-install packages and commands.
* <code>[[Iefieldkit|iefieldkit]]</code> is a Stata package that standardizes '''best practices''' for high quality, [[Reproducible Research | reproducible]] [[Primary Data Collection | primary data collection]].
* <code>[https://github.com/worldbank/iefieldkit/ iefieldkit]</code> is a Stata package that standardizes '''best practices''' (guidelines) for high quality, [[Reproducible Research | reproducible]] [[Primary Data Collection | primary data collection]].
* <code>[[Stata Coding Practices#ietoolkit|ietoolkit]]</code> is a Stata package that standardizes '''best practices''' in [[Data Management|data management]] and [[Data Analysis|data analysis]].  
* <code>[https://worldbank.github.io/ietoolkit/ ietoolkit]</code> is a Stata package that standardizes best practices in [[Data Management|data management]] and [[Data Analysis|data analysis]].  
* As with standard Stata packages like <code>coefplot</code>, use <code>ssc install</code> to download these packages.
* As with other Stata packages like [https://www.stata-journal.com/article.html?article=gr0059 <code>coefplot</code>], use <syntaxhighlight lang="Stata" inline>ssc install</syntaxhighlight> to download these packages.
* Other common Stata '''best practices''', for instance, with respect to naming file paths, also contribute to successful impact evaluations.
* Other common Stata best practices, for instance, with respect to naming file paths, also contribute to successful impact evaluations.


== iefieldkit ==
== iefieldkit ==
DIME has developed <code>[[iefieldkit]]</code> as a package to simplify the process of [[Primary Data Collection|primary data collection]]. The package currently supports supports three major components of this '''workflow''' (process) - [[Questionnaire Design|survey design]], [[Iecompdup|survey completion]], and [[Data Cleaning|data cleaning]] and [[Iecodebook#Harmonize| data harmonization]]. <code>[[iefieldkit]]</code> uses four commands to simplify each of these tasks:
DIME has developed the <code>[[iefieldkit]]</code> package for Stata to simplify the process of [[Primary Data Collection|primary data collection]]. The package currently supports three major components of this '''workflow''' (process) - [[Questionnaire Design|survey design]], [[Iecompdup|survey completion]], and [[Data Cleaning|data cleaning]] and [[Iecodebook#Harmonize| data harmonization]]. <code>[[iefieldkit]]</code> uses four commands to simplify each of these tasks:
* '''Before data collection.''' The <code>[[ietestform]]</code> command tests the collected data to make sure it follows '''best practices''' in naming, coding, and labeling. For instance, it does not let an '''enumerator''' move to the next field until they enter a response, thus ensuring that incomplete forms can not be submitted.  
* '''Before data collection.''' The <code>[[ietestform]]</code> command tests the collected data to make sure it follows '''best practices''' in naming, coding, and labeling. For instance, it does not let an '''enumerator''' move to the next field until they enter a response, thus ensuring that incomplete forms can not be submitted.  
* '''During data collection.''' The <code>[[ieduplicates]]</code> and <code>[[Iecompdup|iecompdup]]</code> commands allow the [[Impact Evaluation Team|research team]] to '''detect''' (identify) and '''resolve''' (deal with) duplicate entries in the data set. These commands were previously a part of the <code>[[Stata Coding Practices#ietoolkit|ietoolkit]]</code> package, but are now part of the <code>[[iefieldkit]]</code> package.
* '''During data collection.''' The <code>[[ieduplicates]]</code> and <code>[[Iecompdup|iecompdup]]</code> commands allow the [[Impact Evaluation Team|research team]] to '''detect''' (identify) and '''resolve''' (deal with) duplicate entries in the data set. These commands were previously a part of the <code>[[Stata Coding Practices#ietoolkit|ietoolkit]]</code> package, but are now part of the <code>[[iefieldkit]]</code> package.
* '''After data collection.''' The <code>[[iecodebook]]</code> command provides a method for rapidly [[Data Cleaning|cleaning]], [[iecodebook#Harmonize|harmonizing]], and [[Data Documentation|documenting]] data sets.  
* '''After data collection.''' The <code>[[iecodebook]]</code> command provides a method for rapidly [[Data Cleaning|cleaning]], [[iecodebook#Harmonize|harmonizing]], and [[Data Documentation|documenting]] data sets.  
 
To install the <code>[[iefieldkit]]</code> package, type <syntaxhighlight lang="Stata" inline>ssc install iefieldkit</syntaxhighlight> in your Stata command window. Note that some features of this package might require '''meta data''' (information) that is specific to '''SurveyCTO''', but users can still test them in other cases.
To install the <code>[[iefieldkit]]</code> package, type <code>ssc install iefieldkit</code> in your Stata command window. Note that some features of the package might require meta data specific to SurveyCTO, but feel free to try these commands on any use case.


== ietoolkit ==
== ietoolkit ==
DIME has developed the <code>ietoolkit</code> package for Stata, to simplify the process of [[Data Management|data management]] and [[Data Analysis|analysis]] in impact evaluations. The list of commands given below will be extended continuously, and suggestions for new commands are always appreciated.  
DIME has developed the <code>[[Ietoolkit|ietoolkit]]</code> package for Stata to simplify the process of [[Data Management|data management]] and [[Data Analysis|analysis]] in impact evaluations. Given below are the list of commands that are currently part of this package.  
 
* '''Data management.'''
Commands for data management currently include:
** <code>[[iefolder]]</code> sets up a '''standardized''' (common) structure for all folders that are shared as part of a project, that is the '''project folder'''. It creates [[Master Do-files|master do-files]] that link to all '''sub-folders''' (folders within another folder), so that the project folder is automatically updated every time more data or files are shared from the '''field teams'''. This command helps create [[Reproducible Research|reproducible research]].
*<code>[[iefolder]]</code>, which sets up project folders and creates master do-files that link to all sub-folders;
** <code>[[iegitaddmd]]</code> allows members of the research team to share a '''template''' (outline) folder for a new project on GitHub even if it is empty. This command creates a '''placeholder''' that can be updated later when a file is added to that folder. For example, templates often include an output folder where the results of [[Data Analysis|data analysis]] will be stored. This folder remains empty until the data set is [[Data Cleaning|cleaned]] to prepare it for analysis. Using this command, two people, say A and B, can still share this folder with each other on GitHub.
*<code>[[iegitaddmd]]</code>, which adds a placeholder file to empty folders so that folder structures with empty folders can be shared on GitHub; and
** <code>[[ieboilstart]]</code> standardizes the '''version''', '''capacity''' (in terms of the number of observations it can store in memory), and other Stata settings for all users in a project. This command should be '''run''' (typed) at the top of all do-files that are shared between members of the [[Impact Evaluation Team|research team]]. Such a code is called a '''boilerplate code''', since it standardizes the code at the beginning for all do-files.  
*<code>[[ieboilstart]]</code>, which standardizes the boilerplate code at the top of all do-files.  
An example of a code that uses these commands is given below:
 
<syntaxhighlight lang="stata" line>ieboilstart, version(14.0) //Standardizes the version for everyone.
Commands for data analysis currently include:
*<code>[[iematch]]</code>, an algorithm for matching observations in one group to "the most similar" observations in another group;
*<code>[[iebaltab]]</code>, which runs balance test regressions and outputs the result in well formatted balance tables;
*<code>[[iedropone]]</code>, which drops observations and controls that the correct number was dropped;
*<code>[[ieboilsave]]</code>, which performs checks before saving a data set; *<code>[[ieddtab]]</code>, which runs [[Difference-in-Differences | difference in differences]] regressions and outputs the result in well formatted tables; and *<code>[[iegraph]]</code>, which produces graphs of estimation results in common impact evaluation regression models
 
To install the <code>ietoolkit</code>, type <code>ssc install ietoolkit</code> in your Stata command window. For more details, see the [https://worldbank.github.io/ietoolkit/ <code>ietoolkit</code> GitHub page].
 
==Common Stata Practices==
 
===File Paths===
DIME Analytics' recommendation is that all file paths should be absolute and dynamic, should always be enclosed in double quotes, and should always use forward slashes for folder hierarchies (<code>/</code>), since Mac and Linux computers cannot read file paths with backslashes. File paths should also always include the file extension (<code>.dta</code>, <code>.do</code>, <code>.csv</code>, etc.), since to omit the extension causes ambiguity if another file with the same name is created (even if there is a default).
 
* Absolute file paths means that all file paths must start at the root folder of the computer, for example, <code>C:/</code> on a PC or <code>/Users/</code> on a Mac. This makes sure that you always get the correct file in the correct folder. We never use <code>cd</code>. We have seen many cases when using <code>cd</code> where a file has been overwritten in another project folder where <code>cd</code> was currently pointing to. Relative file paths are common in many other programming languages, but there they are relative to the location of the file running the code, and then there is no risk that a file is saved in a completely different folder. Stata does not provide this functionality.


* Dynamic file paths use globals that are set in a central master do-file to dynamically build your file paths. This has the same function in practice as setting <code>cd</code>, as all new users should only have to change these file path globals in one location. But dynamic absolute file paths are a better practice since if the global names are set uniquely there is no risk that files are saved in the incorrect project folder, and you can create multiple folder globals instead of just one location as with <code>cd</code>.
global folder "C:/Users/username/DropBox/ProjectABC"


====Examples====
iefolder new project, projectfolder("$folder") //Sets up the main structure
*Dynamic (and absolute) file path - RECOMMENDED
iegitaddmd, folder ("$folder") //Makes sure users can share the main folder on GitHub even if it is empty </syntaxhighlight>
* '''Data analysis.''' 
** <code>[[iematch]]</code> is a command which can be used for matching observations in one group to observations in another group which are the closest in terms of a particular characteristic. <br>For example, consider a study which is designed to evaluate the impact of randomly providing cash transfers to half the workers in a firm. The research team can use <code>[[iematch]]</code> to match and compare wages of women in the '''treatment''' group (which received the cash transfers) with observations in a '''control''' group (which did not receive the cash transfers).
** <code>[[iebaltab]]</code> runs [[Balance tests|balance tests]], and produces '''balance tables''' which show the difference in means for one or more '''treatment''' groups. It can be used to check if there are '''statistically significant''' differences between the '''treatment''' and '''control''' groups. In case there are significant differences in the means, <code>[[iebaltab]]</code> even displays an error message that suggests that results from such data can be wrongly interpreted.
** <code>[[iedropone]]</code> drops only a specific number of observations, and makes sure that no additional observations are dropped.
** <code>[[ieboilsave]]</code> performs checks to ensure that '''best practices''' are followed before saving a data set.
** <code>[[ieddtab]]</code> runs [[Difference-in-Differences | difference-in-difference]] regressions and displays the result in well-formatted tables.
** <code>[[iegraph]]</code> produces graphs of results from regression models that researchers commonly use during impact evaluations.
To install the <code>ietoolkit</code>, type <syntaxhighlight lang="Stata" inline>ssc install ietoolkit</syntaxhighlight> in your Stata command window.


    <code>global myDocs   "C:/Users/username/Documents"
== File Paths==
    global myProject "${myDocs}/MyProject"
DIME Analytics suggests the following guidelines for specifying '''file paths''' in Stata:
    use "${myProject}/MyDataset.dta"</code>
* '''Double quotes (<code>"</code>).''' Always enclose file paths in double quotes (<code>"</code>) . For example, <syntaxhighlight lang="Stata" inline>"$maindir"</syntaxhighlight>.
* '''Forward slashes (<code>/</code>).''' Always use forward slashes (<code>/</code>) to specify folder '''hierarchies''', that is, the exact location of a folder inside another folder, and so on. For example, <code>"C:/Users/username/Documents"</code>. This is important because Mac and Linux computers cannot read file paths with '''back slashes'''(<code>\</code>).
* '''File extension.''' Always include the file extension in the file path, such as <code>.dta</code>, <code>.do</code>, or <code>.csv</code>. This helps to avoid '''ambiguity''' (or doubt) if another file with the same name exists.
* '''Absolute.''' File paths must be '''absolute''', that is, all file paths must begin from the '''root folder''' of the computer, for example, <code>C:/</code> on a PC or <code>/Users/</code> on a Mac. This ensures that users are always specifying the correct file and the correct folder. While '''relative''' (non-absolute) file paths are common in many other programming languages, Stata does not allow users to specify relative file paths.
* '''Dynamic.''' A file path is said to '''non-dynamic''' if it uses <code>cd</code> to specify the path. File paths should however always be '''dynamic'''. This means that users must use '''globals''' (global macros) that are located in the [[Master Do-files|master do-files]] to specify file paths. These file paths are called '''dynamic''' because users can change the file path as and when required, simply by changing only the name of the folder '''global''' in the '''master do-file'''. They also allow users to create more than one folder '''global''' using global macros, which is not the possible when working with non-dynamic file paths.
* Never use <code>cd</code> since there can be cases where a user accidentally overwrites a file in the project folder which the <code>cd</code> initially referred to. It is always better to use'''absolute''' and '''dynamic''' file paths, since there is no risk of files getting saved in the wrong project folder (as long as the global macro has a unique name).
=== Examples ===
* Dynamic and absolute file path.
<syntaxhighlight lang="stata" line>global myDocs "C:/Users/username/Documents"
global myProject "${myDocs}/MyProject"
use "${myProject}/MyDataset.dta"</syntaxhighlight>
* Non-absolute, non-dynamic file path.
<syntaxhighlight lang="stata" line>cd "C:/Users/username/Documents/MyProject"
use MyDataset.dta</syntaxhighlight>
* Absolute, but non-dynamic file path.
<syntaxhighlight lang="stata" line>cd "C:/Users/username/Documents/MyProject"
use "C:/Users/username/Documents/MyProject/MyDataset.dta"</syntaxhighlight>


* Relative (and absolute) file path - NOT RECOMMENDED
== Exporting Tables ==
    <code>cd "C:/Users/username/Documents/MyProject"
Tables play a crucial role in representing the results of a study in an easy-to-understand format. However, it is common to copy-and-paste results from Stata, and format them in a word-processing software, which affects the [[Reproducible Research|reproducibility of research]]. [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] has therefore created the following resources for exporting tables in Stata:
    use MyDataset.dta</code>
* [[Checklist:_Submit_Table|Checklist for submitting tables in development research]]
* [https://osf.io/78nuc/ Nice and fast tables in Stata for LaTex and Excel]
* [https://github.com/worldbank/stata-tables GitHub - Stata tables] is a repository with do-files and output tables. Use these to practice exporting tables using the <code>esttab</code> command.
* [https://blogs.worldbank.org/impactevaluations/nice-and-fast-tables-stata Blog post on Stata tables]


* Absolute but not dynamic - NOT RECOMMENDED
== Related Pages ==
    <code> use "C:/Users/username/Documents/MyProject/MyDataset.dta"</code>
[[Special:WhatLinksHere/Stata_Coding_Practices|Click here for pages that link to this topic.]]


== Additional Resources ==
== Additional Resources ==
===Programs and Commands===
* DIME Analytics (World Bank), [https://gist.github.com/kbjarkefur/1f880b78029eaf78416d12dfd2076985 Writing programs in Stata]. This GitHub repository contains <code>.ado</code> files that reduce tedious programming tasks like statistical analysis and the production of graphs to a single line of command. You can use this repository to experiment with various commands.
*You can find a broad variety of Stata commands in this World Bank repository, [https://gist.github.com/kbjarkefur/1f880b78029eaf78416d12dfd2076985 How to Write Programs in Stata], which contains ado files for commands useful for data management, statistical analysis, and the production of graphics. In many cases, these adofiles reduce the production of routine items from a tedious programming task to a single command line (i.e. data import and cleaning; production of summary statistics [[Checklist: Submit Table | table]]; and categorical bar charts with confidence intervals.
* DIME Analytics (World Bank, [https://gist.github.com/kbjarkefur/16b63c1fc89ab52c3d4cae9c74288452 Sharing sub-functions between different commands]. Download the <code>.ado</code> files and follow the instructions.
*You can experiment with and build upon DIME Analytics’ [https://gist.github.com/kbjarkefur/1f880b78029eaf78416d12dfd2076985 Intro to how to write programs (also called commands or functions) in Stata] and  [https://gist.github.com/kbjarkefur/16b63c1fc89ab52c3d4cae9c74288452 Share functions (sub-programs) between command in the same package]. Download the files and read the instructions.
* DIME Analytics (World Bank), [https://worldbank.github.io/Stata-IE-Visual-Library/ Stata visual library]
*This DIME Analytics [https://worldbank.github.io/Stata-IE-Visual-Library/ Stata IE Visual Library repository] hosts Stata Graph examples on GitHub; feel free to submit your own example codes there.
* DIME Analytics (World Bank), [https://github.com/worldbank/DIME-Resources/blob/master/welcome-iefolder.pdf Data Management and <code>iefolder</code>]
*Innovations for Poverty Action's [http://www.poverty-action.org/researchers/research-resources/stata-programs Stata modules for data collection and analysis] and [https://github.com/PovertyAction GitHub page] host programs for impact evaluations
* DIME Analytics (World Bank), [https://github.com/worldbank/DIME-Resources/blob/master/stata2-3-data.pdf Data management for reproducible research].
*Innovations for Poverty Action's [https://github.com/PovertyAction/odkmeta odkmeta command] writes a do-file to import ODK data to Stata, using the metadata from the survey and choices worksheets of the XLSForm.
* DIME Analytics (World Bank), [https://github.com/worldbank/DIME-Resources/blob/master/stata1-4-quality.pdf Real-time data quality checks and <code>ietoolkit</code>].
*Read more on <code>iefolder</code> in DIME Analytics’ presentations [https://github.com/worldbank/DIME-Resources/blob/master/welcome-iefolder.pdf here] and [https://github.com/worldbank/DIME-Resources/blob/master/stata2-3-data.pdf here].
* DIME Analytics (World Bank), [https://github.com/worldbank/DIME-Resources/blob/master/stata1-2-coding.pdf Guide to Stata coding]  
*Read more on <code>ietoolkit</code> in DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/stata1-4-quality.pdf Real Time Data Quality Checks].
* DIME Analytics (World Bank), [https://github.com/worldbank/DIME-Resources/blob/master/stata1-3-cleaning.pdf Guide to data cleaning in Stata].
*Check out The World Bank's [https://worldbank.github.io/stata/ Stata GitHub].
* DIME (World Bank), [[Checklist: Submit Table| Checklist on submitting results.]]
 
* Gentzkow and Shapiro (Stanford) [http://web.stanford.edu/~gentzkow/research/CodeAndData.pdf Code and Data for the Social Sciences]
===General Coding Resources===
* The GeoCenter, [http://geocenter.github.io/StataTraining/portfolio/01_resource/  Stata cheat sheets.]
*Read DIME Analytics' guide to Stata [https://github.com/worldbank/DIME-Resources/blob/master/stata1-2-coding.pdf coding] and [https://github.com/worldbank/DIME-Resources/blob/master/stata1-3-cleaning.pdf cleaning].
* Innovations for Poverty Action, [http://www.poverty-action.org/researchers/research-resources/stata-programs Stata modules for data collection and analysis]  
*Refer to these [http://geocenter.github.io/StataTraining/portfolio/01_resource/  Stata cheat sheets] on GitHub.
* Innovations for Poverty Action, [https://github.com/PovertyAction GitHub repository on impact evaluations]
*Gentzkow and Shapiro's [http://web.stanford.edu/~gentzkow/research/CodeAndData.pdf Code and Data for the Social Sciences] is a handbook for best practices.  
* Innovations for Poverty Action, [https://github.com/PovertyAction/odkmeta Odkmeta command]. This command writes a do-file to import ODK (Open Data Kit) data to Stata, using metadata from the survey and choices worksheets of the XLSForm.
*Poverty Action Lab's [https://www.povertyactionlab.org/sites/default/files/resources/IAPStataWorkshopSlides.pdf Programming with Stata], Princeton's [https://www.princeton.edu/~otorres/StataTutorial.pdf Getting Started in Data Analysis Using Stata] and Standford's [https://web.stanford.edu/~leinav/teaching/econ257/STATA.pdf Basics of Stata] provide resources for beginning and intermediate Stata users.
* J-PAL, [https://www.povertyactionlab.org/sites/default/files/resources/IAPStataWorkshopSlides.pdf Programming with Stata]
For more details, see the [https://github.com/worldbank/iefieldkit/ <code>iefieldkit</code> GitHub page].
* Princeton, [https://www.princeton.edu/~otorres/StataTutorial.pdf Data analysis in Stata for beginners]  
[[Category: Stata ]]
* Standford, [https://web.stanford.edu/~leinav/teaching/econ257/STATA.pdf Basics of Stata]  
* World Bank, [https://worldbank.github.io/stata/ Stata repository].
[[Category: Coding Practices]]
[[Category: Reproducible Research]]
[[Category: Stata Coding Practices]]
[[Category: Technical Tools]]

Revision as of 16:31, 14 April 2021

Researchers use Stata in all stages of an impact evaluation (or study), such as sampling, randomizing, monitoring data quality, cleaning, and analysis. Good Stata coding practices, packages, and commands are a critical component of high quality reproducible research. These practices also allow the impact evaluation team (or research team) to save time and energy, and focus on other aspects of study design.

Read First

iefieldkit

DIME has developed the iefieldkit package for Stata to simplify the process of primary data collection. The package currently supports three major components of this workflow (process) - survey design, survey completion, and data cleaning and data harmonization. iefieldkit uses four commands to simplify each of these tasks:

  • Before data collection. The ietestform command tests the collected data to make sure it follows best practices in naming, coding, and labeling. For instance, it does not let an enumerator move to the next field until they enter a response, thus ensuring that incomplete forms can not be submitted.
  • During data collection. The ieduplicates and iecompdup commands allow the research team to detect (identify) and resolve (deal with) duplicate entries in the data set. These commands were previously a part of the ietoolkit package, but are now part of the iefieldkit package.
  • After data collection. The iecodebook command provides a method for rapidly cleaning, harmonizing, and documenting data sets.

To install the iefieldkit package, type ssc install iefieldkit in your Stata command window. Note that some features of this package might require meta data (information) that is specific to SurveyCTO, but users can still test them in other cases.

ietoolkit

DIME has developed the ietoolkit package for Stata to simplify the process of data management and analysis in impact evaluations. Given below are the list of commands that are currently part of this package.

  • Data management.
    • iefolder sets up a standardized (common) structure for all folders that are shared as part of a project, that is the project folder. It creates master do-files that link to all sub-folders (folders within another folder), so that the project folder is automatically updated every time more data or files are shared from the field teams. This command helps create reproducible research.
    • iegitaddmd allows members of the research team to share a template (outline) folder for a new project on GitHub even if it is empty. This command creates a placeholder that can be updated later when a file is added to that folder. For example, templates often include an output folder where the results of data analysis will be stored. This folder remains empty until the data set is cleaned to prepare it for analysis. Using this command, two people, say A and B, can still share this folder with each other on GitHub.
    • ieboilstart standardizes the version, capacity (in terms of the number of observations it can store in memory), and other Stata settings for all users in a project. This command should be run (typed) at the top of all do-files that are shared between members of the research team. Such a code is called a boilerplate code, since it standardizes the code at the beginning for all do-files.

An example of a code that uses these commands is given below:

ieboilstart, version(14.0) //Standardizes the version for everyone.

global folder "C:/Users/username/DropBox/ProjectABC" 

iefolder new project, projectfolder("$folder") //Sets up the main structure
 
iegitaddmd, folder ("$folder") //Makes sure users can share the main folder on GitHub even if it is empty
  • Data analysis.
    • iematch is a command which can be used for matching observations in one group to observations in another group which are the closest in terms of a particular characteristic.
      For example, consider a study which is designed to evaluate the impact of randomly providing cash transfers to half the workers in a firm. The research team can use iematch to match and compare wages of women in the treatment group (which received the cash transfers) with observations in a control group (which did not receive the cash transfers).
    • iebaltab runs balance tests, and produces balance tables which show the difference in means for one or more treatment groups. It can be used to check if there are statistically significant differences between the treatment and control groups. In case there are significant differences in the means, iebaltab even displays an error message that suggests that results from such data can be wrongly interpreted.
    • iedropone drops only a specific number of observations, and makes sure that no additional observations are dropped.
    • ieboilsave performs checks to ensure that best practices are followed before saving a data set.
    • ieddtab runs difference-in-difference regressions and displays the result in well-formatted tables.
    • iegraph produces graphs of results from regression models that researchers commonly use during impact evaluations.

To install the ietoolkit, type ssc install ietoolkit in your Stata command window.

File Paths

DIME Analytics suggests the following guidelines for specifying file paths in Stata:

  • Double quotes ("). Always enclose file paths in double quotes (") . For example, "$maindir".
  • Forward slashes (/). Always use forward slashes (/) to specify folder hierarchies, that is, the exact location of a folder inside another folder, and so on. For example, "C:/Users/username/Documents". This is important because Mac and Linux computers cannot read file paths with back slashes(\).
  • File extension. Always include the file extension in the file path, such as .dta, .do, or .csv. This helps to avoid ambiguity (or doubt) if another file with the same name exists.
  • Absolute. File paths must be absolute, that is, all file paths must begin from the root folder of the computer, for example, C:/ on a PC or /Users/ on a Mac. This ensures that users are always specifying the correct file and the correct folder. While relative (non-absolute) file paths are common in many other programming languages, Stata does not allow users to specify relative file paths.
  • Dynamic. A file path is said to non-dynamic if it uses cd to specify the path. File paths should however always be dynamic. This means that users must use globals (global macros) that are located in the master do-files to specify file paths. These file paths are called dynamic because users can change the file path as and when required, simply by changing only the name of the folder global in the master do-file. They also allow users to create more than one folder global using global macros, which is not the possible when working with non-dynamic file paths.
  • Never use cd since there can be cases where a user accidentally overwrites a file in the project folder which the cd initially referred to. It is always better to useabsolute and dynamic file paths, since there is no risk of files getting saved in the wrong project folder (as long as the global macro has a unique name).

Examples

  • Dynamic and absolute file path.
global myDocs "C:/Users/username/Documents"
global myProject "${myDocs}/MyProject"
use "${myProject}/MyDataset.dta"
  • Non-absolute, non-dynamic file path.
cd "C:/Users/username/Documents/MyProject"
use MyDataset.dta
  • Absolute, but non-dynamic file path.
cd "C:/Users/username/Documents/MyProject" 
use "C:/Users/username/Documents/MyProject/MyDataset.dta"

Exporting Tables

Tables play a crucial role in representing the results of a study in an easy-to-understand format. However, it is common to copy-and-paste results from Stata, and format them in a word-processing software, which affects the reproducibility of research. DIME Analytics has therefore created the following resources for exporting tables in Stata:

Related Pages

Click here for pages that link to this topic.

Additional Resources