Difference between revisions of "Ietoolkit"

Jump to: navigation, search
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
Stub.
[https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] has developed the <code>ietoolkit</code> package for Stata to simplify the process of [[Data Management|data management]] and [[Data Analysis|analysis]] in impact evaluations. Along with <code>iefieldkit</code>, this package allows [[Impact Evaluation Team|research teams]] to perform highly repetitive but important processes in [[Primary Data Collection|primary data collection]], with an aim to promote high quality [[Reproducible Research|reproducible research]].
==Read First==
* [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] has conducted a [https://osf.io/csmxz/ bootcamp on reproducible research] which establishes standard best practices in development research.
* [[Stata Coding Practices|Stata coding practices]] lists common best practices for writing reproducible and replicable Stata '''do-files'''.
* You can [https://github.com/worldbank/ietoolkit/blob/master/CONTRIBUTING.md contribute] to improving future updates of <code>ietoolkit</code> using this [https://github.com/worldbank/ietoolkit Github repository] maintained by [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics].
* To install the package, type <syntaxhighlight lang="Stata" inline>ssc install ietoolkit</syntaxhighlight> in the Stata command box.
== Data Management ==
The following set of commands in <code>ietoolkit</code> allow the [[Impact Evaluation Team|research team members]] to tackle various aspects of [[Data Management|data management]]:
* <code>[[iefolder]]</code>: Sets up a '''standardized''' (common) structure for all folders that are shared as part of a project, that is the [[DataWork Folder|project folder]]. Creates [[Master Do-files|master do-files]] that link to all '''sub-folders''', so that the '''project folder''' is automatically updated every time more data or files are shared from the '''field teams'''.
* <code>[[iegitaddmd]]</code>: Allows members of the research team to share a '''template''' (outline) folder for a new project on GitHub even if it is empty. This command creates a '''placeholder''' that can be updated later when a file is added to that folder. For example, templates often include an output folder where the results of [[Data Analysis|data analysis]] will be stored. This folder remains empty until the data set is [[Data Cleaning|cleaned]] to prepare it for analysis. Using this command, two people, say A and B, can still share this folder with each other on GitHub.
* <code>[[ieboilstart]]</code>: Standardizes the '''version''', '''capacity''' (in terms of the number of observations it can store in memory), and other Stata settings for all users in a project. This command should be run at the top of all do-files that are shared between members of the [[Impact Evaluation Team|research team]]. Such a code is called a '''boilerplate code''', since it standardizes the code at the beginning for all do-files.
An example of a code that uses these commands is given below:
<syntaxhighlight lang="Stata"> ieboilstart, version(14.0) //Standardizes the version for everyone.
 
global folder "C:/Users/username/DropBox/ProjectABC"
 
iefolder new project, projectfolder("$folder") //Sets up the main structure
iegitaddmd, folder ("$folder") //Makes sure users can share the main folder on
GitHub even if it is empty </syntaxhighlight>
 
== Data Analysis ==
The following set of commands in <code>ietoolkit</code> allow the [[Impact Evaluation Team|research team members]] to tackle various aspects of [[Data Analysis|data analysis]]:
* <code>[[iematch]]</code>: Matches observations in one group to observations in another group which are the closest in terms of a particular characteristic. <br>For example, consider a study which is designed to evaluate the impact of randomly providing cash transfers to half the workers in a firm. The research team can use <code>[[iematch]]</code> to match and compare wages of women in the '''treatment''' group (which received the cash transfers) with observations in a '''control''' group (which did not receive the cash transfers).
* <code>[[iebaltab]]</code>: Runs [[Balance tests|balance tests]], and produces '''balance tables''' which show the difference in means for one or more '''treatment''' groups. It can be used to check if there are '''statistically significant''' differences between the '''treatment''' and '''control''' groups. In case there are significant differences in the means, <code>[[iebaltab]]</code> even displays an error message that suggests that results from such data can be wrongly interpreted.
* <code>[[iedropone]]</code>: Drops only a specific number of observations, and makes sure that no additional observations are dropped.
* <code>[[ieboilsave]]</code>: Performs checks to ensure that '''best practices''' are followed before saving a data set.
* <code>[[ieddtab]]</code>: Runs [[Difference-in-Differences | difference-in-difference]] regressions and displays the result in well-formatted tables.
* <code>[[iegraph]]</code>: Produces graphs of results from regression models that researchers commonly use during impact evaluations.
'''NOTE:''' To install the above commands in the <code>ietoolkit</code> package, type <syntaxhighlight lang="Stata" inline>ssc install ietoolkit</syntaxhighlight> in your Stata command window.
 
== Related Pages ==
[[Special:WhatLinksHere/Ietoolkit|Click here for pages that link to this topic.]]
 
== Additional Resources ==
* DIME Analytics (World Bank), [https://github.com/worldbank/ietoolkit Ietoolkit - Stata commands for impact evaluations]
[[Category: Reproducible Research]]
[[Category:Stata Coding Practices]]

Latest revision as of 14:24, 13 April 2021

DIME Analytics has developed the ietoolkit package for Stata to simplify the process of data management and analysis in impact evaluations. Along with iefieldkit, this package allows research teams to perform highly repetitive but important processes in primary data collection, with an aim to promote high quality reproducible research.

Read First

Data Management

The following set of commands in ietoolkit allow the research team members to tackle various aspects of data management:

  • iefolder: Sets up a standardized (common) structure for all folders that are shared as part of a project, that is the project folder. Creates master do-files that link to all sub-folders, so that the project folder is automatically updated every time more data or files are shared from the field teams.
  • iegitaddmd: Allows members of the research team to share a template (outline) folder for a new project on GitHub even if it is empty. This command creates a placeholder that can be updated later when a file is added to that folder. For example, templates often include an output folder where the results of data analysis will be stored. This folder remains empty until the data set is cleaned to prepare it for analysis. Using this command, two people, say A and B, can still share this folder with each other on GitHub.
  • ieboilstart: Standardizes the version, capacity (in terms of the number of observations it can store in memory), and other Stata settings for all users in a project. This command should be run at the top of all do-files that are shared between members of the research team. Such a code is called a boilerplate code, since it standardizes the code at the beginning for all do-files.

An example of a code that uses these commands is given below:

 ieboilstart, version(14.0) //Standardizes the version for everyone.

 global folder "C:/Users/username/DropBox/ProjectABC" 

 iefolder new project, projectfolder("$folder") //Sets up the main structure
 
 iegitaddmd, folder ("$folder") //Makes sure users can share the main folder on 
 GitHub even if it is empty

Data Analysis

The following set of commands in ietoolkit allow the research team members to tackle various aspects of data analysis:

  • iematch: Matches observations in one group to observations in another group which are the closest in terms of a particular characteristic.
    For example, consider a study which is designed to evaluate the impact of randomly providing cash transfers to half the workers in a firm. The research team can use iematch to match and compare wages of women in the treatment group (which received the cash transfers) with observations in a control group (which did not receive the cash transfers).
  • iebaltab: Runs balance tests, and produces balance tables which show the difference in means for one or more treatment groups. It can be used to check if there are statistically significant differences between the treatment and control groups. In case there are significant differences in the means, iebaltab even displays an error message that suggests that results from such data can be wrongly interpreted.
  • iedropone: Drops only a specific number of observations, and makes sure that no additional observations are dropped.
  • ieboilsave: Performs checks to ensure that best practices are followed before saving a data set.
  • ieddtab: Runs difference-in-difference regressions and displays the result in well-formatted tables.
  • iegraph: Produces graphs of results from regression models that researchers commonly use during impact evaluations.

NOTE: To install the above commands in the ietoolkit package, type ssc install ietoolkit in your Stata command window.

Related Pages

Click here for pages that link to this topic.

Additional Resources