Iegraph

Jump to: navigation, search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

iegraph is used to graphically visualize regression results for some regression models commonly used in impact evaluations. This article is meant to describe use cases, work flow, and the reasoning used when developing the command. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing help iegraph in Stata. This command is a part of the package ietoolkit. To install all the commands in this package, type ssc install ietoolkit in Stata.

Intended use cases

This generates a graph from regression estimations. This command is used to work with two specific models common in impact evaluations, but it is possible that that there are more regression models for which this command works.

OLS with Treatment Dummies Model

The first regression model, let's call it dummy OLS for short, is a specification where each treatment arm is represented by a dummy. See the equation below.

The dummy OLS has one tmt variable for each treatment arm. The omitted category is intended to be the control group. The number of treatment dummies has to be at least one and is only limited to the number of dummies that can be displayed on the graph without it getting too cluttered. The specification may include control variables, fixed effects etc., which is represented by the vector X in the equation.

Difference-in-Differences Model

The second regression model is a difference-in-differences model, let's call it diff-in-diff for short, where treatment is the dummy D and time is the dummy T.

Both the treatment dummy and the time dummy are included in the regression as well as the interaction term between them (D, T and DT in the equation). The specification may include control variables, fixed effects etc., which is represented by the vector X in the equation.

Intended Work Flow

Simply run the regression using the regress command in Stata, and run iegraph immediately afterwards .

Instructions

These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata see the help files by typing help iegrpah in Stata.

Values In The Graph

One important note is that the values used in the graph are exactly the same as the coefficients for the treatment dummy (or dummies) in the dummy OLS and the treatment and time dummies in the diff-in-diff only if no control variables, fixed effects etc., were used. To make the graph more easily interpreted by a non-technical audience -- but still correct and equally informative to a technical audience -- the omitted category (the control group in the dummy OLS and control group in time = 0 in the diff-in-diff) is the average value of Y in that group and not the A coefficient. This is also the starting point of the other values.

If there were no control variables, fixed effects etc., the average of Y for the omitted category is equal to the A coefficient, but that is only true in this very specific case. If we use the A coefficient together with control variables and fixed effects, we risk ending up with values in the graph that might not make sense, for example negative harvest values or a negative number of pre-natal visits. A technical audience would know that the impact of the treatment can still be read from such a graph, but a non-technical audience would be confused. It is likely rare that the A coefficient shifts so much that harvest or visits becomes negative, but it will shift away from its true value to the degree that non-technical readers might find the absolute values not credible and then not trust the rest of the analysis. That is why the omitted category is represented by its average value of Y.

The other categories are represented by the average value of Y in the control group plus the value of the coefficient of the corresponding dummy variable in the regression. This way the impact is clearly shown (the difference between this value and the omitted category) but since the starting point is the average of Y for the omitted category, the absolute value in the graph is close to the average value of that category.

List of dummies

When using iegraph, you always have to list the treatment dummy variables (and the time and interaction dummies if you ran a diff-in-diff) as the variable list looks like this: iegraph T1 T2 T3 where T1, T2 and T3 are treatment dummies. This is the only way that iegraph knows which coefficients are the treatment dummies and which coefficients are control variables, fixed effects etc. Only the treatment dummy (and the time and interaction dummies in diff-in-diff) will be displayed in the graph.

iegraph tests that the dummies fit either of the two models this command has been developed to work with. The command tests if one of these two sets of criteria are true in regards to the dummies. Otherwise, an error is returned (see below table for option how to disable this test).

Dummy OLS Diff-in-Diff
  • Some observations have the value 0 for all treatment dummies - control observations
  • No observation has the value 1 in more than one treatment dummy - no observation can be in be in two treatment arms
  • For all treatment dummies, there are at least some observations that have the value 1 - at least some observations in each treatment arm
  • Some observations have the value 0 for all dummies- omitted controls observations in time = 0
  • Some observation must have value 1 for only the treatment dummy - treatment observations in time = 0
  • Some observations must have the value 1 for only the time dummy - control observations in time = 1
  • Some observation must have value 1 in all three of the time, treatment and interaction dummies - treatment observations in time = 1
  • No observation has the value 1 in exactly two dummies or in four or more dummies.

If you want to use this command for something slightly different, you can disable these tests by using the option ignoredummytest. If you have a model other than dummy OLS or diff-in-diff that you think this command is a good fit for, please let us know and we will see if we can add that it as a supported model. Contact information on our GitHub page.

Formatting options

Many of the formatting options available to Stata's two-way scatter graph can be applied to iegraph by just adding those options to iegraph. Some options that should be applied directly to each bar need to be specified in the baroption() option.

Allowing options from one command, like Stata's two-way scatter to a user written command is not always straightforward and can have unintended consequences. For the advanced user, there is an option that allows for debugging. This options is norestore which tells iegraph to not return the original dataset but the one that iegraph prepared to produce the graph from (be aware that you will lose any unsaved data when you do this).

Now when you have the same dataset that iegraph uses, you can get the line of code that iegraph uses to generate the table by accessing that code from the returned macro r(cmd). If you find any potential improvements or any bugs please let us know. Contact information on our GitHub page.

Back to Parent

This article is part of the topic ietoolkit