Difference between revisions of "Iegraph"

Jump to: navigation, search
Line 14: Line 14:


=== Difference-in-Differences Model ===  
=== Difference-in-Differences Model ===  
The second regression model is a difference in difference model, let's call it ''diff-in-diff'' for short, where treatment is the dummy D and time is the dummy T. Both these dummies are included in the regression as well as the interaction term between them (D, T and DT in the equation). The specification may include control variables, fixed effects etc. (the vector of Xs in the equation).
The second regression model is a difference in difference model, let's call it ''diff-in-diff'' for short, where treatment is the dummy D and time is the dummy T.  


<math>y = \alpha + \beta_1 D +  \beta_2 T +  \beta _3(D*T) + \beta X + \varepsilon</math>
<math>y = \alpha + \beta_1 D +  \beta_2 T +  \beta _3(D*T) + \beta X + \varepsilon</math>


If you are using any of these models you can quickly produce a graph with confidential interval bars by using iegraph.
Both these dummies are included in the regression as well as the interaction term between them (D, T and DT in the equation). The specification may include control variables, fixed effects etc. (the vector of Xs in the equation).


=== Intended Work Flow ===
=== Intended Work Flow ===

Revision as of 01:52, 13 January 2018

iegraph is used to graphically visualize regression results for some regression models commonly used in impact evaluations.

This article is means to describe use cases, work flow and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing help iegraph in Stata. This command is a part of the package ietoolkit, to install all the commands in this package including this command, type ssc install ietoolkit in Stata.

Intended use cases

This generates a graph from regression estimations. This command is implemented and tested to work with two specific models common in impact evaluations, but it is possible that that there are more regression models for which this command works.

OLS with Treatment Dummies Model

The first regression model, let's call it dummy OLS for short, is a the specification where each treatment arm is represented by a dummy. See the equation below.

The dummy OLS has one tmt variable for each treatment arm. The omitted category is intended to be the control group. The number of treatment dummy has to be at least one and are only limited to the number of dummies that can be displayed in the graph without getting to cluttered. The specification may include control variables, fixed effects etc. which is represented by the vector of X in the equation.

Difference-in-Differences Model

The second regression model is a difference in difference model, let's call it diff-in-diff for short, where treatment is the dummy D and time is the dummy T.

Both these dummies are included in the regression as well as the interaction term between them (D, T and DT in the equation). The specification may include control variables, fixed effects etc. (the vector of Xs in the equation).

Intended Work Flow

Simply run the regression using the regress command in Stata, and immediately afterwards run iegraph.

Instructions

These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata see the help files by typing help iegrpah in Stata.

Values In The Graph

One important note is that it is only if no control variables, fixed effects etc. were used that the values used in the graph is exactly the same as the coefficients for the treatment dummy/dummies in the dummy OLS and the treatment and time dummies in the diff-in-diff. To make the graph more easily interpreted by a non-technical audience -- but still correct and equally informative to a technical audience -- the omitted category (the control group in the dummy OLS and control group in time = 0 in the diff-in-diff) is the average value of Y in that group and not the A coefficient. This is also the starting point of the other values.

If there were no control variables, fixed effects etc. the average of Y for the omitted category is equal to the A coefficient, but that is only true in this very specific case. If we would use the A coefficient together with control variables and fixed effects we would risk ending up with values in the graph that might not make sense, for example negative harvest value or negative number of pre-natal visits. A technical audience would know that the impact of the treatment can still be read from such a graph, but a non-technical audience would be confused. It might be rare that the A coefficient shifts that much that harvest or visits becomes negative, but it will shift away from its true value to the degree that non-technical readers might find the absolute values not credible and then not trust the rest of the analysis. That is why the omitted category is represented by its average value of Y.

The other categories is represented by the average value of Y in the control group plus the value of the coefficient of the corresponding dummy variable in the regression. This way the impact is clearly shown (the difference between this value and the omitted category) but since the starting point is the average of Y for the omitted category the absolute value in the graph is close to the average value of that category.

List of dummies

When using iegraph you always have to list the treatment dummy variables (and the time and interaction dummies if you ran a diff-in-diff) as the varlist like this: iegraph T1 T2 T3 where T1, T2 and T3 are treatment dmmies. This is the only way that iegraph knows which coefficients are the treatment dummies and which coefficients are control variables, fixed effects etc. Only the treatment dummy (and time and interaction dummy in diff-in-diff) will be displayed in the graph.

iegraph test that the dummies fits either of the two model this command has been implemented to work with. The command test that one of these two sets of criteria are true in regards to the dummies. Otherwise and error is thrown (see below table for option how to disable this test).

Dummy OLS Diff-in-Diff
  • Some observations has the value 0 in for all treatment dummies - control observations
  • No observation has the value 1 in more than one treatment dummy - no observation can be in be in two treatment arms
  • For all treatment dummies there are at least some observations that have the value 1 - at least some observations in each treatment arm
  • Some observations has the value 0 for all dummies- omitted controls observations in time = 0
  • Some observation must have value 1 for only the treatment dummy - treatment observations in time = 0
  • Some observation must have value 1 for only the time dummy - control observations in time = 1
  • Some observation must have value 1 in all three of the time, treatment and interaction dummies - treatment observations in time = 1
  • No observation has the value 1 in exactly two dummies or in four or more dummies.

If you want to use this command for something slightly different you can disable these tests by using the option ignoredummytest. If you have a model other than dummy OLS or diff-in-diff that you think this command is a good fit for, please let us know and we will see if we can add that model as a supported model. Contact information on our GitHub page.

Formatting options

Many of the formatting options available to Stata's two-way scatter graph can be applied to iegraph by just adding those options to iegraph. Some options that should be applied directly to each bar needs to be specified in the baroption() option.

Allowing options from one command, like Stata's two-way scatter, to a user written command is not always straightforward and can have unintended consequences. For the advanced user there is an option that allows for debugging. This options is norestore which tells iegraph to not return the original data set but the data set that iegraph prepared to produce the graph from (be aware that you will lose any unsaved data when you do this).

Now when you have the same data set that iegraph uses you can get the line of code that iegraph use to generate the table by accessing that code from the returned macro r(cmd). If you find any potential improvements or any bugs please let us know. Contact information on our GitHub page.

Back to Parent

This article is part of the topic ietoolkit