Difference between revisions of "Iegraph"

Jump to: navigation, search
Line 16: Line 16:
If you are using any of these models you can quickly produce a graph with confidential interval bars by using iegraph.
If you are using any of these models you can quickly produce a graph with confidential interval bars by using iegraph.


=== Values in the graph ===
=== Values In The Graph ===
One '''important note''' is that it is only if no control variables, fixed effects etc. were used that the values used in the graph is exactly the same as the coefficients for the treatment dummy/dummies in the first model and the treatment and time dummies in the second model.
One '''important note''' is that it is only if no control variables, fixed effects etc. were used that the values used in the graph is exactly the same as the coefficients for the treatment dummy/dummies in the first model and the treatment and time dummies in the second model. To make the graph more easily interpreted by a non-technical audience -- but still correct and equally informative to a technical audience -- the omitted category (the control group in the first model and control group in time = 0 in the second model) is the average value of Y in that group and not the A coefficient. This is also the starting point of the other values.
 
If there were no control variables, fixed effects etc. the average of Y for the omitted category is equal to the A coefficient, but that is only true in this very specific case. If we would use the A coefficient together with control variables and fixed effects we would risk ending up with values in the graph that might not make sense, for example negative harvest value or negative number of pre-natal visits. A technical audience would know that the impact of the treatment can still be read from this graph, but a non-technical audience would be confused. It might be rare that the A coefficient shifts that much that harvest or visits becomes negative, but it will shift away from its true value to the degree that readers might find the absolute values not credible and then not trust the rest of the analysis. That is why the omitted category is represented by its average value of Y.
 
The other categories is represented by the average value of Y in the control group plus the value of the coefficient of the corresponding dummy variable in the regression. This way the impact is clearly shown (the difference between this value and the omitted category) but since the starting point is the average of Y for the omitted category the absolute value in the graph is close to the average value of that category.


=== Intended Work Flow ===
=== Intended Work Flow ===

Revision as of 14:15, 12 January 2018

iegraph is used to graphically visualize regression results for some regression models commonly used in impact evaluations.

This article is means to describe use cases, work flow and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing help iegraph in Stata. This command is a part of the package ietoolkit, to install all the commands in this package including this command, type ssc install ietoolkit in Stata.

Intended use cases

This generates a graph from regression estimations. This command is implemented and tested to work with two specific models common in impact evaluations, but it is possible that that there are more regression models for which this command works.

The first regression model is a the specification where each treatment arm is represented by a dummy (the vector of Ds in the equation). The omitted category is intended to be the control group. The number of treatment dummy has to be at least one and are only limited to the number of dummies that can be displayed in the graph without getting to cluttered. The specification may include control variables, fixed effects etc. (the vector of Xs in the equation).

Y = B + BD + BX + mu

The second regression model is a difference in difference model where treatment is the dummy D and time is the dummy T. Both these dummies are included in the regression as well as the interaction term between them (D, T and DT in the equation). The specification may include control variables, fixed effects etc. (the vector of Xs in the equation).

Y = B + BD + BT + BDT + BX + mu

If you are using any of these models you can quickly produce a graph with confidential interval bars by using iegraph.

Values In The Graph

One important note is that it is only if no control variables, fixed effects etc. were used that the values used in the graph is exactly the same as the coefficients for the treatment dummy/dummies in the first model and the treatment and time dummies in the second model. To make the graph more easily interpreted by a non-technical audience -- but still correct and equally informative to a technical audience -- the omitted category (the control group in the first model and control group in time = 0 in the second model) is the average value of Y in that group and not the A coefficient. This is also the starting point of the other values.

If there were no control variables, fixed effects etc. the average of Y for the omitted category is equal to the A coefficient, but that is only true in this very specific case. If we would use the A coefficient together with control variables and fixed effects we would risk ending up with values in the graph that might not make sense, for example negative harvest value or negative number of pre-natal visits. A technical audience would know that the impact of the treatment can still be read from this graph, but a non-technical audience would be confused. It might be rare that the A coefficient shifts that much that harvest or visits becomes negative, but it will shift away from its true value to the degree that readers might find the absolute values not credible and then not trust the rest of the analysis. That is why the omitted category is represented by its average value of Y.

The other categories is represented by the average value of Y in the control group plus the value of the coefficient of the corresponding dummy variable in the regression. This way the impact is clearly shown (the difference between this value and the omitted category) but since the starting point is the average of Y for the omitted category the absolute value in the graph is close to the average value of that category.

Intended Work Flow

Describe work flow here (remove if obvious from use case)

Instructions

These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata see the help files by typing help iegrpah in Stata.

Describe best practices related to this command here.

Reasoning used during development

Describe any non obvious decisions made during development of this command. This can help explain restrictions and requirements

Back to Parent

This article is part of the topic ietoolkit