Difference between revisions of "Iegraph"

Jump to: navigation, search
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<onlyinclude>
'''iegraph''' is used to graphically visualize regression results for some regression models commonly used in impact evaluations.
'''iegraph''' is used to graphically visualize regression results for some regression models commonly used in impact evaluations.
 
</onlyinclude>
This article is means to describe use cases, work flow and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing <code>help iegraph</code> in Stata. This command is a part of the package [[Stata_Coding_Practices#ietoolkit|ietoolkit]], to install all the commands in this package including this command, type <code>ssc install ietoolkit</code> in Stata.
This article is meant to describe use cases, work flow and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing <code>help iegraph</code> in Stata. This command is a part of the package [[Stata_Coding_Practices#ietoolkit|ietoolkit]], to install all the commands in this package including this command, type <code>ssc install ietoolkit</code> in Stata.


== Intended use cases ==
== Intended use cases ==
This generates a graph from regression estimations. This command is implemented and tested to work with two specific models common in impact evaluations, but it is possible that that there are more regression models for which this command works.
This generates a graph from regression estimations. This command is implemented and tested to work with two specific models common in impact evaluations, but it is possible that that there are more regression models for which this command works.


=== OLS with treatment dummies model ===  
=== OLS with Treatment Dummies Model ===  
The first regression model, let's call it ''Dummy OLS'' for short, is a the specification where each treatment arm is represented by a dummy (the vector of Ds in the equation). The omitted category is intended to be the control group. The number of treatment dummy has to be at least one and are only limited to the number of dummies that can be displayed in the graph without getting to cluttered. The specification may include control variables, fixed effects etc. (the vector of Xs in the equation).  
The first regression model, let's call it ''dummy OLS'' for short, is a the specification where each treatment arm is represented by a dummy. See the equation below.
 
<math>y = \alpha + \beta_1 tmt_1 + \beta_2 tmt_2 + \cdots + \beta_n tmt_n + \beta X + \varepsilon </math>


Y = A + BD + BX + mu (update when math extension is installed)
The ''dummy OLS'' has one ''tmt'' variable for each treatment arm. The omitted category is intended to be the control group. The number of treatment dummy has to be at least one and are only limited to the number of dummies that can be displayed in the graph without getting to cluttered. The specification may include control variables, fixed effects etc. which is represented by the vector of X in the equation.


=== Difference-in-Differences model ===  
=== Difference-in-Differences Model ===  
The second regression model is a difference in difference model, let's call it ''diff-in-diff'' for short, where treatment is the dummy D and time is the dummy T. Both these dummies are included in the regression as well as the interaction term between them (D, T and DT in the equation). The specification may include control variables, fixed effects etc. (the vector of Xs in the equation).
The second regression model is a difference in difference model, let's call it ''diff-in-diff'' for short, where treatment is the dummy D and time is the dummy T.  


Y = A + BD + BT + BDT + BX + mu (update when math extension is installed)
<math>y = \alpha + \beta_1 D + \beta_2 T + \beta _3(D*T) + \beta X + \varepsilon</math>


If you are using any of these models you can quickly produce a graph with confidential interval bars by using iegraph.
Both the treatment dummy and the time dummy are included in the regression as well as the interaction term between them (''D'', ''T'' and ''DT'' in the equation). The specification may include control variables, fixed effects etc. which is represented by the vector of X in the equation.


=== Intended Work Flow ===
=== Intended Work Flow ===
Line 25: Line 28:


=== Values In The Graph ===
=== Values In The Graph ===
One '''important note''' is that it is only if no control variables, fixed effects etc. were used that the values used in the graph is exactly the same as the coefficients for the treatment dummy/dummies in the first model and the treatment and time dummies in the second model. To make the graph more easily interpreted by a non-technical audience -- but still correct and equally informative to a technical audience -- the omitted category (the control group in the first model and control group in time = 0 in the second model) is the average value of Y in that group and not the A coefficient. This is also the starting point of the other values.  
One '''important note''' is that it is only if no control variables, fixed effects etc. were used that the values used in the graph is exactly the same as the coefficients for the treatment dummy/dummies in the ''dummy OLS'' and the treatment and time dummies in the ''diff-in-diff''. To make the graph more easily interpreted by a non-technical audience -- but still correct and equally informative to a technical audience -- the omitted category (the control group in the ''dummy OLS'' and control group in time = 0 in the ''diff-in-diff'') is the average value of Y in that group and not the A coefficient. This is also the starting point of the other values.  


If there were no control variables, fixed effects etc. the average of Y for the omitted category is equal to the A coefficient, but that is only true in this very specific case. If we would use the A coefficient together with control variables and fixed effects we would risk ending up with values in the graph that might not make sense, for example negative harvest value or negative number of pre-natal visits. A technical audience would know that the impact of the treatment can still be read from such a graph, but a non-technical audience would be confused. It might be rare that the A coefficient shifts that much that harvest or visits becomes negative, but it will shift away from its true value to the degree that non-technical readers might find the absolute values not credible and then not trust the rest of the analysis. That is why the omitted category is represented by its average value of Y.
If there were no control variables, fixed effects etc. the average of Y for the omitted category is equal to the A coefficient, but that is only true in this very specific case. If we would use the A coefficient together with control variables and fixed effects we would risk ending up with values in the graph that might not make sense, for example negative harvest value or negative number of pre-natal visits. A technical audience would know that the impact of the treatment can still be read from such a graph, but a non-technical audience would be confused. It might be rare that the A coefficient shifts that much that harvest or visits becomes negative, but it will shift away from its true value to the degree that non-technical readers might find the absolute values not credible and then not trust the rest of the analysis. That is why the omitted category is represented by its average value of Y.
Line 32: Line 35:


=== List of dummies ===
=== List of dummies ===
When using iegraph you always have to list the treatment dummy variables (and the time and interaction dummies if you ran a diff-in-diff) as the varlist. This is the only way that iegraph knows which coefficients are the treatment dummies and which coefficients are control variables, fixed effects etc. Only the treatment dummy (and time and interaction dummy) will be displayed in the graph.
When using iegraph you always have to list the treatment dummy variables (and the time and interaction dummies if you ran a ''diff-in-diff'') as the varlist like this: <code>iegraph T1 T2 T3</code> where T1, T2 and T3 are treatment dmmies. This is the only way that iegraph knows which coefficients are the treatment dummies and which coefficients are control variables, fixed effects etc. Only the treatment dummy (and time and interaction dummy in ''diff-in-diff'') will be displayed in the graph.


The command test that one of these two sets of criteria are true in regards to the dummies. Otherwise and error is thrown.
iegraph test that the dummies fits either of the two model this command has been implemented to work with. The command test that one of these two sets of criteria are true in regards to the dummies. Otherwise and error is thrown (see below table for option how to disable this test).


{| class="wikitable"
{| class="wikitable"
|style="text-align:center; width: 50%" | '''OLS with treatment dummies'''
|style="text-align:center; width: 50%" | '''Dummy OLS'''
|style="text-align:center; width: 50%" | '''Diff-in-Diff'''
|style="text-align:center; width: 50%" | '''Diff-in-Diff'''
|-
|-
Line 51: Line 54:
* No observation has the value 1 in exactly two dummies or in four or more dummies.
* No observation has the value 1 in exactly two dummies or in four or more dummies.
|}
|}
If you want to use this command for something slightly different you can disable these tests by using the option ''ignoredummytest''. If you have a model other than ''dummy OLS'' or ''diff-in-diff'' that you think this command is a good fit for, please let us know and we will see if we can add that model as a supported model. Contact information on our [https://github.com/worldbank/ietoolkit GitHub page].


=== Formatting options ===
=== Formatting options ===
Many of the formatting options available to Stata's ''two-way scatter'' graph can be applied to iegraph by just adding those options to iegraph. Some options that should be applied directly to each bar needs to be specified in the ''baroption()'' option.
Allowing options from one command, like Stata's ''two-way scatter'', to a user written command is not always straightforward and can have unintended consequences. For the advanced user there is an option that allows for debugging. This options is ''norestore'' which tells iegraph to not return the original data set but the data set that iegraph prepared to produce the graph from (be aware that you will lose any unsaved data when you do this).


== Reasoning used during development ==
Now when you have the same data set that iegraph uses you can get the line of code that iegraph use to generate the table by accessing that code from the returned macro <code>r(cmd)</code>. If you find any potential improvements or any bugs please let us know. Contact information on our [https://github.com/worldbank/ietoolkit GitHub page].
''Describe any non obvious decisions made during development of this command. This can help explain restrictions and requirements''


== Back to Parent ==
== Back to Parent ==

Revision as of 17:24, 20 November 2018

iegraph is used to graphically visualize regression results for some regression models commonly used in impact evaluations.

This article is meant to describe use cases, work flow and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing help iegraph in Stata. This command is a part of the package ietoolkit, to install all the commands in this package including this command, type ssc install ietoolkit in Stata.

Intended use cases

This generates a graph from regression estimations. This command is implemented and tested to work with two specific models common in impact evaluations, but it is possible that that there are more regression models for which this command works.

OLS with Treatment Dummies Model

The first regression model, let's call it dummy OLS for short, is a the specification where each treatment arm is represented by a dummy. See the equation below.

The dummy OLS has one tmt variable for each treatment arm. The omitted category is intended to be the control group. The number of treatment dummy has to be at least one and are only limited to the number of dummies that can be displayed in the graph without getting to cluttered. The specification may include control variables, fixed effects etc. which is represented by the vector of X in the equation.

Difference-in-Differences Model

The second regression model is a difference in difference model, let's call it diff-in-diff for short, where treatment is the dummy D and time is the dummy T.

Both the treatment dummy and the time dummy are included in the regression as well as the interaction term between them (D, T and DT in the equation). The specification may include control variables, fixed effects etc. which is represented by the vector of X in the equation.

Intended Work Flow

Simply run the regression using the regress command in Stata, and immediately afterwards run iegraph.

Instructions

These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata see the help files by typing help iegrpah in Stata.

Values In The Graph

One important note is that it is only if no control variables, fixed effects etc. were used that the values used in the graph is exactly the same as the coefficients for the treatment dummy/dummies in the dummy OLS and the treatment and time dummies in the diff-in-diff. To make the graph more easily interpreted by a non-technical audience -- but still correct and equally informative to a technical audience -- the omitted category (the control group in the dummy OLS and control group in time = 0 in the diff-in-diff) is the average value of Y in that group and not the A coefficient. This is also the starting point of the other values.

If there were no control variables, fixed effects etc. the average of Y for the omitted category is equal to the A coefficient, but that is only true in this very specific case. If we would use the A coefficient together with control variables and fixed effects we would risk ending up with values in the graph that might not make sense, for example negative harvest value or negative number of pre-natal visits. A technical audience would know that the impact of the treatment can still be read from such a graph, but a non-technical audience would be confused. It might be rare that the A coefficient shifts that much that harvest or visits becomes negative, but it will shift away from its true value to the degree that non-technical readers might find the absolute values not credible and then not trust the rest of the analysis. That is why the omitted category is represented by its average value of Y.

The other categories is represented by the average value of Y in the control group plus the value of the coefficient of the corresponding dummy variable in the regression. This way the impact is clearly shown (the difference between this value and the omitted category) but since the starting point is the average of Y for the omitted category the absolute value in the graph is close to the average value of that category.

List of dummies

When using iegraph you always have to list the treatment dummy variables (and the time and interaction dummies if you ran a diff-in-diff) as the varlist like this: iegraph T1 T2 T3 where T1, T2 and T3 are treatment dmmies. This is the only way that iegraph knows which coefficients are the treatment dummies and which coefficients are control variables, fixed effects etc. Only the treatment dummy (and time and interaction dummy in diff-in-diff) will be displayed in the graph.

iegraph test that the dummies fits either of the two model this command has been implemented to work with. The command test that one of these two sets of criteria are true in regards to the dummies. Otherwise and error is thrown (see below table for option how to disable this test).

Dummy OLS Diff-in-Diff
  • Some observations has the value 0 in for all treatment dummies - control observations
  • No observation has the value 1 in more than one treatment dummy - no observation can be in be in two treatment arms
  • For all treatment dummies there are at least some observations that have the value 1 - at least some observations in each treatment arm
  • Some observations has the value 0 for all dummies- omitted controls observations in time = 0
  • Some observation must have value 1 for only the treatment dummy - treatment observations in time = 0
  • Some observation must have value 1 for only the time dummy - control observations in time = 1
  • Some observation must have value 1 in all three of the time, treatment and interaction dummies - treatment observations in time = 1
  • No observation has the value 1 in exactly two dummies or in four or more dummies.

If you want to use this command for something slightly different you can disable these tests by using the option ignoredummytest. If you have a model other than dummy OLS or diff-in-diff that you think this command is a good fit for, please let us know and we will see if we can add that model as a supported model. Contact information on our GitHub page.

Formatting options

Many of the formatting options available to Stata's two-way scatter graph can be applied to iegraph by just adding those options to iegraph. Some options that should be applied directly to each bar needs to be specified in the baroption() option.

Allowing options from one command, like Stata's two-way scatter, to a user written command is not always straightforward and can have unintended consequences. For the advanced user there is an option that allows for debugging. This options is norestore which tells iegraph to not return the original data set but the data set that iegraph prepared to produce the graph from (be aware that you will lose any unsaved data when you do this).

Now when you have the same data set that iegraph uses you can get the line of code that iegraph use to generate the table by accessing that code from the returned macro r(cmd). If you find any potential improvements or any bugs please let us know. Contact information on our GitHub page.

Back to Parent

This article is part of the topic ietoolkit