Difference between revisions of "Randomization Inference"

Jump to: navigation, search
 
Line 1: Line 1:
<onlyinclude>Randomization inference is a statistical practice for calculating regression p-values that reflect variation in experimentally assigned data arising from the randomization itself. When the researcher controls the treatment assignment of the entire observed group, variation arises from the treatment assignment rather than from the sampling strategy, and therefore p-values based on the randomization may be more appropriate than "standard" p-values.</onlyinclude>
Randomization inference is a method of calculating regression p-values that take into account any variations in [[Randomized_Control_Trials | RCT]] data that arise from [[Randomization in Stata | randomization]] itself. When the researcher controls the treatment assignment of the entire observed group, variation arises from the treatment assignment rather than from the [[Sampling & Power Calculations | sampling]] strategy. Randomization inference considers what would have occurred under not only the random assignment that happened to be selected for the [[Experimental Methods | experiment]], but rather under all possible random assignments: would the results hold? Randomization inference takes place during [[Data Analysis | data analysis]]. This page will cover the motivation behind randomization inference, explain how to implement it, and discuss its implications.  


==Motivation: Baseline Balance in Experimental Data==
==Read First==
* Randomization inference considers what would have occurred under not only the random assignment that happened to be selected for the [[Experimental Methods | experiment]], but rather under all possible random assignments: would the results hold?
* Although the practice is not yet required by most journals, randomization inference is straightforward to implement with [http://blogs.worldbank.org/impactevaluations/print/finally-way-do-easy-randomization-inference-stata modern statistical software].
* When planning to utilize randomization inference for an experimental analysis, consider the difference in variation source during experimental design. In particular, perform power calculations and actual randomization to account for the randomization-inference method of p-value calculation.


[http://blogs.worldbank.org/impactevaluations/should-we-require-balance-t-tests-baseline-observables-randomized-experiments Recent discussions] have pointed out that "baseline balance" t-tests on datasets where treatment was randomly assigned are conceptually challenging. This is because the p-values from t-tests are properly interpreted as the estimated probability that the observed difference between the sampled groups would have been observed if those samples had been drawn from underlying sampling frames with no true mean difference. However, in a randomization framework, there is no underlying universe of observations from which the samples are drawn: the observed data comprises the full universe of eligible units and therefore the differences are exact, so "testing" them reveals no information in this view.
==Motivation==
===Baseline Balance in Experimental Data===


==Randomization Inference: Is the Treatment Effect Significant?==
[http://blogs.worldbank.org/impactevaluations/should-we-require-balance-t-tests-baseline-observables-randomized-experiments Recent discussions] point out that [[Balance tests | balance tests]] on datasets where treatment was randomly assigned are conceptually challenging. P-values from t-tests are the estimated probability that the observed difference between the sampled groups would have been observed if those samples had been drawn from underlying sampling frames with no true mean difference. However, in the randomization framework, there is no underlying universe of observations from which the samples are drawn: the observed data makes up the full universe of eligible units and therefore the differences are exact. Thus, testing them reveals no information.


[https://jasonkerwin.com/nonparibus/2017/09/25/randomization-inference-vs-bootstrapping-p-values/ The same logic extends to differences in outcome variables] that the researcher wants to investigate for causal response to a randomly assigned treatment. The differences between the treatment and control groups are in general exact because the full universes are observed in data. This means that asymptotically-motivated "sampling variation" cannot be used to calculate whether the difference between the treatment and control groups is statistically significant. Rather than estimating the variation in draws from a hypothesized infinite underlying distribution (the mathematical approach of "standard" p-values), the researcher should instead compute p-values based on the knowable variation in hypothetical ''treatment assignments'', using the randomization process as the source of variation for the estimate.
===Is the Treatment Effect Significant?===


==Calculating Randomization Inference p-Values==
[https://jasonkerwin.com/nonparibus/2017/09/25/randomization-inference-vs-bootstrapping-p-values/ The same logic extends to differences in outcome variables] that the researcher investigates for causal response to a randomly assigned treatment. The differences between the treatment and control groups are, in general, exact because the full universes are observed in data. This means that asymptotically-motivated "sampling variation" cannot be used to calculate whether the difference between the treatment and control groups is statistically significant. Rather than estimating the variation in draws from a hypothesized infinite underlying distribution (the mathematical approach of "standard" p-values), the researcher should instead compute p-values based on the knowable variation in hypothetical ''treatment assignments'', using the randomization process as the source of variation for the estimate.


Although the practice is not yet required by most journals, randomization inference is straightforward to implement with [http://blogs.worldbank.org/impactevaluations/print/finally-way-do-easy-randomization-inference-stata modern statistical software]. The steps are conceptually straightforward in a Monte Carlo framework:
==Calculating P-Values with Randomization Inference==


#Preserve the original treatment assignment
Although randomization inference is not yet required by most journals, it is straightforward to implement with [http://blogs.worldbank.org/impactevaluations/print/finally-way-do-easy-randomization-inference-stata modern statistical software]. The steps are conceptually straightforward in a Monte Carlo framework:
#Generate placebo treatment statuses according to the original assignment method
#Estimate the original regression equation with an additional term for the placebo treatment
#Repeat #1–3
#The randomization inference p-value is ''the proportion of times the placebo treatment effect was larger than the estimated treatment effect''


#Preserve the original treatment assignment.
#Generate placebo treatment statuses according to the original assignment method.
#Estimate the original regression equation with an additional term for the placebo treatment.
#Repeat #1–3.
#The randomization inference p-value is ''the proportion of times the placebo treatment effect was larger than the estimated treatment effect.''


Because the treatment assignment is the source of variation in the experimental design, the p-value is correctly interpretable as "the probability that a similar size treatment effect would have been observed under different hypothetical realizations of the chosen randomization method".
Because the treatment assignment is the source of variation in the experimental design, the p-value is correctly interpretable as "the probability that a similar size treatment effect would have been observed under different hypothetical realizations of the chosen randomization method".
Line 24: Line 29:
==Implications for Experimental Design==
==Implications for Experimental Design==


When planing to utilize randomization inference for an experimental analysis, it is also important to consider the difference in variation source during experimental design. In particular, this means performing power calculations and actual randomization to account for the randomization-inference method of p-value calculation.
When planning to utilize randomization inference for an experimental analysis, consider the difference in variation source during experimental design. In particular, perform power calculations and actual randomization to account for the randomization-inference method of p-value calculation. [https://www.povertyactionlab.org/sites/default/files/publications/athey_imbens_june19.pdf Athey and Imbens (2016)] provide an extensive guide to these considerations. Major takeaways include:
 
[https://www.povertyactionlab.org/sites/default/files/publications/athey_imbens_june19.pdf Athey and Imbens (2016)] provide an extensive guide to these considerations. Major takeaways include:


#Power is maximized by forcing treatment-control balance on relevant baseline observables or outcome levels. This is achieved in theory by maximally partitioning into strata (2 treatment units and 2 control units in each, assuming a balanced design with one treatment arm), with fixed effects for the strata in the final regression.
#Power is maximized by forcing treatment-control balance on relevant baseline observables or outcome levels. This is achieved in theory by maximally partitioning into strata (2 treatment units and 2 control units in each, assuming a balanced design with one treatment arm), with fixed effects for the strata in the final regression.
Line 32: Line 35:
#The "re-randomization" approach to force balance is typically inappropriate.
#The "re-randomization" approach to force balance is typically inappropriate.


==Back to Parent==
This article is part of the topic [[Data Analysis]]
==Additional Resources==


[[Category:Data Analysis]]
[[Category:Data Analysis]]

Latest revision as of 14:26, 5 June 2019

Randomization inference is a method of calculating regression p-values that take into account any variations in RCT data that arise from randomization itself. When the researcher controls the treatment assignment of the entire observed group, variation arises from the treatment assignment rather than from the sampling strategy. Randomization inference considers what would have occurred under not only the random assignment that happened to be selected for the experiment, but rather under all possible random assignments: would the results hold? Randomization inference takes place during data analysis. This page will cover the motivation behind randomization inference, explain how to implement it, and discuss its implications.

Read First

  • Randomization inference considers what would have occurred under not only the random assignment that happened to be selected for the experiment, but rather under all possible random assignments: would the results hold?
  • Although the practice is not yet required by most journals, randomization inference is straightforward to implement with modern statistical software.
  • When planning to utilize randomization inference for an experimental analysis, consider the difference in variation source during experimental design. In particular, perform power calculations and actual randomization to account for the randomization-inference method of p-value calculation.

Motivation

Baseline Balance in Experimental Data

Recent discussions point out that balance tests on datasets where treatment was randomly assigned are conceptually challenging. P-values from t-tests are the estimated probability that the observed difference between the sampled groups would have been observed if those samples had been drawn from underlying sampling frames with no true mean difference. However, in the randomization framework, there is no underlying universe of observations from which the samples are drawn: the observed data makes up the full universe of eligible units and therefore the differences are exact. Thus, testing them reveals no information.

Is the Treatment Effect Significant?

The same logic extends to differences in outcome variables that the researcher investigates for causal response to a randomly assigned treatment. The differences between the treatment and control groups are, in general, exact because the full universes are observed in data. This means that asymptotically-motivated "sampling variation" cannot be used to calculate whether the difference between the treatment and control groups is statistically significant. Rather than estimating the variation in draws from a hypothesized infinite underlying distribution (the mathematical approach of "standard" p-values), the researcher should instead compute p-values based on the knowable variation in hypothetical treatment assignments, using the randomization process as the source of variation for the estimate.

Calculating P-Values with Randomization Inference

Although randomization inference is not yet required by most journals, it is straightforward to implement with modern statistical software. The steps are conceptually straightforward in a Monte Carlo framework:

  1. Preserve the original treatment assignment.
  2. Generate placebo treatment statuses according to the original assignment method.
  3. Estimate the original regression equation with an additional term for the placebo treatment.
  4. Repeat #1–3.
  5. The randomization inference p-value is the proportion of times the placebo treatment effect was larger than the estimated treatment effect.

Because the treatment assignment is the source of variation in the experimental design, the p-value is correctly interpretable as "the probability that a similar size treatment effect would have been observed under different hypothetical realizations of the chosen randomization method".

Implications for Experimental Design

When planning to utilize randomization inference for an experimental analysis, consider the difference in variation source during experimental design. In particular, perform power calculations and actual randomization to account for the randomization-inference method of p-value calculation. Athey and Imbens (2016) provide an extensive guide to these considerations. Major takeaways include:

  1. Power is maximized by forcing treatment-control balance on relevant baseline observables or outcome levels. This is achieved in theory by maximally partitioning into strata (2 treatment units and 2 control units in each, assuming a balanced design with one treatment arm), with fixed effects for the strata in the final regression.
  2. Pairwise randomization is inappropriate because within-strata variances cannot be computed.
  3. The "re-randomization" approach to force balance is typically inappropriate.

Back to Parent

This article is part of the topic Data Analysis

Additional Resources