Difference between revisions of "Randomized Control Trials"
Mrijanrimal (talk | contribs) |
|||
(20 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
A randomized controlled trial (RCT) is a method of impact evaluation in which all eligible units in a sample are randomly assigned to treatment and control groups. The treatment group receives or participates in the program being tested, while the control group does not. Given a sufficiently large number of units, an RCT ensures that the control and treatment groups are equal in both observed and unobserved characteristics, thus ruling out selection bias. The only difference between the treatment and control groups, then, is their participation in the intervention itself, and the difference in their outcomes therefore represents the impact of the intervention or program. | |||
=== Clustered RCTs === | RCTs are considered the gold standard of impact evaluation. This page will explain the context of and reasons for evaluating and randomizing, provide details on randomized assignment, and briefly touch on major steps in the RCT process. | ||
==== | |||
==Read First== | |||
*[[Randomized Evaluations: Principles of Study Design]] | |||
*RCTs provide a reliable estimation of program impact; if the treatment and control groups are identical in all aspects other than their participation in the program, then any differences in outcome must be accredited to the program itself. | |||
* Random assignment equitably ensures that every eligible unit has the same chance of receiving the program, free of subjective criteria, corruption or patronage. | |||
*Treatment may be randomized at the individual- or cluster-level and may follow a phase-in design. | |||
*RCTs inform evidence-based policy design by allowing research teams to test a program at a small scale, rigorously evaluate it, and scale the program up in the same context if successful. | |||
==Context== | |||
===Why Evaluate? === | |||
Randomized controlled trials and impact evaluations in general play a critical role in evidence-based policy making. They provide an objective assessment of planned, ongoing or completed projects, programs or policies and give policymakers insight into what works and what doesn’t in different contexts. When deciding whether to use a randomized controlled trial to measure a program’s impact, consider a) what portion of the available budget does the program require? and b) how many people or entities will the program affect? | |||
As the portion of budget used and the parties affected increase, so does the demand for and benefits of implementing a randomized controlled trial. | |||
===Why Randomize? === | |||
Randomization allows researchers to identify a group of program participants (treatment) and a group of non-participants (control) that are statistically equivalent in the absence of a program. In ensuring statistical equivalence between groups, randomization rules out confounding variables that otherwise bias measurements. Simple before-after comparisons or comparisons between non-randomized groups are likely biased by factors outside of the program itself. These methods lead to weaker, less robust, and potentially misleading conclusions. RCTs overcome these issues to provide a reliable estimation of program impact; if the two groups are identical in all aspects other than their participation in the program, then any differences in outcome must be accredited to the program itself. | |||
==Randomized Assignment== | |||
After defining the population of interest (i.e. households within 10 km of a road in region X, elementary schools in region Y) and identifying the [[Survey Budget | survey budget]], [[Randomization in Stata | randomly assign treatment]]. Random assignment equitably ensures that every eligible unit has the same chance of receiving the program, free of subjective criteria, corruption or patronage. When there is excess demand for a program, program managers can easily and effectively explain randomized assignment to constituents and be understood. | |||
Given that a sufficiently large number of units to which the randomized assignment process is applied, randomized assignment will produce groups with statistically equivalent averages for all their characteristics. These averages should also tend towards population averages, ensuring that the study is representative of the broader population. Before the program begins, research teams can use baseline data on the population of interest to [[Balance tests | verify that there are no systematic differences]] in observed characteristics between the treatment and control units. Given statistical equivalence at baseline and exposure to the same external environmental factors over the course of the program, any differences in outcomes between the treatment and controls groups after the program can be explained only by program. | |||
===Level of Randomization=== | |||
Randomization is typically done at the level at which the program is implemented. For example, assignment to community health program will be randomized at the community level. Randomizing below this level means changing how the program is implemented. Note that randomization cannot take place at a lower level than that at which the outcome is measured. | |||
====Individual vs. Cluster==== | |||
In individual randomization, individual units are assigned to a treatment or control group. Meanwhile, in cluster randomization, clusters of units rather than the units themselves are randomly assigned to treatment and control groups (i.e. cohort, village). Clustered RCTs are the preferred type of RCT when the intervention is by definition applied at the cluster rather than the individual level (i.e. an intervention targeted towards schools or health facilities in a given setting). The [[Sampling & Power Calculations | statistical power]] in cluster RCTs is typically lower than that for individually randomized trials, since outcomes within clusters are typically somewhat similar to each other. This means that the number of clusters in a cluster RCT, rather than the number of individuals who participate, is most relevant to the statistical power of the study. Cluster RCTs are often more expensive than individually-randomized RCTs. However, cluster RCTs provide administrative convenience, reduce ethical concerns, and avoid treatment group contamination. | |||
====Phase-In==== | |||
In phase-in randomization, the roll-out of the intervention is randomized and every unit or cluster in the population of interest will get the program eventually. Phase-in designs are usually used at the cluster-level but may also be applied at the individual-level. For example, in an intervention intended to treat 100 villages, 50 villages are randomly selected to receive interventions in year 1 and 50 villages are selected to receive interventions in year 2. The latter group serves as the control group in year 1. | |||
Randomized phase-ins are easily applied to project implementation schedules, as roll-outs typically happen over multiple years. Phase-in designs also reduce concerns of inequity and provide incentives to maintain contact. However, for control participants, phase-in designs could change present actions through setting expectations of future change. Further, phase-in designs complicate estimating long-run effects since once the intervention is fully rolled out, no control group remains. Long-run analyses can still examine differences between groups with degrees of exposure. | |||
===Threats to Design=== | |||
Spillovers and crossovers pose threats to design. Spillovers occur when a program changes outcome for units in the control group. Spillovers may be physical, behavioral, informational, market or general equilibrium. Crossovers occur when a control unit directly receives the program, either intentionally or accidentally. In either of these situations, the validity of the control group is compromised because some control units receive treatment. In both cases, the comparison group no longer serves as a counterfactual. | |||
==Data Collection== | |||
RCTs require baseline and endline data and may also include midline data or follow-up data. These datasets can span the course of weeks, months, or years. This data is often acquired via [[Primary Data Collection | primary data collection]]. See the [https://dimewiki.worldbank.org/wiki/Category:Primary_Data_Collection Primary Data Collection] pages for more details. | |||
==Analysis and Application== | |||
For more information on handling and analyzing data, see [[Data Cleaning]] and [[Data Analysis]]. Once results are ready, they can be used to inform policy design. So goes the approach to evidence-based policy design: test a program at a small scale, rigorously evaluate it, and scale the program up in the same context if successful. | |||
== Back to Parent == | == Back to Parent == | ||
This article is part of the topic [[ | This article is part of the topic [[Experimental Methods]] | ||
== Additional Resources == | == Additional Resources == | ||
*DIME Analytics' presentations on randomization [https://github.com/worldbank/DIME-Resources/blob/master/stata1-5-randomization.pdf 1] and [https://github.com/worldbank/DIME-Resources/blob/master/stata2-5-randomization.pdf 2] | |||
* Gertler et al.’s [http://www.worldbank.org/en/programs/sief-trust-fund/publication/impact-evaluation-in-practice Impact Evaluation in Practice] | |||
* Duflo and Kremer’s [https://economics.mit.edu/files/2785 Use of Randomization in the Evaluation of Development Effectiveness] | |||
*Duflo’s [http://pubdocs.worldbank.org/en/394531465569503682/Esther-Duflo-PRESENTATION.pdf Randomized Controlled Trials, Development Economics and Policy Making in Developing Countries] via JPAL and MIT | |||
* JPAL's [https://www.povertyactionlab.org/research-resources/introduction-evaluations Introduction to Evaluations] | |||
* Glennerster and Takavarasha’s [http://runningres.com/ Running Randomized Evaluations: A Practical Guide] | |||
*Glennerster’s [https://ocw.mit.edu/resources/res-14-002-abdul-latif-jameel-poverty-action-lab-executive-training-evaluating-social-programs-2011-spring-2011/lecture-notes/MITRES_14_002S11_lec4.pdf How to Randomize] | |||
*Gerber and Green’s [http://books.wwnorton.com/books/webad.aspx?id=24003 Field Experiments: Design, Analysis, and Interpretation] | |||
[[Category: Experimental Methods]] | |||
[[Category: Research Design]] | |||
[[Category: |
Latest revision as of 14:38, 13 April 2021
A randomized controlled trial (RCT) is a method of impact evaluation in which all eligible units in a sample are randomly assigned to treatment and control groups. The treatment group receives or participates in the program being tested, while the control group does not. Given a sufficiently large number of units, an RCT ensures that the control and treatment groups are equal in both observed and unobserved characteristics, thus ruling out selection bias. The only difference between the treatment and control groups, then, is their participation in the intervention itself, and the difference in their outcomes therefore represents the impact of the intervention or program.
RCTs are considered the gold standard of impact evaluation. This page will explain the context of and reasons for evaluating and randomizing, provide details on randomized assignment, and briefly touch on major steps in the RCT process.
Read First
- Randomized Evaluations: Principles of Study Design
- RCTs provide a reliable estimation of program impact; if the treatment and control groups are identical in all aspects other than their participation in the program, then any differences in outcome must be accredited to the program itself.
- Random assignment equitably ensures that every eligible unit has the same chance of receiving the program, free of subjective criteria, corruption or patronage.
- Treatment may be randomized at the individual- or cluster-level and may follow a phase-in design.
- RCTs inform evidence-based policy design by allowing research teams to test a program at a small scale, rigorously evaluate it, and scale the program up in the same context if successful.
Context
Why Evaluate?
Randomized controlled trials and impact evaluations in general play a critical role in evidence-based policy making. They provide an objective assessment of planned, ongoing or completed projects, programs or policies and give policymakers insight into what works and what doesn’t in different contexts. When deciding whether to use a randomized controlled trial to measure a program’s impact, consider a) what portion of the available budget does the program require? and b) how many people or entities will the program affect?
As the portion of budget used and the parties affected increase, so does the demand for and benefits of implementing a randomized controlled trial.
Why Randomize?
Randomization allows researchers to identify a group of program participants (treatment) and a group of non-participants (control) that are statistically equivalent in the absence of a program. In ensuring statistical equivalence between groups, randomization rules out confounding variables that otherwise bias measurements. Simple before-after comparisons or comparisons between non-randomized groups are likely biased by factors outside of the program itself. These methods lead to weaker, less robust, and potentially misleading conclusions. RCTs overcome these issues to provide a reliable estimation of program impact; if the two groups are identical in all aspects other than their participation in the program, then any differences in outcome must be accredited to the program itself.
Randomized Assignment
After defining the population of interest (i.e. households within 10 km of a road in region X, elementary schools in region Y) and identifying the survey budget, randomly assign treatment. Random assignment equitably ensures that every eligible unit has the same chance of receiving the program, free of subjective criteria, corruption or patronage. When there is excess demand for a program, program managers can easily and effectively explain randomized assignment to constituents and be understood.
Given that a sufficiently large number of units to which the randomized assignment process is applied, randomized assignment will produce groups with statistically equivalent averages for all their characteristics. These averages should also tend towards population averages, ensuring that the study is representative of the broader population. Before the program begins, research teams can use baseline data on the population of interest to verify that there are no systematic differences in observed characteristics between the treatment and control units. Given statistical equivalence at baseline and exposure to the same external environmental factors over the course of the program, any differences in outcomes between the treatment and controls groups after the program can be explained only by program.
Level of Randomization
Randomization is typically done at the level at which the program is implemented. For example, assignment to community health program will be randomized at the community level. Randomizing below this level means changing how the program is implemented. Note that randomization cannot take place at a lower level than that at which the outcome is measured.
Individual vs. Cluster
In individual randomization, individual units are assigned to a treatment or control group. Meanwhile, in cluster randomization, clusters of units rather than the units themselves are randomly assigned to treatment and control groups (i.e. cohort, village). Clustered RCTs are the preferred type of RCT when the intervention is by definition applied at the cluster rather than the individual level (i.e. an intervention targeted towards schools or health facilities in a given setting). The statistical power in cluster RCTs is typically lower than that for individually randomized trials, since outcomes within clusters are typically somewhat similar to each other. This means that the number of clusters in a cluster RCT, rather than the number of individuals who participate, is most relevant to the statistical power of the study. Cluster RCTs are often more expensive than individually-randomized RCTs. However, cluster RCTs provide administrative convenience, reduce ethical concerns, and avoid treatment group contamination.
Phase-In
In phase-in randomization, the roll-out of the intervention is randomized and every unit or cluster in the population of interest will get the program eventually. Phase-in designs are usually used at the cluster-level but may also be applied at the individual-level. For example, in an intervention intended to treat 100 villages, 50 villages are randomly selected to receive interventions in year 1 and 50 villages are selected to receive interventions in year 2. The latter group serves as the control group in year 1.
Randomized phase-ins are easily applied to project implementation schedules, as roll-outs typically happen over multiple years. Phase-in designs also reduce concerns of inequity and provide incentives to maintain contact. However, for control participants, phase-in designs could change present actions through setting expectations of future change. Further, phase-in designs complicate estimating long-run effects since once the intervention is fully rolled out, no control group remains. Long-run analyses can still examine differences between groups with degrees of exposure.
Threats to Design
Spillovers and crossovers pose threats to design. Spillovers occur when a program changes outcome for units in the control group. Spillovers may be physical, behavioral, informational, market or general equilibrium. Crossovers occur when a control unit directly receives the program, either intentionally or accidentally. In either of these situations, the validity of the control group is compromised because some control units receive treatment. In both cases, the comparison group no longer serves as a counterfactual.
Data Collection
RCTs require baseline and endline data and may also include midline data or follow-up data. These datasets can span the course of weeks, months, or years. This data is often acquired via primary data collection. See the Primary Data Collection pages for more details.
Analysis and Application
For more information on handling and analyzing data, see Data Cleaning and Data Analysis. Once results are ready, they can be used to inform policy design. So goes the approach to evidence-based policy design: test a program at a small scale, rigorously evaluate it, and scale the program up in the same context if successful.
Back to Parent
This article is part of the topic Experimental Methods
Additional Resources
- DIME Analytics' presentations on randomization 1 and 2
- Gertler et al.’s Impact Evaluation in Practice
- Duflo and Kremer’s Use of Randomization in the Evaluation of Development Effectiveness
- Duflo’s Randomized Controlled Trials, Development Economics and Policy Making in Developing Countries via JPAL and MIT
- JPAL's Introduction to Evaluations
- Glennerster and Takavarasha’s Running Randomized Evaluations: A Practical Guide
- Glennerster’s How to Randomize
- Gerber and Green’s Field Experiments: Design, Analysis, and Interpretation