Difference between revisions of "Randomized Control Trials"

Jump to: navigation, search
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Randomized Control Trials (RCTs) are experiments that randomly allocate participants between treatment and control groups. They are considered the 'gold standard' for impact evaluation.


== Randomization ==
A randomized controlled trial (RCT) is a method of impact evaluation in which all eligible units in a sample are randomly assigned to treatment and control groups. The treatment group receives or participates in the program being tested, while the control group does not. Given a sufficiently large number of units, an RCT ensures that the control and treatment groups are equal in both observed and unobserved characteristics, thus ruling out selection bias. The only difference between the treatment and control groups, then, is their participation in the intervention itself, and the difference in their outcomes therefore represents the impact of the intervention or program.


=== Individual-level RCTs ===
RCTs are considered the gold standard of impact evaluation. This page will explain the context of and reasons for evaluating and randomizing, provide details on randomized assignment, and briefly touch on major steps in the RCT process.
Individual-level RCTs are impact evaluation design where the outcomes are measured on an individual basis. Randomization for individual-level RCTs are also done on an individual (per participant) level.


=== Clustered RCTs ===
==Read First==
Clustered RCTs are a type of RCT in which randomization is done on the basis of a group i.e. cohort, villages, etc. This is the preferred type of RCT when the intervention is by definition applied at the cluster, rather than the individual level (for example an intervention that is targeted towards schools or health facilities in a given setting, rather than the students or patients who might attend these schools or clinics).
*[[Randomized Evaluations: Principles of Study Design]]
*RCTs provide a reliable estimation of program impact; if the treatment and control groups are identical in all aspects other than their participation in the program, then any differences in outcome must be accredited to the program itself.
* Random assignment equitably ensures that every eligible unit has the same chance of receiving the program, free of subjective criteria, corruption or patronage.
*Treatment may be randomized at the individual- or cluster-level and may follow a phase-in design.
*RCTs inform evidence-based policy design by allowing research teams to test a program at a small scale, rigorously evaluate it, and scale the program up in the same context if successful.


A key consideration in the design and analysis of cluster RCTs is that statistical power in cluster RCTs is typically lower than that for individually randomized trials, since outcomes within clusters are typically somewhat similar to each other. This means that the number of clusters in a cluster RCT, rather than the number of individuals who participate, is most relevant to the statistical power of the study. This also means that cluster RCTs are often more expensive than individually-randomized RCTs. However, an advantage is that cluster RCTs can enable measurement of spillover effects.
==Context==
===Why Evaluate? ===
Randomized controlled trials and impact evaluations in general play a critical role in evidence-based policy making. They provide an objective assessment of planned, ongoing or completed projects, programs or policies and give policymakers insight into what works and what doesn’t in different contexts. When deciding whether to use a randomized controlled trial to measure a program’s impact, consider a) what portion of the available budget does the program require? and b) how many people or entities will the program affect?


=== Randomized Phase-In ===
As the portion of budget used and the parties affected increase, so does the demand for and benefits of implementing a randomized controlled trial.  
Roll-out of the intervention is randomized. This is typically done at the cluster-level. For example, an intervention is intended to treat 100 villages. 50 villages are randomly selected to receive interventions in year 1, and 50 villages are selected to receive interventions in year 2 (and therefore serve as a control group in year 1). A primary advantage of the randomized phase-in is that it is easily applied to project implementation schedules (as roll-outs typically happen over multiple years). A primary disadvantage is that once the intervention is fully rolled-out, there is no remaining control group, and thus no way to measure long-run effects (although long-run analyses can still examine differences between groups with degrees of exposure).


===Why Randomize? ===
Randomization allows researchers to identify a group of program participants (treatment) and a group of non-participants (control) that are statistically equivalent in the absence of a program. In ensuring statistical equivalence between groups, randomization rules out confounding variables that otherwise bias measurements. Simple before-after comparisons or comparisons between non-randomized groups are likely biased by factors outside of the program itself. These methods lead to weaker, less robust, and potentially misleading conclusions. RCTs overcome these issues to provide a reliable estimation of program impact; if the two groups are identical in all aspects other than their participation in the program, then any differences in outcome must be accredited to the program itself.
==Randomized Assignment==
After defining the population of interest (i.e. households within 10 km of a road in region X, elementary schools in region Y) and identifying the [[Survey Budget | survey budget]], [[Randomization in Stata | randomly assign treatment]]. Random assignment equitably ensures that every eligible unit has the same chance of receiving the program, free of subjective criteria, corruption or patronage. When there is excess demand for a program, program managers can easily and effectively explain randomized assignment to constituents and be understood.
Given that a sufficiently large number of units to which the randomized assignment process is applied, randomized assignment will produce groups with statistically equivalent averages for all their characteristics. These averages should also tend towards population averages, ensuring that the study is representative of the broader population. Before the program begins, research teams can use baseline data on the population of interest to [[Balance tests | verify that there are no systematic differences]] in observed characteristics between the treatment and control units. Given statistical equivalence at baseline and exposure to the same external environmental factors over the course of the program, any differences in outcomes between the treatment and controls groups after the program can be explained only by program.
===Level of Randomization===
Randomization is typically done at the level at which the program is implemented. For example, assignment to community health program will be randomized at the community level. Randomizing below this level means changing how the program is implemented. Note that randomization cannot take place at a lower level than that at which the outcome is measured.
====Individual vs. Cluster====
In individual randomization, individual units are assigned to a treatment or control group. Meanwhile, in cluster randomization, clusters of units rather than the units themselves are randomly assigned to treatment and control groups (i.e. cohort, village). Clustered RCTs are the preferred type of RCT when the intervention is by definition applied at the cluster rather than the individual level (i.e. an intervention targeted towards schools or health facilities in a given setting). The [[Sampling & Power Calculations | statistical power]] in cluster RCTs is typically lower than that for individually randomized trials, since outcomes within clusters are typically somewhat similar to each other. This means that the number of clusters in a cluster RCT, rather than the number of individuals who participate, is most relevant to the statistical power of the study. Cluster RCTs are often more expensive than individually-randomized RCTs. However, cluster RCTs provide administrative convenience, reduce ethical concerns, and avoid treatment group contamination.
====Phase-In====
In phase-in randomization, the roll-out of the intervention is randomized and every unit or cluster in the population of interest will get the program eventually. Phase-in designs are usually used at the cluster-level but may also be applied at the individual-level. For example, in an intervention intended to treat 100 villages, 50 villages are randomly selected to receive interventions in year 1 and 50 villages are selected to receive interventions in year 2. The latter group serves as the control group in year 1.
Randomized phase-ins are easily applied to project implementation schedules, as roll-outs typically happen over multiple years. Phase-in designs also reduce concerns of inequity and provide incentives to maintain contact. However, for control participants, phase-in designs could change present actions through setting expectations of future change. Further, phase-in designs complicate estimating long-run effects since once the intervention is fully rolled out, no control group remains. Long-run analyses can still examine differences between groups with degrees of exposure.
===Threats to Design===
Spillovers and crossovers pose threats to design. Spillovers occur when a program changes outcome for units in the control group. Spillovers may be physical, behavioral, informational, market or general equilibrium. Crossovers occur when a control unit directly receives the program, either intentionally or accidentally. In either of these situations, the validity of the control group is compromised because some control units receive treatment. In both cases, the comparison group no longer serves as a counterfactual.
==Data Collection==
RCTs require baseline and endline data and may also include midline data or follow-up data. These datasets can span the course of weeks, months, or years. This data is often acquired via [[Primary Data Collection | primary data collection]]. See the [https://dimewiki.worldbank.org/wiki/Category:Primary_Data_Collection Primary Data Collection] pages for more details.
==Analysis and Application==
For more information on handling and analyzing data, see [[Data Cleaning]] and [[Data Analysis]]. Once results are ready, they can be used to inform policy design. So goes the approach to evidence-based policy design: test a program at a small scale, rigorously evaluate it, and scale the program up in the same context if successful.
== Back to Parent ==
== Back to Parent ==
This article is part of the topic [[Experimental Methods]]
This article is part of the topic [[Experimental Methods]]


== Additional Resources ==
== Additional Resources ==
* Impact Evaluation in Practice. Paul J. Gertler, Sebastian Martinez, Patrick Premand, Christel Vermeersch, Laura B. Rawlings.World Bank Publications, 2016. [http://www.worldbank.org/en/programs/sief-trust-fund/publication/impact-evaluation-in-practice]
*DIME Analytics' presentations on randomization [https://github.com/worldbank/DIME-Resources/blob/master/stata1-5-randomization.pdf 1] and [https://github.com/worldbank/DIME-Resources/blob/master/stata2-5-randomization.pdf 2]
* https://economics.mit.edu/files/2785
* Gertler et al.’s [http://www.worldbank.org/en/programs/sief-trust-fund/publication/impact-evaluation-in-practice Impact Evaluation in Practice]
* JPAL's Introduction to Evaluations [https://www.povertyactionlab.org/research-resources/introduction-evaluations]
* Duflo and Kremer’s [https://economics.mit.edu/files/2785 Use of Randomization in the Evaluation of Development Effectiveness]
* Running Randomized Evaluations: A Practical Guide. Rachel Glennerster, Kudzai Takavarasha. Princeton University Press, 2013.  [http://runningres.com/]
*Duflo’s [http://pubdocs.worldbank.org/en/394531465569503682/Esther-Duflo-PRESENTATION.pdf Randomized Controlled Trials, Development Economics and Policy Making in Developing Countries] via JPAL and MIT
* Evidence in Governance and Politics [http://egap.org/]
* JPAL's [https://www.povertyactionlab.org/research-resources/introduction-evaluations Introduction to Evaluations]
* Field Experiments: Design, Analysis, and Interpretation. Alan S. Gerber, Donald P. Green. W. W. Norton, 2012. [http://books.wwnorton.com/books/webad.aspx?id=24003]
* Glennerster and Takavarasha’s [http://runningres.com/ Running Randomized Evaluations: A Practical Guide]
*Glennerster’s [https://ocw.mit.edu/resources/res-14-002-abdul-latif-jameel-poverty-action-lab-executive-training-evaluating-social-programs-2011-spring-2011/lecture-notes/MITRES_14_002S11_lec4.pdf How to Randomize]
*Gerber and Green’s [http://books.wwnorton.com/books/webad.aspx?id=24003 Field Experiments: Design, Analysis, and Interpretation]


[[Category: Experimental Methods]]
[[Category: Experimental Methods]]
[[Category: Research Design]]

Latest revision as of 14:38, 13 April 2021

A randomized controlled trial (RCT) is a method of impact evaluation in which all eligible units in a sample are randomly assigned to treatment and control groups. The treatment group receives or participates in the program being tested, while the control group does not. Given a sufficiently large number of units, an RCT ensures that the control and treatment groups are equal in both observed and unobserved characteristics, thus ruling out selection bias. The only difference between the treatment and control groups, then, is their participation in the intervention itself, and the difference in their outcomes therefore represents the impact of the intervention or program.

RCTs are considered the gold standard of impact evaluation. This page will explain the context of and reasons for evaluating and randomizing, provide details on randomized assignment, and briefly touch on major steps in the RCT process.

Read First

  • Randomized Evaluations: Principles of Study Design
  • RCTs provide a reliable estimation of program impact; if the treatment and control groups are identical in all aspects other than their participation in the program, then any differences in outcome must be accredited to the program itself.
  • Random assignment equitably ensures that every eligible unit has the same chance of receiving the program, free of subjective criteria, corruption or patronage.
  • Treatment may be randomized at the individual- or cluster-level and may follow a phase-in design.
  • RCTs inform evidence-based policy design by allowing research teams to test a program at a small scale, rigorously evaluate it, and scale the program up in the same context if successful.

Context

Why Evaluate?

Randomized controlled trials and impact evaluations in general play a critical role in evidence-based policy making. They provide an objective assessment of planned, ongoing or completed projects, programs or policies and give policymakers insight into what works and what doesn’t in different contexts. When deciding whether to use a randomized controlled trial to measure a program’s impact, consider a) what portion of the available budget does the program require? and b) how many people or entities will the program affect?

As the portion of budget used and the parties affected increase, so does the demand for and benefits of implementing a randomized controlled trial.

Why Randomize?

Randomization allows researchers to identify a group of program participants (treatment) and a group of non-participants (control) that are statistically equivalent in the absence of a program. In ensuring statistical equivalence between groups, randomization rules out confounding variables that otherwise bias measurements. Simple before-after comparisons or comparisons between non-randomized groups are likely biased by factors outside of the program itself. These methods lead to weaker, less robust, and potentially misleading conclusions. RCTs overcome these issues to provide a reliable estimation of program impact; if the two groups are identical in all aspects other than their participation in the program, then any differences in outcome must be accredited to the program itself.

Randomized Assignment

After defining the population of interest (i.e. households within 10 km of a road in region X, elementary schools in region Y) and identifying the survey budget, randomly assign treatment. Random assignment equitably ensures that every eligible unit has the same chance of receiving the program, free of subjective criteria, corruption or patronage. When there is excess demand for a program, program managers can easily and effectively explain randomized assignment to constituents and be understood.

Given that a sufficiently large number of units to which the randomized assignment process is applied, randomized assignment will produce groups with statistically equivalent averages for all their characteristics. These averages should also tend towards population averages, ensuring that the study is representative of the broader population. Before the program begins, research teams can use baseline data on the population of interest to verify that there are no systematic differences in observed characteristics between the treatment and control units. Given statistical equivalence at baseline and exposure to the same external environmental factors over the course of the program, any differences in outcomes between the treatment and controls groups after the program can be explained only by program.

Level of Randomization

Randomization is typically done at the level at which the program is implemented. For example, assignment to community health program will be randomized at the community level. Randomizing below this level means changing how the program is implemented. Note that randomization cannot take place at a lower level than that at which the outcome is measured.

Individual vs. Cluster

In individual randomization, individual units are assigned to a treatment or control group. Meanwhile, in cluster randomization, clusters of units rather than the units themselves are randomly assigned to treatment and control groups (i.e. cohort, village). Clustered RCTs are the preferred type of RCT when the intervention is by definition applied at the cluster rather than the individual level (i.e. an intervention targeted towards schools or health facilities in a given setting). The statistical power in cluster RCTs is typically lower than that for individually randomized trials, since outcomes within clusters are typically somewhat similar to each other. This means that the number of clusters in a cluster RCT, rather than the number of individuals who participate, is most relevant to the statistical power of the study. Cluster RCTs are often more expensive than individually-randomized RCTs. However, cluster RCTs provide administrative convenience, reduce ethical concerns, and avoid treatment group contamination.

Phase-In

In phase-in randomization, the roll-out of the intervention is randomized and every unit or cluster in the population of interest will get the program eventually. Phase-in designs are usually used at the cluster-level but may also be applied at the individual-level. For example, in an intervention intended to treat 100 villages, 50 villages are randomly selected to receive interventions in year 1 and 50 villages are selected to receive interventions in year 2. The latter group serves as the control group in year 1.

Randomized phase-ins are easily applied to project implementation schedules, as roll-outs typically happen over multiple years. Phase-in designs also reduce concerns of inequity and provide incentives to maintain contact. However, for control participants, phase-in designs could change present actions through setting expectations of future change. Further, phase-in designs complicate estimating long-run effects since once the intervention is fully rolled out, no control group remains. Long-run analyses can still examine differences between groups with degrees of exposure.

Threats to Design

Spillovers and crossovers pose threats to design. Spillovers occur when a program changes outcome for units in the control group. Spillovers may be physical, behavioral, informational, market or general equilibrium. Crossovers occur when a control unit directly receives the program, either intentionally or accidentally. In either of these situations, the validity of the control group is compromised because some control units receive treatment. In both cases, the comparison group no longer serves as a counterfactual.

Data Collection

RCTs require baseline and endline data and may also include midline data or follow-up data. These datasets can span the course of weeks, months, or years. This data is often acquired via primary data collection. See the Primary Data Collection pages for more details.

Analysis and Application

For more information on handling and analyzing data, see Data Cleaning and Data Analysis. Once results are ready, they can be used to inform policy design. So goes the approach to evidence-based policy design: test a program at a small scale, rigorously evaluate it, and scale the program up in the same context if successful.

Back to Parent

This article is part of the topic Experimental Methods

Additional Resources