# Regression Discontinuity

Regression Discontinuity design is a quasi-experimental impact evaluation design which attempts to find the causal effects of interventions by assigning a threshold (cut off point) above and below which the treatment is assigned. Observations closely on either side of the threshold are compared to estimate the average treatment effect. Regression Discontinuity is done in situations when actual random assignment of control and treatment might not be feasible due to various reasons.

Regression discontinuity design is a key method (Lee and Lemieux 2010 prefer to see it as a particular data generating process) in applied researchers’ toolkit interested in unveiling causal effects from different sorts of policies. The method was first used in 1960 by Thistlethwaite and Campbell who were interested in identifying the causal impacts of merit awards assigned based on observed test scores on future academic outcomes (Lee and Lemieux 2010).

Applications using RD design increased exponentially in the last few years and it has been applied in different fields such as social protection programs such as conditional cash transfers, educational programs such as school grants, SME policies, and electoral accountability.

The intuition behind the RD design is very simple. The main problem posed to causal inference methods is the self-selection problem, more specifically when selection to a given intervention or program is based on individual’s unobserved characteristics such as innate ability, and motivation. With randomized controlled trials, the assignment to ‘treatment’ (T) and ‘control’ (C) groups is random and hence independent (orthogonal) from individuals’ willingness to participate in the intervention.

In the RD design, the assignment to T and C groups is based on some clear-cut threshold (or cutoff) of an observed variable such as age, income, and score. Causal inference is then made comparing individuals in both sides of the cutoff. (Add a figure to illustrate)

## Asumptions

The application of the method relies on two assumptions.

First, the threshold should not be perfectly manipulatable. In order words, the method accommodates some manipulation in case some individuals play around to increase their chances to be included (or excluded) from some intervention. There are different ways of checking the plausibility of this assumption, but perhaps the most used one by applied researchers is the McCrary Density Test. This test check whether there is indication of perfect manipulation of the assignment variable by looking for discontinuities in its density function around the cutoff point.

Second, individuals close to the cutoff point should be very similar, on average, in observed and unobserved characteristics. In the RD framework, this means that the distribution of the observed and unobserved variables should be continuous around the threshold. Even though researchers can check similarity between observed covariates, the similarity between unobserved characteristics has to be assumed. This is considered a plausible assumption to make for individuals very close to the cutoff point, that is, for a relatively narrow window.

Finally, unlike the instrumental variable framework, the RD design does not require the exclusion restriction as the assignment variable (also called running variable or forcing variable) can be directly correlated with the outcome variable (see Lee and Lemieux 2010). The identification strategy in RD design framework requires that conditional on the assignment variable, participation in a program or intervention is exogeneous. This is very similar to the conditional independency assumption, but because of the discontinuity, the assumption is required for individuals lying each side of the threshold.

In practice, the assignment rule can be deterministic or probabilistic (see Hahn et al. 2001). If deterministic, the design is called Sharp as the assignment rule defines treatment status with probabilities 0 or 1. If probabilistic, the design is called Fuzzy as the assignment rule defines ‘eligibility’ status rather than ‘treatment’ status. What could cause the fuzziness? Imperfect compliance with some law/rule, imperfect implementation that could end up treating some control units, spillover effects, or some manipulation of the forcing variable could lead to a fuzzy RD design. Thus, the estimates of the causal effect under the fuzzy design require more assumptions than under the sharp design, but are weaker than any IV approach.

The key assumption of a fuzzy design is that without the assignment rule some of those who take up the treatment would not participate in the programme (for similarities between IV and RDD approaches, see Imbens and Lemieux 2008 and van der Klaauw 2008). The forcing variable acts as a nudge. The subgroup that participates in a programme due to the selection rule is called *compliers* (see e.g. Angrist and Imbens 1994, and Imbens, Angrist, and Rubin 1996). Thus, under the RDD the treatment effects are estimated only for the group of compliers.

For the sake of illustration, let X be the treatment variable, Z the assignment variable and Y the outcome variable. Under sharp design, the treatment variable X is a deterministic function of Z, and

is discontinuous in some observable values of Z, i.e., ZTemplate:Sub. Defining the observed outcome model as XTemplate:Sub XTemplate:Sup

### Subsection 2

### Subsection 3

## Back to Parent

This article is part of the topic Impact Evaluation Design

## Additional Resources

- An introduction and user guide to Regression discontinuity. Lee, David S., and Thomas Lemieuxa. "Regression discontinuity designs in economics." Journal of economic literature 48, no. 2 (2010): 281-355.