# Difference between revisions of "Regression Discontinuity"

Regression Discontinuity (RD) design is a quasi-experimental impact evaluation design which attempts to find the causal effects of interventions by assigning a threshold (cutoff point) above and below which the treatment is assigned. Observations closely on either side of the threshold are compared to estimate the average treatment effect. Regression Discontinuity is done in situations when actual random assignment of control and treatment might not be feasible due to various reasons.

RD design is a key method (Lee and Lemieux 2010 prefer to see it as a particular data generating process) in applied researchers’ toolkit interested in unveiling causal effects from different sorts of policies. The method was first used in 1960 by Thistlethwaite and Campbell who were interested in identifying the causal impacts of merit awards assigned based on observed test scores on future academic outcomes (Lee and Lemieux 2010).

Applications using RD design increased exponentially in the last few years and it has been applied in different fields such as social protection programs such as conditional cash transfers, educational programs such as school grants, SME policies, and electoral accountability.

The intuition behind the RD design is very simple. The main problem posed to causal inference methods is the self-selection problem, more specifically when selection to a given intervention or program is based on individual’s unobserved characteristics such as innate ability, and motivation. With randomized controlled trials, the assignment to ‘treatment’ (T) and ‘control’ (C) groups is random and hence independent (orthogonal) from individuals’ willingness to participate in the intervention.

In the RD design, the assignment to T and C groups is based on some clear-cut threshold (or cutoff) of an observed variable such as age, income, and score. Causal inference is then made comparing individuals on both sides of the cutoff.

## Asumptions

The application of the method relies on two assumptions.

First, the threshold should not be perfectly manipulatable. In order words, the method accommodates some manipulation in case some individuals play around to increase their chances to be included (or excluded) from some intervention. There are different ways of checking the plausibility of this assumption, but perhaps the most used one by applied researchers is the McCrary Density Test. This test checks whether there is indication of perfect manipulation of the assignment variable by looking for discontinuities in its density function around the cutoff point.

Second, individuals close to the cutoff point should be very similar, on average, in observed and unobserved characteristics. In the RD framework, this means that the distribution of the observed and unobserved variables should be continuous around the threshold. Even though researchers can check similarity between observed covariates, the similarity between unobserved characteristics has to be assumed. This is considered a plausible assumption to make for individuals very close to the cutoff point, that is, for a relatively narrow window.

Finally, unlike the instrumental variable framework, the RD design does not require the exclusion restriction as the assignment variable (also called running variable or forcing variable) can be directly correlated with the outcome variable (see Lee and Lemieux 2010). The identification strategy in RD design framework requires that conditional on the assignment variable, participation in a program or intervention is exogeneous. This is very similar to the conditional independency assumption, but because of the discontinuity, the assumption is required for individuals lying each side of the threshold.

## Sharp and fuzzy RD

In practice, the assignment rule can be deterministic or probabilistic (see Hahn et al. 2001). If deterministic, the design is called Sharp as the assignment rule defines treatment status with probabilities 0 or 1. If probabilistic, the design is called Fuzzy as the assignment rule defines ‘eligibility’ status rather than ‘treatment’ status. What could cause the fuzziness? Imperfect compliance with some law/rule, imperfect implementation that could end up treating some control units, spillover effects, or some manipulation of the forcing variable could lead to a fuzzy RD design. Thus, the estimates of the causal effect under the fuzzy design require more assumptions than under the sharp design, but are weaker than any IV approach.

The key assumption of a fuzzy design is that without the assignment rule some of those who take up the treatment would not participate in the programme (for similarities between IV and RDD approaches, see Imbens and Lemieux 2008 and van der Klaauw 2008). The forcing variable acts as a nudge. The subgroup that participates in a programme due to the selection rule is called compliers (see e.g. Angrist and Imbens 1994, and Imbens, Angrist, and Rubin 1996). Thus, under the RDD the treatment effects are estimated only for the group of compliers.

For the sake of illustration, let ${\displaystyle X}$ be the treatment variable, ${\displaystyle Z}$ the assignment variable and ${\displaystyle Y}$ the outcome variable. Under sharp design, the treatment variable ${\displaystyle X}$ is a deterministic function of ${\displaystyle Z}$, and ${\displaystyle X=f(Z)}$ is discontinuous in some observable values of ${\displaystyle Z}$, i.e., ${\displaystyle Z_{0}}$. Defining the observed outcome model as ${\displaystyle Y_{i}=\alpha _{i}+X_{i}\beta _{i}}$, and assuming that:

1. The limits ${\displaystyle X^{+}=\lim _{z\to X_{0}^{+}}E[X_{i}|Z_{i}=Z]}$ and ${\displaystyle X^{-}=\lim _{z\to X_{0}^{-}}E[X_{i}|Z_{i}=Z]}$ exist and ${\displaystyle X^{+}\neq X^{-}}$: and
2. ${\displaystyle E[\alpha _{i}|Z_{i}=Z]}$ is continuous in ${\displaystyle Z}$ at ${\displaystyle Z_{0}}$ such that for an arbitrarily small ${\displaystyle e>0}$, ${\displaystyle E[\alpha _{i}|Z_{i}=Z_{0}+e]\cong E[\alpha _{i}|Z_{i}=Z_{0}-e]}$

Then the (local) treatment effect in a sharp design is given by: ${\displaystyle \beta _{sharp}={\frac {Y^{+}-Y^{-}}{X^{+}-X^{-}}}=Y^{+}-Y^{-}}$, since ${\displaystyle X^{+}=1}$ and ${\displaystyle X^{-}=0}$. $\displaystyle Y^+$ and ${\displaystyle Y^{-}}$ are defined similarly to ${\displaystyle X^{+}}$ and ${\displaystyle X^{-}}$.

In the fuzzy design, ${\displaystyle X_{i}}$ is a random variable given ${\displaystyle Z_{i}}$ and the conditional probability ${\displaystyle X=f(Z)=Pr[X_{i}=1|Z_{i}=Z]}$ is known to be discontinuous in ${\displaystyle Z_{0}}$. Thus, the only difference between the sharp and fuzzy estimators is that for the latter ${\displaystyle X^{+}\neq 1}$ and ${\displaystyle X^{-}/ne0}$, i.e., 'there are additional variables unobserved by the econometrician that determine assignment to the treatment’ (Hahn et al., 2001, p. 202). So, the treatment effect in a fuzzy design is given by: ${\displaystyle \beta _{fuzzy}={\frac {Y^{+}-Y^{-}}{X^{+}-X^{-}}}}$

Although the sharp and fuzzy estimators identify only the local average treatment effect, i.e., the treatment effect for the individuals close to the cut-off, Hahn et al. (2001) note that this method has many advantages compared to other quasi-experimental approaches in that it does not depend on functional form assumptions when estimates can be obtained with narrow bandwidths and does not require identifying instruments or the set of variables that affect the selection rule for a particular programme (or treatment).

That said, the most recent advances in the RDD literature suggest that it is not very accurate to interpret a discontinuity design as a local experiment. To be considered ‘as good as a local experiment for the units close enough to the cutoff point’, one should have to use a very narrow bandwidth and drop the assignment variable (or a function of it) from the regression equation. For more details on this point see Cattaneo et al.

### Empirical Challenges: Bandwidth Size, Structural Form, and Falsification Tests

The estimation of the treatment effects can be performed parametrically as follows:

${\displaystyle y_{i}=\alpha +\delta X_{i}+h(Z_{i})+\varepsilon _{i}}$

where ${\displaystyle y_{i}}$ is the outcome of interest of individual i, ${\displaystyle X_{i}}$ is an indicator function that takes the value of 1 for individuals assigned to the treatment and 0 otherwise, ${\displaystyle Z_{i}}$ is the assignment variable that defines an observable clear cutoff point, and ${\displaystyle h(Z_{i})}$ is a flexible function in ${\displaystyle Z}$. The identification strategy hinges on the exogeneity of${\displaystyle Z}$ at the threshold. It is standard to center the assignment variable at the cutoff point. In this case, one would use ${\displaystyle h(Z_{i}-Z_{0})}$ instead with ${\displaystyle Z_{0}}$ being the cutoff. With that assumption, the parameter of interest, ${\displaystyle \delta }$, provides the treatment effect estimate. In the case of a sharp design with perfect compliance, the parameter ${\displaystyle \delta }$ identifies the average treatment effect on the treated (ATT). In the case of a fuzzy design, $\displaystyle \delta$ corresponds to the intent-to-treat effects – i.e. the effect of the eligibility rather than the treatment itself on the outcomes of interest. The LATE can be estimated using an IV approach. This could be done as follows:

First stage: $\displaystyle P_i =\alpha+\delta X_i+h(Z_i)+\varepsilon_i$

Second stage: ${\displaystyle y_{i}=\mu +\delta {\hat {P_{i}}}+h(Z_{i})+u_{i}}$,

where ${\displaystyle P_{i}}$ is a dummy variable that identify actual participation of individual i in the program/intervention. Notice that with a parametric specification the researcher should specify $\displaystyle h(Z_i )$ the same way in both regressions (Imbens and Lemieux 2008).

Despite the natural appeal of parametric method such as the one just outlined, this method has some direct practical implications. First, the right functional form of $\displaystyle h(Z_i )$ is never known. Researchers are thus encouraged to fit the model with different specifications of $\displaystyle h(Z_i )$ (Lee and Lemieux 2010), particularly when have to consider data farther away from the cutoff point to have enough statistical power. (For those interested in some fresh discussion on power calculation for RD design, please see these links here, here and here).

Although some authors test the sensitivity of results using high order polynomials, there is some recent discussion arguing against the use of high order polynomials given that they assign too much weight to observations away of the cutoff point (Imbens and Gelman 2014).

In practice, the size of the window (usually referred to in this literature as bandwidth size) depends on data availability (see discussion below). Ideally, one would like to have enough sample to run the regressions using information very close to the cutoff. The main advantage of using a very narrow bandwidth is that the functional form h(Z_i ) becomes much less of a worry and treatment effects can be obtained with parametric regression using a linear or piecewise linear specification of the assignment variable (see Lee and Lemieux 2010 for this point).

Falsification (or placebo) tests are really important when using RD design as identification strategy. The researcher needs to convince the reader (and referees!) that the discontinuity been exploited to inform causal impacts of an intervention was very much likely caused by the assignment rule to the intervention. In practice, researchers use fake cutoffs or different cohorts to run those tests. Examples can be seen here, here, and here.

Another way of estimating treatment effects with RD design is via non-parametric methods. In fact, the use of non-parametric methods has been growing in the last few years, at least to check robustness of estimates obtained parametrically. This might be partially explained by the increasing number of available STATA commands (if you want to know more about STATA commands for RDD, please using this here) but perhaps more importantly due to some attractive properties of the method compared to parametric ones (see e.g. Imbens and Gelman 2014 for this point).

The use of non-parametric methods does not come without costs. There are still many decisions left to the researcher such as which kernel function to use, which algorithm to use for selection of optimal bandwidth size, and whether to use local linear or any other specification (Imbens and Gelman 2014 suggest the use of local linear and at most local quadratic polynomials).

For those interested in knowing more RD design and its recent ramifications, check this practical introduction (link here). For more advanced stuff, check this e-book.