# Matching

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Matching is a quasi-experimental method in which the researcher uses statistical techniques to construct an artificial control group by matching each treated unit with a non-treated unit of similar characteristics. Matching is useful for estimating the impact of a program or event for which it is not ethically or logistically feasible to randomize. This page outlines approaches to and limitations of matching methods.

## Read First

• Matching requires extensive datasets with information on treated and non-treated units’ characteristics before the treatment.
• To implement matching in Stata, use the `iematch` command. For more information on matching implementation, see Additional Resources.
• Matching methods rely on the assumption that there are no systematic differences in unobserved characteristics between the treatment units and the matched comparison units

## Overview

Matching is a quasi-experimental method in which the researcher uses statistical techniques to construct an artificial control group by matching each treated unit with a non-treated unit of similar characteristics. Consider, for example, a researcher who wants to measure the effect of a water filter installment program on health outcomes; however, the program doesn’t have clear assignment rules or randomization to explain why participating households enrolled in the program and why non-participating households did not.

Using a dataset that contains information on the units that enrolled in the program and units that didn’t, the researcher can use matching methods to identify non-participant units most similar to the participant units. The dataset should contain baseline data. The characteristics on which the units are matched should be pre-intervention traits; if not, matching is a very risky approach. Then, the researcher can approximate the characteristics that most influence the units decision to enroll and find matches to serve as the control group. These matches make it possible to estimate the counterfactual and the impact of the program.

## Approaches and Variations

### Propensity Score Matching

Propensity score matching is a matching method that computes that probability that a unit will enroll in the program. This probability is called the propensity score and is used to match units in the treatment group with unenrolled units of similar propensity scores. For more information, see Propensity Score Matching.

### Matched Difference-in-Differences

Matched difference-in-differences combines matching methods with difference-in-differences to reduce the risk of bias in the estimation. To implement:

• Match treatment units to control units
• Compute the difference-in-differences.

This method controls for any unobserved, time-invariant characteristics between the two groups. For more information, see Difference-in-Differences.

### Synthetic Control Method

The synthetic control method estimates impact for an event or intervention (i.e. political event, natural disaster) experienced by a single unit (i.e. state, country). The method uses data on the treated unit and the untreated units, weighting each untreated unit in a manner that most closely resembles the treated unit to ultimately create a synthetic control. This process requires extensive panel data on the characteristics of the treated and untreated units.

## Limitations

Matching methods have two main limitations: they require extensive datasets to properly match units and they rely on broad assumptions that are difficult to prove. First, matching requires extensive, datasets – ideally on baseline characteristics. This is not always available. Second, the validity of matching methods relies on the assumption that there are no systematic differences in unobserved characteristics between the treatment units and the matched comparison units. It is difficult to prove this assumption correct, making matching methods a less robust approach than, for example, randomized control trials (RCTs) or regression discontinuity design (RDD), which do not require this assumption. As mentioned, the matched difference-in-differences method controls for unobserved, time-invariant characteristics.

## Back to Parent

This article is part of the topic Quasi-Experimental Methods.