Difference between revisions of "Selection Bias"
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
'''Selection bias''' | '''Selection bias''' occurs when participants in a program (treatment group) are systematically different from non-participants (control group). Selection bias affects the validity of program evaluations whenever selection of treatment and control groups is done non-randomly. | ||
== Read First == | == Read First == | ||
* | * Selection bias means that treatment and control groups are not comparable, and therefore the '''impact evaluation''' is not internally valid. | ||
* The only foolproof way to avoid selection bias is to do a [[Randomized Control Trials | randomized control trial]]. | * The only foolproof way to avoid selection bias is to do a [[Randomized Control Trials | randomized control trial]]. | ||
== | == Overview == | ||
'''Selection bias''' can be | '''Selection bias''' can be positive or negative. For example, an [[Experimental Methods | evaluation]] of an after-school program for at-risk youth compares those who volunteered for the program to those who did not. It is likely that the volunteers are more motivated and eager than those who did not, which may make the program appear more effective than it is. On the other hand, evaluating the program by comparing outcomes for the participants in the after-school program to 'average' students may understate the effect of the program, as the at-risk youth likely perform worse on average. | ||
There is no reliable way to estimate the size of selection bias. | There is no reliable way to estimate the size of selection bias. | ||
==How to avoid selection bias== | |||
The best way to avoid selection bias is to use [[Randomization | randomization]]. Randomizing selection of beneficiaries into treatment and control groups, for example, ensures that the two groups are comparable in terms of observable and unobservable characteristics. | The best way to avoid selection bias is to use [[Randomization | randomization]]. [[SurveyCTO Additional Topics#Random Draw of Beneficiaries Example 1|Randomizing selection of beneficiaries]] into treatment and control groups, for example, ensures that the two groups are comparable in terms of observable and unobservable characteristics. | ||
It is important to randomize both at the level of treatment and to have a random sample of survey respondents. | It is important to '''randomize''' both at the level of treatment and to have a '''random''' [[Sampling|sample]] of [[Survey Pilot|survey]] respondents. As an example, treatment is randomly assigned by classroom for an evaluation of the impact of tablets in the classroom and a '''random sample''' of students is drawn per classroom for the '''survey'''. | ||
Non-randomized evaluations attempt to avoid selection bias by making the control group as comparable as possible, typically by matching on observables. The more data that is available for | Non-randomized evaluations attempt to avoid selection bias by making the control group as comparable as possible, typically by [[Matching|matching]] on observables. The more data that is available for '''matching''', the more convincing this is. | ||
==Selection bias in Sampling== | |||
Selection bias can be a problem even in [[Randomized Control Trials | randomized control trials]]. For example: | |||
* '''High levels of attrition between [[Survey Protocols | survey]] rounds''': the respondents for the follow-up survey may be systematically different. For example, this would be the case if wealthier households are more likely to migrate and therefore the sample at follow-up would be systematically poorer. | * '''High levels of attrition between [[Survey Protocols | survey]] rounds''': the respondents for the follow-up '''survey''' may be systematically different. For example, this would be the case if wealthier households are more likely to migrate and therefore the [[Sampling|sample]] at follow-up would be systematically poorer. | ||
* '''High item non-response''': missing data can create worries of selection bias within a particular question. For example, if half the sample answers ' | * '''High item non-response''': missing data can create worries of selection bias within a particular question. For example, if half the '''sample''' answers 'don't know' to a question on income, those respondents will be excluded from the [[Data Analysis|analysis]]. However, if people who have lower levels of numeracy or less systematic income are less likely to know, this can create bias. | ||
* '''Survey mode''': for example, phone surveys limit the set of respondents to those who have access to a mobile phone. If the full population of interest does not, this could bias responses (respondents of higher socioeconomic status are more likely to have phones, but impacts may differ by socioeconomic status) | * '''Survey mode''': for example, phone '''surveys''' limit the set of respondents to those who have access to a mobile phone. If the full population of interest does not, this could bias responses (respondents of higher socioeconomic status are more likely to have phones, but impacts may differ by socioeconomic status) | ||
== | == Related Pages == | ||
[[Special:WhatLinksHere/Selection_Bias|Click here for pages that link to this topic]]. | |||
== Additional Resources == | |||
[[Category: Quasi-Experimental Methods]] | [[Category: Quasi-Experimental Methods]] |
Latest revision as of 14:05, 9 August 2023
Selection bias occurs when participants in a program (treatment group) are systematically different from non-participants (control group). Selection bias affects the validity of program evaluations whenever selection of treatment and control groups is done non-randomly.
Read First
- Selection bias means that treatment and control groups are not comparable, and therefore the impact evaluation is not internally valid.
- The only foolproof way to avoid selection bias is to do a randomized control trial.
Overview
Selection bias can be positive or negative. For example, an evaluation of an after-school program for at-risk youth compares those who volunteered for the program to those who did not. It is likely that the volunteers are more motivated and eager than those who did not, which may make the program appear more effective than it is. On the other hand, evaluating the program by comparing outcomes for the participants in the after-school program to 'average' students may understate the effect of the program, as the at-risk youth likely perform worse on average.
There is no reliable way to estimate the size of selection bias.
How to avoid selection bias
The best way to avoid selection bias is to use randomization. Randomizing selection of beneficiaries into treatment and control groups, for example, ensures that the two groups are comparable in terms of observable and unobservable characteristics.
It is important to randomize both at the level of treatment and to have a random sample of survey respondents. As an example, treatment is randomly assigned by classroom for an evaluation of the impact of tablets in the classroom and a random sample of students is drawn per classroom for the survey.
Non-randomized evaluations attempt to avoid selection bias by making the control group as comparable as possible, typically by matching on observables. The more data that is available for matching, the more convincing this is.
Selection bias in Sampling
Selection bias can be a problem even in randomized control trials. For example:
- High levels of attrition between survey rounds: the respondents for the follow-up survey may be systematically different. For example, this would be the case if wealthier households are more likely to migrate and therefore the sample at follow-up would be systematically poorer.
- High item non-response: missing data can create worries of selection bias within a particular question. For example, if half the sample answers 'don't know' to a question on income, those respondents will be excluded from the analysis. However, if people who have lower levels of numeracy or less systematic income are less likely to know, this can create bias.
- Survey mode: for example, phone surveys limit the set of respondents to those who have access to a mobile phone. If the full population of interest does not, this could bias responses (respondents of higher socioeconomic status are more likely to have phones, but impacts may differ by socioeconomic status)
Related Pages
Click here for pages that link to this topic.