Difference between revisions of "Selection Bias"

Jump to: navigation, search
 
(8 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Selection bias is when participants in a program (treatment group) are systematically different from non-participants (control group). Selection bias affects the validity of program evaluations whenever selection of treatment and control groups is done non-randomly. The only foolproof way to avoid selection bias is to do a [[Randomized Control Trial]].
'''Selection bias''' occurs when participants in a program (treatment group) are systematically different from non-participants (control group). Selection bias affects the validity of program evaluations whenever selection of treatment and control groups is done non-randomly.


== Read First ==
== Read First ==
Selection bias means that treatment and control groups are not comparable, and therefore the impact evaluation is not internally valid.
* Selection bias means that treatment and control groups are not comparable, and therefore the '''impact evaluation''' is not internally valid.


== Guidelines ==
* The only foolproof way to avoid selection bias is to do a [[Randomized Control Trials | randomized control trial]].
Selection bias can be either positive or negative. For example, an evaluation of an after-school program for at-risk youth compares those who volunteered for the program to those who did not. It is likely that the volunteers are more motivated and eager than those who did not, which may make the program appear more effective than it is. On the other hand, comparing evaluating the program by comparing outcomes for the participants in the after-school program to 'average' students may understate the effect of the program, as the at-risk youth likely perform worse on average.  
 
== Overview ==
'''Selection bias''' can be positive or negative. For example, an [[Experimental Methods | evaluation]] of an after-school program for at-risk youth compares those who volunteered for the program to those who did not. It is likely that the volunteers are more motivated and eager than those who did not, which may make the program appear more effective than it is. On the other hand, evaluating the program by comparing outcomes for the participants in the after-school program to 'average' students may understate the effect of the program, as the at-risk youth likely perform worse on average.  


There is no reliable way to estimate the size of selection bias.  
There is no reliable way to estimate the size of selection bias.  


===How to avoid selection bias?===
==How to avoid selection bias==
The best way to avoid selection bias is to use randomization. Randomizing selection of beneficiaries into treatment and control groups, for example, ensures that the two groups are comparable in terms of observable and unobservable characteristics.
The best way to avoid selection bias is to use [[Randomization | randomization]]. [[SurveyCTO Additional Topics#Random Draw of Beneficiaries Example 1|Randomizing selection of beneficiaries]] into treatment and control groups, for example, ensures that the two groups are comparable in terms of observable and unobservable characteristics.


It is important to randomize both at the level of treatment, and to have a random sample of survey respondents. For example, for an evaluation of the impact of ipads in the classroom, treatment is randomly assigned by classroom and for the survey, a random sample of students is drawn per classroom.  
It is important to '''randomize''' both at the level of treatment and to have a '''random''' [[Sampling|sample]] of [[Survey Pilot|survey]] respondents. As an example, treatment is randomly assigned by classroom for an evaluation of the impact of tablets in the classroom and a '''random sample''' of students is drawn per classroom for the '''survey'''.


Non-randomized evaluations attempt to avoid selection bias by making the control group as comparable as possible, typically by matching on observables. The more data that is available for matching, the more convincing this is.
Non-randomized evaluations attempt to avoid selection bias by making the control group as comparable as possible, typically by [[Matching|matching]] on observables. The more data that is available for '''matching''', the more convincing this is.


===Selection bias in Sampling?===
==Selection bias in Sampling==
Selection bias can be a problem even in a [[Randomized Control Trial]]. For example:
Selection bias can be a problem even in [[Randomized Control Trials | randomized control trials]]. For example:
* High levels of attrition between survey rounds: the respondents for the follow-up survey may be systematically different. For example, this would be the case if wealthier households are more likely to migrate and therefore the sample at follow-up would be systematically poorer.  
* '''High levels of attrition between [[Survey Protocols | survey]] rounds''': the respondents for the follow-up '''survey''' may be systematically different. For example, this would be the case if wealthier households are more likely to migrate and therefore the [[Sampling|sample]] at follow-up would be systematically poorer.  
* High item non-response: missing data can create worries of selection bias within a particular question. For example, if half the sample answers 'doesn't know' to a question on income, those respondents will be excluded from the analysis. However, if people who have lower levels of numeracy or less systematic income are less likely to know, this can create bias.  
* '''High item non-response''': missing data can create worries of selection bias within a particular question. For example, if half the '''sample''' answers 'don't know' to a question on income, those respondents will be excluded from the [[Data Analysis|analysis]]. However, if people who have lower levels of numeracy or less systematic income are less likely to know, this can create bias.  
* Survey mode: for example, phone surveys limit the set of respondents to those who have access to a mobile phone. If the full population of interest does not, this could bias responses (respondents of higher socioeconomic status are are more likely to have phones, but impacts may differ by socioeconomic status)
* '''Survey mode''': for example, phone '''surveys''' limit the set of respondents to those who have access to a mobile phone. If the full population of interest does not, this could bias responses (respondents of higher socioeconomic status are more likely to have phones, but impacts may differ by socioeconomic status)


== Related Pages ==
[[Special:WhatLinksHere/Selection_Bias|Click here for pages that link to this topic]].


===Subsection 3===
== Additional Resources ==


== Back to Parent ==
This article is part of the topic [[Impact Evaluation Design]]
== Additional Resources ==
* list here other articles related to this topic, with a brief description and link


[[Category: Impact Evaluation Design ]]
[[Category: Quasi-Experimental Methods]]

Latest revision as of 14:05, 9 August 2023

Selection bias occurs when participants in a program (treatment group) are systematically different from non-participants (control group). Selection bias affects the validity of program evaluations whenever selection of treatment and control groups is done non-randomly.

Read First

  • Selection bias means that treatment and control groups are not comparable, and therefore the impact evaluation is not internally valid.

Overview

Selection bias can be positive or negative. For example, an evaluation of an after-school program for at-risk youth compares those who volunteered for the program to those who did not. It is likely that the volunteers are more motivated and eager than those who did not, which may make the program appear more effective than it is. On the other hand, evaluating the program by comparing outcomes for the participants in the after-school program to 'average' students may understate the effect of the program, as the at-risk youth likely perform worse on average.

There is no reliable way to estimate the size of selection bias.

How to avoid selection bias

The best way to avoid selection bias is to use randomization. Randomizing selection of beneficiaries into treatment and control groups, for example, ensures that the two groups are comparable in terms of observable and unobservable characteristics.

It is important to randomize both at the level of treatment and to have a random sample of survey respondents. As an example, treatment is randomly assigned by classroom for an evaluation of the impact of tablets in the classroom and a random sample of students is drawn per classroom for the survey.

Non-randomized evaluations attempt to avoid selection bias by making the control group as comparable as possible, typically by matching on observables. The more data that is available for matching, the more convincing this is.

Selection bias in Sampling

Selection bias can be a problem even in randomized control trials. For example:

  • High levels of attrition between survey rounds: the respondents for the follow-up survey may be systematically different. For example, this would be the case if wealthier households are more likely to migrate and therefore the sample at follow-up would be systematically poorer.
  • High item non-response: missing data can create worries of selection bias within a particular question. For example, if half the sample answers 'don't know' to a question on income, those respondents will be excluded from the analysis. However, if people who have lower levels of numeracy or less systematic income are less likely to know, this can create bias.
  • Survey mode: for example, phone surveys limit the set of respondents to those who have access to a mobile phone. If the full population of interest does not, this could bias responses (respondents of higher socioeconomic status are more likely to have phones, but impacts may differ by socioeconomic status)

Related Pages

Click here for pages that link to this topic.

Additional Resources