Difference between revisions of "Stratified Random Sample"

Jump to: navigation, search
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Stratification is an ex-ante statistical technique that ensures that sub-groups of the population are represented in the final sample and treatment groups. In addition to ensuring representativeness, stratification allows researchers to disaggregate by subgroup during analysis. Stratification takes place when [[Sampling & Power Calculations | defining the sample]] and treatment assignments during research design. This page outlines why and how to stratify.


== Read First ==
*Stratified random sampling ensures that sub-groups of a population are represented in the sample and in treatment groups.
*Stratified random sampling is essential for any evaluation that seeks to compare program impacts between subgroups.
*The Stata commands <code>egen strata</code> and <code>randtreat</code> are useful for stratification.
==Why Stratify?==
Stratification accomplishes two key goals. First, it ensures that the sample and treatment groups are representative of the broader population. Second, it allows researchers to identify if and to what extent a program differentially impacts distinct groups of the population. Say, for example, you are evaluating a mobile money program in Region Y. In addition to measuring the overall impact of mobile money accounts on households in Region Y, you would like to measure the specific impact of mobile money accounts on rural households in Region Y. However, only 20 percent of households in Region Y are rural. In this case, even a large random sample would likely fail to contain a sufficient number of rural households to reliably estimate the rural-specific impact of the program. Stratified random sampling provides a solution to this scenario by balancing treatment and control across sub-populations and thus facilitating statistically significant comparisons across groups.  In general, stratified random sampling is essential for any evaluation that seeks to compare program impacts between subgroups.


== Read First ==
==How to Stratify?==
Stratification is a statistical technique to ensure representation of subpopulations in the final sample. If you intend to disaggregate by subgroup for the analysis, it is best practice to stratify by the same subgroups when sampling.  
 
To stratify, first divide the target population into subgroups, or stratum. You may stratify on variables that you believe may significantly impact the outcome variable and/or on subgroups that you are particularly interested in evaluating.  You may stratify with one or multiple variables; as the number of variables increases, so does the number of strata. For example, if you are stratifying on Variable A (i.e. 3 education groupings), Variable B (i.e. 2 geographic groupings), and Variable C (i.e. 3 age groupings), you will have 18 distinct strata. You can generate strata using the Stata command <code>egen strata=group(A B C)</code>. Note that since some combinations of stratification variables may be more common than others, the strata sizes may vary.


== Guidelines ==
Then, randomize within each strata. To obtain a stratified simple random sample, randomize within each strata via <code>randtreat</code>. 


===Why stratify?===
===Dealing with Misfits===
===How to stratify?===
If your sample isn’t divisible by the number of strata, [http://blogs.worldbank.org/impactevaluations/tools-of-the-trade-doing-stratified-randomization-with-uneven-numbers-in-some-strata McKenzie and Bruhn] advise to randomly allocate the leftovers within each strata, in such a way that if you have X units left in a strata, you allocate the remaining X units per strata in a manner in which “a) no treatment or control group gets allocated more than one of these units within the strata, and b) we randomly choose which treatment groups get the extra units.” Stata’s <code>randtreat</code> command provides a number of options for dealing with misfits via the misfit option.


== Back to Parent ==
== Back to Parent ==
This article is part of the topic [[Sampling & Power Calculations]]
This article is part of the topic [[Sampling & Power Calculations]]


== Additional Resources ==
== Additional Resources ==
* list here other articles related to this topic, with a brief description and link
*DIME Analytics' presentations on randomization [https://github.com/worldbank/DIME-Resources/blob/master/stata1-5-randomization.pdf 1] and [https://github.com/worldbank/DIME-Resources/blob/master/stata2-5-randomization.pdf 2], the latter of which covers stratification
 
*JPAL's [https://www.povertyactionlab.org/sites/default/files/documents/L4_How%20to%20Randomize.pdf How to Randomize]
*Gertler's [https://siteresources.worldbank.org/EXTHDOFFICE/Resources/5485726-1295455628620/Impact_Evaluation_in_Practice.pdf Impact Evaluation in Practice]
*JPAL's [http://cega.berkeley.edu/assets/cega_learning_materials/81/Methods_Manual_JPAL_110603.pdf Overview of Methodology for Randomized Experiments]
*The World Bank's [http://web.worldbank.org/archive/website01542/WEB/IMAGES/1_STATA_.PDF Randomization “how to” in Stata (plus other random stuff)]
[[Category: Sampling & Power Calculations‏ ]]
[[Category: Sampling & Power Calculations‏ ]]

Latest revision as of 21:25, 9 June 2019

Stratification is an ex-ante statistical technique that ensures that sub-groups of the population are represented in the final sample and treatment groups. In addition to ensuring representativeness, stratification allows researchers to disaggregate by subgroup during analysis. Stratification takes place when defining the sample and treatment assignments during research design. This page outlines why and how to stratify.

Read First

  • Stratified random sampling ensures that sub-groups of a population are represented in the sample and in treatment groups.
  • Stratified random sampling is essential for any evaluation that seeks to compare program impacts between subgroups.
  • The Stata commands egen strata and randtreat are useful for stratification.

Why Stratify?

Stratification accomplishes two key goals. First, it ensures that the sample and treatment groups are representative of the broader population. Second, it allows researchers to identify if and to what extent a program differentially impacts distinct groups of the population. Say, for example, you are evaluating a mobile money program in Region Y. In addition to measuring the overall impact of mobile money accounts on households in Region Y, you would like to measure the specific impact of mobile money accounts on rural households in Region Y. However, only 20 percent of households in Region Y are rural. In this case, even a large random sample would likely fail to contain a sufficient number of rural households to reliably estimate the rural-specific impact of the program. Stratified random sampling provides a solution to this scenario by balancing treatment and control across sub-populations and thus facilitating statistically significant comparisons across groups. In general, stratified random sampling is essential for any evaluation that seeks to compare program impacts between subgroups.

How to Stratify?

To stratify, first divide the target population into subgroups, or stratum. You may stratify on variables that you believe may significantly impact the outcome variable and/or on subgroups that you are particularly interested in evaluating. You may stratify with one or multiple variables; as the number of variables increases, so does the number of strata. For example, if you are stratifying on Variable A (i.e. 3 education groupings), Variable B (i.e. 2 geographic groupings), and Variable C (i.e. 3 age groupings), you will have 18 distinct strata. You can generate strata using the Stata command egen strata=group(A B C). Note that since some combinations of stratification variables may be more common than others, the strata sizes may vary.

Then, randomize within each strata. To obtain a stratified simple random sample, randomize within each strata via randtreat.

Dealing with Misfits

If your sample isn’t divisible by the number of strata, McKenzie and Bruhn advise to randomly allocate the leftovers within each strata, in such a way that if you have X units left in a strata, you allocate the remaining X units per strata in a manner in which “a) no treatment or control group gets allocated more than one of these units within the strata, and b) we randomly choose which treatment groups get the extra units.” Stata’s randtreat command provides a number of options for dealing with misfits via the misfit option.

Back to Parent

This article is part of the topic Sampling & Power Calculations

Additional Resources