

Line 1: 
Line 1: 
−  Creating a statistically valid, representative sample is a crucial step in conducting high quality [[Randomized Control Trials  randomized control trials]] and [[Experimental Methods  experimental methods]]. The sampling process consists of two parts: sample design and sample implementation, both of which should occur early in the evaluation design process in order to facilitate [[Preparing for Field Data Collection  data collection planning]] and other technical processes like preloading questionnaires. The sample size also affects a research project’s [[Survey Budgetbudget]], timeline, accuracy, and precision. This page provides guidelines on how to calculate your sample size before launching fieldwork.
 +  #REDIRECT [[Sample Size and Power Calculations]] 
−   
−  == Read First ==
 
−   
−  *A sample too small will not permit you to detect a statistically significant effect, while a sample too large may be a waste of limited resources. When choosing sample size, consider the marginal value of added observations or, in other words, the tradeoffs between accuracy and cost.
 
−  *To calculate sample size, you need to know the [[Minimum Detectable Effectminimum detectable effect size]] (MDE), the standard deviation of the population outcome, and the Type I and Type II significance levels. For clustered samples, you need to also know the intracluster correlation and the average cluster size.
 
−   
−  == Factors Related to Sample Size ==
 
−   
−  The sample size formula, below, relies largely on the minimum detectable effect size, the standard deviation in population outcome, and the Type I and Type II significance levels. These variables are represented in the formula by D, σ, α, and β, respectively. Details of each variable follow.
 
−   
−  [[File:Sample Size Formula.png  200px]].
 
−   
−  ===Minimum Detectable Effect Size===
 
−   
−  The minimum detectable effect size is the effect size below which we cannot precisely distinguish the effect from zero, even if it exists. If a researcher sets MDE to 10%, for example, he/she may not be able to distinguish a 7% increase in income from a null effect. To be clear, MDE is not the effect size we expect or want. However, to select MDE, it is important to consider the expected effect size. For example, if a program is expected to raise incomes by a minimum of 10%, it may not be necessary to have the option to distinguish program impacts of less than 10% from a null effect.
 
−   
−  As the MDE increases, the necessary sample size decreases.
 
−   
−  === Standard Deviation of Population Outcome ===
 
−   
−  The standard deviation of population outcome measures the variability of the data. As the standard deviation in a population increases, so does the necessary sample size. Consider, for example, a researcher who is studying the effect of an intervention on household income. Within the city of interest, household incomes range from 15,000 USD to 200,000 USD and standard deviation is quite large. Since the standard deviation is large, the researcher needs a larger sample size to detect an effect of the intervention.
 
−   
−  === Statistical Confidence / Precision ===
 
−   
−  Type I errors occur when the null hypothesis is true but is rejected (otherwise known as a false positive). Type II errors occur when the null hypothesis is false but erroneously fails to be rejected (otherwise known as a false negative). To calculate necessary sample size, power calculations require that you specify the Type I and Type II significance levels. For impact evaluations, researchers typically set Type I significance level (α) to 5% and Type II significance level (β) to 80%.
 
−   
−  The greater precision desired, the larger the sample size required.
 
−   
−  == Additional Factors for [[Multistage (Cluster) SamplingClustered Sampling]] ==
 
−   
−  When calculating sample sizes for a clustered sample, the sample size formula includes an inflation factor called the “design effect,” outlined in green in the formula below. The design effect relies on two additional variables: the intracluster correlation (ICC) and the average cluster size. These variables are represented by ρ and m, respectively, in the formula below. Details of each variable follow.
 
−   
−  [[File: Sample_Size_Formula_(clustering).png  300px ]]
 
−   
−  === Intracluster correlation ===
 
−   
−  Intracluster correlation is the proportion of the total sample variation explained by within cluster level variance. When within cluster variance is high, within cluster correlation is low and between cluster correlation is high. As the intracluster correlation increases, the necessary sample size increases.
 
−   
−  === Average Cluster Size ===
 
−   
−  As the number of units per cluster increases, the necessary sample size increases. The sample size formula above is most accurate when all clusters are of equal size.
 
−   
−  == Indirect Effects on Sample Size ==
 
−  === Takeup ===
 
−   
−  If the program takeup is low, then the observed effect in the treatment group is lower. Effectively, a lower proportion of compliers dilutes MDE. For example, if program takeup rate is 50%, the observed effect in treatment group is half the size when compared to a 100% takeup rate. If MDE is half the size, the necessary sample size quadruples.
 
− 
 
−  === Data Quality ===
 
−   
−  Poor data quality (i.e. missing observations, measurement error) also increases the necessary sample size. The best way to avoid poor data quality is proactively by [[Enumerator Trainingtraining enumerators]] effectively, creating and implementing a [[Data Quality Assurance Plan data quality assurance plan]], [[Monitoring Data Qualitymonitoring data quality]] in the field, conducting [[Back Checksback checks]]. Ideally, a field coordinator will be on the ground monitoring data collection.
 
−  == Back to Parent ==
 
−  This article is part of the topic [[Sampling & Power Calculations]]
 
−   
−  == Additional Resources ==
 
−  *JPAL’s [https://www.povertyactionlab.org/sites/default/files/resources/L5_Sampling%20and%20Sample%20Size_0.pdf slides] explain sampling and sample size in detail.
 
−  *DIME Analytics guidelines on survey sampling [https://github.com/worldbank/DIMEResources/blob/master/surveysampling1.pdf 1] and [https://github.com/worldbank/DIMEResources/blob/master/surveysampling2.pdf 2]
 
−   
−  [[Category: Sampling & Power Calculations ]]
 