Creating a statistically valid, representative sample is a crucial step in conducting high quality randomized control trials and experimental methods. The sampling process consists of two parts: sample design and sample implementation, both of which should occur early in the evaluation design process in order to facilitate data collection planning and other technical processes like preloading questionnaires. The sample size also affects a research project’s budget, timeline, accuracy, and precision. This page provides guidelines on how to calculate your sample size before launching fieldwork.
- A sample too small will not permit you to detect a statistically significant effect, while a sample too large may be a waste of limited resources. When choosing sample size, consider the marginal value of added observations or, in other words, the trade-offs between accuracy and cost.
- To calculate sample size, you need to know the minimum detectable effect size (MDE), the standard deviation of the population outcome, and the Type I and Type II significance levels. For clustered samples, you need to also know the intra-cluster correlation and the average cluster size.
Factors Related to Sample Size
The sample size formula, below, relies largely on the minimum detectable effect size, the standard deviation in population outcome, and the Type I and Type II significance levels. These variables are represented in the formula by D, σ, α, and β, respectively. Details of each variable follow.
Minimum Detectable Effect Size
The minimum detectable effect size is the effect size below which we cannot precisely distinguish the effect from zero, even if it exists. If a researcher sets MDE to 10%, for example, he/she may not be able to distinguish a 7% increase in income from a null effect. To be clear, MDE is not the effect size we expect or want. However, to select MDE, it is important to consider the expected effect size. For example, if a program is expected to raise incomes by a minimum of 10%, it may not be necessary to have the option to distinguish program impacts of less than 10% from a null effect.
As the MDE increases, the necessary sample size decreases.
Standard Deviation of Population Outcome
The standard deviation of population outcome measures the variability of the data. As the standard deviation in a population increases, so does the necessary sample size. Consider, for example, a researcher who is studying the effect of an intervention on household income. Within the city of interest, household incomes range from 15,000 USD to 200,000 USD and standard deviation is quite large. Since the standard deviation is large, the researcher needs a larger sample size to detect an effect of the intervention.
Statistical Confidence / Precision
Type I errors occur when the null hypothesis is true but is rejected (otherwise known as a false positive). Type II errors occur when the null hypothesis is false but erroneously fails to be rejected (otherwise known as a false negative). To calculate necessary sample size, power calculations require that you specify the Type I and Type II significance levels. For impact evaluations, researchers typically set Type I significance level (α) to 5% and Type II significance level (β) to 80%.
The greater precision desired, the larger the sample size required.
Additional Factors for Clustered Sampling
When calculating sample sizes for a clustered sample, the sample size formula includes an inflation factor called the “design effect,” outlined in green in the formula below. The design effect relies on two additional variables: the intracluster correlation (ICC) and the average cluster size. These variables are represented by ρ and m, respectively, in the formula below. Details of each variable follow.
Intracluster correlation is the proportion of the total sample variation explained by within cluster level variance. When within cluster variance is high, within cluster correlation is low and between cluster correlation is high. As the intracluster correlation increases, the necessary sample size increases.
Average Cluster Size
As the number of units per cluster increases, the necessary sample size increases. The sample size formula above is most accurate when all clusters are of equal size.
Indirect Effects on Sample Size
If the program take-up is low, then the observed effect in the treatment group is lower. Effectively, a lower proportion of compliers dilutes MDE. For example, if program take-up rate is 50%, the observed effect in treatment group is half the size when compared to a 100% take-up rate. If MDE is half the size, the necessary sample size quadruples.
Poor data quality (i.e. missing observations, measurement error) also increases the necessary sample size. The best way to avoid poor data quality is proactively by training enumerators effectively, creating and implementing a data quality assurance plan, monitoring data quality in the field, conducting back checks. Ideally, a field coordinator will be on the ground monitoring data collection.
Back to Parent
This article is part of the topic Sampling & Power Calculations