Sample Size

Jump to: navigation, search

Read First

The size of the sample will determine whether you can distinguish an impact of the studied program or intervention that is statistically distinguishable from the null.


Why is sample size important?

It is rarely cost-effective to collect data from the full population of interest. Rather, a sample is used. The size of the sample will effect the precision of your estimates. It is important to think about the trade-offs between accuracy and cost, i.e. the marginal value of added observations.

What factors influence what sample size I need?

The sample size formula for a simple random sample is as follows. Sample Size Formula.png

Below find a description of each element in the formula, including expected direction of relation to sample size.

Expected size of impact

D: Minimum Detectable Effect Size (MDES)

The lowest effect size you want to be able to precisely distinguish from zero. If you set the MDES at 10%, a 7% increase in income would not necessarily be distinguishable from a null effect. The appropriate assumption to use for MDES will depend on the expected impact of the program. For example, if a program is expected to raise incomes by a minimum of 10%, it may not be necessary to be able to distinguish program impacts of less than 10% from a null effect.

The smaller the effect we want to be able to distinguish, the larger the sample size required.

Variation in outcome

σ: standard deviation in population outcome measure

The higher the level of variance in the outcome, the larger the sample size required, as the image below illustrates.

Sample size variance.png

Statistical confidence / precision

α relates to “type I error” -- typically set this to 5%

β relates to “type II error” -- typically set this to 80%

The more precision, the larger the sample size required

Additional considerations for clustered sampling

Sample size formula (clustered design) Sample Size Formula (clustering).png

Level of clustering

ρ: intracluster correlation effect m: number of units per cluster As a whole, the second part of this formula (distinguishing it from the SRS formula above), represents what is referred to as the Design Effect

Indirect Effects on Sample Size

Take up

MDE is “diluted” by proportion of compliers

If program take up is 50%, this means that the observed effect in treatment group will be half the size when compared to 100% take up

If MDE is half the size, n quadruples...

Data quality

Poor data quality effectively increases required sample size

  • Missing observations
  • High measurement error

Best way to avoid this is a field coordinator on the ground monitoring data collection.

Back to Parent

This article is part of the topic Sampling & Power Calculations

Additional Resources

Please add here any articles related to this topic, with a brief description and link