# Sample Size

## Contents

## Read First

The size of the sample will determine whether you can distinguish an impact of the studied program or intervention that is statistically distinguishable from the null.

## Guidelines

### Why is sample size important?

It is rarely cost-effective to collect data from the full population of interest. Rather, a sample is used. The size of the sample will effect the precision of your estimates. It is important to think about the trade-offs between accuracy and cost, i.e. the marginal value of added observations.

### What factors influence what sample size I need?

The sample size formula for a simple random sample is as follows.

Below find a description of each element in the formula, including expected direction of relation to sample size.

#### Expected size of impact

D: Minimum Detectable Effect Size (MDES)

The lowest effect size you want to be able to precisely distinguish from zero. If you set the MDES at 10%, a 7% increase in income would not necessarily be distinguishable from a null effect. The appropriate assumption to use for MDES will depend on the expected impact of the program. For example, if a program is expected to raise incomes by a minimum of 10%, it may not be necessary to be able to distinguish program impacts of less than 10% from a null effect.

The smaller the effect we want to be able to distinguish, the larger the sample size required.

#### Variation in outcome

σ: standard deviation in population outcome measure

The higher the level of variance in the outcome, the larger the sample size required, as the image below illustrates.

#### Statistical confidence / precision

α relates to “type I error” -- typically set this to 5%

β relates to “type II error” -- typically set this to 80%

The more precision, the larger the sample size required

### Additional considerations for clustered sampling

Sample size formula (clustered design)

#### Level of clustering

ρ: intracluster correlation effect
m: number of units per cluster
As a whole, the second part of this formula (distinguishing it from the SRS formula above), represents what is referred to as the *Design Effect*

### Indirect Effects on Sample Size

#### Take up

MDE is “diluted” by proportion of compliers

If program take up is 50%, this means that the observed effect in treatment group will be half the size when compared to 100% take up

If MDE is half the size, n quadruples...

#### Data quality

Poor data quality effectively increases required sample size

- Missing observations
- High measurement error

Best way to avoid this is a field coordinator on the ground monitoring data collection.

## Back to Parent

This article is part of the topic Sampling & Power Calculations

## Additional Resources

Please add here any articles related to this topic, with a brief description and link