

(5 intermediate revisions by 3 users not shown) 
Line 1: 
Line 1: 
−  Creating a statistically valid sample representative of the population of interest for the impact evaluation is a crucial aspect of impact evaluation design. This task can be roughly divided into two phases: sample design and implementation. Implementation typically means writing a software program to enact the sampling strategy.
 +  For information on sampling approaches, please see [[Sampling]]. For information on sample size and power calculations, see [[Sample Size and Power Calculations]]. 
−   
−   
−  == Read First ==
 
−  * To calculate exact sample size, you need to know the effect of the program and the mean and standard deviation of your outcome of interest for both the treatment and the control group. You cannot know these with certainty at the start of an impact evaluation. For this reason, power calculations require estimates and assumptions, and can seem like more of an art than a science.
 
−   
−  * Sampling code requires extra care! Errors cannot be corrected after the intervention (or survey) has started. Always ask a second person to doublecheck your code before you use the sampling it generated in the field. For DIME projects, you should always consult any member of DIME Analytics before sending a sample to the field. Do not randomize the sample from a temporary data set or a data set constructed for only this purpose. Instead, always randomize from a [[Master_Data_SetMaster data set]]. If no master data set exist for the [[Unit_of_Observationunit of observation]] you are sampling on, then it is very important that you start by creating that.
 
−   
−  == Sampling ==
 
−   
−  === Sample Size===
 
−  Power Calculations are a statistical tool to help determine [[Sample Size]]. This is important, a sample that is too small means that you will not be able to detect a statistically significant effect, and a sample size that is too large can be a waste of limited resources.
 
−  You can estimate either sample size or minimum detectable effect. Which you should estimate depends on the research design and constraints of a specific impact evaluation. The types of questions you can answer through power calculations include:
 
−  * Given that I want to be able to statistically distinguish program impact of a 10% change in my outcome of interest, what is the minimum sample size needed?
 
−  * Given that I only have budget to sample 1,000 households, what is the minimum effect size that I will be able to distinguish from a null effect? (this is known as [[Minimum Detectable Effect]])
 
−   
−   
−  Power calculations should be done at [[Impact Evaluation Design]] stage. They are mostly typically done using [https://www.stata.com/ Stata] or [http://hlmsoft.net/od/ Optimal Design] (See [[Power Calculations in Optimal Design]], [[Power Calculations in Stata]]). Power calculations can be used to determine either sample size (using standard assumption of 80% power) or power (if sample size is constrained).
 
−   
−  Intuition:
 
−  [[Media:Sample Size Intuition.pngSummary of Determinants of Sample Size ]]
 
−   
−  === Sample Design ===
 
−  ''Population'': What is the population of interest for the impact evaluation? In other words, what population does your sample need to represent? This will vary depending on the study design. Some data on the overall population is required, in order to draw a representative sample.
 
−   
−  ''Stratification'': To ensure a representative sample you can use [[Stratified Random Samplestratification]]. A typical variable to stratify on is gender. When you stratify on gender you guarantee that your sample has the same ratio of women as the population frame you are sampling from.
 
−   
−  === Sample Selection ===
 
−  The most basic sampling technique is a Simple Random Sample. This works well for studies of small populations, with a complete sampling frame for the population. More typically, impact evaluations rely on [[Multistage (Cluster) Sampling  multistage or clustered sampling]], often with [[Stratified Random Samplestratification]].
 
−   
−  You should always work from a [[Master_Data_Setmaster data set]] of the population (sampling frame). If you do not have a master data set for the [[Unit_of_Observationunit of observation]] you are sampling from (for example, households, villages, clinics, schools) you should always start by creating one. In the field, this is done by a [[listing]] at the lowest level of clustering possible. If it is impossible to do a listing, an alternative is to do an "onthespot" randomization. There are a few different methods here, for example, a ‘random walk’ by enumerators where they spin a bottle to determine a random direction. But without knowing the total number of households this will always be biased towards the households at the center of the village. In addition, it’s hard to monitor whether protocols are adhered to in the field, and there isn’t a systematic way of tracing when replacements were used and how they were established.
 
−   
−  === Randomization in Stata ===
 
−  All sampling code you produce must be reproducible. Any code that includes randomization needs version, seed and sort to be reproducible. See [[Randomization in Statareproducible randomization in Stata]] for details.
 
−   
−  == Power Calculations ==
 
−   
−  ===Software for Power Calculations===
 
−  [http://www.stata.com/ Stata Stata] is better for [[Reproducible Researchreproducible research]], in that the power calculations are codified in a do file. However, it is less visual and intuitive than [[Power Calculations in Optimal DesignOptimal Design]], and Stata's builtin program for sample size calculations, ''power'', does not allow for corrections for clustering (there are user written programs to do this, but all have some pitfalls). See [[Power Calculations in Stata]] for details.
 
−   
−  [https://sites.google.com/site/optimaldesignsoftware/home Optimal Design] creates graphs to visualize tradeoffs and relationships between the various components of the sample size equation. However, transparency is an issue when using this software. Most people just save graphs it creates, but that could be difficult to replicate in the future. Other issues with Optimal Design are:
 
−  * It cannot calculate power for an individuallevel randomization with binary outcome
 
−  * It assumes equal mean and variance for treatment and control (for an RCT this is generally okay)
 
−  * It only gives you total number of clusters or sample size, assuming equal split, whereas you might want to fix the size of your treatment group (say budget constraints) and calculate control group size
 
−  See [[Power Calculations in Optimal Design]] for details.
 
−   
−  == Back to Parent ==
 
−  This article is part of the topic [[Sampling & Power Calculations]]
 
−   
−  == Additional Resources ==
 
−  *[https://www.povertyactionlab.org/sites/default/files/resources/2017.01.11TheDangerofUnderpoweredEvaluations.pdf The Danger of Underpowered Evaluations], JPAL North America
 
−  * [http://unstats.un.org/unsd/demographic/sources/surveys/Series_F98en.pdf Designing Household Survey Samples: Practical Guidelines] United Nations, Department of Economic and Social Affairs, Statistics Division  2008
 
−  * Why it makes sense to revisit power calculations after data has been collected: http://andrewgelman.com/2017/03/03/yesmakessensedesignanalysispowercalculationsdatacollected/
 
−  * Development Impact Blog: [http://blogs.worldbank.org/impactevaluations/powercalculationswhatsoftwareshouldiuse "Power Calculations: What software should I use?"]
 
−   
−  [[Category: Sampling & Power Calculations ]]
 