Randomization is a critical step for ensuring exogeneity in experimental methods and randomized control trials (RCTs). Stata provides a replicable, reliable, and well-documented way to randomize treatment before beginning fieldwork.

Read First

  • Common alternatives to using Stata for randomization include: (i) Using the Excel Rand command; (ii) Randomizing directly within a chosen electronic survey platform such as SurveyCTO; or (iii) randomization through a public lottery.
  • Randomizing in Stata is preferred to randomizing in Excel or randomizing in survey software because it is transparent, reproducible, and gives the research more time to run balance tests and double check assignments.
  • Make sure to set the version, set the seed, sort the data, and use unique IDs when randomizing in Stata.
  • For information how to draw a stratified random sample, see Stratified Random Sample.

Randomization in Stata

During surveys, we often need to randomize various aspects of the questionnaire. For example – sometimes we need to randomize which household members to interview, and sometimes - which set of questions to ask. While most CAPI software have random number generators, it is not the preferred option. Using, for example, Stata to randomize and then preloading the generated data file into the survey software is in almost all cases the better option among the two. The main advantages of using Stata over CAPI software during randomization are as follows:

  • Randomization in Stata is transparent and reproducible which is important for publishing research.
  • Randomization results in Stata can be dependent, so that we are guaranteed that no disproportional large share of the results falls into any group. Randomization is always independent in SurveyCTO which means that no groups could be assigned observations if the number of observation per groups is low.
  • Randomization in Stata provides the option of ensuring that the result of the randomization is balanced over other variables, i.e. stratas. This means that we can guarantee that, for example, not all female respondents end up in a certain group.
  • Randomization in Stata is done before the survey takes place. This provides an opportunity to double check the result of a randomization and fix bugs and typos in the randomization code before it is used in the field, as it then would be too late to fix.