Difference between revisions of "Randomization in Stata"

Jump to: navigation, search
Line 5: Line 5:
* Use a dataset which has a unique ID [respondent ID, household number, etc.
* Use a dataset which has a unique ID [respondent ID, household number, etc.
* While writing a do-file, pay close attention to the following things:
* While writing a do-file, pay close attention to the following things:
** Set version. Setting Stata's version in a do file ensures that the randomization algorithm is the same, as it sometimes changes between Stata versions.  
** Set version. Setting Stata's version in a do file ensures that the randomization algorithm is the same, as it sometimes changes between Stata versions. </br> For example - <code> version 12.0 </code>
For example - <code> version 12.0 </code>
** Set seed. This makes sure that the same random number is generated for the first observation, for the second observation, and so on, for every time the code is run. </br> For example - <code> set seed 12345 </code>
** Set seed. This makes sure that the same random number is generated for the first observation, for the second observation, and so on, for every time the code is run.  
For example - <code> set seed 12345 </code>
** Properly sorting the data. The data should be sorted such that observations are in the same order every time the code is run. The most optimal situation is sorting using an ID variable which uniquely and fully identifies each observation.
** Properly sorting the data. The data should be sorted such that observations are in the same order every time the code is run. The most optimal situation is sorting using an ID variable which uniquely and fully identifies each observation.
*Convert the random numbers into categorical variables or dummy variables. This helps you check if the data is balanced.
*Convert the random numbers into categorical variables or dummy variables. This helps you check if the data is balanced.


The end goal is to have a CSV format file containing the ID variable used for randomization and the categorical variables created from the random numbers generated. This dataset will be preloaded into SurveyCTO so that after an enumerator enters the respondent ID at the start of a survey questionnaire the result of the randomization will be loaded for the form and can be used for various sections of the survey.
The end goal is to have a CSV format file containing the ID variable used for randomization and the categorical variables created from the random numbers generated. This dataset will be preloaded into SurveyCTO so that after an enumerator enters the respondent ID at the start of a survey questionnaire the result of the randomization will be loaded for the form and can be used for various sections of the survey.

Revision as of 16:07, 24 January 2017

Steps on effectively using Stata to randomize survey questionnaire

Here are a few steps that should be followed to create a reproducible randomization using Stata:

  • Use a dataset which has a unique ID [respondent ID, household number, etc.
  • While writing a do-file, pay close attention to the following things:
    • Set version. Setting Stata's version in a do file ensures that the randomization algorithm is the same, as it sometimes changes between Stata versions.
      For example - version 12.0
    • Set seed. This makes sure that the same random number is generated for the first observation, for the second observation, and so on, for every time the code is run.
      For example - set seed 12345
    • Properly sorting the data. The data should be sorted such that observations are in the same order every time the code is run. The most optimal situation is sorting using an ID variable which uniquely and fully identifies each observation.
  • Convert the random numbers into categorical variables or dummy variables. This helps you check if the data is balanced.

The end goal is to have a CSV format file containing the ID variable used for randomization and the categorical variables created from the random numbers generated. This dataset will be preloaded into SurveyCTO so that after an enumerator enters the respondent ID at the start of a survey questionnaire the result of the randomization will be loaded for the form and can be used for various sections of the survey.