Difference between revisions of "Randomization in Excel"

Jump to: navigation, search
Line 1: Line 1:
This article discusses the advantages and disadvantages of randomizing using Excel and also gives a step-by-step guide on randomizing using Excel. [[ Randomized Evaluations: Principles of Study Design | Randomization]] is critical in performing field experiments to determine the impact of policy interventions.  
[[Randomization | Randomization]] involves assigning subjects randomly to one of two groups: one, the '''treatment group''', which is receiving the policy intervention being evaluated, and two, the '''control group''', which remains in status-quo/untreated. '''Randomizing in Excel''' has its advantages and disadvantages but this article gives a step-by-step guide on randomizing using Excel.  
== Read First ==
*[[Randomized Evaluations: Principles of Study Design | Randomized Evaluations]] are field experiments that use [[Randomization | randomization]] to determine the effectiveness of an intervention.
* If [[Randomization in Stata | randomization using Stata]] is feasible, then it should always be the preferred option as[[Randomization in Stata | randomization in Stata]] is more easily [[Reproducible Research | reproducible]].
* [[Randomization | Randomization]] can also be done using [[SurveyCTO Programming | SurveyCTO]] however, [[Stata Coding Practices |Stata]]''' is the preferred option over '''SurveyCTO'''.


== Read First ==
*[[ Randomized Evaluations: Principles of Study Design | Randomization]] involves assigning subjects randomly to one of two groups: one, the treatment group, which is receiving the policy intervention being evaluated, and two, the control group, which remains in status-quo/untreated.


* If [[Randomization in Stata | randomization using Stata]] is feasible, then it should always be the preferred option as[[Randomization in Stata | randomization in Stata]] is more easily [[Reproducible Research | reproducible]].
== Overview ==


* Randomization can also be done using SurveyCTO however, Stata is the preferred option over SurveyCTO.
[[Randomization in Stata | Stata]] is preferred over Excel because of the following reasons:
* '''Easy documentation.''' [[Randomization | Randomization]] done in [[Randomization in Stata | Stata]] can be better documented through files. Every step can be documented which makes it easier to [[Reproducible Research#Replication and Reproducibility |reproduce]] the results.
*[[Stata Coding Practices | Stata]] gives us the option of setting which version of Stata we use for randomization. This is useful when different researchers use different versions of [[Stata Coding Practices | Stata]].
*'''Better documentation and version control.''' Documentation of [[Randomization | randomization]] results in Stata remains consistent across various runs.


== Reasons why Stata is always preferable to Excel ==
[[Stata Coding Practices | Stata]] might not be available in some cases. For those cases, the advantages and disadvantages of '''randomizing using Excel''' are as follows:
Here are some of the reasons :
* '''Easy documentation.''' Randomization done in Stata can be better documented through files. Every step can be documented which makes it easier to reproduce the results.
*[[Stata Coding Practices | Stata]] gives us the option of setting which version of Stata we use for randomization. This is useful when different researchers use different versions of Stata.
*Better documentation and version control make randomization results in Stata consistent across various runs.


== Advantages and Disadvantages of Randomizing using Excel ==
'''Advantages:'''
Stata might not be available in some cases. For those cases, the advantages and disadvantages of randomizing using Excel are as follows:
===Advantages===
Here are some of the advantages of randomizing using Excel :  
Here are some of the advantages of randomizing using Excel :  
*Balance/stratification can be achieved during randomization using Excel.  
*[[Balance tests | Balance/stratification]] can be achieved during randomization using Excel.  
*Randomization using Excel is simple to implement and produces a record.
*'''Simplicity.''' Randomization using Excel is simple to implement and produces a record.
*Since Excel is widely used, it is commonly understood and used by project staff.  
*'''Popularity.''' Since Excel is widely used, it is commonly understood and used by project staff.  


===Disadvantages ===
'''Disadvantages:'''
Some of the disadvantages of using Excel to randomize are as follows:  
Some of the disadvantages of using Excel to randomize are as follows:  
*Excel is more mysterious to beneficiaries than public randomization(For example - drawing names from a hat, etc)  
*'''Transparency.''' Excel is more mysterious to beneficiaries than public randomization (For example - drawing names from a hat, etc)  
*Randomization in Excel is less [[Reproducible Research#Replication and Reproducibility | replicable]] than [[Randomization in Stata|randomization in Stata]].  
*'''Replicability.''' Randomization in Excel is less [[Reproducible Research#Replication and Reproducibility | replicable]] than [[Randomization in Stata|randomization in Stata]].  
*Since the randomization involves copying and pasting, it can be subject to human errors.
*'''Errors.''' Since the randomization involves copying and pasting, it can be subject to human errors.
*It is also less flexible to changes in the randomization plan.
*'''Flexibility.''' It is also less flexible to changes in the randomization plan.


==Steps for Randomization in Excel ==  
==Steps for Randomization in Excel ==  
Line 33: Line 33:
Here are the steps of doing successful '''randomization using Excel''':
Here are the steps of doing successful '''randomization using Excel''':


# The first thing that needs to be done is to define a randomization rule. For example = the lowest 50% will be treatment, the rest will be assigned to control, etc.  
# '''Randomization Rule.''' For example = the lowest 50% will be treatment, the rest will be assigned to control, etc.  
# Assign random numbers to each observation. While doing this, use "paste values" to stop recalculating the randomization. </br> <code> =rand() </code>
# '''=rand().''' Assign random numbers to each observation. While doing this, use "paste values" to stop recalculating the randomization. </br> <code> =rand() </code>
# Sort the random numbers from the lowest to the highest.
# '''Sorting.''' Sort the random numbers from the lowest to the highest.
# Created an ordered serial number. If you need to balance the data, then first sort by the strata, then by the random values.  
# '''Order.''' Created an ordered serial number. If you need to balance the data, then first sort by the strata, then by the random values.  
# Assign groups using either the <code> '''mod''' </code> or the  <code> '''if''' </code> formulas. <!-- Add stratification and balanced part here-->
# '''Assignment.''' Assign groups using either the <code> '''mod''' </code> or the  <code> '''if''' </code> formulas. <!-- Add stratification and balanced part here-->
# Save the record.
# '''Finish.''' Save the record.


== Back to Parent ==
== Related Pages ==
This article is part of the topic [[Randomized Control Trials]]
This article is part of the topic [[Randomized Control Trials]]




[[Category: Impact Evaluation Design ]]
[[Category: Impact Evaluation Design ]]

Revision as of 18:02, 12 July 2022

Randomization involves assigning subjects randomly to one of two groups: one, the treatment group, which is receiving the policy intervention being evaluated, and two, the control group, which remains in status-quo/untreated. Randomizing in Excel has its advantages and disadvantages but this article gives a step-by-step guide on randomizing using Excel.

Read First


Overview

Stata is preferred over Excel because of the following reasons:

  • Easy documentation. Randomization done in Stata can be better documented through files. Every step can be documented which makes it easier to reproduce the results.
  • Stata gives us the option of setting which version of Stata we use for randomization. This is useful when different researchers use different versions of Stata.
  • Better documentation and version control. Documentation of randomization results in Stata remains consistent across various runs.

Stata might not be available in some cases. For those cases, the advantages and disadvantages of randomizing using Excel are as follows:

Advantages: Here are some of the advantages of randomizing using Excel :

  • Balance/stratification can be achieved during randomization using Excel.
  • Simplicity. Randomization using Excel is simple to implement and produces a record.
  • Popularity. Since Excel is widely used, it is commonly understood and used by project staff.

Disadvantages: Some of the disadvantages of using Excel to randomize are as follows:

  • Transparency. Excel is more mysterious to beneficiaries than public randomization (For example - drawing names from a hat, etc)
  • Replicability. Randomization in Excel is less replicable than randomization in Stata.
  • Errors. Since the randomization involves copying and pasting, it can be subject to human errors.
  • Flexibility. It is also less flexible to changes in the randomization plan.

Steps for Randomization in Excel

Here are the steps of doing successful randomization using Excel:

  1. Randomization Rule. For example = the lowest 50% will be treatment, the rest will be assigned to control, etc.
  2. =rand(). Assign random numbers to each observation. While doing this, use "paste values" to stop recalculating the randomization.
    =rand()
  3. Sorting. Sort the random numbers from the lowest to the highest.
  4. Order. Created an ordered serial number. If you need to balance the data, then first sort by the strata, then by the random values.
  5. Assignment. Assign groups using either the mod or the if formulas.
  6. Finish. Save the record.

Related Pages

This article is part of the topic Randomized Control Trials