Difference between revisions of "Randomization in Excel"
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[Randomization | Randomization]] involves assigning subjects randomly to one of two groups: | [[Randomization | Randomization]] involves assigning subjects randomly to one of two groups: the '''treatment group''', which is receiving the policy intervention being evaluated, and the '''control group''' which is untreated. Randomizing in Excel has its advantages and disadvantages. This article gives a step-by-step guide on randomizing using Excel. | ||
== Read First == | == Read First == | ||
*[[Randomized Evaluations: Principles of Study Design | Randomized Evaluations]] are field experiments that use [[Randomization | randomization]] to determine the effectiveness of an intervention. | *[[Randomized Evaluations: Principles of Study Design | Randomized Evaluations]] are field experiments that use [[Randomization | randomization]] to determine the effectiveness of an intervention. | ||
* If [[Randomization in Stata | randomization using Stata]] is feasible, then it should always be the preferred option as | * If [[Randomization in Stata | randomization using Stata]] is feasible, then it should always be the preferred option as it is more easily [[Reproducible Research | reproducible]]. | ||
* | * '''Randomization''' can also be done [[Randomization in SurveyCTO | using SurveyCTO]]. However, [[Stata Coding Practices |Stata]] is the preferred option over '''SurveyCTO'''. | ||
* [[Sampling | Sampling]] is the process of randomly selecting [[Units of Observation|units]] from a population of interest to represent the characteristics of that population and is crucial to conducting | * [[Sampling | Sampling]] is the process of '''randomly''' selecting [[Units of Observation|units]] from a population of interest to represent the characteristics of that population and is crucial to conducting '''randomized experiments''' when deciding the effectiveness of an intervention. | ||
== Overview == | == Overview == | ||
[[Randomization in Stata | Stata]] is preferred over Excel because of the following reasons: | [[Randomization in Stata | Stata]] is preferred over Excel because of the following reasons: | ||
* '''Easy documentation | * '''Easy documentation''': better documented through files. Every step can be documented which makes it easier to [[Reproducible Research#Replication and Reproducibility |reproduce]] the results. | ||
*[[Stata Coding Practices | Stata]] gives us the option of setting which version of Stata we use for randomization. This is useful when different researchers use different versions of | *[[Stata Coding Practices | Stata]] gives us the option of setting which version of '''Stata''' we use for [[Randomization|randomization]]. This is useful when different researchers use different versions of '''Stata'''. | ||
*'''Better documentation | *'''Better documentation''': documentation of '''randomization''' results in '''Stata''' remains consistent across various runs. | ||
'''Stata''' might not be available in some cases. For those cases, the advantages of randomizing using Excel are as follows: | |||
*[[Balance tests | Balance / stratification]] can be achieved during randomization using Excel. | |||
*'''Simplicity''': simple to implement and produces a record. | |||
*[[Balance tests | Balance/stratification]] can be achieved during randomization using Excel. | *'''Popularity''': since Excel is widely used, it is commonly understood and used by project staff. | ||
*'''Simplicity | |||
*'''Popularity | |||
These are the disadvantages: Some of the disadvantages of using Excel to randomize are as follows: | |||
Some of the disadvantages of using Excel to randomize are as follows: | *'''Transparency''': Excel is more mysterious to beneficiaries than public. '''randomization''' (For example - drawing names from a hat, etc.). | ||
*'''Transparency | *'''Replicability''': randomization in Excel is less '''replicable''' than '''randomization in Stata''' . | ||
*'''Replicability | *'''Errors''': since the '''randomization''' involves copying and pasting, it can be subject to human errors. | ||
*'''Errors | *'''Flexibility''': it is less flexible to changes in the '''randomization''' plan. | ||
*'''Flexibility | |||
==Steps for Randomization in Excel == | ==Steps for Randomization in Excel == | ||
Here are the steps of doing successful | Here are the steps of doing successful randomization using Excel: | ||
# '''Randomization Rule | # '''Randomization Rule''': for example, the lowest 50% will be '''treatment''', the rest will be assigned to '''control''', etc. | ||
# '''=rand() | # '''=rand()''': assign random numbers to each observation. While doing this, use "paste values" to stop recalculating the '''randomization'''. </br> <code> =rand() </code> | ||
# '''Sorting | # '''Sorting''': sort the random numbers from lowest to highest. | ||
# '''Order | # '''Order''': Create an ordered serial number. If you need to balance the data, then first sort by the strata, then by the random values. | ||
# '''Assignment | # '''Assignment''': assign groups using either the <code> '''mod''' </code> or the <code> '''if''' </code> formulas. <!-- Add stratification and balanced part here--> | ||
# '''Finish.''' Save the record. | # '''Finish.''' Save the record. | ||
== Related Pages == | == Related Pages == | ||
[[Special:WhatLinksHere/Randomization_in_Excel|Click here for pages that link to this topic]]. | |||
[[Category: Impact Evaluation Design ]] | [[Category: Impact Evaluation Design ]] |
Latest revision as of 19:21, 8 August 2023
Randomization involves assigning subjects randomly to one of two groups: the treatment group, which is receiving the policy intervention being evaluated, and the control group which is untreated. Randomizing in Excel has its advantages and disadvantages. This article gives a step-by-step guide on randomizing using Excel.
Read First
- Randomized Evaluations are field experiments that use randomization to determine the effectiveness of an intervention.
- If randomization using Stata is feasible, then it should always be the preferred option as it is more easily reproducible.
- Randomization can also be done using SurveyCTO. However, Stata is the preferred option over SurveyCTO.
- Sampling is the process of randomly selecting units from a population of interest to represent the characteristics of that population and is crucial to conducting randomized experiments when deciding the effectiveness of an intervention.
Overview
Stata is preferred over Excel because of the following reasons:
- Easy documentation: better documented through files. Every step can be documented which makes it easier to reproduce the results.
- Stata gives us the option of setting which version of Stata we use for randomization. This is useful when different researchers use different versions of Stata.
- Better documentation: documentation of randomization results in Stata remains consistent across various runs.
Stata might not be available in some cases. For those cases, the advantages of randomizing using Excel are as follows:
- Balance / stratification can be achieved during randomization using Excel.
- Simplicity: simple to implement and produces a record.
- Popularity: since Excel is widely used, it is commonly understood and used by project staff.
These are the disadvantages: Some of the disadvantages of using Excel to randomize are as follows:
- Transparency: Excel is more mysterious to beneficiaries than public. randomization (For example - drawing names from a hat, etc.).
- Replicability: randomization in Excel is less replicable than randomization in Stata .
- Errors: since the randomization involves copying and pasting, it can be subject to human errors.
- Flexibility: it is less flexible to changes in the randomization plan.
Steps for Randomization in Excel
Here are the steps of doing successful randomization using Excel:
- Randomization Rule: for example, the lowest 50% will be treatment, the rest will be assigned to control, etc.
- =rand(): assign random numbers to each observation. While doing this, use "paste values" to stop recalculating the randomization.
=rand()
- Sorting: sort the random numbers from lowest to highest.
- Order: Create an ordered serial number. If you need to balance the data, then first sort by the strata, then by the random values.
- Assignment: assign groups using either the
mod
or theif
formulas. - Finish. Save the record.