# Difference between revisions of "Randomization in Excel"

(10 intermediate revisions by 2 users not shown) | |||

Line 1: | Line 1: | ||

[[Randomization | Randomization]] involves assigning subjects randomly to one of two groups: one, the '''treatment group''', which is receiving the policy intervention being evaluated, and two, the '''control group''', which remains in status-quo/untreated. '''Randomizing in Excel''' has its advantages and disadvantages. This article gives a step-by-step guide on randomizing using Excel. | |||

== Read First == | |||

*[[Randomized Evaluations: Principles of Study Design | Randomized Evaluations]] are field experiments that use [[Randomization | randomization]] to determine the effectiveness of an intervention. | |||

* If [[Randomization in Stata | randomization using Stata]] is feasible, then it should always be the preferred option as[[Randomization in Stata | randomization in Stata]] is more easily [[Reproducible Research | reproducible]]. | |||

* [[Randomization | Randomization]] can also be done using [[SurveyCTO Programming | SurveyCTO]] however, [[Stata Coding Practices |Stata]] is the preferred option over '''SurveyCTO'''. | |||

* [[Sampling | Sampling]] is the process of randomly selecting [[Units of Observation|units]] from a population of interest to represent the characteristics of that population and is crucial to conducting [[Randomized Control Trials| randomized experiments]] when deciding the effectiveness of an intervention. | |||

== Overview == | |||

[[Randomization in Stata | Stata]] is preferred over Excel because of the following reasons: | |||

* '''Easy documentation.''' [[Randomization | Randomization]] done in [[Randomization in Stata | Stata]] can be better documented through files. Every step can be documented which makes it easier to [[Reproducible Research#Replication and Reproducibility |reproduce]] the results. | |||

*[[Stata Coding Practices | Stata]] gives us the option of setting which version of Stata we use for randomization. This is useful when different researchers use different versions of [[Stata Coding Practices | Stata]]. | |||

*'''Better documentation and version control.''' Documentation of [[Randomization | randomization]] results in Stata remains consistent across various runs. | |||

[[Stata Coding Practices | Stata]] might not be available in some cases. For those cases, the advantages and disadvantages of '''randomizing using Excel''' are as follows: | |||

'''Advantages:''' | |||

Here are some of the advantages of randomizing using Excel : | Here are some of the advantages of randomizing using Excel : | ||

*Balance/stratification can be achieved during | *[[Balance tests | Balance/stratification]] can be achieved during randomization using Excel. | ||

*Randomization using Excel is simple to implement and produces a record. | *'''Simplicity.''' Randomization using Excel is simple to implement and produces a record. | ||

*Since Excel is widely used, it is commonly understood and used by project staff. | *'''Popularity.''' Since Excel is widely used, it is commonly understood and used by project staff. | ||

'''Disadvantages:''' | |||

Some of the disadvantages of using Excel to randomize are as follows: | Some of the disadvantages of using Excel to randomize are as follows: | ||

*Excel is more mysterious to beneficiaries than public randomization(For example - drawing names from a hat, etc) | *'''Transparency.''' Excel is more mysterious to beneficiaries than public randomization (For example - drawing names from a hat, etc) | ||

*Randomization in Excel is less replicable than [[Randomization in Stata]]. | *'''Replicability.''' Randomization in Excel is less [[Reproducible Research#Replication and Reproducibility | replicable]] than [[Randomization in Stata|randomization in Stata]]. | ||

*Since the randomization involves copying and pasting, it can be subject to human errors. | *'''Errors.''' Since the randomization involves copying and pasting, it can be subject to human errors. | ||

*It is also less flexible to changes in randomization plan. | *'''Flexibility.''' It is also less flexible to changes in the randomization plan. | ||

==Steps | ==Steps for Randomization in Excel == | ||

Here are the steps of doing | Here are the steps of doing successful '''randomization using Excel''': | ||

# '''Randomization Rule.''' For example = the lowest 50% will be treatment, the rest will be assigned to control, etc. | |||

# '''=rand().''' Assign random numbers to each observation. While doing this, use "paste values" to stop recalculating the randomization. </br> <code> =rand() </code> | |||

# '''Sorting.''' Sort the random numbers from the lowest to the highest. | |||

# '''Order.''' Created an ordered serial number. If you need to balance the data, then first sort by the strata, then by the random values. | |||

# '''Assignment.''' Assign groups using either the <code> '''mod''' </code> or the <code> '''if''' </code> formulas. <!-- Add stratification and balanced part here--> | |||

<!-- Add stratification and balanced part here--> | # '''Finish.''' Save the record. | ||

== | == Related Pages == | ||

[[Special:WhatLinksHere/Randomization_in_Excel|Click here for pages that link to this topic]]. | |||

[[Category: Impact Evaluation Design ]] | [[Category: Impact Evaluation Design ]] |

## Latest revision as of 00:03, 19 July 2022

Randomization involves assigning subjects randomly to one of two groups: one, the **treatment group**, which is receiving the policy intervention being evaluated, and two, the **control group**, which remains in status-quo/untreated. **Randomizing in Excel** has its advantages and disadvantages. This article gives a step-by-step guide on randomizing using Excel.

## Read First

- Randomized Evaluations are field experiments that use randomization to determine the effectiveness of an intervention.
- If randomization using Stata is feasible, then it should always be the preferred option as randomization in Stata is more easily reproducible.
- Randomization can also be done using SurveyCTO however, Stata is the preferred option over
**SurveyCTO**. - Sampling is the process of randomly selecting units from a population of interest to represent the characteristics of that population and is crucial to conducting randomized experiments when deciding the effectiveness of an intervention.

## Overview

Stata is preferred over Excel because of the following reasons:

**Easy documentation.**Randomization done in Stata can be better documented through files. Every step can be documented which makes it easier to reproduce the results.- Stata gives us the option of setting which version of Stata we use for randomization. This is useful when different researchers use different versions of Stata.
**Better documentation and version control.**Documentation of randomization results in Stata remains consistent across various runs.

Stata might not be available in some cases. For those cases, the advantages and disadvantages of **randomizing using Excel** are as follows:

**Advantages:**
Here are some of the advantages of randomizing using Excel :

- Balance/stratification can be achieved during randomization using Excel.
**Simplicity.**Randomization using Excel is simple to implement and produces a record.**Popularity.**Since Excel is widely used, it is commonly understood and used by project staff.

**Disadvantages:**
Some of the disadvantages of using Excel to randomize are as follows:

**Transparency.**Excel is more mysterious to beneficiaries than public randomization (For example - drawing names from a hat, etc)**Replicability.**Randomization in Excel is less replicable than randomization in Stata.**Errors.**Since the randomization involves copying and pasting, it can be subject to human errors.**Flexibility.**It is also less flexible to changes in the randomization plan.

## Steps for Randomization in Excel

Here are the steps of doing successful **randomization using Excel**:

**Randomization Rule.**For example = the lowest 50% will be treatment, the rest will be assigned to control, etc.**=rand().**Assign random numbers to each observation. While doing this, use "paste values" to stop recalculating the randomization.

`=rand()`

**Sorting.**Sort the random numbers from the lowest to the highest.**Order.**Created an ordered serial number. If you need to balance the data, then first sort by the strata, then by the random values.**Assignment.**Assign groups using either the**mod****if****Finish.**Save the record.