Difference between revisions of "Variable Construction"

Jump to: navigation, search
(Created page with "=Read First= ==Workflow== ==Common tasks== ===Dealing with outliers=== ==Documentation==")
 
Line 4: Line 4:
==Common tasks==
==Common tasks==
===Dealing with outliers===
===Dealing with outliers===
While there are many rules of thumb for how to define an outlier, there is no silver bullet. Some consider an outlier to be any data point that is three standard deviations away from the mean of the same data point for all observations. This may be a starting point, but one needs to qualitatively consider if this is a correct approach. Approaches to outliers include, but are not limited to:
#Replacing the outlier values with a missing value.
#Winsorization, or replacing any values bigger than a certain percentile, often the 99th, with the value at that percentile. This prevents very large values from biasing the mean. It also maintains an equality of impact aspect. For example, if all project benefits go to a single observation in the treatment group, then the mean would still be high, but that is rarely a desired outcome in development. Winsorization thus penalizes inequitable distribution of the benefits of a project.
==Documentation==
==Documentation==

Revision as of 15:05, 28 January 2021

Read First

Workflow

Common tasks

Dealing with outliers

While there are many rules of thumb for how to define an outlier, there is no silver bullet. Some consider an outlier to be any data point that is three standard deviations away from the mean of the same data point for all observations. This may be a starting point, but one needs to qualitatively consider if this is a correct approach. Approaches to outliers include, but are not limited to:

  1. Replacing the outlier values with a missing value.
  2. Winsorization, or replacing any values bigger than a certain percentile, often the 99th, with the value at that percentile. This prevents very large values from biasing the mean. It also maintains an equality of impact aspect. For example, if all project benefits go to a single observation in the treatment group, then the mean would still be high, but that is rarely a desired outcome in development. Winsorization thus penalizes inequitable distribution of the benefits of a project.

Documentation