Data analysis is the process of exploring and describing trends and results from data. Data analysis typically occurs in two stages: exploratory analysis and final analysis. This page provides guidance on how to organize analysis files and output results in an orderly and reproducible manner.
Data Analysis
After analyzing data and before disseminating results, research teams must export analyses.
Due to the long life span of a typical impact evaluation, multiple generations of team members often contribute to the same data work. Clear methods for organization of the data folder, the structure of the data sets in the folder, and identification of the observations in the data sets is critical.
Read First
PCA is a way to create an index from a group of variables that are similar in the information that they provide. This allows us to maximize the information we keep, without using variables that will cause multicollinearity, and without having to choose one variable among many.
THIS IS A STUB PAGE - CONTRIBUTIONS REQUESTED
The synthetic control method is a statistical method to evaluate treatment effects in comparative case studies. It creates a synthetic version of treated units by weighting variables and observations in the control group.
The difference-in-differences method is a quasi-experimental approach that compares the changes in outcomes over time between a population enrolled in a program (the treatment group) and a population that is not (the comparison group). It is a useful tool for data analysis.
Matching is a quasi-experimental method in which the researcher uses statistical techniques to construct an artificial control group by matching each treated unit with a non-treated unit of similar characteristics.
Data visualization is a method of expressing descriptive statistics and analytical results through visual representations of the data (i.e. charts, graphs, etc.). It can also be a useful tool for the research team during exploratory analysis to better understand the data.
An ID variable is a variable that identifies each entity in a dataset (person, household, etc) with a distinct value. This article lists five properties of ID variables that researchers should keep in mind when creating, collecting, and merging data.
Propensity score matching (PSM) is a quasi-experimental method in which the researcher uses statistical techniques to construct an artificial control group by matching each treated unit with a non-treated unit of similar characteristics. Using these matches, the researcher can estimate the impact of an intervention.
Pagination
- Page 1
- Next page
