Difference between revisions of "Data Analysis"

Jump to: navigation, search
Line 1: Line 1:




add introductory 1-2 sentences here




== Read First ==


== Read First
* include here key points you want to make sure all readers understand


== Preparing the Data Set for Analysis ==


*Standardization
== Preparing the Data Set for Analysis
*Aggregation


== Outputting the Result of the Analysis ==
In the cleaning section we do replace values that otherwise would bias the variable. The first part of Data Analysis is to edit the variables so that they fit into the statistical analysis models that we are using.


== Different Specific Types of Analysis ==
1. [[Standardization]] - Convert all values in each variable into the same unit. If the values in one variable are different then errors like 1000 gram will be interpreted as one thousand times larger than 1 kg.
1. [[Aggregation]] - We often collect variable disaggregated over categories (income collected as different income categories) or disaggregated over instances (harvest value over multiple crops). Disaggregated data collection is used to improve quality of data collected, but in the analysis we are often interested in the aggregated value.


===Principal Component Analysis===
== Outputting the Result of the Analysis
[[Principal Component Analysis (PCA)]] is an analytical tool looks to explain the maximum amount of variance with the fewest number of principal components.  
Just as the rest of your code the output of results must also be replicable. There are different degrees of replicability. The basic that is obviously a must is that all parts of the results used in the table is replicable.


=== Cost Effectiveness Analysis ===
Even better is that all part of the same table is outputted in a single file. Sometimes tables are consist of results from multiple estimations and it is preferably that they are outputted to a single file. See Stata command [[estout]].
 
Optimally all tables are outputted in a way that no manual formatting is required. A very common tool for that is LaTeX. DIME has prepared material for getting started with LaTeX that assumes no knowledge in LaTeX and aims to explain the work flow from software as Stata and R to final reports using LaTeX. [[https://github.com/worldbank/DIME-LaTeX-Templates]]
 
== Different Specific Types of Analysis
 
=== Principal Component Analysis
[[Principal Component Analysis (PCA)]] is an analytical tool looks to explain the maximum amount of variance with the fewest number of principal components.
 
=== Cost Effectiveness Analysis


One type is [[Cost-effectiveness Analysis]]
One type is [[Cost-effectiveness Analysis]]


== Back to Parent
This article is part of the topic [[Data Analysis]]


== Additional Resources ==
== Additional Resources
 
* list here other articles related to this topic, with a brief description and link


[[Category: Data Analysis ]]
[[Category: Data Analysis ]]

Revision as of 14:05, 26 October 2017


add introductory 1-2 sentences here


== Read First

  • include here key points you want to make sure all readers understand


== Preparing the Data Set for Analysis

In the cleaning section we do replace values that otherwise would bias the variable. The first part of Data Analysis is to edit the variables so that they fit into the statistical analysis models that we are using.

1. Standardization - Convert all values in each variable into the same unit. If the values in one variable are different then errors like 1000 gram will be interpreted as one thousand times larger than 1 kg. 1. Aggregation - We often collect variable disaggregated over categories (income collected as different income categories) or disaggregated over instances (harvest value over multiple crops). Disaggregated data collection is used to improve quality of data collected, but in the analysis we are often interested in the aggregated value.

== Outputting the Result of the Analysis Just as the rest of your code the output of results must also be replicable. There are different degrees of replicability. The basic that is obviously a must is that all parts of the results used in the table is replicable.

Even better is that all part of the same table is outputted in a single file. Sometimes tables are consist of results from multiple estimations and it is preferably that they are outputted to a single file. See Stata command estout.

Optimally all tables are outputted in a way that no manual formatting is required. A very common tool for that is LaTeX. DIME has prepared material for getting started with LaTeX that assumes no knowledge in LaTeX and aims to explain the work flow from software as Stata and R to final reports using LaTeX. [[1]]

== Different Specific Types of Analysis

=== Principal Component Analysis Principal Component Analysis (PCA) is an analytical tool looks to explain the maximum amount of variance with the fewest number of principal components.

=== Cost Effectiveness Analysis

One type is Cost-effectiveness Analysis

== Back to Parent This article is part of the topic Data Analysis

== Additional Resources

  • list here other articles related to this topic, with a brief description and link