Difference between revisions of "Aggregation"
|Line 14:||Line 14:|
== Stata commands ==
== Stata commands ==
Do not use regular addition with plus signs like <
Do not use regular addition with plus signs like <>var1 + var2</> as this is likely to lead to a lot of values being incorrectly reported as missing values. Instead, in Stata one should use the <code>egen</code> function <code>rowtotal()</code>.
Revision as of 18:58, 26 October 2017
Aggregation is not a conceptually difficult topic, but unless the best practices described in this article is followed, then concepts like missing values, can easily cause the aggregates to be incorrect.
- Make sure to use Stata commands exactly for this
- Make sure to check if missing values was created in your aggregates. It could be OK, but be sure that you can explain all of them
In order to help the respondents to recall information we often split up questions on categories. A common example is income where we often split up the income question in to different types of categories of income even if we are only interested in total income.
To properly aggregate categories in survey data, make sure to clean the categories for survey codes and then use commands that properly handles missing data. See
egen rowtotal() below if you are using Stata.
It is common that we ask the same question over a number repeated instances, and most of those times we want to aggregate the amounts. This issues are similar to those described already, but missing values are even more common.
Do not use regular addition with plus signs like
var1 + var2
as this is likely to lead to a lot of values being incorrectly reported as missing values. Instead, in Stata one should use the
egen total_income = rowtotal(income1, income2, income3)
There are factors that risk biasing aggregates. They are easily avoided if one knows what to do.
Missing values are common in survey data and cause errors when aggregating. In most programming languages the expression
var1 + var2 is only valid if both variables have valid values. That is why we need to use specialized command like
egen rowtotal() when aggregating variables.
Make sure that your variables are standardized to the same unit (if your variables have units). See standardization for more details.
If adjustments, for example winsorazion for outliers, have been applied to the disaggregated variables, then it usually not a good idea to apply that same adjustment to the aggregate of those variables.
Back to Parent
This article is part of the topic Data Analysis
list here other articles related to this topic, with a brief description and link