Difference between revisions of "Aggregation"

Jump to: navigation, search
Line 17: Line 17:

For example:
For example:
egen total_income = rowtotal(income1, income2, income3)
egen total_income = rowtotal(income1, income2, income3)

== Risks ==
== Risks ==

Revision as of 18:58, 26 October 2017

Aggregation is not a conceptually difficult topic, but unless the best practices described in this article is followed, then concepts like missing values, can easily cause the aggregates to be incorrect.

Read First

  • Make sure to use Stata commands exactly for this
  • Make sure to check if missing values was created in your aggregates. It could be OK, but be sure that you can explain all of them


In order to help the respondents to recall information we often split up questions on categories. A common example is income where we often split up the income question in to different types of categories of income even if we are only interested in total income.

To properly aggregate categories in survey data, make sure to clean the categories for survey codes and then use commands that properly handles missing data. See egen rowtotal() below if you are using Stata.

Repeat groups

It is common that we ask the same question over a number repeated instances, and most of those times we want to aggregate the amounts. This issues are similar to those described already, but missing values are even more common.

Stata commands

Do not use regular addition with plus signs like var1 + var2 as this is likely to lead to a lot of values being incorrectly reported as missing values. Instead, in Stata one should use the egen function rowtotal().

For example:

egen total_income = rowtotal(income1, income2, income3)


There are factors that risk biasing aggregates. They are easily avoided if one knows what to do.

Missing values

Missing values are common in survey data and cause errors when aggregating. In most programming languages the expression var1 + var2 is only valid if both variables have valid values. That is why we need to use specialized command like egen rowtotal() when aggregating variables.


Make sure that your variables are standardized to the same unit (if your variables have units). See standardization for more details.

Double corrections

If adjustments, for example winsorazion for outliers, have been applied to the disaggregated variables, then it usually not a good idea to apply that same adjustment to the aggregate of those variables.

Back to Parent

This article is part of the topic Data Analysis

Additional Resources

list here other articles related to this topic, with a brief description and link