Aggregation

Jump to: navigation, search

Aggregation is not a conceptually difficult topic, but unless the best practices described in this article is followed, then concepts like missing values, can easily cause the aggregates to be incorrect.

Read First

  • Make sure to use Stata commands exactly for this
  • Make sure to check if missing values was created in your aggregates. It could be OK, but be sure that you can explain all of them

Categories

In order to help the respondents to recall information we often split up questions on categories. A common example is income where we often split up the income question in to different types of categories of income even if we are only interested in total income.

To properly aggregate categories in survey data, make sure to clean the categories for survey codes and then use commands that properly handles missing data. See egen rowtotal() below if you are using Stata.

Repeat groups

It is common that we ask the same question over a number repeated instances, and most of those times we want to aggregate the amounts. This issues are similar to those described already, but missing values are even more common.

Stata commands

Do not use regular addition with plus signs like

var1 + var2

as this is likely to lead to a lot of values being incorrectly reported as missing values. Instead, in Stata one should use the egen function rowtotal().

For example:

egen total_income = rowtotal(income1, income2, income3)

Risks

There are factors that risk biasing aggregates. They are easily avoided if one knows what to do.

Missing values

Missing values are common in survey data and cause errors when aggregating. In most programming languages the expression var1 + var2 is only valid if both variables have valid values. That is why we need to use specialized command like egen rowtotal() when aggregating variables.

Standardization

Make sure that your variables are standardized to the same unit (if your variables have units). See standardization for more details.

Double corrections

If adjustments, for example winsorazion for outliers, have been applied to the disaggregated variables, then it usually not a good idea to apply that same adjustment to the aggregate of those variables.

Back to Parent

This article is part of the topic Data Analysis

Additional Resources

list here other articles related to this topic, with a brief description and link