Unit of Observation
While the specific term Unit of Observation is not always well know, it is a concept that all people who work with data has come across. Having an exact understanding of this concept and getting the habit of thinking about your data sets in terms if unit of observation and organize your data sets and your project folder accordingly is key to a efficient data work. Mistakes done in regards to this concepts are more common that what one would expect, and those mistakes will bias your analysis.
Read First
- Never trust the the file name by itself as an indicator of what the unit of observations is. Always perform some tests to convince yourself of the unit of observation.
Definition
The most common context where the concept of unit of observation is used is to describe a data set. A non-technical way to explain unit of observation in this context is what each row in the data set represents. Just as a distance data does not make sense unless we know whether it is measured in miles or kilometer, we need to know the unit of our data set. We often have a good idea what the unit of observation is at the first glance of the data, but do not trust this, always test that your assumption is correct.
Methods to confirm the Unit of Observation in a data set
The first time you are using a data set you have not created yourself, you should always start by making sure that you have no doubt what the unit of observation is. You often get this information from the name of the data file, but you should always test that before believing it.
Look for duplicates. For example, after down
Usages other than in data sets
The examples below all have many similarities to how unit of observation is used in the context of a data set. They are included to give further explanation to the concept or highlight small differences in usage.
Regressions
The unit of observation in a regression is what the N (or number of observations) represents. That is very much related to how the concept is used in the data set, as the N is the number of rows from the data set included in the regression. To be able to interpret the regression correctly therefore depends on understanding the unit of observation. In most cases this is trivial, but we have had issues where regressions have been misinterpreted as a monitoring data that was believed to have the unit of observation "households", while it actually was "packages distributed to households". Since the vast majority of households only received one package each, it was easy to make this mistake than what it first might seem.
Surveys
The concept of unit of observation can also be used to describe for example surveys. The unit of observation in a survey is the type of respondent. For example, household, company, school etc. In the cases of company and school the respondent is a person, for example the CEO or the principal, but they provide answers about the company or the school. If they would be asked questions about themselves, then
Back to Parent
This article is part of the topic Data Management
Additional Resources
- list here other articles related to this topic, with a brief description and link