Difference between revisions of "Unit of Observation"

Jump to: navigation, search
Line 10: Line 10:
===Methods to confirm the Unit of Observation in a data set===
===Methods to confirm the Unit of Observation in a data set===


The first time you are using a data set you have not created yourself, you should always start by making sure that you have no doubt what the ''unit of observation'' is. You often get this information from the name of the data file, but you should always test that before believing it.  
The first time you are using a data set you have not created yourself, you should always start by making sure that you have no doubt what the ''unit of observation'' is. You often get this information from the name of the data file, but you should always test that before believing it. The most obvious method to make sure you know what is the unit of observation is to ask the person that sent you the data set, but the rest of this section assumes that you for any reason cannot confirm the ''unit of observation'' that easily.


Look for duplicates. For example, after down
If you open up a data set for which you have a good reason to believe the ''unit of observations'' is, for example, household, then look for a household ID variable and test if it is uniquely and fully identifying the data set. If this is the case then you are done. However, if you do not find such variable you will have to find other information that uniquely and fully identifies the data set. For example, in this case, you would look for variables with information of household head name. Test if this variable uniquely identifies the all observations. Names are often not unique across a country, so you might have to ad region name and village name to the test.


==Usages other than in data sets==
==Usages other than in data sets==

Revision as of 15:46, 3 February 2017

While the specific term Unit of Observation is not always well know, it is a concept that all people who work with data has come across. Having an exact understanding of this concept and getting the habit of thinking about your data sets in terms if unit of observation and organize your data sets and your project folder accordingly is key to a efficient data work. Mistakes done in regards to this concepts are more common that what one would expect, and those mistakes will bias your analysis.


Read First

  • Never trust the the file name by itself as an indicator of what the unit of observations is. Always perform some tests to convince yourself of the unit of observation.

Definition

The most common context where the concept of unit of observation is used is to describe a data set. A non-technical way to explain unit of observation in this context is what each row in the data set represents. Just as a distance data does not make sense unless we know whether it is measured in miles or kilometer, we need to know the unit of our data set. We often have a good idea what the unit of observation is at the first glance of the data, but do not trust this, always test that your assumption is correct.

Methods to confirm the Unit of Observation in a data set

The first time you are using a data set you have not created yourself, you should always start by making sure that you have no doubt what the unit of observation is. You often get this information from the name of the data file, but you should always test that before believing it. The most obvious method to make sure you know what is the unit of observation is to ask the person that sent you the data set, but the rest of this section assumes that you for any reason cannot confirm the unit of observation that easily.

If you open up a data set for which you have a good reason to believe the unit of observations is, for example, household, then look for a household ID variable and test if it is uniquely and fully identifying the data set. If this is the case then you are done. However, if you do not find such variable you will have to find other information that uniquely and fully identifies the data set. For example, in this case, you would look for variables with information of household head name. Test if this variable uniquely identifies the all observations. Names are often not unique across a country, so you might have to ad region name and village name to the test.

Usages other than in data sets

The examples below all have many similarities to how unit of observation is used in the context of a data set. They are included to give further explanation to the concept or highlight small differences in usage.

Regressions

The unit of observation in a regression is what the N (or number of observations) represents. That is very much related to how the concept is used in the data set, as the N is the number of rows from the data set included in the regression. To be able to interpret the regression correctly therefore depends on understanding the unit of observation. In most cases this is trivial, but we have had issues where regressions have been misinterpreted as a monitoring data that was believed to have the unit of observation "households", while it actually was "packages distributed to households". Since the vast majority of households only received one package each, it was easy to make this mistake than what it first might seem.

Surveys

The concept of unit of observation can also be used to describe for example surveys. The unit of observation in a survey is the type of respondent. For example, household, company, school etc. In the cases of company and school the respondent is a person, for example the CEO or the principal, but they provide answers about the company or the school. If they would be asked questions about themselves, then


Back to Parent

This article is part of the topic Data Management


Additional Resources

  • list here other articles related to this topic, with a brief description and link