The unit of observation is the unit at or for which data is collected. Common examples include individual, household, community, or school. Clearly identifying the unit of observation is important for a logical survey design, organized data collection, a sound data folder set-up, and an unbiased analysis. This page discusses unit of observation in the context of surveys and datasets and explains how to confirm the unit of observation for a given dataset.

• When working with a dataset that you have not created yourself, identifying the unit of observation is the first step to understanding the data.
• Mistakes related to unit of observation introduce bias into analyses. Always double check the unit of observation before working with data.

## Unit of Observation in Surveys

In the context of a survey, the unit of observation describes the unit at or for which survey data is collected. Many times, the unit of observation in a survey is the type of respondent. However, sometimes a respondent provides answers about a larger entity, which is the unit of observation. For example, if school principals are the survey respondents but they provide answers about their schools, the unit of observation is school. If mothers are the survey respondents but they provide answers about their households, the unit of observation is household. However, if school principals are the survey respondents and they provide answers about themselves, then the unit of observation is principal. Similarly, if mothers are the survey respondents and they provide answers about themselves, the unit of observation is mother. Identifying the unit of observation early in the study design is critical for designing a high-quality survey and effectively planning primary data collection.

## Unit of Observation in Datasets

When working with a dataset that you have not created yourself, always start by identifying the unit of observation. In many cases, there is seemingly little risk for confusion in terms of unit of observation. We often have a good intuition for the unit of observation at the first glance of a dataset or a file name. However, always test that your assumption is correct: errors due to an unclear understanding of unit of observation are more common than one might imagine. Consider, for example, monitoring data whose unit of observation is “packages distributed to households.” However, since most households in the dataset only received one package, one could easily confuse the unit of observation to be “household.” Clarifying and confirming the unit of confirmation before beginning to work with a dataset avoids biased analysis and makes the way for a correct interpretation of regression and analysis results.

Note that a dataset is always incorrectly constructed if it has more than one unit of observation. Even if the two units of observation have the same variables, it is incorrect, bad practice, and a huge source of error if they are included in the same dataset. All such datasets should be separated into two datasets.

### Confirming Unit of Observation

The most obvious way to confirm the unit of observation in a new dataset is by asking the person from whom you received the dataset. If you can’t do this for whatever reason, begin by inferring the unit of observation. Imagine you believe the unit of observation is household. Then, open up the dataset, look for a household ID variable and test if it is uniquely and fully identifying. If it is, then you are done. If not, search for other information that uniquely and fully identifies the dataset. In this case, for example, look for variables with information of household head name. Test if this variable uniquely identifies all observations. Names are often not unique across a country, so you might have to add region name and village name to the test. Once you have found the information that uniquely and fully identifies the dataset, make sure you create an appropriate ID variable accordingly if it does not yet exist.