Difference between revisions of "De-identification"

Jump to: navigation, search
Line 6: Line 6:


==Personally Identifiable Information ==
==Personally Identifiable Information ==
In the context of a survey, Personally identifiable information (PII) are the variables that can, either on their own or in combination with other variables, lead to identifying a single surveyed individual. Here's a list of variables that may lead to personal identification:
* Names of survey respondent, household members, enumerators and other individuals
* Names of schools, clinics, villages and possibly other administrative units (depending on the survey)
* Dates of birth
* GPS coordinates
* Contact information
* Record identifier (social security number, process number, medical record number, national clinic code, license plate, IP address)
* Pictures (of individuals, houses, etc)
A few examples of sensitive variables that depending on survey context may contain personally identifying information:
* Age
* Gender
* Ethnicity
* Grades, salary,  job position
As these variables exemplify, what exactly is PII will depend on the context of each survey. For example, if a survey covers a small farming community, variables such as plot size and crops cultivated can be combined to identify an individual household. Administrative units can be considered PII if there are few individuals in each of them. The guidelines to deal with PII will be discussed below, but three common solutions are (1) drop PII variables, (2) use anonymous codes instead of names, and (3) introduce white noise.
==Guidelines==
==Guidelines==
===Folder Encryption===
===Folder Encryption===

Revision as of 20:49, 16 November 2017

Read First

  • Some survey variables allow identification of individual respondents. This is called Personally Identifiable Information (PII)
  • It is the responsibility of researchers to make sure this data is safely stored and that
  • PII must be saved in encrypted folders and removed from data sets as soon as possible in the project
  • No PII can ever be publicly released without explicit consent

Personally Identifiable Information

In the context of a survey, Personally identifiable information (PII) are the variables that can, either on their own or in combination with other variables, lead to identifying a single surveyed individual. Here's a list of variables that may lead to personal identification:

  • Names of survey respondent, household members, enumerators and other individuals
  • Names of schools, clinics, villages and possibly other administrative units (depending on the survey)
  • Dates of birth
  • GPS coordinates
  • Contact information
  • Record identifier (social security number, process number, medical record number, national clinic code, license plate, IP address)
  • Pictures (of individuals, houses, etc)


A few examples of sensitive variables that depending on survey context may contain personally identifying information:

  • Age
  • Gender
  • Ethnicity
  • Grades, salary, job position


As these variables exemplify, what exactly is PII will depend on the context of each survey. For example, if a survey covers a small farming community, variables such as plot size and crops cultivated can be combined to identify an individual household. Administrative units can be considered PII if there are few individuals in each of them. The guidelines to deal with PII will be discussed below, but three common solutions are (1) drop PII variables, (2) use anonymous codes instead of names, and (3) introduce white noise.

Guidelines

Folder Encryption

De-identification

Anonymous IDs

Back to Parent

This article is part of the topic Data Analysis


Additional Resources

  • list here other articles related to this topic, with a brief description and link