Difference between revisions of "ID Variable Properties"

Jump to: navigation, search
(Created page with "{{subst:dime_wiki}}")
 
Line 1: Line 1:
<span style="font-size:150%">
An ID variable that identifies an observation should have the properties listed below. Note that this relates to the ID variable that identifies observations across data sets in out project folder. Some commands in Stata, for example <code>reclink</code> requires a <code>masterid()</code> and an <code>userid()</code> and these ID variables created temporarily for that command does not have to have all of these properties.
<span style="color:#ff0000"> '''NOTE: this article is only a template. Please add content!''' </span>
</span>


== Read First ==
*
==Uniquely Identifying==
This is the most commonly referred to property of an ID variable.


add introductory 1-2 sentences here
==Fully Identifying==


==Constant Across a Project==


==Constant Throughout the Duration of a Project==


== Read First ==
==Anonymous IDs==
* include here key points you want to make sure all readers understand
 
Sometimes we have access to IDs that satisfy all the properties above, but we should be very careful before using them. Examples of such cases could be individual national IDs, public company IDs, a hospital's patient ID etc. Since records over those IDs are available to people outside our team, there is no way for us to guarantee that we can protect the privacy of the data we collect. In all of these cases we need to create our own ID that has no association with the ID variable created by someone else and is unique to our project and thereby be an anonymous ID that only identifies the observation to us. In the master data set we can include the other ID to enable us to merge data quickly, but then the information in the master data set becomes even more sensitive then usual.


There is an exception to this rule that can simplify the data work but should only be used with care. If a project has a high-level unit of observation for which the project team is absolutely certain that it will not collect sensitive data, and there is an official code for it, then we could perhaps use this code. It could for example be done for districts or region so that we can easier include publicly available data from those district or region. However, if there is any probability that we would include any data not publicly available, for example district budgets etc., then we need to make our own code. Also, if we have a unit of observation for which we have a single instance in which we have few observations of another level, for example a school with few students or a village with a few households, then we have to create an anonymous IDs for ''all'' instances at that level. Not just that one school or village, but all schools or villages.


== Guidelines ==
It is never incorrect to create an anonymous ID, so if there is any uncertainty whether a public ID can be used, then always go for the anonymous option.
* organize information on the topic into subsections. for each subsection, include a brief description / overview, with links to articles that provide details
===Subsection 1===
===Subsection 2===
===Subsection 3===


== Back to Parent ==
== Back to Parent ==
This article is part of the topic [[*topic name, as listed on main page*]]
This article is part of the topic [[Data Management]]




Line 25: Line 30:
* list here other articles related to this topic, with a brief description and link
* list here other articles related to this topic, with a brief description and link


[[Category: *category name* ]]
[[Category: Data Management ]]

Revision as of 23:51, 6 February 2017

An ID variable that identifies an observation should have the properties listed below. Note that this relates to the ID variable that identifies observations across data sets in out project folder. Some commands in Stata, for example reclink requires a masterid() and an userid() and these ID variables created temporarily for that command does not have to have all of these properties.

Read First


Uniquely Identifying

This is the most commonly referred to property of an ID variable.

Fully Identifying

Constant Across a Project

Constant Throughout the Duration of a Project

Anonymous IDs

Sometimes we have access to IDs that satisfy all the properties above, but we should be very careful before using them. Examples of such cases could be individual national IDs, public company IDs, a hospital's patient ID etc. Since records over those IDs are available to people outside our team, there is no way for us to guarantee that we can protect the privacy of the data we collect. In all of these cases we need to create our own ID that has no association with the ID variable created by someone else and is unique to our project and thereby be an anonymous ID that only identifies the observation to us. In the master data set we can include the other ID to enable us to merge data quickly, but then the information in the master data set becomes even more sensitive then usual.

There is an exception to this rule that can simplify the data work but should only be used with care. If a project has a high-level unit of observation for which the project team is absolutely certain that it will not collect sensitive data, and there is an official code for it, then we could perhaps use this code. It could for example be done for districts or region so that we can easier include publicly available data from those district or region. However, if there is any probability that we would include any data not publicly available, for example district budgets etc., then we need to make our own code. Also, if we have a unit of observation for which we have a single instance in which we have few observations of another level, for example a school with few students or a village with a few households, then we have to create an anonymous IDs for all instances at that level. Not just that one school or village, but all schools or villages.

It is never incorrect to create an anonymous ID, so if there is any uncertainty whether a public ID can be used, then always go for the anonymous option.

Back to Parent

This article is part of the topic Data Management


Additional Resources

  • list here other articles related to this topic, with a brief description and link