Difference between revisions of "Iedropone"

Jump to: navigation, search
 
(3 intermediate revisions by the same user not shown)
Line 2: Line 2:


== Read First ==
== Read First ==
* '''iedropone''' is used to make sure that no additional observations are dropped when dropping an exact number of observations.
*<code>iedropone</code> is used to make sure that no additional observations are dropped when dropping an exact number of observations.
* This command is a part of the package [[Stata_Coding_Practices#ietoolkit|ietoolkit]], to install all the commands in this package including this command, type <code>ssc install ietoolkit</code> in Stata.
* This command is a part of the package [[Stata_Coding_Practices#ietoolkit|ietoolkit]]. To install all the commands in this package, type <code>ssc install ietoolkit</code> in '''Stata'''.


== Intended use cases ==
== Intended use cases ==
It is common that observations needs to be dropped when cleaning a data set. For example, we might know that an interview was done incorrectly and the data for that observation needs to be dropped. Or a whole village should be dropped. At the same time it is important that these observations are dropped so that they do not introduce error in the analysis, it is also important that we do not delete more observation than exactly those observations. When we first write <code>drop if HHID == 123456</code> we can easily check that Stata deletes exactly one observation. And if we want to delete all observations from one village and we know that it is 12 observations in that village, we can write <code>drop if village_code == 123</code> and check that exactly 12 observations are deleted.
It is common that observations needs to be dropped when [[Data Cleaning|cleaning]] a [[Master Dataset|dataset]]. For example, we might know that an interview was done incorrectly and the data for that observation needs to be dropped. Or perhaps a whole village should be dropped. At the same time, it is important that these observations are dropped so that they do not introduce error in the [[Data Analysis|analysis]]. It is also important that we do not delete more observations than intended. When we first write <code>drop if HHID == 123456</code> we can easily check that [[Stata Coding Practices|Stata]] deletes exactly one observation. And if we want to delete all observations from one village and we know that it is 12 observations in that village, we can write <code>drop if village_code == 123</code> and check that exactly 12 observations are deleted.


However, in the cleaning process the data can change, especially if the data collection is still ongoing. And that means that more observations might incorrectly be deleted, or observations that are supposed to be deleted are no longer deleted. Let's say that someone change all village code of 123 to missing as it is incorrect. If that change happens before the code that drops those village, then these twelve villages are no longer deleted. In these examples when we delete based on ID information we are likely to catch the mistake eventually, but perhaps not after some damage is done. And when we delete observations without having clear IDs available, then this might be an issue that we never catches. This is where '''iedropone''' comes in. It will test that exactly one observation is dropped if no number is specified, and it can be set to test for any other number of observations. If it is not the expected number of observations that will be dropped, then iedropone will throw an error and you get a chance to investigate why the number has changed.
However, in the '''cleaning process''' the data can change, especially if the [[Primary Data Collection|data collection]] is still ongoing. And that means that more observations might incorrectly be deleted, or observations that are supposed to be deleted are not. Let's say that someone changes all village code of 123 to missing as it is incorrect. If that change happens before the code that drops those village, then these twelve villages are no longer deleted. In these examples, when we delete based on ID information, we are likely to catch the mistake eventually, but perhaps not after some damage is done. And when we delete observations without having clear IDs available, then this might be an issue that we never catch. This is where <code>iedropone</code> comes in. It will test that exactly one observation is dropped if no number is specified, and it can be set to test for any other number of observations. If it the expected number of observations that are not dropped, then <code>iedropone</code> will return an error and you get a chance to investigate why the number has changed.


=== Intended Work Flow ===
=== Intended Work Flow ===
Simply replace the command ''drop'' with iedropone, and keep running the code as normal.
Simply replace the command <code>drop</code> with <code>iedropone</code>, and keep running the code as normal.


== Instructions ==
== Instructions ==
These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata see the help files by typing <code>help iedropone</code> in Stata.
These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in [[Stata Coding Practices|Stata]], see the help files by typing <code>help iedropone</code> in '''Stata'''.


The usage of this command is easy. The syntax works the same way as Stata's built in command drop, however, there are some more options to iedropone, see help file. The main challenge when using iedropone is to document really well why that exact number of observations were expected, so that it is possible for anyone in the future who may get an error message from iedropone to know why the observation was dropped and what could have caused that more or less observations are now dropped.
The usage of this command is easy. The syntax works the same way as '''Stata's''' built in command <code>drop</code>, however, there are more options when using <code>iedropone</code> (see help file). The main challenge when using <code>iedropone</code> is to document well why a particular number of observations were dropped, so that it is possible for anyone in the future who may get an error message from <code>iedropone</code> to know why those observations were dropped and what could have caused an incorrect number of observations to be dropped now.


== Related Pages ==
== Related Pages ==

Latest revision as of 15:34, 9 August 2023

This article is meant to describe use cases, work flow, and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing help iedropone in Stata.

Read First

  • iedropone is used to make sure that no additional observations are dropped when dropping an exact number of observations.
  • This command is a part of the package ietoolkit. To install all the commands in this package, type ssc install ietoolkit in Stata.

Intended use cases

It is common that observations needs to be dropped when cleaning a dataset. For example, we might know that an interview was done incorrectly and the data for that observation needs to be dropped. Or perhaps a whole village should be dropped. At the same time, it is important that these observations are dropped so that they do not introduce error in the analysis. It is also important that we do not delete more observations than intended. When we first write drop if HHID == 123456 we can easily check that Stata deletes exactly one observation. And if we want to delete all observations from one village and we know that it is 12 observations in that village, we can write drop if village_code == 123 and check that exactly 12 observations are deleted.

However, in the cleaning process the data can change, especially if the data collection is still ongoing. And that means that more observations might incorrectly be deleted, or observations that are supposed to be deleted are not. Let's say that someone changes all village code of 123 to missing as it is incorrect. If that change happens before the code that drops those village, then these twelve villages are no longer deleted. In these examples, when we delete based on ID information, we are likely to catch the mistake eventually, but perhaps not after some damage is done. And when we delete observations without having clear IDs available, then this might be an issue that we never catch. This is where iedropone comes in. It will test that exactly one observation is dropped if no number is specified, and it can be set to test for any other number of observations. If it the expected number of observations that are not dropped, then iedropone will return an error and you get a chance to investigate why the number has changed.

Intended Work Flow

Simply replace the command drop with iedropone, and keep running the code as normal.

Instructions

These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata, see the help files by typing help iedropone in Stata.

The usage of this command is easy. The syntax works the same way as Stata's built in command drop, however, there are more options when using iedropone (see help file). The main challenge when using iedropone is to document well why a particular number of observations were dropped, so that it is possible for anyone in the future who may get an error message from iedropone to know why those observations were dropped and what could have caused an incorrect number of observations to be dropped now.

Related Pages

Click here to see pages that link to this topic.