Difference between revisions of "Iedropone"
Kbjarkefur (talk | contribs) |
|||
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
This article is meant to describe use cases, work flow, and the reasoning used when developing the commands. For instructions on how to use the command specifically in [[Stata Coding Practices|Stata]] and for a complete list of the options available, see the help files by typing <code>help iedropone</code> in '''Stata'''. | |||
== Read First == | |||
This article is meant to describe use cases, work flow and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing <code>help iedropone</code> in Stata. This command is a part of the package [[Stata_Coding_Practices#ietoolkit|ietoolkit]] | *<code>iedropone</code> is used to make sure that no additional observations are dropped when dropping an exact number of observations. | ||
* This command is a part of the package [[Stata_Coding_Practices#ietoolkit|ietoolkit]]. To install all the commands in this package, type <code>ssc install ietoolkit</code> in '''Stata'''. | |||
== Intended use cases == | == Intended use cases == | ||
It is common that observations needs to be dropped when cleaning a | It is common that observations needs to be dropped when [[Data Cleaning|cleaning]] a [[Master Dataset|dataset]]. For example, we might know that an interview was done incorrectly and the data for that observation needs to be dropped. Or perhaps a whole village should be dropped. At the same time, it is important that these observations are dropped so that they do not introduce error in the [[Data Analysis|analysis]]. It is also important that we do not delete more observations than intended. When we first write <code>drop if HHID == 123456</code> we can easily check that [[Stata Coding Practices|Stata]] deletes exactly one observation. And if we want to delete all observations from one village and we know that it is 12 observations in that village, we can write <code>drop if village_code == 123</code> and check that exactly 12 observations are deleted. | ||
However, in the cleaning process the data can change, especially if the data collection is still ongoing. And that means that more observations might incorrectly be deleted, or observations that are supposed to be deleted are | However, in the '''cleaning process''' the data can change, especially if the [[Primary Data Collection|data collection]] is still ongoing. And that means that more observations might incorrectly be deleted, or observations that are supposed to be deleted are not. Let's say that someone changes all village code of 123 to missing as it is incorrect. If that change happens before the code that drops those village, then these twelve villages are no longer deleted. In these examples, when we delete based on ID information, we are likely to catch the mistake eventually, but perhaps not after some damage is done. And when we delete observations without having clear IDs available, then this might be an issue that we never catch. This is where <code>iedropone</code> comes in. It will test that exactly one observation is dropped if no number is specified, and it can be set to test for any other number of observations. If it the expected number of observations that are not dropped, then <code>iedropone</code> will return an error and you get a chance to investigate why the number has changed. | ||
=== Intended Work Flow === | === Intended Work Flow === | ||
Simply replace the command | Simply replace the command <code>drop</code> with <code>iedropone</code>, and keep running the code as normal. | ||
== Instructions == | == Instructions == | ||
These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata see the help files by typing <code>help iedropone</code> in Stata. | These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in [[Stata Coding Practices|Stata]], see the help files by typing <code>help iedropone</code> in '''Stata'''. | ||
The usage of this command is easy. The syntax works the same way as Stata's built in command drop, however, there are | The usage of this command is easy. The syntax works the same way as '''Stata's''' built in command <code>drop</code>, however, there are more options when using <code>iedropone</code> (see help file). The main challenge when using <code>iedropone</code> is to document well why a particular number of observations were dropped, so that it is possible for anyone in the future who may get an error message from <code>iedropone</code> to know why those observations were dropped and what could have caused an incorrect number of observations to be dropped now. | ||
== | == Related Pages == | ||
[[Special:WhatLinksHere/Iedropone|Click here to see pages that link to this topic]]. | |||
[[Category: Stata ]] | [[Category: Stata ]] |
Latest revision as of 15:34, 9 August 2023
This article is meant to describe use cases, work flow, and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing help iedropone
in Stata.
Read First
iedropone
is used to make sure that no additional observations are dropped when dropping an exact number of observations.- This command is a part of the package ietoolkit. To install all the commands in this package, type
ssc install ietoolkit
in Stata.
Intended use cases
It is common that observations needs to be dropped when cleaning a dataset. For example, we might know that an interview was done incorrectly and the data for that observation needs to be dropped. Or perhaps a whole village should be dropped. At the same time, it is important that these observations are dropped so that they do not introduce error in the analysis. It is also important that we do not delete more observations than intended. When we first write drop if HHID == 123456
we can easily check that Stata deletes exactly one observation. And if we want to delete all observations from one village and we know that it is 12 observations in that village, we can write drop if village_code == 123
and check that exactly 12 observations are deleted.
However, in the cleaning process the data can change, especially if the data collection is still ongoing. And that means that more observations might incorrectly be deleted, or observations that are supposed to be deleted are not. Let's say that someone changes all village code of 123 to missing as it is incorrect. If that change happens before the code that drops those village, then these twelve villages are no longer deleted. In these examples, when we delete based on ID information, we are likely to catch the mistake eventually, but perhaps not after some damage is done. And when we delete observations without having clear IDs available, then this might be an issue that we never catch. This is where iedropone
comes in. It will test that exactly one observation is dropped if no number is specified, and it can be set to test for any other number of observations. If it the expected number of observations that are not dropped, then iedropone
will return an error and you get a chance to investigate why the number has changed.
Intended Work Flow
Simply replace the command drop
with iedropone
, and keep running the code as normal.
Instructions
These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata, see the help files by typing help iedropone
in Stata.
The usage of this command is easy. The syntax works the same way as Stata's built in command drop
, however, there are more options when using iedropone
(see help file). The main challenge when using iedropone
is to document well why a particular number of observations were dropped, so that it is possible for anyone in the future who may get an error message from iedropone
to know why those observations were dropped and what could have caused an incorrect number of observations to be dropped now.