This article is meant to describe use cases, work flow and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing
help iedropone in Stata.
- iedropone is used to make sure that no additional observations are dropped when dropping an exact number of observations.
- This command is a part of the package ietoolkit, to install all the commands in this package including this command, type
ssc install ietoolkitin Stata.
Intended use cases
It is common that observations needs to be dropped when cleaning a data set. For example, we might know that an interview was done incorrectly and the data for that observation needs to be dropped. Or a whole village should be dropped. At the same time it is important that these observations are dropped so that they do not introduce error in the analysis, it is also important that we do not delete more observation than exactly those observations. When we first write
drop if HHID == 123456 we can easily check that Stata deletes exactly one observation. And if we want to delete all observations from one village and we know that it is 12 observations in that village, we can write
drop if village_code == 123 and check that exactly 12 observations are deleted.
However, in the cleaning process the data can change, especially if the data collection is still ongoing. And that means that more observations might incorrectly be deleted, or observations that are supposed to be deleted are no longer deleted. Let's say that someone change all village code of 123 to missing as it is incorrect. If that change happens before the code that drops those village, then these twelve villages are no longer deleted. In these examples when we delete based on ID information we are likely to catch the mistake eventually, but perhaps not after some damage is done. And when we delete observations without having clear IDs available, then this might be an issue that we never catches. This is where iedropone comes in. It will test that exactly one observation is dropped if no number is specified, and it can be set to test for any other number of observations. If it is not the expected number of observations that will be dropped, then iedropone will throw an error and you get a chance to investigate why the number has changed.
Intended Work Flow
Simply replace the command drop with iedropone, and keep running the code as normal.
These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata see the help files by typing
help iedropone in Stata.
The usage of this command is easy. The syntax works the same way as Stata's built in command drop, however, there are some more options to iedropone, see help file. The main challenge when using iedropone is to document really well why that exact number of observations were expected, so that it is possible for anyone in the future who may get an error message from iedropone to know why the observation was dropped and what could have caused that more or less observations are now dropped.