Difference between revisions of "Iefieldkit"
Line 1: | Line 1: | ||
<code>iefieldkit</code> | [[Primary Data Collection|Primary data collection]] and [[Data Cleaning|cleaning]] involve highly repetitive but extremely important processes that contribute to high quality [[Reproducible Research|reproducible research]]. [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] has developed '''<code>iefieldkit</code>''' as a package in [https://www.stata.com/ Stata] to standardize and simplify '''best practices''' involved in '''primary data collection'''. '''<code>Iefieldkit</code>''' consists of commands that automate: [[Ietestform|error-checking]] for electronic '''Open Data Kit (ODK)-based''' survey modules; [[Ieduplicates|duplicate checking]] and [[Iecompdup|resolution]]; [[Iecodebook#Apply|data cleaning]] (renaming, labeling, recoding), [[Iecodebook#Harmonize|survey harmonization]]; and [[Iecodebook#Export|codebook creation]]. | ||
==Read First== | ==Read First== | ||
*<code>iefieldkit</code> aims to provide Stata-based tools for managing the primary data collection process from start to finish. | *<code>iefieldkit</code> aims to provide Stata-based tools for managing the primary data collection process from start to finish. |
Revision as of 21:00, 29 April 2020
Primary data collection and cleaning involve highly repetitive but extremely important processes that contribute to high quality reproducible research. DIME Analytics has developed iefieldkit
as a package in Stata to standardize and simplify best practices involved in primary data collection. Iefieldkit
consists of commands that automate: error-checking for electronic Open Data Kit (ODK)-based survey modules; duplicate checking and resolution; data cleaning (renaming, labeling, recoding), survey harmonization; and codebook creation.
Read First
iefieldkit
aims to provide Stata-based tools for managing the primary data collection process from start to finish.iefieldkit
currently consists of four commands:ietestform
,ieduplicates
,iecompdup
, andiecodebook
.- All commands in the package can be used independently, and are developed for use in a wide range contexts.
- See the open-source code on GitHub here for public contribution and comment.
- To install the package, type
ssc install iefieldkit
in the Stata command box.
Overview
One of the most important developments in economics research over the past two decades has been the rise of empirical data collection, especially with unique primary datasets collected by the researchers themselves. The authors of iefieldkit
have supported the implementation of a wide range of primary data collection in fields including agriculture, health, energy and environment, edutainment, financial and private sector development, fragility, conflict, violence, gender, governance, and transport. They have developed workflows to support general best practices for data collection. As a rule, they develop new packages only in order to fill an essential gap in Stata functionality. iefieldkit
aims to provide Stata-based tools for managing the primary data collection process from start to finish.
All commands utilize spreadsheet-based workflows so that their inputs and outputs are significantly more human-readable than Stata do files completing the same tasks would be. These tasks can be supported and reviewed by personnel who specialize in field work rather than code tools. The increasing diversity and specialization of research teams has made accessibility to non-Stata-proficient personnel an essential component of data management workflows, and this package takes this development seriously.
Commands
Before Data Collection
Before data collection occurs, ietestform
allows for rapid error-checking of ODK-based electronic surveys, including best practices for SurveyCTO-styled forms. This ensures that data, once collected, will import in Stata-friendly formats -- such as avoiding name conflicts and ensuring compliant variable naming and labelling.
complements the ODK syntax test on SurveyCTO server. It runs tests to inform researchers how to use ODK programming language features to ensure high data quality. This command is especially useful if the data that will be imported to Stata has other restrictions in addition to ODK syntax.
During Data Collection
During data collection, ieduplicates
and iecompdup
(both previously released as a part of the package ietoolkit
but now moved to this package) provide a workflow for detecting and resolving duplicate entries in the dataset, ensuring that the final survey dataset will be a correct record of the survey sample to merge onto the master sampling database.
After Data Collection
After data collection, the iecodebook
commands provide a workflow for rapidly cleaning, harmonizing, and documenting datasets. iecodebook
uses input specified in an Excel sheet, which provides a much more well-structured and easy to follow overview – especially for non-technical users – than the same operations written directly to a dofile.
Additional Resources
- Visit the
iefieldkit
GitHub page here