Difference between revisions of "Iefieldkit"

Jump to: navigation, search
Line 1: Line 1:
===Summary===
<code>iefieldkit</code> is a package of Stata commands that standardizes and simplifies best practices for high-quality, [[Reproducible Research | reproducible]], [[Primary Data Collection | primary data collection]]. This package currently supports three major components of data workflow: survey design; survey completion; and data cleaning and survey harmonization. This page will explain the package and its commands.


iefieldkit provides a set of commands that enable a reproducible primary data collection and cleaning workflow. The package is developed to facilitate a workflow including (1) data collection (in particular using opendatakit.org, more specifically surveycto.com); (2) basic data cleaning, such as labeling and recoding; (3) reconciling survey rounds; (4) preparing codebooks to document data sets. iefieldkit was developed to standardize and simplify best practices for high-quality primary data collection across the World Bank's Development Research Group Impact Evaluations team (DIME). The commands can also be used independently, and are developed to be applicable to many other contexts as well. See https://github.com/worldbank/iefieldkit for more details, or read the DIME Wiki entries for:
==Read First==
*<code>iefieldkit</code> aims to provide Stata-based tools for managing the primary data collection process from start to finish.
*<code>iefieldkit</code> currently consists of four commands: <code>ietestform</code>, <code>ieduplicates</code>, <code>iecompdup</code>, and <code>iecodebook</code>.
*All commands in the package can be used independently, and are developed for use in a wide range contexts.  
*See the open-source code [https://github.com/worldbank/iefieldkit here] for public contribution and comment.


- [[ietestform]]
==Overview==


- [[ieduplicates]]
One of the most important developments in economics research over the past two decades has been the rise of empirical data collection, especially with unique primary datasets collected by the researchers themselves. The authors of <code>iefieldkit</code> have supported the implementation of a wide range of primary data collection in fields including agriculture, health, energy and environment, edutainment, financial and private sector development, fragility, conflict, violence, gender, governance, and transport. They have developed workflows to support general best practices for data collection. As a rule, they develop new packages only in order to fill an essential gap in Stata functionality. <code>iefieldkit</code> aims to provide Stata-based tools for managing the primary data collection process from start to finish.


- [[iecodebook]]
All commands utilize spreadsheet-based workflows so that their inputs and outputs are significantly more human-readable than Stata do files completing the same tasks would be. These tasks can be supported and reviewed by personnel who specialize in field work rather than code tools. The increasing diversity and specialization of research teams has made accessibility to non-Stata-proficient personnel an essential component of data management workflows, and this package takes this development seriously.


===Details===
==Commands==


The iefieldkit  package is a set of commands designed to simplify a series of tedious and repetitive tasks for Stata users who are in the process of collecting primary survey data in the field. This package currently supports three major components of that workflow: survey design; survey completion; and data cleaning and survey harmonization.
===Before Data Collection===


One of the most important developments in economics research over the past two decades has been the rise of empirical data collection, especially with unique primary datasets collected by the researchers themselves. The authors of iefieldkit have supported the implementation of a wide range of primary data collection in fields including agriculture, health, energy and environment, edutainment, financial and private sector development, fragility, conflict, and violence, gender, governance, and transport. They have developed workflows to support general best practices for data collection, and as a rule develop new packages only when they fill an essential gap in Stata functionality. The packages here are a first attempt to provide Stata-based tools for managing the primary data collection process using native tools from start to finish.
Before data collection occurs, <code>ietestform</code> allows for rapid error-checking of ODK-based electronic surveys, including best practices for [[SurveyCTO Coding Practices | SurveyCTO]]-styled forms. This ensures that data, once collected, will import in Stata-friendly formats -- such as avoiding name conflicts and ensuring compliant variable naming and labelling.  


Specifically, iefieldkit performs three essential tasks. Before data collection occurs, iefieldkit allows for rapid error-checking of ODK-based electronic surveys, including best practices for SurveyCTO-styled forms. This ensures that data, once collected, will import in Stata-friendly formats -- such as avoiding name conflicts and ensuring compliant variable naming and labelling. While data collection is ongoing, ieduplicates and iecompdup provide a workflow for detecting and resolving duplicate entries in the dataset, ensuring that the final survey dataset will be a correct record of the survey sample to merge onto the master sampling database. Finally, once data collection is complete, the iecodebook commands provide a workflow for rapidly cleaning, harmonizing, and documenting datasets.
===During Data Collection===


All three commands utilize spreadsheet-based workflows so that their inputs and outputs are significantly more human-readable than Stata dofiles completing the same tasks would be, and these tasks can be supported and reviewed by personnel who specialize in field work rather than code tools. The increasing diversity and specialization of research teams has made accessibility to non-Stata-proficient personnel an essential component of data management workflows, and the iefieldkit package takes this development seriously. The code is also open-source and available for public contribution and comment on GitHub at https://github.com/worldbank/iefieldkit.
During data collection, <code>ieduplicates</code> and <code>iecompdup</code> (both previously released as a part of the package <code>ietoolkit</code> but now moved to this package) provide a workflow for detecting and resolving duplicate entries in the dataset, ensuring that the final survey dataset will be a correct record of the survey sample to merge onto the master sampling database.
===After Data Collection===
 
After data collection, the <code>iecodebook</code> commands provide a workflow for rapidly [[Data Cleaning | cleaning]], harmonizing, and [[Data Documentation | documenting]] datasets. <code>iecodebook</code> uses input specified in an Excel sheet, which provides a much more well-structured and easy to follow overview – especially for non-technical users – than the same operations written directly to a dofile.
==Additional Resources==
 
[[Category: Stata]]

Revision as of 21:01, 10 May 2019

iefieldkit is a package of Stata commands that standardizes and simplifies best practices for high-quality, reproducible, primary data collection. This package currently supports three major components of data workflow: survey design; survey completion; and data cleaning and survey harmonization. This page will explain the package and its commands.

Read First

  • iefieldkit aims to provide Stata-based tools for managing the primary data collection process from start to finish.
  • iefieldkit currently consists of four commands: ietestform, ieduplicates, iecompdup, and iecodebook.
  • All commands in the package can be used independently, and are developed for use in a wide range contexts.
  • See the open-source code here for public contribution and comment.

Overview

One of the most important developments in economics research over the past two decades has been the rise of empirical data collection, especially with unique primary datasets collected by the researchers themselves. The authors of iefieldkit have supported the implementation of a wide range of primary data collection in fields including agriculture, health, energy and environment, edutainment, financial and private sector development, fragility, conflict, violence, gender, governance, and transport. They have developed workflows to support general best practices for data collection. As a rule, they develop new packages only in order to fill an essential gap in Stata functionality. iefieldkit aims to provide Stata-based tools for managing the primary data collection process from start to finish.

All commands utilize spreadsheet-based workflows so that their inputs and outputs are significantly more human-readable than Stata do files completing the same tasks would be. These tasks can be supported and reviewed by personnel who specialize in field work rather than code tools. The increasing diversity and specialization of research teams has made accessibility to non-Stata-proficient personnel an essential component of data management workflows, and this package takes this development seriously.

Commands

Before Data Collection

Before data collection occurs, ietestform allows for rapid error-checking of ODK-based electronic surveys, including best practices for SurveyCTO-styled forms. This ensures that data, once collected, will import in Stata-friendly formats -- such as avoiding name conflicts and ensuring compliant variable naming and labelling.

During Data Collection

During data collection, ieduplicates and iecompdup (both previously released as a part of the package ietoolkit but now moved to this package) provide a workflow for detecting and resolving duplicate entries in the dataset, ensuring that the final survey dataset will be a correct record of the survey sample to merge onto the master sampling database.

After Data Collection

After data collection, the iecodebook commands provide a workflow for rapidly cleaning, harmonizing, and documenting datasets. iecodebook uses input specified in an Excel sheet, which provides a much more well-structured and easy to follow overview – especially for non-technical users – than the same operations written directly to a dofile.

Additional Resources