Difference between revisions of "Iefieldkit"
Line 17: | Line 17: | ||
== Before Data Collection == | == Before Data Collection == | ||
Before data collection | In '''Open Data Kit (ODK)-based''' electronic survey kits, including [https://www.surveycto.com/ SurveyCTO], '''survey forms''' (or questionnaires) are typically [[SurveyCTO Programming#Programming in Excel|built in Excel]] using a specialized structured syntax. Before the [[Impact Evaluation Team|research team]] starts with [[Preparing for Field Data Collection|field data collection]], they can use <code>[[ietestform]]</code> to test '''Open Data Kit (ODK)-based''' [[Field Surveys|electronic survey forms]] for common errors, as well as best practices for [[SurveyCTO Coding Practices | SurveyCTO-based forms]]. | ||
The [[ServeyCTO Server Management|SurveyCTO server]] has a built-in test feature that tests the '''ODK''' syntax of a form when it is uploaded by the '''research team'''. <code>[[ietestform]]</code> complements these built-in tests to ensure that the collected data is in a format that is easily readable in Stata, and is of [[Monitoring Data Quality|high quality]]. | |||
== During Data Collection == | == During Data Collection == | ||
During data collection, <code>[[ieduplicates]]</code> and <code>[[iecompdup]]</code> (both previously released as a part of the package <code>ietoolkit</code> but now moved to this package) provide a workflow for detecting and resolving duplicate entries in the dataset, ensuring that the final survey dataset will be a correct record of the survey sample to merge onto the master sampling database. | During data collection, <code>[[ieduplicates]]</code> and <code>[[iecompdup]]</code> (both previously released as a part of the package <code>ietoolkit</code> but now moved to this package) provide a workflow for detecting and resolving duplicate entries in the dataset, ensuring that the final survey dataset will be a correct record of the survey sample to merge onto the master sampling database. |
Revision as of 17:50, 30 April 2020
Primary data collection and cleaning involve highly repetitive but extremely important processes that contribute to high quality reproducible research. DIME Analytics has developed iefieldkit
as a package in Stata to standardize and simplify best practices involved in primary data collection. Iefieldkit
consists of commands that automate: error-checking for electronic Open Data Kit (ODK)-based survey modules; duplicate checking and resolution; data cleaning and survey harmonization; and codebook creation.
Read First
- DIME Analytics Bootcamp on Reproducible Research.
- Stata coding practices.
iefieldkit
currently consists of four commands:ietestform
,ieduplicates
,iecompdup
, andiecodebook
.- Each of these commands can be used independently in a wide range contexts.
- The
iefieldkit
open-source code is available on GitHub for public contribution and comment. - To install the package, type
ssc install iefieldkit
in the Stata command box.
Objective
One of the most important developments in economics over the past two decades has been the rise of empirical research, through primary as well as secondary data collection. The authors of iefieldkit
have developed the package to support data collection by researchers directly in a wide range of fields like agriculture, health, energy and environment, transport, financial and private sector development, gender, governance, and fragility, conflict and violence (FCV). iefieldkit
therefore supports general best practices in primary data collection from start to finish:
- Before data collection.
ietestform
- During data collection.
ieduplicates
andiecompdup
- After data collection.
iecodebook
.
These four commands in this package make sure that inputs and outputs are significantly more human-readable by working with spreadsheets instead of Stata do-files. In doing so, they allow field personnel who do not specialize in code tools to understand and review the tasks involved in primary data collection. iefieldkit
thus recognizes the vital role played by field personnel in supporting data management and data cleaning even if they are not proficient in Stata.
Before Data Collection
In Open Data Kit (ODK)-based electronic survey kits, including SurveyCTO, survey forms (or questionnaires) are typically built in Excel using a specialized structured syntax. Before the research team starts with field data collection, they can use ietestform
to test Open Data Kit (ODK)-based electronic survey forms for common errors, as well as best practices for SurveyCTO-based forms.
The SurveyCTO server has a built-in test feature that tests the ODK syntax of a form when it is uploaded by the research team. ietestform
complements these built-in tests to ensure that the collected data is in a format that is easily readable in Stata, and is of high quality.
During Data Collection
During data collection, ieduplicates
and iecompdup
(both previously released as a part of the package ietoolkit
but now moved to this package) provide a workflow for detecting and resolving duplicate entries in the dataset, ensuring that the final survey dataset will be a correct record of the survey sample to merge onto the master sampling database.
After Data Collection
After data collection, the iecodebook
commands provide a workflow for rapidly cleaning, harmonizing, and documenting datasets. iecodebook
uses input specified in an Excel sheet, which provides a much more well-structured and easy to follow overview – especially for non-technical users – than the same operations written directly to a dofile.
Additional Resources
- Visit the
iefieldkit
GitHub page here