Difference between revisions of "Primary Data Collection"

Jump to: navigation, search
 
(104 intermediate revisions by 5 users not shown)
Line 1: Line 1:
<onlyinclude>
'''Primary data collection''' is the process of gathering data through [[Field Surveys|surveys]], interviews, or experiments. A typical example of primary data is '''household surveys'''. In this form of data collection, researchers can personally ensure that primary data meets the standards of [[Monitoring Data Quality | quality]], availability, [[Power Calculations in Stata | statistical power]] and [[Sampling & Power Calculations | sampling]] required for a particular research question. With globally increasing access to specialized [[Software Tools |survey tools]],  [[Survey Firm | survey firms]], and [[Training Guidelines: Content and Structure|field manuals]], primary data has become the dominant source for empirical inquiry in development economics
'''Primary data collection''' is the process of gathering data through [[Field Surveys|surveys]], interviews, or experiments. A typical example of primary data is '''household surveys'''. In this form of data collection, researchers can personally ensure that primary data meets the standards of [[Monitoring Data Quality | quality]], availability, [[Power Calculations in Stata | statistical power]] and [[Sampling & Power Calculations | sampling]] required for a particular research question. With globally increasing access to specialized [[Software Tools |survey tools]],  [[Survey Firm | survey firms]], and field manuals, primary data has become the dominant source for empirical inquiry in development economics.
</onlyinclude>
== Read First ==
== Read First ==
*[https://github.com/worldbank/dime-standards/blob/master/dime-research-standards/README.md The DIME Research Standards] provide a comprehensive checklist to ensure that collection and handling of research data is in line with global best practices.
* [[Field Surveys|Field surveys]] are one of the most effective medium for '''primary data collection'''. Depending on the research question, these interviews may take the form of household surveys, business (firm) surveys, or agricultural (farm) surveys.
*[[Field Surveys|Personal interviews]] are one of the most effective medium for primary data collection. Depending on the research question, these interviews may take the form of household surveys, business (firm) surveys, or agricultural (farm) surveys.
* The [[Impact Evaluation Team|research team]] must [[Field Management|plan]] and [[Preparing for Field Data Collection|prepare]] for '''primary data collection''' in advance.
*<code>[[iefieldkit]]</code> is a Stata package that aids primary data collection. It currently supports three major components of that workflow: [[Questionnaire Design|survey design]]; survey completion; and [[Data Cleaning|data-cleaning]] and [[Iefieldkit|survey harmonization]].
* <code>[[iefieldkit]]</code> is a Stata package that aids '''primary data collection'''. It currently supports three major components of this process: [[Iefieldkit#Before Data Collection|testing survey instruments]]; [[Iefieldkit#During Data Collection|survey completion]]; and [[Data Cleaning|data-cleaning]] and [[Iefieldkit#After Data Collection|survey harmonization]].


== Guidelines ==
== Overview ==
While impact evaluations often benefit from [[Secondary Data Sources|secondary sources of data]] like administrative data, census data, or household data, these  sources may not always be available. In such cases, researchers need to collect data directly through a series of [[Questionnaire Design|well-designed]] interviews and [[Field Surveys|surveys]]. The process of collecting primary data requires a great deal of foresight, [[Field Management|planning]] and coordination.
While impact evaluations often benefit from [[Secondary Data Sources|secondary sources of data]] like [[Administrative and Monitoring Data|administrative data]], census data, or household data, these  sources may not always be available. In such cases, the [[Impact Evaluation Team|research team]] will need to collect data directly using well-designed [[Computer-Assisted Personal Interviews (CAPI)|interviews]] and [[Field Surveys|surveys]], and the '''research team''' typically owns the data that it collects. However, even then, the research team must keep in mind certain [[Research Ethics|ethical concerns]] related to owning and handling sensitive, or [[Personally Identifiable Information (PII)|personally identifiable information (PII)]].
Listed below are the crucial steps involved the in [[Preparing for Field Data Collection | preparation and collection]] of primary data.


=== Pre-register research ===
Before moving on to the discussion of concerns about ownership and handling, however, it is important to understand the process of '''collecting primary data'''. The process of '''primary data collection''' consists of several steps, from [[Questionnaire Design|questionnaire development]], to [[Enumerator Training|enumerator training]]. Each of these steps are listed below, and require detailed [[Field Management|planning]], and coordination among the members of the '''research team'''.
The first step with any new research project is to [[Pre-Registration | pre-register]] your research, including the methodology, and draft a [[Pre-Analysis Plan | pre-analysis plan]].


=== Acquire approval from human subjects ===
== Develop Questionnaire ==  
There are strict rules about [[Human Subjects Approval | acquiring approval from human subjects]]. Researchers must understand the [[Research Ethics|ethics]] and rules for [[Data Security|security of sensitive data]], and should use proper tools for [[Encryption | encryption]] and [[De-identification | de-identification]] of [[Personally Identifiable Information_(PII)|personally identifiable information (PII)]].
The first step of '''primary data collection''' is to [[Questionnaire Design|design a survey instrument]] (or '''questionnaire'''). It is important to remember that drafting a questionnaire from scratch can be a time-consuming process, so the [[Impact Evaluation Team|research team]] should try to use existing resources as far as possible. While '''developing''' the questionnaire, keep the following things in mind:
* '''Plan.''' The '''research team''' should start with a clear understanding of the [[Theory of Change|theory of change]] for the project. List key outcomes of interest, and the main variables that can be used to measure these outcomes. A good starting point for this is the [[Pre-Analysis Plan|pre-analysis plan]].


=== Compile the survey budget ===
* '''Modules.''' Divide the questionnaire into individual modules, each with a group of questions that are related to one aspect of the [[Field Surveys|survey]]. Unless the context of the study is entirely new, perform a [[Literature Review for Questionnaire|literature review]] of existing well-tested and reliable surveys to prepare the general structure of the questionnaire. One example of a resource for past studies and questionnaires is the [https://microdata.worldbank.org/index.php/home World Bank Microdata Library].
Researchers must prepare a [[Survey Budget | survey budget]] before [[Procuring a Survey Firm|procuring a survey firm]]. This step allows researchers to calculate expected costs of conducting a study, and compare these with the proposals of firms that submit an '''expression of interest (EOI)'''.


=== Determine relevant parameters of a study ===
* '''Measurement challenges.''' Often, '''research teams''' face challenges in measuring certain outcomes, for instance, abstract concepts (like empowerment), or socially sensitive topics that people do not wish to talk about (like drug abuse). In such cases, try to use indicators that are easy to identify, or build a level of comfort with respondents before moving to the sensitive topics.
After agreeing upon a budget, researchers then decide upon factors like the adequate '''sampling frame''' (which is a list of individuals or units in a population from which a sample can be drawn), [[Sample Size | sample size]], and [[Sampling & Power Calculations | statistical power]] based on which they can then [[Randomized_Control_Trials|randomize treatment]].


=== Procure a survey firm ===
* '''Translation.''' [[Questionnaire Translation|Translating]] the questionnaire is a very important step. The '''research team''' must hire only professional translators to translate the questionnaire into all local languages that are spoken in the study location.
The next step is to [[Procuring a Survey Firm|procure a survey firm]] after issuing detailed [[Survey Firm TOR|terms of reference (TOR)]], and performing due diligence among local research firm options.


=== Carry out a pre-pilot===
== Pilot Questionnaire ==
The '''first stage''' of the [[Survey Pilot|survey pilot]], the '''pre-pilot''' involves two things: [[Piloting Survey Content |piloting content]] and [[Piloting Survey Protocols| piloting protocols]]. Clear protocols allow researchers to ensure that [[Preparing for Field Data Collection|field collection]] is carried out consistently across teams and/or regions, and ensure that published [[Reproducible_Research|research is reproducible]].
[[Survey Pilot|Survey pilot]] is the process of carrying out interviews and tests on different components of a survey, including [[Checklist: Content-focused Pilot|content]] and [[Checklist:_Data-focused_Pilot|protocols]]. A good '''pilot''' provides the research team with important feedback before they start the process of [[Primary Data Collection|data collection]]. This feedback can help the [[Impact Evaluation Team|research team]] review and improve [[Questionnaire Design|instrument design]], [[Questionnaire Translation|translations]], as well as [[Survey Protocols|survey protocols]] related to [[Checklist:_Piloting_Survey_Protocols|interview scheduling]], [[Sampling|sampling]], and [[Geo Spatial Data|geo data]].


=== Refine and review the survey design ===
A '''pilot''' has [[Survey Pilot#Stages of a Survey Pilot|three stages]] - '''pre-pilot''', [[Checklist: Content-focused Pilot|content-focused pilot]], and [[Checklist:_Data-focused_Pilot|data-focused pilot]]. Typically, the '''pilot''' is carried out before [[Procuring a Survey Firm|hiring a survey firm]]. The '''research team''' must draft a clear [[Survey Pilot#Timeline|timeline]] for the '''pilot''', and allocate enough time for each component of the pilot. [https://www.worldbank.org/en/research/dime/data-and-analytics DIME Analytics] has also created the following checklists to assist researchers and [[Survey Pilot Participants|enumerators]] in preparing for, and implementing a pilot:
The '''first stage''' of the [[Survey Pilot|survey pilot]] allows researchers to develop a [[Questionnaire_Design|design]] for the instrument. The researchers then conduct the '''second stage''' of the survey pilot, called [[Piloting_Survey_Content|content-focused pilot]], to review and refine the structure of the instrument.
*[[Preparing_for_the_survey_checklist|Checklist: Preparing for a survey pilot]]
*[[Checklist:_Content-focused_Pilot|Checklist: Refining questionnaire content]]
*[[Checklist:_Data-focused_Pilot|Checklist: Refining questionnaire data]]


=== Translate the survey instrument ===
== Pilot Recruitment Strategy ==
After the content-focused pilot, the research firm [[Questionnaire_Translation|translates the instrument]] into all local languages. This step helps to ensure that the survey can be taken by more people, therefore making the study more effective.
Besides testing [[Questionnaire Design|content]] and [[Survey Protocols|protocols]], it is also important for the [[Impact Evaluation Team|research team]] to '''pilot recruitment strategy''' before starting data collection. This is especially important in the following cases:
* '''Smaller sample.''' Before finalizing the target population, the '''research team''' often surveys a smaller sample of the population that is selected for an intervention. The research team can use this to finalize the '''sampling frame''' (which is a list of individuals or units in a population from which a sample can be drawn), [[Sampling_&_Power_Calculations | sample size]], and [[Power_Calculations | statistical power]] based on which they can then start data collection.


=== Program the instrument ===
* '''Low or unknown take-up.''' '''Take-up rate''' is the percentage of eligible people who accept a benefit, or participate in data collection. Sometimes the '''take-up rates''' can be low, or unknown. In such cases, the research team should reconsider the [[Sampling|sampling]] strategy, and test it before starting data collection.  
After obtaining [[IRB Approval]], researchers [[Questionnaire Programming|program the questionnaire]]. This step makes it easier to share surveys that rely on methods like [[Computer-Assisted Personal Interviews (CAPI)]] or [[Computer-Assisted Field Entry (CAFE)]]
<br/> Also refer to [[SurveyCTO_Coding_Practices|SurveyCTO coding practices]] to learn more about programming surveys.


=== Train enumerators and monitor data quality ===
* '''Different participation rates.''' Sometimes participation can differ based on factors like gender, age, social status, etc. This requires the research to consider different strategies like [[Stratified Random Sample|stratified sampling]].
After validating the programming of the questionnaire, the researchers [[Enumerator Training | train enumerators]] and [[Monitoring_Data_Quality|monitor data quality]] to generate a '''final draft''' of the instrument. '''Monitoring''' can be done in the form of [[Back_Checks|back checks]], [[Monitoring Data Quality#High Frequency Checks|high frequency checks]], as well as other methods.


=== Maintain  an organized data folder ===
One of the ways to test the '''recruitment strategy''' is to test 3 different recruitment strategies, say, '''A''', '''B''', and '''C'''. The research team can then finalize the strategy that has the highest '''take-up rates'''. Another method is identifying the ideal incentives which can ensure higher participation by the eligible population. After finalizing a recruitment strategy, the research team can move on to drafting the [[Survey Firm TOR|terms of reference (TOR)]] for the data collection.
DIME has created a Stata package, <code>[[iefolder]]</code>. Part of the DIME Analytics <code>[[ietoolkit]]</code>, this package helps increase project efficiency, and reduces the risk of error in a study.


== Back to Parent ==
== TOR: Create Budget and Plan Fieldwork ==
This article is part of the [[Main Page|DIME Wiki main page]]
After finalizing the '''survey instrument''' and '''recruitment strategy''', the [[Impact Evaluation Team|research team]] must prepare a detailed [[Survey Firm TOR|terms of reference (TOR)]] for hiring a [[Survey Firm|survey firm]]. The '''terms of reference (TOR)''' define the structure of the project, as well as the responsibilities of the '''survey firm'''. While preparing the '''TOR''', the '''research team''' must create a [[Survey Budget|survey budget]], and [[Field Management|plan the fieldwork]].
=== Create budget ===
The '''research team''' should calculate standard, as well as project-specific costs, and prepare a '''survey budget'''. In this stage, the research team should also consider what [[Sampling|sample size]] it can afford for the data collection. This allows the research team to calculate expected costs of conducting a study, and compare these with the proposals of [[Survey Firm|survey firms]] that respond to the '''terms of reference (TOR)'''.
 
=== Plan fieldwork ===
It is also important to plan '''fieldwork''' in advance to give potential '''survey firms''' an idea of the responsibilities and tasks involved in the data collection. For the [[Impact Evaluation Team#Field Coordinators|field coordinators (FCs)]], this includes deciding number of interviews each enumerator will conduct in a day, number of field teams, modes of transport, and keeping extra buffer time for possible delays. Similarly, for the survey firm, this involves defining basic parameters like [[Sampling & Power Calculations|sample size]], [[Sampling|sampling strategy]], [[Timeline of Survey Pilot|timeline]], etc.
 
== Contract Survey Firm ==
After the [[Impact Evaluation Team|research team]] finalizes and issues the '''terms of reference (TOR)''', multiple [[Survey Firm|survey firms]] can express interest in signing a contract with the '''research team.''' The research team will then select one of these '''survey firms''', and sign a contract with the selected survey firm. This completes the process of [[Procuring a Survey Firm|procuring a survey firm]].
 
After signing the contract, the research team and the survey firm should agree on the parameters defined in the '''terms of reference (TOR)''', the '''survey timeline''', and discuss possible scenarios and common issues that might arise during data collection. One such issue that the research team and the survey firm must discuss in detail is the [[Data Quality Assurance Plan|data quality assurance plan]].
 
== Data Quality Assurance Plan ==
The [[Impact Evaluation Team|research team]] must draft a [[Data Quality Assurance Plan|data quality assurance plan]], and share it with everyone in the '''research team''', as well as the [[Survey Firm|survey firm]] before starting with data collection. A '''data quality assurance plan''' considers everything that could go wrong ahead of time, and makes a plan to resolve these issues. Some of the issues that can affect [[Monitoring Data Quality|data quality]] include errors in [[Questionnaire Programming|programming]] or [[Questionnaire Translation|translation]], '''attrition''' (or dropping out of respondents during a [[Field Surveys|survey]], and faulty '''tablets''' used during [[Computer-Assisted Personal Interviews (CAPI)|computer-assisted personal interviews (CAPI)]], among others. A comprehensive '''data quality assurance plan''' has 3 major components for each of the following stages - '''before''', '''during''', and '''after''' data collection.
=== Before data collection ===
Before data collection, the '''research team''' can include the following in the '''data quality assurance plan''':
* '''Survey design and programming.''' Make sure the [[Questionnaire Design|instrument design and structure]] are in line with the context of the study. Use the [[Survey Pilot|pilot]] to review and revise the '''instrument'''. Hire a professional translator to perform the [[Questionnaire Translation|translation]]. Check the [[Questionnaire Programming|programmed instrument]] for bugs, and make sure all '''skip patterns''' and '''repeat groups''' work properly.
 
* '''Enumerator training.''' [[Enumerator Training|Train enumerators]] and conduct regular feedback sessions with them to refine the [[Checklist: Content-focused Pilot|survey content]] and [[Checklist:_Piloting_Survey_Protocols|protocols]]. Wherever possible, conduct [[Survey Pilot#Pen-and-Paper Pilots|pen-and-paper pilots]], since in that case it is easier for '''enumerators''' to write down the issues they are facing. Make sure enumerators conduct several practice interviews before the actual [[Field Surveys|fieldwork]] starts.
 
=== During data collection ===
During data collection, the '''research team''' can include the following in the '''data quality assurance plan''':
* '''Communication and reporting.''' Clear communication is important to ensure that both '''enumerators''' and '''respondents''' are able to understand the questions in the survey instrument. It also allows [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinators]] to regularly discuss issues faces by '''enumerators'''. For instance, enumerators may face issues like faulty equipment or connectivity issues, which can affect the quality of data.
 
* '''Field monitoring.''' '''Field coordinators (FCs)''' and '''supervisors''' should monitor the performance of enumerators. There should be a clear list of parameters that supervisors will use to judge performance. They should also share useful feedback after (not during) the interviews, and ensure that respondents are able to understand the questions correctly. Ask supervisors to fill in a '''tracking sheet''' or a form that records observations about each enumerator who works under a supervisor.
 
* '''Minimize attrition.''' There are several reasons for '''attrition''' of respondents. For instance, it is possible that the respondent moved away from the location of the study, or refuses to participate. It is important to first identify the reason for '''attrition'''. Generally, attrition rates of more than 5% are considered poor, and the research team must try to resolve these issues. High attrition rates can affect the quality of data and introduce bias in the  results of a study.
 
* '''Back checks and real-time checks.''' At the same time, it is also important to constantly [[Monitoring Data Quality|monitor quality]] of every new round of data shared by the '''field teams'''. In a [[Back Checks|back check]], an experienced enumerator asks some selected questions to the respondent again to compare the answers. Similarly, supervisors can conduct real-time data quality checks (or '''spot checks'''), and [[Monitoring_Data_Quality#High_Frequency_Checks|high frequency checks]] to check the quality of responses.
 
=== After data collection ===
After data collection ends, the [[Survey Firm|survey firm]] usually provides a final '''field report'''. This report can be used to improve data quality in the last stage of the data collection process. It can provide qualitative information to the '''research team''' about everything that could not been captured by the survey instrument, such as :
* '''Issues in understanding.''' Sometimes respondents do not understand a question and answer randomly. This information is especially important for the research team if a study or experiment only shows marginal impact.
 
* '''Limited option choices.''' Sometimes respondents may convey that the option choices for particular questions were not comprehensive, which can also affect the quality of data.
 
* '''Other feedback.''' It also allows the research team to understand issues like size and structure of the communities that were part of the [[Sampling|sample]]. Such information is often useful to weight each group within a sample differently, which can improve accuracy of results.
 
== Obtain Ethical Approval ==
Members of the [[Impact Evaluation Team|research team]] must ensure that they [[Protecting Human Research Subjects|protect the rights]] of all '''human subjects''' in a study, including the '''right to privacy'''. In this context, all living individuals whose sensitive or [[Personally Identifiable Information (PII)|personally identifiable information (PII)]] is contained in the data collected by the '''research team''' are considered '''human subjects'''. In this step, the '''research team''' must consider issues like [[IRB Approval|IRB approvals]], [[Informed Consent|informed consent]], and [[Data Security|data security]].
 
=== IRB approvals ===
The [[Impact Evaluation Team|research team]] must obtain [[IRB Approval|IRB approvals]] for studies that use [[Personally Identifiable Information (PII)|personally identifiable information]]. '''Institutional review boards (IRBs)''' are organizations that review and monitor research studies ensure the welfare of '''human subjects'''.
 
In addition to '''IRB approvals''', the research team should also obtain approvals from local institutions in the location of the study. This will ensure that the study complies with local regulations, and does not violate any laws in that area, particular with respect to the '''right to privacy'''.
 
=== Informed consent ===
Before involving any individual in a research study, the [[Impact Evaluation Team|research team]] must obtain [[Informed Consent|informed consent]] from each individual. This means that the '''research team''' must clearly mention all possible risks and benefits from participating in a study, either for the [[Survey Pilot|survey pilot]] or as a [[Survey Pilot Participants|respondent]] in the actual data collection.
 
=== Data security ===
The [[Impact Evaluation Team|research team]] must understand the [[Research Ethics|ethics]] and rules for [[Data Security|data security]], and should use proper tools for [[Encryption | encryption]] and [[De-identification | de-identification]] of [[Personally Identifiable Information_(PII)|personally identifiable information (PII)]]. '''Data security''' also means ensuring that members of the '''research team''' who are not listed by the [[IRB Approval|IRB]] can not access any '''confidential data'''. Data can be '''confidential''' for multiple reasons, but the most common reason is that it contains '''personally identifiable information (PII)'''. Other reasons include that the data was shared under a data usage license that requires the data to be kept confidential.
 
== Train Enumerators ==
Finally, the [[Impact Evaluation Team|research team]] must plan and conduct comprehensive [[Enumerator Training|enumerator training]]. '''Enumerator training''' is usually a joint effort between the '''research team''' and the [[Survey Firm|survey firm]]. The [[Training Guidelines: Content and Structure|content and structure]] of the training can be divided into the following sections:
=== Objectives ===
The purpose of the training should be to ensure that all '''field staff''' know all the [[Survey Protocols|survey protocols]]. Also ensure that '''enumerators''' understand all questions in the [[Questionnaire Design|survey instrument]]. They should also be comfortable with using '''tablets''' used in [[Computer-Assisted Personal Interviews (CAPI)|CAPI]], or paper forms used for [[Pen-and-Paper Personal Interviews (PAPI)|PAPI]]. Finally, ensure that all '''field staff''' know and understand their duties.
 
=== Planning ===
In terms of planning, the '''survey firm''' should coordinate with the research team on logistics, such as deciding a venue for the training, printing '''field manuals''', questionnaires, and the agenda for each session. It is a good idea to recruit '''enumerators''' and experienced trainers in advance. The [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinator (FC)]] should finalize the '''field manual''', update the [[Enumerator Training#Enumerator Manual|training manual]], and also make sure the trainers are aware of the objectives and the context of the study. Finally, prepare quizzes for the assessments, and plan practice interviews for enumerators.
=== Components ===
Generally, the '''survey firm''' leads the training, while the '''field coordinator (FC)''' monitors the sessions. The training should explain the context and [[Piloting Survey Content|content]] of the questionnaire,  [[Experimental Methods|methods]] used for data collection, [[Sampling|sample selection]], and [[Survey Protocols|protocols]]. It is also important to anticipate potential issues that enumerators may face, and train them on how to handle these issues.
=== Assessment ===
Conduct assessments and quizzes to select enumerators for the actual data collection. Select enumerators based on scores on these quizzes, observations of the '''field coordinator (FC)''' and '''supervisors''', communication skills, and familiarity with the '''survey instrument'''. Always train more enumerators than you need for the data collection. Provide regular feedback during the training to ensure transparency.
=== Tips and ideas ===
Finally, follow '''best practices''' in training. Examples of these practices include taking notes during the sessions, recording training sessions so that enumerators can watch them again later, and regular practice interviews.
== Related Pages ==
[[Special:WhatLinksHere/Primary_Data_Collection|Click here for pages that link to this topic]].


== Additional Resources ==
== Additional Resources ==
* Oxfam, [http://policy-practice.oxfam.org.uk/publications/planning-survey-research-578973  Brief on Planning Survey Research]
* DIME Analytics (World Bank), [https://osf.io/u5evr Engage With Data Collectors]
* World Bank (DIME), [http://web.worldbank.org/archive/website01542/WEB/IMAGES/SURVEY.PDF Guide on Planning, Preparing & Monitoring Household Surveys]
* DIME Analytics (World Bank), [https://osf.io/357uv Design and Pilot a Survey]
* World Bank (DIME Analytics), [https://github.com/worldbank/DIME-Resources/blob/master/survey-preparing.pdf Guidelines on Preparing for Data Collection]
* DIME Analytics (World Bank), [https://osf.io/aqv2g Overview: Working with Survey Firms]
* Oxfam, [https://oxfamilibrary.openrepository.com/bitstream/handle/10546/620522/cs-going-digital-data-quality-data-collection-240718-en.pdf?sequence=1&isAllowed=y Case study on using electronic data collection (SurveyCTO) and Stata to improve data quality in the field]
* DIME Analytics (World Bank), [https://osf.io/ezm68 Overview of SurveyCTO at the World Bank]
 
* DIME Analytics (World Bank), [https://osf.io/t5kq3 SurveyCTO Resources]
* DIME Analytics (World Bank), [https://osf.io/un2hk SurveyCT0: Case Management and Advanced Offline Features]
* IPA-JPAL-SurveyCTO, [https://d37djvu3ytnwxt.cloudfront.net/assets/courseware/v1/30701099ebb94072fdfcf1ec96d8227a/asset-v1:MITx+JPAL102x+1T2017+type@asset+block/4.6_High_quality_data_accurate_data.pdf Collecting High Quality Data]
* Oxfam, [http://policy-practice.oxfam.org.uk/publications/planning-survey-research-578973 Planning Survey Research]
* Oxfam, [https://oxfamilibrary.openrepository.com/bitstream/handle/10546/620522/cs-going-digital-data-quality-data-collection-240718-en.pdf?sequence=1&isAllowed=y Case study: Improving data quality with digital data collection]
* SurveyCTO, [https://www.youtube.com/watch?v=tHb-3bnfRLo Data quality with SurveyCTO]
[[Category: Primary Data Collection ]]
[[Category: Primary Data Collection ]]
[[Category: Research Design]]

Latest revision as of 18:41, 5 July 2023

Primary data collection is the process of gathering data through surveys, interviews, or experiments. A typical example of primary data is household surveys. In this form of data collection, researchers can personally ensure that primary data meets the standards of quality, availability, statistical power and sampling required for a particular research question. With globally increasing access to specialized survey tools, survey firms, and field manuals, primary data has become the dominant source for empirical inquiry in development economics

Read First

Overview

While impact evaluations often benefit from secondary sources of data like administrative data, census data, or household data, these sources may not always be available. In such cases, the research team will need to collect data directly using well-designed interviews and surveys, and the research team typically owns the data that it collects. However, even then, the research team must keep in mind certain ethical concerns related to owning and handling sensitive, or personally identifiable information (PII).

Before moving on to the discussion of concerns about ownership and handling, however, it is important to understand the process of collecting primary data. The process of primary data collection consists of several steps, from questionnaire development, to enumerator training. Each of these steps are listed below, and require detailed planning, and coordination among the members of the research team.

Develop Questionnaire

The first step of primary data collection is to design a survey instrument (or questionnaire). It is important to remember that drafting a questionnaire from scratch can be a time-consuming process, so the research team should try to use existing resources as far as possible. While developing the questionnaire, keep the following things in mind:

  • Plan. The research team should start with a clear understanding of the theory of change for the project. List key outcomes of interest, and the main variables that can be used to measure these outcomes. A good starting point for this is the pre-analysis plan.
  • Modules. Divide the questionnaire into individual modules, each with a group of questions that are related to one aspect of the survey. Unless the context of the study is entirely new, perform a literature review of existing well-tested and reliable surveys to prepare the general structure of the questionnaire. One example of a resource for past studies and questionnaires is the World Bank Microdata Library.
  • Measurement challenges. Often, research teams face challenges in measuring certain outcomes, for instance, abstract concepts (like empowerment), or socially sensitive topics that people do not wish to talk about (like drug abuse). In such cases, try to use indicators that are easy to identify, or build a level of comfort with respondents before moving to the sensitive topics.
  • Translation. Translating the questionnaire is a very important step. The research team must hire only professional translators to translate the questionnaire into all local languages that are spoken in the study location.

Pilot Questionnaire

Survey pilot is the process of carrying out interviews and tests on different components of a survey, including content and protocols. A good pilot provides the research team with important feedback before they start the process of data collection. This feedback can help the research team review and improve instrument design, translations, as well as survey protocols related to interview scheduling, sampling, and geo data.

A pilot has three stages - pre-pilot, content-focused pilot, and data-focused pilot. Typically, the pilot is carried out before hiring a survey firm. The research team must draft a clear timeline for the pilot, and allocate enough time for each component of the pilot. DIME Analytics has also created the following checklists to assist researchers and enumerators in preparing for, and implementing a pilot:

Pilot Recruitment Strategy

Besides testing content and protocols, it is also important for the research team to pilot recruitment strategy before starting data collection. This is especially important in the following cases:

  • Smaller sample. Before finalizing the target population, the research team often surveys a smaller sample of the population that is selected for an intervention. The research team can use this to finalize the sampling frame (which is a list of individuals or units in a population from which a sample can be drawn), sample size, and statistical power based on which they can then start data collection.
  • Low or unknown take-up. Take-up rate is the percentage of eligible people who accept a benefit, or participate in data collection. Sometimes the take-up rates can be low, or unknown. In such cases, the research team should reconsider the sampling strategy, and test it before starting data collection.
  • Different participation rates. Sometimes participation can differ based on factors like gender, age, social status, etc. This requires the research to consider different strategies like stratified sampling.

One of the ways to test the recruitment strategy is to test 3 different recruitment strategies, say, A, B, and C. The research team can then finalize the strategy that has the highest take-up rates. Another method is identifying the ideal incentives which can ensure higher participation by the eligible population. After finalizing a recruitment strategy, the research team can move on to drafting the terms of reference (TOR) for the data collection.

TOR: Create Budget and Plan Fieldwork

After finalizing the survey instrument and recruitment strategy, the research team must prepare a detailed terms of reference (TOR) for hiring a survey firm. The terms of reference (TOR) define the structure of the project, as well as the responsibilities of the survey firm. While preparing the TOR, the research team must create a survey budget, and plan the fieldwork.

Create budget

The research team should calculate standard, as well as project-specific costs, and prepare a survey budget. In this stage, the research team should also consider what sample size it can afford for the data collection. This allows the research team to calculate expected costs of conducting a study, and compare these with the proposals of survey firms that respond to the terms of reference (TOR).

Plan fieldwork

It is also important to plan fieldwork in advance to give potential survey firms an idea of the responsibilities and tasks involved in the data collection. For the field coordinators (FCs), this includes deciding number of interviews each enumerator will conduct in a day, number of field teams, modes of transport, and keeping extra buffer time for possible delays. Similarly, for the survey firm, this involves defining basic parameters like sample size, sampling strategy, timeline, etc.

Contract Survey Firm

After the research team finalizes and issues the terms of reference (TOR), multiple survey firms can express interest in signing a contract with the research team. The research team will then select one of these survey firms, and sign a contract with the selected survey firm. This completes the process of procuring a survey firm.

After signing the contract, the research team and the survey firm should agree on the parameters defined in the terms of reference (TOR), the survey timeline, and discuss possible scenarios and common issues that might arise during data collection. One such issue that the research team and the survey firm must discuss in detail is the data quality assurance plan.

Data Quality Assurance Plan

The research team must draft a data quality assurance plan, and share it with everyone in the research team, as well as the survey firm before starting with data collection. A data quality assurance plan considers everything that could go wrong ahead of time, and makes a plan to resolve these issues. Some of the issues that can affect data quality include errors in programming or translation, attrition (or dropping out of respondents during a survey, and faulty tablets used during computer-assisted personal interviews (CAPI), among others. A comprehensive data quality assurance plan has 3 major components for each of the following stages - before, during, and after data collection.

Before data collection

Before data collection, the research team can include the following in the data quality assurance plan:

  • Enumerator training. Train enumerators and conduct regular feedback sessions with them to refine the survey content and protocols. Wherever possible, conduct pen-and-paper pilots, since in that case it is easier for enumerators to write down the issues they are facing. Make sure enumerators conduct several practice interviews before the actual fieldwork starts.

During data collection

During data collection, the research team can include the following in the data quality assurance plan:

  • Communication and reporting. Clear communication is important to ensure that both enumerators and respondents are able to understand the questions in the survey instrument. It also allows field coordinators to regularly discuss issues faces by enumerators. For instance, enumerators may face issues like faulty equipment or connectivity issues, which can affect the quality of data.
  • Field monitoring. Field coordinators (FCs) and supervisors should monitor the performance of enumerators. There should be a clear list of parameters that supervisors will use to judge performance. They should also share useful feedback after (not during) the interviews, and ensure that respondents are able to understand the questions correctly. Ask supervisors to fill in a tracking sheet or a form that records observations about each enumerator who works under a supervisor.
  • Minimize attrition. There are several reasons for attrition of respondents. For instance, it is possible that the respondent moved away from the location of the study, or refuses to participate. It is important to first identify the reason for attrition. Generally, attrition rates of more than 5% are considered poor, and the research team must try to resolve these issues. High attrition rates can affect the quality of data and introduce bias in the results of a study.
  • Back checks and real-time checks. At the same time, it is also important to constantly monitor quality of every new round of data shared by the field teams. In a back check, an experienced enumerator asks some selected questions to the respondent again to compare the answers. Similarly, supervisors can conduct real-time data quality checks (or spot checks), and high frequency checks to check the quality of responses.

After data collection

After data collection ends, the survey firm usually provides a final field report. This report can be used to improve data quality in the last stage of the data collection process. It can provide qualitative information to the research team about everything that could not been captured by the survey instrument, such as :

  • Issues in understanding. Sometimes respondents do not understand a question and answer randomly. This information is especially important for the research team if a study or experiment only shows marginal impact.
  • Limited option choices. Sometimes respondents may convey that the option choices for particular questions were not comprehensive, which can also affect the quality of data.
  • Other feedback. It also allows the research team to understand issues like size and structure of the communities that were part of the sample. Such information is often useful to weight each group within a sample differently, which can improve accuracy of results.

Obtain Ethical Approval

Members of the research team must ensure that they protect the rights of all human subjects in a study, including the right to privacy. In this context, all living individuals whose sensitive or personally identifiable information (PII) is contained in the data collected by the research team are considered human subjects. In this step, the research team must consider issues like IRB approvals, informed consent, and data security.

IRB approvals

The research team must obtain IRB approvals for studies that use personally identifiable information. Institutional review boards (IRBs) are organizations that review and monitor research studies ensure the welfare of human subjects.

In addition to IRB approvals, the research team should also obtain approvals from local institutions in the location of the study. This will ensure that the study complies with local regulations, and does not violate any laws in that area, particular with respect to the right to privacy.

Informed consent

Before involving any individual in a research study, the research team must obtain informed consent from each individual. This means that the research team must clearly mention all possible risks and benefits from participating in a study, either for the survey pilot or as a respondent in the actual data collection.

Data security

The research team must understand the ethics and rules for data security, and should use proper tools for encryption and de-identification of personally identifiable information (PII). Data security also means ensuring that members of the research team who are not listed by the IRB can not access any confidential data. Data can be confidential for multiple reasons, but the most common reason is that it contains personally identifiable information (PII). Other reasons include that the data was shared under a data usage license that requires the data to be kept confidential.

Train Enumerators

Finally, the research team must plan and conduct comprehensive enumerator training. Enumerator training is usually a joint effort between the research team and the survey firm. The content and structure of the training can be divided into the following sections:

Objectives

The purpose of the training should be to ensure that all field staff know all the survey protocols. Also ensure that enumerators understand all questions in the survey instrument. They should also be comfortable with using tablets used in CAPI, or paper forms used for PAPI. Finally, ensure that all field staff know and understand their duties.

Planning

In terms of planning, the survey firm should coordinate with the research team on logistics, such as deciding a venue for the training, printing field manuals, questionnaires, and the agenda for each session. It is a good idea to recruit enumerators and experienced trainers in advance. The field coordinator (FC) should finalize the field manual, update the training manual, and also make sure the trainers are aware of the objectives and the context of the study. Finally, prepare quizzes for the assessments, and plan practice interviews for enumerators.

Components

Generally, the survey firm leads the training, while the field coordinator (FC) monitors the sessions. The training should explain the context and content of the questionnaire, methods used for data collection, sample selection, and protocols. It is also important to anticipate potential issues that enumerators may face, and train them on how to handle these issues.

Assessment

Conduct assessments and quizzes to select enumerators for the actual data collection. Select enumerators based on scores on these quizzes, observations of the field coordinator (FC) and supervisors, communication skills, and familiarity with the survey instrument. Always train more enumerators than you need for the data collection. Provide regular feedback during the training to ensure transparency.

Tips and ideas

Finally, follow best practices in training. Examples of these practices include taking notes during the sessions, recording training sessions so that enumerators can watch them again later, and regular practice interviews.

Related Pages

Click here for pages that link to this topic.

Additional Resources