Primary Data Collection
Primary data collection is the process of gathering data through surveys, interviews, or experiments. A typical example of primary data is household surveys. In this form of data collection, researchers can personally ensure that primary data meets the standards of quality, availability, statistical power and sampling required for a particular research question. With globally increasing access to specialized survey tools, survey firms, and field manuals, primary data has become the dominant source for empirical inquiry in development economics.
- The DIME Research Standards provide a comprehensive checklist to ensure that collection and handling of research data is in line with global best practices.
- Field surveys are one of the most effective medium for primary data collection. Depending on the research question, these interviews may take the form of household surveys, business (firm) surveys, or agricultural (farm) surveys.
- The research team must plan and prepare for primary data collection in advance.
iefieldkitis a Stata package that aids primary data collection. It currently supports three major components of this process: testing survey instruments; survey completion; and data-cleaning and survey harmonization.
While impact evaluations often benefit from secondary sources of data like administrative data, census data, or household data, these sources may not always be available. In such cases, the research team will need to collect data directly using well-designed interviews and surveys, and the research team typically owns the data that it collects. However, even then, the research team must keep in mind certain ethical concerns related to owning and handling sensitive, or personally identifiable information (PII).
Before moving on to the discussion of concerns about ownership and handling, however, it is important to understand the process of collecting primary data. The process of primary data collection consists of several steps, from questionnaire development, to enumerator training. Each of these steps are listed below, and require detailed planning, and coordination among the members of the research team.
The first step of primary data collection is to design a survey instrument (or questionnaire). It is important to remember that drafting a questionnaire from scratch can be a time-consuming process, so the research team should try to use existing resources as far as possible. Start While developing the questionnaire, keep the following things in mind:
- Plan. The research team should start with a clear understanding of the theory of change for the project. List key outcomes of interest, and the main variables that can be used to measure these outcomes. A good starting point for this is the pre-analysis plan.
- Modules. Divide the questionnaire into individual modules, each with a group of questions that are related to one aspect of the survey. Unless the context of the study is entirely new, perform a literature review of existing well-tested and reliable surveys to prepare the general structure of the questionnaire. One example of a resource for past studies and questionnaires is the World Bank Microdata Library.
- Measurement challenges. Often, research teams face challenges in measuring certain outcomes, for instance, abstract concepts (like empowerment), or socially sensitive topics that people do not wish to talk about (like drug abuse). In such cases, try to use indicators that are easy to identify, or build a level of comfort with respondents before moving to the sensitive topics.
- Translation. Translating the questionnaire is a very important step. The research team must hire only professional translators to translate the questionnaire into all local languages that are spoken in the study location.
Survey pilot is the process of carrying out interviews and tests on different components of a survey, including content and protocols. A good pilot provides the research team with important feedback before they start the process of data collection. This feedback can help the research team review and improve instrument design, translations, as well as survey protocols related to interview scheduling, sampling, and geo data.
A pilot has three stages - pre-pilot, content-focused pilot, and data-focused pilot. Typically, the pilot is carried out before hiring a survey firm. The research team must draft a clear timeline for the pilot, and allocate enough time for each component of the pilot. DIME Analytics] has also created the following checklists to assist researchers and enumerators in preparing for, and implementing a pilot:
- Checklist: Preparing for a survey pilot
- Checklist: Refining questionnaire content
- Checklist: Refining questionnaire data
Pilot Recruitment Strategy
Besides testing content and protocols, it is also important for the research team to pilot recruitment strategy before starting data collection. This is especially important in the following cases:
- Smaller sample. Before finalizing the target population, the research team often surveys a smaller sample of the population that is selected for an intervention. The research team can use this to finalize the sampling frame (which is a list of individuals or units in a population from which a sample can be drawn), sample size, and statistical power based on which they can then start data collection.
- Low or unknown take-up. Take-up rate is the percentage of eligible people who accept a benefit, or participate in data collection. Sometimes the take-up rates can be low, or unknown. In such cases, the research team should reconsider the sampling strategy, and test it before starting data collection.
- Different participation rates. Sometimes participation can differ based on factors like gender, age, social status, etc. This requires the research to consider different strategies like stratified sampling.
One of the ways to do this is to test 3 different recruitment strategies, say, A, B, and C. The research team can then finalize the strategy that has the highest take-up rates. Another method is identifying the ideal incentives which can ensure higher participation by the eligible population.
TOR and Procurement
After finalizing the survey instrument and recruitment strategy, the research team must prepare terms of reference (TOR) and the survey budget. This step allows the research team to calculate expected costs of conducting a study, and compare these with the proposals of firms that submit an expression of interest (EOI). It involves the following steps:
- Create budget. The research team should calculate standard, as well as project-specific costs, and prepare a survey budget. In this stage, the research team should also consider what sample size it can afford for the data collection.
- Plan fieldwork. It is also important to plan fieldwork in advance to assess costs. For the field coordinators (FCs), this includes deciding number of interviews each enumerator will conduct in a day, number of field teams, modes of transport, and keeping extra buffer time for possible delays. Similarly, for the survey firm, this involves reviewing and finalizing basic parameters like sample size, sampling strategy, timeline, etc.
- Contract survey firm. Finally, the research team will procure a survey firm and sign a contract with the selected firm. After signing the contract, the research team and the survey firm should agree on the parameters defined in the terms of reference (TOR), the survey timeline, and discuss possible scenarios and common issues.
Data Quality Assurance Plan
The research team must draft a data quality assurance plan, and share it with everyone in the research team, as well as the survey firm before starting with data collection. A data quality assurance plan considers everything that could go wrong ahead of time, and makes a plan to resolve these issues. Some of the issues that can affect data quality include errors in programming or translation, attrition (or dropping out of respondents during a survey, and faulty tablets used during computer-assisted personal interviews (CAPI), among others. A comprehensive data quality assurance plan has 3 major components for each of the following stages - before, during, and after data collection.
Before data collection
Before data collection, the research team can include the following in the data quality assurance plan:
- Survey design and programming. Make sure the instrument design and structure are in line with the context of the study. Use the [[Survey Pilot|pilot] to review and revise the instrument. Hire a professional translator to perform the translation. Check the programmed instrument for bugs, and make sure all skip patterns and repeat groups work properly.
- Enumerator training. Train enumerators and conduct regular feedback sessions with them to refine the survey content and protocols. Wherever possible, conduct Pen-and-Paper Pilots|pen-and-paper pilots, since in that case it is easier for enumerators to write down the issues they are facing. Make sure enumerators conduct several practice interviews before the actual fieldwork starts.
During data collection
During data collection, the research team can include the following in the data quality assurance plan:
- Communication and reporting. Clear communication is important to ensure that both enumerators and respondents are able to understand the questions in the survey instrument. It also allows field coordinators to regularly discuss issues faces by enumerators. For instance, enumerators may face issues like faulty equipment or connectivity issues, which can affect the quality of data.
- Field monitoring. Field coordinators (FCs) and supervisors should monitor the performance of enumerators. There should be a clear list of parameters that supervisors will use to judge performance. They should also share useful feedback after (not during) the interviews, and ensure that respondents are able to understand the questions correctly. Ask supervisors to fill in a tracking sheet or a form that records observations about each enumerator who works under a supervisor.
- Minimize attrition. There are several reasons for attrition of respondents. For instance, it is possible that the respondent moved away from the location of the study, or refuses to participate. It is important to first identify the reason for attrition. Generally, attrition rates of more than 5% are considered poor, and the research team must try to resolve these issues. High attrition rates can affect the quality of data and introduce bias in the results of a study.
- Back checks and real-time checks. At the same time, it is also important to constantly monitor quality of every new round of data shared by the field teams. In a back check, an experienced enumerator asks some selected questions to the respondent again to compare the answers. Similarly, supervisors can conduct real-time data quality checks (or spot checks), and high frequency checks to check the quality of responses.
After data collection
Obtain Ethical Approval
There are strict rules about acquiring approval from human subjects. Researchers must understand the ethics and rules for security of sensitive data, and should use proper tools for encryption and de-identification of personally identifiable information (PII).
After validating the programming of the questionnaire, the researchers train enumerators and monitor data quality to generate a final draft of the instrument. Monitoring can be done in the form of back checks, high frequency checks, as well as other methods.
DIME Analytics has created a Stata command,
iefolder. Part of the DIME Analytics Stata package
ietoolkit , it helps increase project efficiency, and reduces the risk of error in a study.
Click here for pages that link to this topic.
- Oxfam, Brief on Planning Survey Research
- DIME (World Bank), Guide on Planning, Preparing & Monitoring Household Surveys
- DIME Analytics (World Bank), Guidelines on Preparing for Data Collection
- Oxfam, Case study on using electronic data collection (SurveyCTO) and Stata to improve data quality in the field