https://dimewiki.worldbank.org/api.php?action=feedcontributions&user=Maria+jones&feedformat=atomDimewiki - User contributions [en]2024-03-29T12:36:29ZUser contributionsMediaWiki 1.37.2https://dimewiki.worldbank.org/index.php?title=Enumerator_Training&diff=8280Enumerator Training2021-06-01T14:30:14Z<p>Maria jones: /* Quiz and test scores */</p>
<hr />
<div>'''Enumerator training''' is an extremely important part of the [[Primary Data Collection|primary data collection]], and should be planned in advance. It is a joint effort between the [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinators]], the [[Survey Firm|survey firm]], and other members of the [[Impact Evaluation Team|impact evaluation team]] (or '''research team'''). The research team must prepare and approve an '''enumerator manual''' (or '''field manual'''). The enumerator manual acts as the basis for the training content, and helps organize the training.<br />
<br />
== Read First == <br />
* The [[Impact Evaluation Team|research team]] should make sure all members of the [[Monitoring_Data_Quality#Communication_and_Reporting|field team]] are familiar with the [[Survey Protocols|survey protocols]] and [[Questionnaire Design|survey design]] by the end of the '''enumerator training'''.<br />
* Always train more enumerators than are required for the [[Primary Data Collection|field data collection]]. <br />
* Select the best enumerators at the end of the training, based on rigorous assessments.<br />
* The field team should follow the [[Enumerator Training#Scientific approach|scientific approach]] of enumerator training, and train enumerators to ensure [[Research Ethics#Confidentiality|confidentiality]] of respondents during the [[Field Surveys|survey]].<br />
* Broadly, the training can be divided into the following components - '''objectives''', '''planning''', '''content''', '''structure''', and '''enumerator assessment'''.<br />
<br />
== Training Objectives ==<br />
The [[Impact Evaluation Team|research team]] should use the '''enumerator training''' to provide the rest of the team members with a clear overview of the context, objectives, and relevance of the impact evaluation. A good, well-organized enumerator training deals with the following aspects: <br />
* '''Survey protocols''': The training should ensure that all members of the [[Monitoring_Data_Quality#Communication_and_Reporting|field team]] have a clear understanding of the [[Survey Protocols|survey protocols]]. The research team must [[Checklist: Piloting Survey Protocols|pilot all protocols]] well in advance, as part of [[Preparing for Field Data Collection|preparing for data collection]].<br />
<br />
* '''Survey instrument''': The research team must ensure that the all enumerators understand all the questions in the [[Questionnaire Design|survey instrument]]. The enumerators should also be able to use the tablets (in case of [[Computer-Assisted Personal Interviews (CAPI)|computer-assisted personal interviews (CAPI)]], or paper forms (in case of [[Pen-and-Paper Personal Interviews (PAPI)|pen-and-paper personal interviews (PAPI)]].<br />
<br />
* '''Key roles''': The training should also ensure that all members of the research team, [[Survey Firm|survey firm]], and the field team understand their roles and duties. This allows everyone to take responsibility of their tasks, and remain committed throughout the process of [[Primary Data Collection|data collection]]. For instance, the survey firm executes the tasks involved in data collection, while the [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinators (FCs)]] supervise these tasks, and ensure quality of the work done by enumerators and the survey firm. Similarly, the [[Impact Evaluation Team#Research Assistants (RAs)|research assistants (RAs)]] provide support in preparing the [[Data Quality Assurance Plan|data quality assurance plan]].<br />
<br />
== Planning == <br />
Before starting with '''enumerator training''', it is important for everyone involved in the [[Primary Data Collection|data collection]] to be aware of their roles and responsibilities. '''Planning''' is a continuous process that requires constant interaction between the [[Survey Firm|survey firm]] and the [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinators (FCs)]]. This stage has the following components:<br />
=== Logistics and recruitment ===<br />
The survey firm is responsible for coordinating '''logistics''' and '''recruitment''', which includes: <br />
* Finalizing the training venue. <br />
* Providing materials like printed [[Questionnaire Design|questionnaires]] (or '''survey forms''') and training agenda.<br />
* Providing tablets, pens, and notebooks. <br />
* Hiring potential enumerators and skilled '''supervisors''' to help with the training. <br />
In this process, the survey firm should coordinate with '''field coordinators (FCs)''' to understand the [[Theory of Change|context]] of the impact evaluation, and become familiar with the [[Questionnaire Design|questionnaire content]].<br />
<br />
=== Train support staff ===<br />
In the context of '''enumerator training''', the following people are considered a part of the '''support staff''' - survey facilitators, survey firm managers, and potential supervisors. The '''field coordinators (FCs)''' are responsible for training support staff to make sure they are familiar with various aspects of the project, including the [[Theory of Change|context of the study]], the [[Questionnaire Design|questionnaire content]], and the potential [[Survey Protocols|survey protocols]]. The support staff can then work with the survey firm during the actual enumerator training.<br />
<br />
=== Enumerator manual ===<br />
An '''enumerator manual''' (or '''field manual''') is extremely important because it is the primary resource used during the '''enumerator training'''. It also acts as an important resource for enumerators during the [[Field Surveys|field survey]]. Field manuals contain all [[Survey Protocols|field protocols]], provide crucial guidelines to the survey firm, and also provide [[Training Guidelines: Content and Structure|content]] for the training Refer to '''Figure 1''' below for a '''field manual template'''. A comprehensive field manual should list the following:<br />
* '''Study objectives:''' The field manual should briefly explain the purpose of the study, and the possible outcomes that the [[Impact Evaluation Team|research team]] hopes to achieve. This provides enumerators and field teams a good reference during the actual [[Field Surveys|field interview]], and helps them understand their roles more clearly.<br />
* '''Roles and responsibilities:''' The field manual should also list the roles and responsibilities of each member in the field team. This allows field staff to take more responsibility for their work, and perform their tasks efficiently.<br />
<br />
* '''Survey protocols:''' [[Survey Protocols|Survey protocols]] play an important role in [[Monitoring Data Quality|ensuring high data quality]] in the field. The field manual should list all protocols, along with examples that explain the importance of following these protocols.<br />
<br />
* '''Key terms:''' The field manual should clearly define all key terms that are used in the questionnaire, as well as throughout the field manual. Key terms include common acronyms like '''open data kit (ODK)''', and technical terms like [[Sampling|sampling frames]].<br />
<br />
* '''Instructions:''' The field manual should also provide detailed instructions on how to operate and [[Training_Guidelines:_Content_and_Structure#Using_tablets|use tablets]] during the [[Field Surveys|field interviews]]. This also helps to ensure consistency during the [[Primary Data Collection|data collection]], and [[Data Quality Assurance Plan|improve data quality]].<br />
<br />
* '''Description of questions:''' The field manual should also explain the questions that are part of the [[Questionnaire Design|questionnaire]], along with common rules and methods for asking questions during the [[Field Surveys|field interview]].<br />
<br />
* '''Frequently asked questions (FAQs):''' Finally, the field manual should include a list of '''frequently asked questions (FAQs)'''. These are questions that often come up during the training sessions, and help to resolve common doubts that may arise during [[Field Surveys|fieldwork]].<br />
<br />
[[File:Fieldmanualtemplate.png|500px|thumb|center|'''Figure 1: Template for developing a field manual ''']]<br />
<br />
=== Finalize training time frame ===<br />
The '''training time frame''' refers to the duration of the '''enumerator training''', and depends on factors like:<br />
* '''Length and complexity of the questionnaire.''' If the questionnaire is longer, and is more complex (that is, has several modules, and many [https://docs.surveycto.com/02-designing-forms/02-additional-topics/02.repeats.html#:~:text=Use%20repeat%20groups%20to%20ask,those%20filling%20out%20your%20form. repeat groups]), then the training will also need to be longer to make sure enumerators are comfortable with the questionnaire.<br />
* '''Capacity of potential enumerators.''' If the potential enumerators are more experienced, the duration of the training will be shorter, compared to a situation where the potential enumerators have less experience.<br />
* '''Complexity of study design.''' Again, if the study itself is based on a complex [[Theory of Change|theory of change]], or is trying to answer questions that were not a part of any previous studies, the training too will have to be longer to explain the objectives and [[Survey Protocols|protocols]].<br />
<br />
Further, keep the following points in mind when deciding the '''time frame''':<br />
* '''Sufficient rest.''' Include sufficient time to rest after the sessions.<br />
* '''Practice sessions.''' Include extra days for practice in the classrooms, as well as in the field.<br />
* '''Extra day for enumerator selection.''' Include at least one day for the process of selecting enumerators for the actual [[Field Surveys|survey]]. <br />
* '''Field manual.''' Use the field manual as a guide for finalizing the time frame, since the manual contains all information about the study and its various aspects.<br />
<br />
== Assessing Enumerators ==<br />
After the '''enumerator training''' is complete, the [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinators (FCs)]], the [[Survey Firm|survey firm]], and the '''supervisors''' should coordinate to conduct the '''enumerator assessment'''. Always train more enumerators than are needed for the actual [[Field Surveys|survey]] (or interview). This motivates enumerators to perform better. It also ensures that a group of qualified enumerators are available as backup in case a few enumerators are unable to conduct the interviews on a given day. The following are the main criteria for '''enumerator selection''':<br />
* '''Scores on regular quizzes and final test'''<br />
* '''Field practice'''<br />
* '''Participation'''<br />
* '''Interpersonal skills'''<br />
* '''Previous experience'''<br />
=== Quiz and test scores ===<br />
Although '''quiz and test scores''' may seem like a highly academic criteria of enumerator assessment, they offer important feedback which field teams can use to improve the training sessions. It is important to reassure the enumerators that the frequent quizzes act more as measure of how much progress each enumerator has made during the enumerator training, and less as a measure of performance. Be creative while preparing the final test. The final test should assess the following:<br />
<br />
* '''Understanding of materials.''' Use the regular quizzes and final test to assess how well an enumerator has understood the training materials such as field manuals, [[Survey Protocols|protocols]], and [[Training Guidelines: Content and Structure|standard guidelines]] for conducting interviews.<br />
<br />
* '''Reading skills in different languages.''' The quizzes should also evaluate basic reading skills in the relevant languages. This also includes assessing familiarity of enumerators with the [[Questionnaire Translation|translated versions]] of the questionnaire in various languages. For example, if the questions are to be asked in English and Hindi, it is important to ensure that enumerators are able to read out the questions in both languages during the interview. <br />
<br />
* '''Understanding of questions.''' The quizzes should also assess if enumerators are able to understand the meaning and relevance of certain questions. For example, in a COVID-19 pulse survey, it would be important for the [[Impact Evaluation Team|research team]] to assess how households are preparing to deal with the economic and health-related consequences of COVID-19. In this case, for a question about how households assess the threat of COVID-19, enumerators must be able to understand the question themselves, before asking the respondents. Further, enumerators should be able to explain how respondents can answer the question using a scale from 1 to 5, with 1 meaning '''"No threat"''', and 5 meaning '''"Severe threat"'''. <br />
<br />
* '''Numeracy skills.''' The quiz must also assess basic numeracy skills (like counting, adding, etc.) of enumerators, including the ability to enter responses on a [[Training_Guidelines:_Content_and_Structure#Using_tablets|tablet]].<br />
<br />
'''NOTE:''' The research team and the field staff should also keep the following things in mind while designing the final test:<br />
* '''Be well-organized.''' Create all quizzes and the final test before the start of enumerator training. Edit the quizzes based on observations during the training sessions.<br />
<br />
* '''Conduct regular quizzes as well as a final test.''' Carry out one quiz per day to test understanding of topics covered on the previous day. Correct the quizzes quickly, ideally on the same day. Share feedback simultaneously, to allow enumerators to correct their mistakes and fill any gaps in their understanding. The final test should provide a comprehensive assessment of the training content. It is useful to adapt the final test based on earlier quizzes, focusing in on areas where enumerators scored poorly on the quizzes. <br />
<br />
* '''Be transparent.''' Inform enumerators before the start of training that they will be required to take regular quizzes and a final test. Share the skills that they should focus on in order to fully utilize the training. This also allows enumerators to concentrate better during sessions.<br />
<br />
* '''Encourage enumerators.''' Quizzes can often be stressful, so motivate enumerators throughout the training. Inform the enumerators that their scores in these quizzes are only one of the several criteria for evaluating them. Provide constructive feedback to enumerators after sharing results of each quiz.<br />
<br />
=== Field practice ===<br />
'''Field practice''' is a very important part of '''enumerator training''', as it allows enumerators and the rest of the field staff to test [[Survey Protocols|survey protocols]], as well as the [[Checklist:_Content-focused_Pilot|survey content]]. '''Field practice''' can take the following forms:<br />
* '''Team exercises:''' These involve enumerators getting together to practice [[Questionnaire_Design#Modules|questionnaire modules]] in pairs or in groups.<br />
<br />
* '''Mock interviews:''' Trainers can also conduct '''mock interviews''' with enumerators, and holding discussion sessions afterwards.<br />
<br />
* '''Pilot interviews with administrative officials:''' In some cases, it is also useful to conduct practice interviews with government officials. Data from these interviews is not included in the final dataset. These are only meant to provide feedback which is especially useful when the field team is not familiar with the culture and social norms in the location of the study.<br />
<br />
Keep the following things in mind regarding '''field practice''': <br />
* '''Plan regular sessions.''' Plan field practice sessions in such a manner that potential enumerators conducts at least one practice interview per person.<br />
<br />
* '''Use facilitators to monitor.''' Split the enumerators into groups, and assign one experienced enumerators as a '''facilitator''' in each group. The facilitators can monitor and observe interviews to ensure that enumerators are following all protocols. <br />
<br />
* '''Keep regular feedback sessions.''' The facilitators should take notes on each enumerator's performances during practice, and share their comments at the end of the day. <br />
<br />
'''NOTE:''' The following is an '''observation checklist''' which facilitators can use to observe enumerators during field practice:<br />
*'''Use of proper equipment:''' It is important to ensure that each enumerator is using the proper equipment during the practice interview. For example, a pen and a notebook in case of a [[Pen-and-Paper Personal Interviews (PAPI)|pen-and-paper interview (PAPI)]], and tablets in case of a [[Computer-Assisted Personal Interviews (CAPI)|computer-assisted personal interview (CAPI)]].<br />
* '''Selection of the correct respondent:''' It is important to ensure that the enumerator selectes the corrected respondent during practice. Interviewing incorrect respondents can cause serious problems for [[Monitoring Data Quality|data quality]], such as '''duplicates''', '''missing values''', and even '''outliers'''.<br />
* '''Proper introduction:''' The enumerators should be able to correctly and concisely introduce the survey objectives to the respondent. They must also be able to read out the [[Informed Consent|informed consent form]] in the language spoken by the respondent.<br />
* '''Introductory sentences:''' The enumerators should be able to read the introductory sentences before questions (if any), as well as all questions correctly in the language spoken by the respondent.<br />
* '''Clarifications and polite probing:''' Sometimes a respondent may have trouble understand a question, or their answer might not be satisfactory. In such situations, the enumerator should be able to clarify as and when required. They must also be able to probe the respondent in a polite manner, and at the right time.<br />
* '''Familiarity with the survey and confidence:''' The facilitators must also check how familiar the enumerators are with the [[Questionnaire Design|survey content]]. Also evaluate the level of confidence in practice sessions for each enumerator.<br />
* '''Language proficiency:''' Check the level of comfort and fluency of enumerators in each of the languages used in the questionnaire.<br />
* '''Interactions with the respondent:''' Evaluate enumerators on their interactions with the respondent, both verbally and non-verbally. The enumerators should be polite and respectful throughout.<br />
* '''Patience and attention to detail:''' The enumerators should answer all follow-up questions from respondents patiently. Evaluate enumerators on their attention to detail and whether they stick to all protocols.<br />
* '''Creating a conducive environment:''' Evaluate enumerators on whether they make the respondents feel comfortable during the interview. The enumerator should reassure the respondent that the impact evaluation study will [[Protecting Human Research Subjects|protect their rights]], including their '''right to privacy'''.<br />
<br />
=== Participation ===<br />
Another criteria to evaluate enumerator performance is their '''participation''' throughout the training sessions. '''Facilitators''' should observe each enumerator and take regular notes. They can score enumerators on a scale of 1 to 5, where '''"1 = Poor"''', '''"2 = Weak"''' '''"3 = Average"''', '''"4 = Strong"''', and '''"5 = Excellent"'''. Some criteria to evaluate '''participation''' are:<br />
* '''Punctuality:''' Facilitators should keep note of enumerators who are punctual for training sessions. This also encourages enumerators to take the sessions seriously.<br />
<br />
* '''Active participation and initiative:''' This includes awarding a higher score to enumerators who take part in classroom discussions, and take initiative to improve in areas where they might be weaker.<br />
<br />
* '''Attitude and integrity:''' The attitude of enumerators during the training is also a very important aspect. Facilitators should take note of, and award higher scores to enumerators who are eager to learn and correct mistakes, and respond positively to feedback after quizzes and training sessions.<br />
<br />
* '''Team work:''' The field team often faces various challenging situations in the field. The training sessions are therefore a good time to think about creating a positive atmosphere in the team. Award higher scores to enumerators who work well in a team, and are willing to help their teammates with any issues they might face.<br />
<br />
* '''Communication skills:''' Good communication skills are also an extremely quality important for enumerators. Enumerators should be able to convey their issues, clarify any doubts they face, and participate in review sessions to improve the overall quality of the [[Field Surveys|survey]].<br />
<br />
== Best Practices ==<br />
It is important to convey the the importance of their role in the research to the enumerators. This will allow enumerators to take ownership of the project, which is essential for ensuring that enumerators remain committed throughout the duration of the [[Primary Data Collection|data collection]]. This will be particularly important for projects lasting several weeks, or projects that involving a significant amount of travel, since these can result in enumerator fatigue. In this regard, it is important to follow certain '''best practices'''.<br />
<br />
=== Scientific approach===<br />
The most important quality of a good survey team is a commitment to the '''scientific method'''. A scientific method is the standard approach for surveys in order to produce concrete and valuable results that can be easily defended during peer-reviews. Clarify to enumerators that the scientific approach means that they are committed to identifying the actual situation on the ground, and not one that appears to exist because of errors in the way opinions are taken into account.<br />
<br />
In development research, the only way that a comparison between different groups is possible is to use the same survey method for all respondents. This means that the following must be as similar as possible for each group:<br />
* '''Process''': Introduce the process in the same way to each group.<br />
* '''Anonymity''': Make people in each group feel equally comfortable that the process is anonymous.<br />
* '''Confidentiality''': Make people feel that their responses will be confidential.<br />
* '''Duration''': Give each group roughly the same amount of time to fill in the questionnaire.<br />
* '''Discussions''': Guide the discussion session in a similar manner for each group.<br />
* '''Collection and filing''': Collect and file all the questionnaires systematically. <br />
<br />
If we deviate from this approach, for example by treating one group differently from all the others, we won’t be able to tell if the differences between that group and the others are because of actual differences, or simply because we failed to use the same survey method for the groups.<br />
<br />
'''NOTE:''' Finally, there will still be times when a particular situation is different from the circumstances of what is considered an ideal interview. In such cases, it is possible that the FAQs compiled by the team might not also be of any help. To ensure that enumerators are prepared for such situations, include sessions to train enumerators on the various aspects of study methodology and approach.<br />
<br />
===Confidentiality and anonymity===<br />
One of the key selling points of the interview for many respondents will be a commitment to [[De-identification|anonymizing]] all interviews and safeguarding respondents’ [[Research_Ethics#Confidentiality|confidentiality]]. Tablets are a very useful tool in helping enumerators achieve this. However, enumerators must also ensure that all interactions with respondents meet the strictest criteria for confidentiality. This entails:<br />
* Not disclosing any opinions, claims, and other features that can be associated with individuals.<br />
* Using confidential information only for the purposes set out in the training, and not for any other purpose.<br />
* Not copying or retaining any written information or record that could be associated with identifying features of individuals, or any other kind of [[Personally Identifiable Information (PII)|identifying information]].<br />
* Returning all confidential information (including notes, memos, photographs) to the survey team at the conclusion of the surveys, or when demanded by the survey team.<br />
* Not disclosing any confidential information to any employee, consultant or third party unless it has been approved by the survey team.<br />
<br />
===Interview practice and field testing===<br />
Before going out in the field, it is important that all enumerators practice interviewing at least twice. This helps them become familiar with the questionnaire, and also allows them to receive feedback on their interviewing skills. It is normal for the first few interviews conducted by each enumerator to be of a lesser quality, so it is important to discard these and not include them in the main dataset.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Primary Data Collection]]<br />
<br />
== Additional Resources ==<br />
* DIME Analytics (World Bank), [https://osf.io/wb86g/ Training Data Collectors]<br />
* DIME Analytics (World Bank), [https://osf.io/n7ctd/ SurveyCTO Guide For Data Collectors]<br />
[[Category: Research Design]]<br />
[[Category: Primary Data Collection]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Enumerator_Training&diff=8279Enumerator Training2021-06-01T14:28:33Z<p>Maria jones: /* Assessing Enumerators */</p>
<hr />
<div>'''Enumerator training''' is an extremely important part of the [[Primary Data Collection|primary data collection]], and should be planned in advance. It is a joint effort between the [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinators]], the [[Survey Firm|survey firm]], and other members of the [[Impact Evaluation Team|impact evaluation team]] (or '''research team'''). The research team must prepare and approve an '''enumerator manual''' (or '''field manual'''). The enumerator manual acts as the basis for the training content, and helps organize the training.<br />
<br />
== Read First == <br />
* The [[Impact Evaluation Team|research team]] should make sure all members of the [[Monitoring_Data_Quality#Communication_and_Reporting|field team]] are familiar with the [[Survey Protocols|survey protocols]] and [[Questionnaire Design|survey design]] by the end of the '''enumerator training'''.<br />
* Always train more enumerators than are required for the [[Primary Data Collection|field data collection]]. <br />
* Select the best enumerators at the end of the training, based on rigorous assessments.<br />
* The field team should follow the [[Enumerator Training#Scientific approach|scientific approach]] of enumerator training, and train enumerators to ensure [[Research Ethics#Confidentiality|confidentiality]] of respondents during the [[Field Surveys|survey]].<br />
* Broadly, the training can be divided into the following components - '''objectives''', '''planning''', '''content''', '''structure''', and '''enumerator assessment'''.<br />
<br />
== Training Objectives ==<br />
The [[Impact Evaluation Team|research team]] should use the '''enumerator training''' to provide the rest of the team members with a clear overview of the context, objectives, and relevance of the impact evaluation. A good, well-organized enumerator training deals with the following aspects: <br />
* '''Survey protocols''': The training should ensure that all members of the [[Monitoring_Data_Quality#Communication_and_Reporting|field team]] have a clear understanding of the [[Survey Protocols|survey protocols]]. The research team must [[Checklist: Piloting Survey Protocols|pilot all protocols]] well in advance, as part of [[Preparing for Field Data Collection|preparing for data collection]].<br />
<br />
* '''Survey instrument''': The research team must ensure that the all enumerators understand all the questions in the [[Questionnaire Design|survey instrument]]. The enumerators should also be able to use the tablets (in case of [[Computer-Assisted Personal Interviews (CAPI)|computer-assisted personal interviews (CAPI)]], or paper forms (in case of [[Pen-and-Paper Personal Interviews (PAPI)|pen-and-paper personal interviews (PAPI)]].<br />
<br />
* '''Key roles''': The training should also ensure that all members of the research team, [[Survey Firm|survey firm]], and the field team understand their roles and duties. This allows everyone to take responsibility of their tasks, and remain committed throughout the process of [[Primary Data Collection|data collection]]. For instance, the survey firm executes the tasks involved in data collection, while the [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinators (FCs)]] supervise these tasks, and ensure quality of the work done by enumerators and the survey firm. Similarly, the [[Impact Evaluation Team#Research Assistants (RAs)|research assistants (RAs)]] provide support in preparing the [[Data Quality Assurance Plan|data quality assurance plan]].<br />
<br />
== Planning == <br />
Before starting with '''enumerator training''', it is important for everyone involved in the [[Primary Data Collection|data collection]] to be aware of their roles and responsibilities. '''Planning''' is a continuous process that requires constant interaction between the [[Survey Firm|survey firm]] and the [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinators (FCs)]]. This stage has the following components:<br />
=== Logistics and recruitment ===<br />
The survey firm is responsible for coordinating '''logistics''' and '''recruitment''', which includes: <br />
* Finalizing the training venue. <br />
* Providing materials like printed [[Questionnaire Design|questionnaires]] (or '''survey forms''') and training agenda.<br />
* Providing tablets, pens, and notebooks. <br />
* Hiring potential enumerators and skilled '''supervisors''' to help with the training. <br />
In this process, the survey firm should coordinate with '''field coordinators (FCs)''' to understand the [[Theory of Change|context]] of the impact evaluation, and become familiar with the [[Questionnaire Design|questionnaire content]].<br />
<br />
=== Train support staff ===<br />
In the context of '''enumerator training''', the following people are considered a part of the '''support staff''' - survey facilitators, survey firm managers, and potential supervisors. The '''field coordinators (FCs)''' are responsible for training support staff to make sure they are familiar with various aspects of the project, including the [[Theory of Change|context of the study]], the [[Questionnaire Design|questionnaire content]], and the potential [[Survey Protocols|survey protocols]]. The support staff can then work with the survey firm during the actual enumerator training.<br />
<br />
=== Enumerator manual ===<br />
An '''enumerator manual''' (or '''field manual''') is extremely important because it is the primary resource used during the '''enumerator training'''. It also acts as an important resource for enumerators during the [[Field Surveys|field survey]]. Field manuals contain all [[Survey Protocols|field protocols]], provide crucial guidelines to the survey firm, and also provide [[Training Guidelines: Content and Structure|content]] for the training Refer to '''Figure 1''' below for a '''field manual template'''. A comprehensive field manual should list the following:<br />
* '''Study objectives:''' The field manual should briefly explain the purpose of the study, and the possible outcomes that the [[Impact Evaluation Team|research team]] hopes to achieve. This provides enumerators and field teams a good reference during the actual [[Field Surveys|field interview]], and helps them understand their roles more clearly.<br />
* '''Roles and responsibilities:''' The field manual should also list the roles and responsibilities of each member in the field team. This allows field staff to take more responsibility for their work, and perform their tasks efficiently.<br />
<br />
* '''Survey protocols:''' [[Survey Protocols|Survey protocols]] play an important role in [[Monitoring Data Quality|ensuring high data quality]] in the field. The field manual should list all protocols, along with examples that explain the importance of following these protocols.<br />
<br />
* '''Key terms:''' The field manual should clearly define all key terms that are used in the questionnaire, as well as throughout the field manual. Key terms include common acronyms like '''open data kit (ODK)''', and technical terms like [[Sampling|sampling frames]].<br />
<br />
* '''Instructions:''' The field manual should also provide detailed instructions on how to operate and [[Training_Guidelines:_Content_and_Structure#Using_tablets|use tablets]] during the [[Field Surveys|field interviews]]. This also helps to ensure consistency during the [[Primary Data Collection|data collection]], and [[Data Quality Assurance Plan|improve data quality]].<br />
<br />
* '''Description of questions:''' The field manual should also explain the questions that are part of the [[Questionnaire Design|questionnaire]], along with common rules and methods for asking questions during the [[Field Surveys|field interview]].<br />
<br />
* '''Frequently asked questions (FAQs):''' Finally, the field manual should include a list of '''frequently asked questions (FAQs)'''. These are questions that often come up during the training sessions, and help to resolve common doubts that may arise during [[Field Surveys|fieldwork]].<br />
<br />
[[File:Fieldmanualtemplate.png|500px|thumb|center|'''Figure 1: Template for developing a field manual ''']]<br />
<br />
=== Finalize training time frame ===<br />
The '''training time frame''' refers to the duration of the '''enumerator training''', and depends on factors like:<br />
* '''Length and complexity of the questionnaire.''' If the questionnaire is longer, and is more complex (that is, has several modules, and many [https://docs.surveycto.com/02-designing-forms/02-additional-topics/02.repeats.html#:~:text=Use%20repeat%20groups%20to%20ask,those%20filling%20out%20your%20form. repeat groups]), then the training will also need to be longer to make sure enumerators are comfortable with the questionnaire.<br />
* '''Capacity of potential enumerators.''' If the potential enumerators are more experienced, the duration of the training will be shorter, compared to a situation where the potential enumerators have less experience.<br />
* '''Complexity of study design.''' Again, if the study itself is based on a complex [[Theory of Change|theory of change]], or is trying to answer questions that were not a part of any previous studies, the training too will have to be longer to explain the objectives and [[Survey Protocols|protocols]].<br />
<br />
Further, keep the following points in mind when deciding the '''time frame''':<br />
* '''Sufficient rest.''' Include sufficient time to rest after the sessions.<br />
* '''Practice sessions.''' Include extra days for practice in the classrooms, as well as in the field.<br />
* '''Extra day for enumerator selection.''' Include at least one day for the process of selecting enumerators for the actual [[Field Surveys|survey]]. <br />
* '''Field manual.''' Use the field manual as a guide for finalizing the time frame, since the manual contains all information about the study and its various aspects.<br />
<br />
== Assessing Enumerators ==<br />
After the '''enumerator training''' is complete, the [[Impact Evaluation Team#Field Coordinators (FCs)|field coordinators (FCs)]], the [[Survey Firm|survey firm]], and the '''supervisors''' should coordinate to conduct the '''enumerator assessment'''. Always train more enumerators than are needed for the actual [[Field Surveys|survey]] (or interview). This motivates enumerators to perform better. It also ensures that a group of qualified enumerators are available as backup in case a few enumerators are unable to conduct the interviews on a given day. The following are the main criteria for '''enumerator selection''':<br />
* '''Scores on regular quizzes and final test'''<br />
* '''Field practice'''<br />
* '''Participation'''<br />
* '''Interpersonal skills'''<br />
* '''Previous experience'''<br />
=== Quiz and test scores ===<br />
Although '''quiz and test scores''' may seem like a highly academic criteria of enumerator assessment, they offer important feedback which field teams can use to improve the training sessions. It is important to reassure the enumerators that the frequent quizzes act more as measure of how much progress each enumerator has made during the enumerator training, and less as a measure of performance. Be creative while preparing the final test. The final test should assess the following:<br />
<br />
* '''Understanding of materials.''' Use the quiz to get an idea of how well an enumerator has understood the training materials such as field manuals, [[Survey Protocols|protocols]], and [[Training Guidelines: Content and Structure|standard guidelines]] for conducting interviews.<br />
<br />
* '''Reading skills in different languages.''' The quiz should also evaluate basic reading skills in the relevant languages. This also includes assessing familiarity of enumerators with the [[Questionnaire Translation|translated versions]] of the questionnaire in various languages. For example, if the questions are to be asked in English and Hindi, it is important to ensure that enumerators are able to read out the questions in both languages during the interview. <br />
<br />
* '''Understanding of questions.''' The quiz should also assess if enumerators are able to understand the meaning and relevance of certain questions. For example, in a COVID-19 pulse survey, it would be important for the [[Impact Evaluation Team|research team]] to assess how households are preparing to deal with the economic and health-related consequences of COVID-19. In this case, for a question about how households assess the threat of COVID-19, enumerators must be able to understand the question themselves, before asking the respondents. Further, enumerators should be able to explain how respondents can answer the question using a scale from 1 to 5, with 1 meaning '''"No threat"''', and 5 meaning '''"Severe threat"'''. <br />
<br />
* '''Numeracy skills.''' The quiz must also assess basic numeracy skills (like counting, adding, etc.) of enumerators, including the ability to enter responses on a [[Training_Guidelines:_Content_and_Structure#Using_tablets|tablet]].<br />
<br />
'''NOTE:''' The research team and the field staff should also keep the following things in mind while designing the final test:<br />
* '''Be well-organized.''' Create all quizzes and the final test before the start of enumerator training. Edit the quizzes based on observations during the training sessions.<br />
<br />
* '''Conduct regular quizzes as well as a final test.''' Carry out one quiz per day to test understanding of topics covered on the previous day. Correct the quizzes quickly, ideally on the same day. Share feedback simultaneously, to allow enumerators to correct their mistakes and fill any gaps in their understanding. The final test should provide a comprehensive assessment of the training content. <br />
<br />
* '''Be transparent.''' Inform enumerators before the start of training that they will be required to take regular quizzes and a final test. Share the skills that they should focus on in order to fully utilize the training. This also allows enumerators to concentrate better during sessions.<br />
<br />
* '''Encourage enumerators.''' Quizzes can often be stressful, so motivate enumerators throughout the training. Inform the enumerators that their scores in these quizzes are only one of the several criteria for evaluating them. Provide constructive feedback to enumerators after sharing results of each quiz.<br />
<br />
=== Field practice ===<br />
'''Field practice''' is a very important part of '''enumerator training''', as it allows enumerators and the rest of the field staff to test [[Survey Protocols|survey protocols]], as well as the [[Checklist:_Content-focused_Pilot|survey content]]. '''Field practice''' can take the following forms:<br />
* '''Team exercises:''' These involve enumerators getting together to practice [[Questionnaire_Design#Modules|questionnaire modules]] in pairs or in groups.<br />
<br />
* '''Mock interviews:''' Trainers can also conduct '''mock interviews''' with enumerators, and holding discussion sessions afterwards.<br />
<br />
* '''Pilot interviews with administrative officials:''' In some cases, it is also useful to conduct practice interviews with government officials. Data from these interviews is not included in the final dataset. These are only meant to provide feedback which is especially useful when the field team is not familiar with the culture and social norms in the location of the study.<br />
<br />
Keep the following things in mind regarding '''field practice''': <br />
* '''Plan regular sessions.''' Plan field practice sessions in such a manner that potential enumerators conducts at least one practice interview per person.<br />
<br />
* '''Use facilitators to monitor.''' Split the enumerators into groups, and assign one experienced enumerators as a '''facilitator''' in each group. The facilitators can monitor and observe interviews to ensure that enumerators are following all protocols. <br />
<br />
* '''Keep regular feedback sessions.''' The facilitators should take notes on each enumerator's performances during practice, and share their comments at the end of the day. <br />
<br />
'''NOTE:''' The following is an '''observation checklist''' which facilitators can use to observe enumerators during field practice:<br />
*'''Use of proper equipment:''' It is important to ensure that each enumerator is using the proper equipment during the practice interview. For example, a pen and a notebook in case of a [[Pen-and-Paper Personal Interviews (PAPI)|pen-and-paper interview (PAPI)]], and tablets in case of a [[Computer-Assisted Personal Interviews (CAPI)|computer-assisted personal interview (CAPI)]].<br />
* '''Selection of the correct respondent:''' It is important to ensure that the enumerator selectes the corrected respondent during practice. Interviewing incorrect respondents can cause serious problems for [[Monitoring Data Quality|data quality]], such as '''duplicates''', '''missing values''', and even '''outliers'''.<br />
* '''Proper introduction:''' The enumerators should be able to correctly and concisely introduce the survey objectives to the respondent. They must also be able to read out the [[Informed Consent|informed consent form]] in the language spoken by the respondent.<br />
* '''Introductory sentences:''' The enumerators should be able to read the introductory sentences before questions (if any), as well as all questions correctly in the language spoken by the respondent.<br />
* '''Clarifications and polite probing:''' Sometimes a respondent may have trouble understand a question, or their answer might not be satisfactory. In such situations, the enumerator should be able to clarify as and when required. They must also be able to probe the respondent in a polite manner, and at the right time.<br />
* '''Familiarity with the survey and confidence:''' The facilitators must also check how familiar the enumerators are with the [[Questionnaire Design|survey content]]. Also evaluate the level of confidence in practice sessions for each enumerator.<br />
* '''Language proficiency:''' Check the level of comfort and fluency of enumerators in each of the languages used in the questionnaire.<br />
* '''Interactions with the respondent:''' Evaluate enumerators on their interactions with the respondent, both verbally and non-verbally. The enumerators should be polite and respectful throughout.<br />
* '''Patience and attention to detail:''' The enumerators should answer all follow-up questions from respondents patiently. Evaluate enumerators on their attention to detail and whether they stick to all protocols.<br />
* '''Creating a conducive environment:''' Evaluate enumerators on whether they make the respondents feel comfortable during the interview. The enumerator should reassure the respondent that the impact evaluation study will [[Protecting Human Research Subjects|protect their rights]], including their '''right to privacy'''.<br />
<br />
=== Participation ===<br />
Another criteria to evaluate enumerator performance is their '''participation''' throughout the training sessions. '''Facilitators''' should observe each enumerator and take regular notes. They can score enumerators on a scale of 1 to 5, where '''"1 = Poor"''', '''"2 = Weak"''' '''"3 = Average"''', '''"4 = Strong"''', and '''"5 = Excellent"'''. Some criteria to evaluate '''participation''' are:<br />
* '''Punctuality:''' Facilitators should keep note of enumerators who are punctual for training sessions. This also encourages enumerators to take the sessions seriously.<br />
<br />
* '''Active participation and initiative:''' This includes awarding a higher score to enumerators who take part in classroom discussions, and take initiative to improve in areas where they might be weaker.<br />
<br />
* '''Attitude and integrity:''' The attitude of enumerators during the training is also a very important aspect. Facilitators should take note of, and award higher scores to enumerators who are eager to learn and correct mistakes, and respond positively to feedback after quizzes and training sessions.<br />
<br />
* '''Team work:''' The field team often faces various challenging situations in the field. The training sessions are therefore a good time to think about creating a positive atmosphere in the team. Award higher scores to enumerators who work well in a team, and are willing to help their teammates with any issues they might face.<br />
<br />
* '''Communication skills:''' Good communication skills are also an extremely quality important for enumerators. Enumerators should be able to convey their issues, clarify any doubts they face, and participate in review sessions to improve the overall quality of the [[Field Surveys|survey]].<br />
<br />
== Best Practices ==<br />
It is important to convey the the importance of their role in the research to the enumerators. This will allow enumerators to take ownership of the project, which is essential for ensuring that enumerators remain committed throughout the duration of the [[Primary Data Collection|data collection]]. This will be particularly important for projects lasting several weeks, or projects that involving a significant amount of travel, since these can result in enumerator fatigue. In this regard, it is important to follow certain '''best practices'''.<br />
<br />
=== Scientific approach===<br />
The most important quality of a good survey team is a commitment to the '''scientific method'''. A scientific method is the standard approach for surveys in order to produce concrete and valuable results that can be easily defended during peer-reviews. Clarify to enumerators that the scientific approach means that they are committed to identifying the actual situation on the ground, and not one that appears to exist because of errors in the way opinions are taken into account.<br />
<br />
In development research, the only way that a comparison between different groups is possible is to use the same survey method for all respondents. This means that the following must be as similar as possible for each group:<br />
* '''Process''': Introduce the process in the same way to each group.<br />
* '''Anonymity''': Make people in each group feel equally comfortable that the process is anonymous.<br />
* '''Confidentiality''': Make people feel that their responses will be confidential.<br />
* '''Duration''': Give each group roughly the same amount of time to fill in the questionnaire.<br />
* '''Discussions''': Guide the discussion session in a similar manner for each group.<br />
* '''Collection and filing''': Collect and file all the questionnaires systematically. <br />
<br />
If we deviate from this approach, for example by treating one group differently from all the others, we won’t be able to tell if the differences between that group and the others are because of actual differences, or simply because we failed to use the same survey method for the groups.<br />
<br />
'''NOTE:''' Finally, there will still be times when a particular situation is different from the circumstances of what is considered an ideal interview. In such cases, it is possible that the FAQs compiled by the team might not also be of any help. To ensure that enumerators are prepared for such situations, include sessions to train enumerators on the various aspects of study methodology and approach.<br />
<br />
===Confidentiality and anonymity===<br />
One of the key selling points of the interview for many respondents will be a commitment to [[De-identification|anonymizing]] all interviews and safeguarding respondents’ [[Research_Ethics#Confidentiality|confidentiality]]. Tablets are a very useful tool in helping enumerators achieve this. However, enumerators must also ensure that all interactions with respondents meet the strictest criteria for confidentiality. This entails:<br />
* Not disclosing any opinions, claims, and other features that can be associated with individuals.<br />
* Using confidential information only for the purposes set out in the training, and not for any other purpose.<br />
* Not copying or retaining any written information or record that could be associated with identifying features of individuals, or any other kind of [[Personally Identifiable Information (PII)|identifying information]].<br />
* Returning all confidential information (including notes, memos, photographs) to the survey team at the conclusion of the surveys, or when demanded by the survey team.<br />
* Not disclosing any confidential information to any employee, consultant or third party unless it has been approved by the survey team.<br />
<br />
===Interview practice and field testing===<br />
Before going out in the field, it is important that all enumerators practice interviewing at least twice. This helps them become familiar with the questionnaire, and also allows them to receive feedback on their interviewing skills. It is normal for the first few interviews conducted by each enumerator to be of a lesser quality, so it is important to discard these and not include them in the main dataset.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Primary Data Collection]]<br />
<br />
== Additional Resources ==<br />
* DIME Analytics (World Bank), [https://osf.io/wb86g/ Training Data Collectors]<br />
* DIME Analytics (World Bank), [https://osf.io/n7ctd/ SurveyCTO Guide For Data Collectors]<br />
[[Category: Research Design]]<br />
[[Category: Primary Data Collection]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Survey_Firm_TOR&diff=7092Survey Firm TOR2020-06-04T19:33:42Z<p>Maria jones: /* Overview */</p>
<hr />
<div><br />
A [[Survey Firm|survey firm]] term of reference (TOR) defines the structure of the project and breaks down the responsibilities of all parties involved, including that of the [[Impact Evaluation Team|impact evaluation team]]. The TOR is an important document for [[Procuring a Survey Firm | survey firm procurement]] and ongoing monitoring of survey firm performance. This page suggests the scope of work for the survey firm and impact evaluation team, discusses deliverables, and provides an example excerpt from a survey firm TOR. <br />
<br />
== Read First ==<br />
*Prepare the TOR early in the firm procurement process.<br />
*Within the TOR, clearly delineate tasks between the survey firm and the impact evaluation team; define deliverables carefully; and specify all expectations and consequences if the expectations are not met. <br />
*Refer to the TOR throughout field work. Any changes to survey protocols or sampling that deviate from the TOR must be confirmed in writing. <br />
<br />
==Overview==<br />
<br />
A detailed TOR is essential for smoothly running fieldwork. Prepare the TOR before publishing the Request for Expression of Interest (rEOI) for [[Survey Firm|survey firms]]; publish it after shortlisting firms based on the expression of interest (EOI). After hiring the survey firm, monitor and provide feedback early and often, using the TOR as an anchor for expectations. Any changes to survey protocols or sampling that deviate from the TOR must be confirmed in writing. If there are cost implications, a contract modification may be required.<br />
<br />
== Defining the Scope of Work ==<br />
<br />
=== Survey Firm Tasks ===<br />
<br />
# Obtain necessary permits or clearance for the survey<br />
# [[Questionnaire Translation|Translate]] and [[Survey Pilot|field test]] all questionnaires<br />
# Create detailed field procedure plan<br />
# Recruit, [[Enumerator Training|train]], and contract experienced field staff<br />
# Manage and oversee household survey data collection<br />
<br />
=== Impact Evaluation Team Tasks ===<br />
<br />
# [[Questionnaire Design|Design questionnaire]]<br />
# Manage [[Survey Pilot | survey pilot]]<br />
# [[Questionnaire Programming| Program]] questionnaire<br />
# [[Sampling & Power Calculations|Sample]]<br />
# [[Monitoring Data Quality|Monitor]] data quality<br />
# [[Data Cleaning| Clean]] data<br />
# Conduct [[Data Analysis|data analysis]]<br />
<br />
The field coordinator should work closely with survey firm every step of the process. <br />
<br />
== Defining the Deliverables ==<br />
<br />
Review of deliverables is the formal and enforceable way to provide feedback to the survey firm. It is important to assess quality and provide feedback early and often. <br />
<br />
Deliverables should be spaced throughout the contract rather than only at the end of the survey. Each major stage of the process should include a deliverable. Deliverables should be specified as a detailed list in the TOR, with intermediate outputs. Finally, deliverables should have clearly indicated quality standards. Consequences should be clearly outlined in the case that the firm does not meet standards. <br />
<br />
[[File:Example_Survey_Firm_Deliverables.png|Example List of Deliverables | 800px]]<br />
== TOR Example Excerpt ==<br />
<br />
===Activity 5 : Household survey data collection===<br />
Train all enumerators, supervisors, and data manager on the administration of questionnaires provided by the research team. <br />
The training should also serve as a screening process for skilled interviewers. Consequently, the survey company must recruit more interviewers for the training than will be ultimately hired for the project. Five enumerators should be included in the training as a reserve.<br />
The following components must be included in the training:<br />
<br />
# Theoretical: training should include a review the theory of the questionnaire and each question in order for trainees to fully understand the objective of each question. Standard quantitative interviewing techniques and field protocols should also be covered.<br />
#Classroom practice: training should include individual and group exercises for trainees to become familiar with the practice of asking questions and filling questionnaires. This part of the training may include in-class demonstrations, where the questionnaire is projected and one interviewer completes the questionnaire in front of the classroom. The training may also use vignettes, where the company designs case scenarios based on typical households (perhaps those found during the supervisor training or piloting) and have interviewers complete the questionnaire based on the vignette. Finally, the trainees should conduct pilot interviews on the same subject, and have the interviewers complete a questionnaire for the interview to test consistency across interviewers. <br />
# Field practice: after the theoretical and classroom practice, the interviewers should go to the field to administer the full questionnaire to a small number of households (outside the study sample). The pre-test should not focus on major adjustments to the questionnaire, but rather simulate the administration of the questionnaire under normal circumstances. All field team members must demonstrate they clearly understand their roles and are correctly following the survey protocols.<br />
# Evaluation: following the training, interviewers, supervisors and data managers should be evaluated based on their understanding of the questionnaire and their ability to correctly record data. <br />
<br />
The training period will conclude only when the field teams have demonstrated mastery of the designated tasks. Decisions as to which field staff will take part in the data collection must be made on the basis of this evaluation.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Primary Data Collection]]<br />
<br />
== Additional Resources ==<br />
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/Survey%20firm%20Technical%20Review%20_%20Scoring%20Matrix.xlsx Survey Firm Technical Review Scoring Matrix]<br />
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/survey-firms.pdf Working with Survey Firms]<br />
[[Category: Primary Data Collection]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Procuring_a_Survey_Firm&diff=7091Procuring a Survey Firm2020-06-04T19:32:15Z<p>Maria jones: /* Procurement Timeline */</p>
<hr />
<div><onlyinclude>This article covers the procurement process for the data collection agency. </onlyinclude><br />
<br />
<br />
== Read First ==<br />
<br />
Responsibility for survey firm procurement varies. There are three primary modes: survey firm contracted by the research team (World Bank or University), survey firm contracted by the government (or other implementing partner), or direct hire of data collectors. In the first case, the research team will manage the full selection process. In the second case, the IE team role will be limited to technical assistance on developing the terms of reference and designing a scoring matrix. The third case is much more rare, and typically only employed for pilots, survey audits, or small-scale qualitative data collection. This article focuses on the first case, in which the research team manages the full procurement process.<br />
<br />
== Guidelines ==<br />
<br />
=== Who will collect the data? ===<br />
Many World Bank-funded surveys are conducted by a private [[Survey Firm]]. Other options include: <br />
<br />
1. Government agency / Ministry<br />
* Pros: enumerators have sector knowledge; may be logistically simpler if project is paying for survey<br />
* Cons: perceived as independent? willing to survey control sites? Quality controls? performance incentives? HH survey experience?<br />
2. National Statistics Office <br />
* Pros: Often high capacity<br />
* Cons: IE surveys are not in typical scope of work (focus on nationally representative surveys), busy with existing surveys, may not be interested in small-scale contracts<br />
3. Directly hire enumerators<br />
* Pros: Highest degree of flexibility and control over the process<br />
* Cons: Procurement challenge (many individual consultants), full responsibility for logistics, requires much more time/effort from research team<br />
<br />
== Procurement Timeline==<br />
<br />
{| class="wikitable" style="margin-left: 10px; margin-right: auto; border: none;"<br />
|-<br />
! Stage <br />
! Minimum time required<br />
|-<br />
| Due diligence: research local survey firm options<br />
| 2 weeks<br />
|-<br />
| Prepare detailed Terms of Reference (TORs)<br />
| 2 weeks<br />
|-<br />
| Publish request for Expression of Interest (rEOI)<br />
| 1 day<br />
|-<br />
| Firms submit expression of interest (EOI)<br />
| 3 weeks<br />
|-<br />
| Shortlist firms based on EOI<br />
| 1 day<br />
|-<br />
| Publish TOR and call for proposals <br />
| 1 day<br />
|-<br />
| Shortlisted firms submit technical and financial proposals<br />
| 3 weeks<br />
|-<br />
| Evaluation of technical then financial proposals<br />
| 1 week<br />
|-<br />
| Negotiations and award of contract to selected firm<br />
| 1 week<br />
|-<br />
| Contract published and signed<br />
| 1 week<br />
|-<br />
|}<br />
<br />
For guidelines and examples, refer to [[Survey Firm TOR|Terms of Reference (TOR)]]<br />
<br />
=== Terms of Reference ===<br />
The [[Survey Firm TOR]] specifies scope of work, responsibilities, required activities, and deliverables. Developing a detailed TOR is essential!<br />
<br />
Be sure that expectations and standards are clearly spelled out, with potential consequences. Otherwise even if you detect fraud may not be able to do anything about it!<br />
<br />
=== Troubleshooting ===<br />
<br />
* If you find problems with observable quality or representativeness of data, for example from a [[Survey Audit]]<br />
** Examples: Enumerators not visiting households / falsifying data, Enumerators falsifying information to shorten interviews, Field Teams dropping households that weren’t actually unavailable (just difficult to get to, or not available on first visit)<br />
** Clearly specified protocols & standards in TORs are the mechanism for dealing with this. If you have laid out consequences for fraudulent activities in the TORs, those consequences can be called into effect here, and the firm will be given a warning, and if the problem continues the contract can be cancelled.<br />
<br />
* If you become aware of logistical (typically sub-contractor) problems<br />
** Examples: Enumerators supposed to be paid by day, instead paid by questionnaire; enumerators paid much less than promised or salary withheld<br />
** It is very difficult for the research team to intervene in this case, as there is no formal relationship with the sub-contractors. It can only formally be dealt with if there are observable consequences for survey protocols or data quality, violating the TORs.<br />
** Though hard to address, it is important to know when these problems are happening. It's a good idea for field coordinators to build relationships with enumerators to understand how the work is going and identify issues. <br />
<br />
* Know average survey costs in the context you will work in. If a proposal seem too cheap to be true, it probably is.<br />
<br />
* Be aware this is potentially a repetitive game, in some markets survey firms have limited competition. <br />
<br />
* Field coordinators should communicate progress and problems with research team often<br />
<br />
* Be careful with asking for things that are not clearly specified in the TORs<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Primary Data Collection]]<br />
<br />
== Additional Resources ==<br />
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/Survey%20firm%20Technical%20Review%20_%20Scoring%20Matrix.xlsx Survey Firm Technical Review Scoring Matrix]<br />
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/survey-firms.pdf Working with Survey Firms]<br />
[[Category: Primary Data Collection]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Procuring_a_Survey_Firm&diff=7090Procuring a Survey Firm2020-06-04T19:31:59Z<p>Maria jones: /* Procurement Timeline */</p>
<hr />
<div><onlyinclude>This article covers the procurement process for the data collection agency. </onlyinclude><br />
<br />
<br />
== Read First ==<br />
<br />
Responsibility for survey firm procurement varies. There are three primary modes: survey firm contracted by the research team (World Bank or University), survey firm contracted by the government (or other implementing partner), or direct hire of data collectors. In the first case, the research team will manage the full selection process. In the second case, the IE team role will be limited to technical assistance on developing the terms of reference and designing a scoring matrix. The third case is much more rare, and typically only employed for pilots, survey audits, or small-scale qualitative data collection. This article focuses on the first case, in which the research team manages the full procurement process.<br />
<br />
== Guidelines ==<br />
<br />
=== Who will collect the data? ===<br />
Many World Bank-funded surveys are conducted by a private [[Survey Firm]]. Other options include: <br />
<br />
1. Government agency / Ministry<br />
* Pros: enumerators have sector knowledge; may be logistically simpler if project is paying for survey<br />
* Cons: perceived as independent? willing to survey control sites? Quality controls? performance incentives? HH survey experience?<br />
2. National Statistics Office <br />
* Pros: Often high capacity<br />
* Cons: IE surveys are not in typical scope of work (focus on nationally representative surveys), busy with existing surveys, may not be interested in small-scale contracts<br />
3. Directly hire enumerators<br />
* Pros: Highest degree of flexibility and control over the process<br />
* Cons: Procurement challenge (many individual consultants), full responsibility for logistics, requires much more time/effort from research team<br />
<br />
== Procurement Timeline==<br />
<br />
{| class="wikitable" style="margin-left: 10px; margin-right: auto; border: none;"<br />
|-<br />
! Stage <br />
! Minimum time required<br />
|-<br />
| Due diligence: research local survey firm options<br />
| 2 weeks<br />
|-<br />
| Prepare detailed Terms of Reference (TORs)<br />
| 2 weeks<br />
|-<br />
| Publish request for Expression of Interest (rEOI)<br />
| 1 day<br />
|-<br />
| Firms submit expression of interest (EOI)<br />
| 3 weeks<br />
|-<br />
| Shortlist firms based on EOI<br />
| 1 day<br />
|-<br />
| Publish TOR and call for proposals <br />
| 1 day<br />
|-<br />
| Shortlisted firms submit technical and financial proposals<br />
| 3 weeks<br />
|-<br />
| Evaluation of technical then financial proposals<br />
| 1 week<br />
|-<br />
| Negotiations and award of contract to selected firm<br />
| 1 week<br />
|-<br />
| Contract published and signed<br />
| 1 week<br />
|-<br />
|}<br />
<br />
For guidelines and examples, refer to and [[Survey Firm TOR|Terms of Reference (TOR)]]<br />
<br />
=== Terms of Reference ===<br />
The [[Survey Firm TOR]] specifies scope of work, responsibilities, required activities, and deliverables. Developing a detailed TOR is essential!<br />
<br />
Be sure that expectations and standards are clearly spelled out, with potential consequences. Otherwise even if you detect fraud may not be able to do anything about it!<br />
<br />
=== Troubleshooting ===<br />
<br />
* If you find problems with observable quality or representativeness of data, for example from a [[Survey Audit]]<br />
** Examples: Enumerators not visiting households / falsifying data, Enumerators falsifying information to shorten interviews, Field Teams dropping households that weren’t actually unavailable (just difficult to get to, or not available on first visit)<br />
** Clearly specified protocols & standards in TORs are the mechanism for dealing with this. If you have laid out consequences for fraudulent activities in the TORs, those consequences can be called into effect here, and the firm will be given a warning, and if the problem continues the contract can be cancelled.<br />
<br />
* If you become aware of logistical (typically sub-contractor) problems<br />
** Examples: Enumerators supposed to be paid by day, instead paid by questionnaire; enumerators paid much less than promised or salary withheld<br />
** It is very difficult for the research team to intervene in this case, as there is no formal relationship with the sub-contractors. It can only formally be dealt with if there are observable consequences for survey protocols or data quality, violating the TORs.<br />
** Though hard to address, it is important to know when these problems are happening. It's a good idea for field coordinators to build relationships with enumerators to understand how the work is going and identify issues. <br />
<br />
* Know average survey costs in the context you will work in. If a proposal seem too cheap to be true, it probably is.<br />
<br />
* Be aware this is potentially a repetitive game, in some markets survey firms have limited competition. <br />
<br />
* Field coordinators should communicate progress and problems with research team often<br />
<br />
* Be careful with asking for things that are not clearly specified in the TORs<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Primary Data Collection]]<br />
<br />
== Additional Resources ==<br />
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/Survey%20firm%20Technical%20Review%20_%20Scoring%20Matrix.xlsx Survey Firm Technical Review Scoring Matrix]<br />
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/survey-firms.pdf Working with Survey Firms]<br />
[[Category: Primary Data Collection]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Survey_Firm&diff=7089Survey Firm2020-06-04T19:28:42Z<p>Maria jones: /* Guidelines */</p>
<hr />
<div><onlyinclude>Data collection is often done through a local research or survey firm. Survey firms may be local or international, for-profit or non-profit. </onlyinclude><br />
<br />
<br />
== Read First ==<br />
Identifying a high-quality and trustworthy survey firm is the most important step in [[Preparing for Data Collection]]. <br />
<br />
== Guidelines ==<br />
=== Local Firm ===<br />
* Pros: Typically have good network of enumerators, know local context and work in local language(s)<br />
* Cons: Experience and capacity level varies. Lack of familiarity with Bank procurement can be a challenge. <br />
<br />
=== International Firm (e.g. Gallup, Ipsos) ===<br />
* Pros: Extensive survey experience. Well-versed in Bank procurement. <br />
* Cons: Less knowledge of context, management may not speak local language, may not have good relationship with local staff or enumerators.<br />
<br />
=== Research NGO ===<br />
* Pros: Not-for profit, typically more budget transparency and incentives more in line with those of research team. Relationship with research team more collaborative / partnership. <br />
* Cons: May not be the most cost-effective; may not be experienced with competitive bidding processes for survey work.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Survey Firm Procurement]]<br />
<br />
<br />
== Additional Resources ==<br />
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/Survey%20firm%20Technical%20Review%20_%20Scoring%20Matrix.xlsx Survey Firm Technical Review Scoring Matrix]<br />
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/survey-firms.pdf Working with Survey Firms]<br />
[[Category: Survey Firm Procurement ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Publishing_Data&diff=7031Publishing Data2020-05-26T17:12:52Z<p>Maria jones: /* Publishing */</p>
<hr />
<div>Data publication is the release of data and data documentation following [[Primary Data Collection | data collection]] and [[Data Analysis | analysis]]. Data publication is an increasingly common standard that bolsters research transparency and [[Reproducible Research | reproducibility]]. Preparation for data publication begins in the early stages of research: effective [[Data Management | data management]] and analytics throughout the project will ensure that the research team can easily publish data when the time comes and that outside users can access and use the data to [[Reproducible Research | replicate]] the researcher's primary results. This page will discuss preparing and publishing data, code, documentation, and directories.<br />
<br />
==Read First==<br />
*[https://github.com/worldbank/dime-standards/tree/master/dime-research-standards/pillar-5-data-publication DIME Data Publication Standards]<br />
*Before publishing data, remove all [[De-identification#Personally Identifiable Information | personally-identifying information (PII)]] such as names, locations or financial records. <br />
*Accompany published data with proper [[Data Documentation | documentation]] to ensure that users understand the data.<br />
*Publish data within a comprehensive directory that includes all necessary data files, raw outputs, and code.<br />
*[[Getting started with GitHub | GitHub]], [https://osf.io/ The Open Science Framework], and [https://www.researchgate.net/Research Gate] are all platforms on which researchers can publish data, code, and directories<br />
<br />
==Preparing for Release == <br />
<br />
=== Preparing Data===<br />
<br />
Released data should allow any user to [[Reproducible Research | replicate]] research findings. Therefore, released data should be [[Data Cleaning | clean]] and [[Data_Cleaning#Labels | well-labelled]], contain all variables used in [[Data Analysis | data analysis]], and include [[ID Variable Properties | identifying variables]]. Make sure to maintain the privacy of respondents by carefully [[De-identification | de-identifying]] any sensitive or [[De-identification#Personally Identifiable Information | personally-identifying information (PII)]] such as names, locations, or financial records, all of which are not [[Research Ethics | ethical]] to publish. <br />
<br />
===Preparing Data Documentation===<br />
<br />
Analysis datasets should be easily understandable to researchers trying replicate results. Therefore, it's important that proper [[Data Documentation | documentation]], including variable dictionaries and survey instruments, accompany the data release. This ensures that users can easily understand the data. See the [[Checklist: Microdata Catalog submission|Microdata Catalog Checklist]] for instructions on how to prepare data and documentation for primary data release.<br />
<br />
===Preparing Code and Directory===<br />
<br />
For full reproducibility, release a structured directory that allows a user to immediately run your code after changing the project directory. If you have followed the DIME Wiki’s protocols and effectively [[Data Management | managed]] data throughout your research project via, among other things, an organized [[DataWork Folder | project folder]] and [[Master Do-files | master do-file]], you will already have well-written and reproducible [[Stata Coding Practices | code]] within a well-structured directory. <br />
<br />
The folders should include all de-identified data necessary for the analysis, all code necessary for the analysis; and the raw outputs you use for the paper. Using <code>[[iefolder]]</code> from DIME’s <code>[[ietoolkit]]</code> can help standardize your directory. In either the /dofiles/ folder or in the root directory, include a [[Master Do-files | master script]] (.do or .r for example). The master script should allow the reviewer to change one line of code to set his/her directory path. Then, the master script should run the entire project and re-create all the raw outputs exactly as supplied. Check that all code will run completely on a new computer: install any required user-written commands in the master script and make sure that settings like <code>version</code>, <code>matsize</code>, and <code>varabbrev</code> are set. All outputs should clearly correspond by name to an exhibit in the paper, and vice versa.<br />
<br />
==Publishing==<br />
A data publication platform must be able to handle structured directories and provide a stable, structured URL for your project.<br />
<br />
[[DIME_Datasets_on_Microdata_Catalog| DIME survey data]] is typically published and released through the [[Microdata Catalog]]. <br />
<br />
[[Getting started with GitHub | GitHub]], [https://osf.io/ The Open Science Framework], and [https://www.researchgate.net/Research Gate] are often used for replication packages, as these platforms allow for publication of data, documentation, and code.<br />
<br />
== Author’s Preprint ==<br />
<br />
Consider releasing an author’s copy or preprint, but check with your publisher before doing so: not all journals will accept material that has been released. Therefore, you may need to wait until acceptance is confirmed. You can do so on a number of pre-print websites, many of which are topically-specific. You can also use GitHub and link the file directly on your personal website or whatever medium through which you are sharing the preprint. Do not use Dropbox or Google Drive for this purpose: many organizations do not allow access to these tools, and that includes blocking staff from accessing your material. <br />
<br />
== Additional Resources==<br />
<br />
*Find an example of a published World Bank directory for replication [https://github.com/worldbank/Water-When-It-Counts here].<br />
*Read the Berkeley Initiative for Transparency in the Social Sciences’ [https://www.bitss.org/2016/05/23/out-of-the-file-drawer-tips-on-prepping-data-for-publication/ tips] on preparing data for publication.<br />
<br />
[[Category: Publishing Data]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Primary_Data_Collection&diff=6245Primary Data Collection2020-04-20T13:41:51Z<p>Maria jones: </p>
<hr />
<div><onlyinclude><br />
'''Primary data collection''' is the process of gathering data through [[Field Surveys|surveys]], interviews, or experiments. A typical example of primary data is '''household surveys'''. In this form of data collection, researchers can personally ensure that primary data meets the standards of [[Monitoring Data Quality | quality]], availability, [[Power Calculations in Stata | statistical power]] and [[Sampling & Power Calculations | sampling]] required for a particular research question. With globally increasing access to specialized [[Software Tools |survey tools]], [[Survey Firm | survey firms]], and field manuals, primary data has become the dominant source for empirical inquiry in development economics.<br />
</onlyinclude><br />
== Read First ==<br />
*[https://github.com/worldbank/dime-standards/blob/master/dime-research-standards/README.md The DIME Research Standards] provide a comprehensive checklist to ensure that collection and handling of research data is in line with global best practices.<br />
*[[Field Surveys|Personal interviews]] are one of the most effective medium for primary data collection. Depending on the research question, these interviews may take the form of household surveys, business (firm) surveys, or agricultural (farm) surveys.<br />
*<code>[[iefieldkit]]</code> is a Stata package that aids primary data collection. It currently supports three major components of that workflow: [[Questionnaire Design|survey design]]; survey completion; and [[Data Cleaning|data-cleaning]] and [[Iefieldkit|survey harmonization]].<br />
<br />
== Guidelines ==<br />
While impact evaluations often benefit from [[Secondary Data Sources|secondary sources of data]] like administrative data, census data, or household data, these sources may not always be available. In such cases, researchers need to collect data directly through a series of [[Questionnaire Design|well-designed]] interviews and [[Field Surveys|surveys]]. The process of collecting primary data requires a great deal of foresight, [[Field Management|planning]] and coordination. <br />
Listed below are the crucial steps involved the in [[Preparing for Field Data Collection | preparation and collection]] of primary data.<br />
<br />
=== Acquire approval from human subjects ===<br />
There are strict rules about [[Human Subjects Approval | acquiring approval from human subjects]]. Researchers must understand the [[Research Ethics|ethics]] and rules for [[Data Security|security of sensitive data]], and should use proper tools for [[Encryption | encryption]] and [[De-identification | de-identification]] of [[Personally Identifiable Information_(PII)|personally identifiable information (PII)]].<br />
<br />
=== Compile the survey budget ===<br />
Researchers must prepare a [[Survey Budget | survey budget]] before [[Procuring a Survey Firm|procuring a survey firm]]. This step allows researchers to calculate expected costs of conducting a study, and compare these with the proposals of firms that submit an '''expression of interest (EOI)'''.<br />
<br />
=== Determine relevant parameters of a study ===<br />
After agreeing upon a budget, researchers then decide upon factors like the adequate '''sampling frame''' (which is a list of individuals or units in a population from which a sample can be drawn), [[Sample Size | sample size]], and [[Sampling & Power Calculations | statistical power]] based on which they can then [[Randomized_Control_Trials|randomize treatment]].<br />
<br />
=== Procure a survey firm ===<br />
The next step is to [[Procuring a Survey Firm|procure a survey firm]] after issuing detailed [[Survey Firm TOR|terms of reference (TOR)]], and performing due diligence among local research firm options.<br />
<br />
=== Carry out a pre-pilot===<br />
The '''first stage''' of the [[Survey Pilot|survey pilot]], the '''pre-pilot''' involves two things: [[Piloting Survey Content |piloting content]] and [[Piloting Survey Protocols| piloting protocols]]. Clear protocols allow researchers to ensure that [[Preparing for Field Data Collection|field collection]] is carried out consistently across teams and/or regions, and ensure that published [[Reproducible_Research|research is reproducible]].<br />
<br />
=== Refine and review the survey design ===<br />
The '''first stage''' of the [[Survey Pilot|survey pilot]] allows researchers to develop a [[Questionnaire_Design|design]] for the instrument. The researchers then conduct the '''second stage''' of the survey pilot, called [[Piloting_Survey_Content|content-focused pilot]], to review and refine the structure of the instrument.<br />
<br />
=== Translate the survey instrument ===<br />
After the content-focused pilot, the research firm [[Questionnaire_Translation|translates the instrument]] into all local languages. This step helps to ensure that the survey can be taken by more people, therefore making the study more effective.<br />
<br />
=== Program the instrument ===<br />
After obtaining [[IRB Approval|IRB approval]], researchers [[Questionnaire Programming|program the questionnaire]]. This step makes it easier to share surveys that rely on methods like [[Computer-Assisted Personal Interviews (CAPI)]] or [[Computer-Assisted Field Entry (CAFE)]].<br />
<br/> Also refer to [[SurveyCTO_Coding_Practices|SurveyCTO coding practices]] to learn more about programming surveys.<br />
<br />
=== Train enumerators and monitor data quality ===<br />
After validating the programming of the questionnaire, the researchers [[Enumerator Training | train enumerators]] and [[Monitoring_Data_Quality|monitor data quality]] to generate a '''final draft''' of the instrument. '''Monitoring''' can be done in the form of [[Back_Checks|back checks]], [[Monitoring Data Quality#High Frequency Checks|high frequency checks]], as well as other methods.<br />
<br />
=== Maintain an organized data folder ===<br />
DIME Analytics has created a Stata command, <code>[[iefolder]]</code>. Part of the DIME Analytics Stata package<code>[[ietoolkit]]</code> , it helps increase project efficiency, and reduces the risk of error in a study.<br />
<br />
== Related Pages ==<br />
[[Special:WhatLinksHere/Primary_Data_Collection|Click here for pages that link to this topic]].<br />
<br />
== Additional Resources ==<br />
* Oxfam, [http://policy-practice.oxfam.org.uk/publications/planning-survey-research-578973 Brief on Planning Survey Research]<br />
* World Bank (DIME), [http://web.worldbank.org/archive/website01542/WEB/IMAGES/SURVEY.PDF Guide on Planning, Preparing & Monitoring Household Surveys]<br />
* World Bank (DIME Analytics), [https://github.com/worldbank/DIME-Resources/blob/master/survey-preparing.pdf Guidelines on Preparing for Data Collection]<br />
* Oxfam, [https://oxfamilibrary.openrepository.com/bitstream/handle/10546/620522/cs-going-digital-data-quality-data-collection-240718-en.pdf?sequence=1&isAllowed=y Case study on using electronic data collection (SurveyCTO) and Stata to improve data quality in the field]<br />
<br />
[[Category: Primary Data Collection ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=SurveyCTO_Coding_Practices&diff=4791SurveyCTO Coding Practices2018-11-19T16:02:42Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div><onlyinclude>This article discuss solutions to common issues in the SurveyCTO programming language. For a general introduction to how to structure your approach to CAPI programming or best practices settings, see the [[Questionnaire Programming]] topic.</onlyinclude><br />
<br />
'''Read First''' All coding examples linked to in this section are stored in Google Drive. SurveyCTO also allows you to pull this code directly to your server, using the URL of the Google Sheet (alternatively, you can copy the code to Excel).<br />
<br />
== Labelling ==<br />
<br />
To speed up data import, all SurveyCTO surveys should have a language labelling column in both the questionnaire and the value labeling tab called "label:stata" which will be used to download and process the data. These labels should be in English, be no longer than 32 characters, and uses no special characters. The research assistant who will be responsible for data management can be of great assistance in preparing this. See the SurveyCTO documentation on "Translating a form into multiple languages" for more details.<br />
<br />
== Randomization ==<br />
In the field, the [[randomization in SurveyCTO | best practice when randomizing anything]] is to prepare the randomization before the field activities start, and preload the result of the randomization into the survey so that it is replicable. What follows are some examples of SurveyCTO forms that randomly select survey participants:<br />
<br />
* [[SurveyCTO Random Draw of Beneficiaries 1|Random draw of beneficiaries from a large pool]], without knowing if the potential beneficiaries are valid participants - this form randomly prioritizes participation over a group of IDs, which are then verified by the enumerator until a final group of 8 participants are registered.<br />
<br />
* [[SurveyCTO Random Draw of Beneficiaries 2|Random draw of any number of beneficiaries using repeat group]] - here we randomly prioritize a group of IDs using an elegant and concise repeat group solution, however this is not recommended for use in the field as it's not replicable without adaptation.<br />
<br />
== Repeat Groups / Rosters ==<br />
This sections lists code examples for special requirements in relation to rosters and repeat groups. These can be used to develop interesting functionality within forms, particularly interacting with the responses from a household, plot or crop roster. Here are some examples of this:<br />
<br />
* [[SurveyCTO Repeat Group Using Previous Choices|Setting Up Repeat Group Using Previous Choices]] - there are many cases when you want to repeat a set of questions over previously selected responses, such as a set of crops cultivated or activities performed. This example shows the 2 main ways of coding this.<br />
<br />
* [[SurveyCTO Select from Roster Age Order|Select Member in Roster Based on Criteria]] - in this example we have a roster over children and then we want the respondent to be asked to select the youngest child if the mother is present, if she is not present, we ask the respondent to select the second youngest child if the mother of the child is present, and so fourth. <br />
<br />
* [[SurveyCTO Conditional Filtering|Filtering on Conditions of Repeat Group Questions]] - this example utilizes responses found inside a repeat group roster as conditions upon which to filter choices for questions further down in a form.<br />
<br />
* [[SurveyCTO Filtering in Repeated Questions|Filtering in Repeated Choice Questions]] - this example shows how to code a repeating question where the list of choices is reduced if an option was previously selected<br />
<br />
=== Agriculture Survey Advice ===<br />
There are many challenges encountered when coding agriculture sections of household surveys. There is a lot of data to capture at different and changing levels: per season, per plot, per crop, etc., and sometimes you might want to change the level of questions from crop within plot within season, to, for example, just the crop level. It's important that the respondents are able to recall harvest and sales information as accurately as possible, therefore we must structure surveys well to account for this. Here are some example forms that talk through the main issues and suggest designs to overcome these issues.<br />
<br />
* [[SurveyCTO Dealing with 'Other' Crops Over Different Repeat Group Levels |Dealing with 'Other' Crops Over Different Repeat Group Levels]] - in SurveyCTO it's very difficult to introduce new crops in different repeats and be able to recall them at other points in the survey. This example form talks about these difficulties and suggests a structure to refer back to them in other sections.<br />
<br />
== Groups ==<br />
Use a lot of groups but do not over use them. In general, groups are used to fulfill one of the purposes below:<br />
<br />
* [[Relevance Condition to Multiple Fields|Apply a relevance condition to multiple fields]].<br />
<br />
* [[Multiple Questions Displayed at the Same Time |Display multiple questions on the same screen]].<br />
<br />
* Frame all the questions on a module in group. Only do this at the highest level of the survey, i.e. do not use this for sub-levels of a module.<br />
<br />
== Choice Lists ==<br />
Choice lists are the answer options an enumerator can choose from in a ''select_one'' or ''select_multiple'' question. They are listed in the choices tab in the SurveyCTO questionnaire. Open Data Kit, the programming language of SurveyCTO, has very few restrictions on how you can code your options. However, there are [[SurveyCTO Choice Lists|choice list best practices]] that matter for data quality.<br />
<br />
* [[SurveyCTO Dynamically Populated Choice Lists|Dynamically Populated Choice Lists - basic]] - it is possible to program dynamically populated choice lists using answers given by the respondents in a previous question.<br />
<br />
* [[SurveyCTO Dynamically Populated Choice Lists From Select One|Dynamically Populated Choice Lists - from repeated select_one]] - a specific example of dynamically populated choice list is when you populate a ''select_multiple'' question with answers from a ''select_one'' asked in a repeat group. For example, say that you list crops grown in a repeat group where each repeat is a crop, and later you want to be able to ask "''which crop did you grow the most?''" and only the crops already selected in the repeat group should display. <br />
<br />
== Other Tips and Tricks ==<br />
* [[SurveyCTO HTML Input|Question font formatting in HTML]] - SurveyCTO accepts HTML commands in the text of questions. This can be used to <font color="red"> highlight </font> and '''emphasize''' key information, among other uses.<br />
<br />
<br />
=== Categories to add to this page: ===<br />
* Household rosters<br />
** General examples<br />
** Updating roasters from previous rounds on tablet during interview<br />
* ID and identification <br />
** Assigning IDs in the field - both when the sample is know before launch of survey and when respondents are sampled in the field<br />
<br />
== Additional Resources ==<br />
Tips on coding complex agricultural surveys in SurveyCTO, from IFPRI: https://www.surveycto.com/best-practices/pro-tips-for-agricultural-surveys-from-ifpri/ <br />
<br />
[[Category: SurveyCTO Coding Practices ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Geo_Spatial_Data&diff=4784Geo Spatial Data2018-08-06T20:24:59Z<p>Maria jones: /* Examples of Papers */</p>
<hr />
<div>== Read First ==<br />
<onlyinclude><br />
The proliferation of remote sensing technology has created new opportunities for high-resolution, affordable geospatial data, which have great potential as a source of impact evaluation data. <br />
<br />
</onlyinclude><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Guidelines ==<br />
<br />
===Repositories of Spatial Data===<br />
The following are repositories of spatial data. The following sites pull in spatial data from a variety of sources.<br />
*[https://earthengine.google.com/datasets/ Google Earth Engine]: Stores petabytes of satellite imagery on google's cloud.<br />
*[http://sedac.ciesin.columbia.edu/ Socio Economic Data and Applications Center (SEDAC)]: Provides links to a number of spatially referenced datasets.<br />
*[http://geoquery.org/ AidData geo.query]: Allows users to extract data to administrative boundaries.<br />
<br />
===Satellite-Based Datasets===<br />
The following are commonly used datasets from satellite imagery or derived from satellite imagery.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Dataset<br />
! Spatial Resolution<br />
! Temporal Resolution<br />
! Description<br />
|-<br />
| [https://ngdc.noaa.gov/eog/viirs/index.html Nighttime Lights: VIIRS]<br />
| 300m<br />
| Monthly, 2012 to Present<br />
| Nighttime lights has increasingly been used as a metric for local economic development. <br />
|-<br />
| [https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html Nighttime Lights: DMSP-OLS]<br />
| 750m<br />
| Annual, 1992-2013<br />
| For studies that need a long time series of nighttime lights, DMSP-OLS data can be combined with VIIRS data. VIIRS, however, has [http://journals.sfu.ca/apan/index.php/apan/article/view/7/pdf_7 several improvements] over DMSP-OLS, including a high resolution and less light saturation in urban areas.<br />
|-<br />
| [https://landsat.usgs.gov/ Landsat]<br />
| 30m<br />
| Every 16 days, 1972 to Present<br />
| Landsat images capture the earth across [https://landsat.usgs.gov/what-are-band-designations-landsat-satellites multiple spectral bands], including spectra unobservable to the human eye. Different combinations of these spectral bands emphasize different aspects of the earth. One of the most common indices is the Normalized Difference Vegetation Index ([https://earthobservatory.nasa.gov/Features/MeasuringVegetation/measuring_vegetation_2.php NDVI]), which provides a measure of vegetation biomass. A list of common indices can be found [http://pro.arcgis.com/en/pro-app/help/data/imagery/indices-gallery.htm here].<br />
|-<br />
| [https://www.esa-landcover-cci.org/?q=node/175 ESA Land Cover]<br />
| 300m<br />
| Annual, 1992 to 2015<br />
| Classifies land cover into one of [https://www.theia-land.fr/en/products/land-cover-globcover 22 land cover types].<br />
|}<br />
<br />
===Georeferenced Data Sources===<br />
*[http://aiddata.org/datasets AidData]: AidData has geocoded foreign aid projects from a number of different donors and countries, including all approved [http://aiddata.org/data/world-bank-geocoded-research-release-level-1-v1-4-2 World Bank projects] from 1995 to 2014, [http://aiddata.org/datasets Chinese official finance] from 2000 to 2014, and [http://aiddata.org/data/afdb-2009-2010-all-approved-projects African Development Bank] projects approved in 2009-2010.<br />
*[http://afrobarometer.org/data/geocoded-data Afrobarometer]: Afrobarometer has surveyed attitudes on democracy, governance and society across 36 countries in Africa in 6 survey rounds from 1999 to 2015. In partnership with AidData, Afrobarometer has recently geocoded the surveys.<br />
*[https://dhsprogram.com/ Demographic and Health Surveys (DHS)]: DHS surveys from USAID are georeferenced at the enumeration area level. However, DHS [https://dhsprogram.com/pubs/pdf/SAR7/SAR7.pdf randomly displaces] the geographic coordinates to protect respondent confidentiality. <br />
*[http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/EXTLSMS/0,,contentMDK:23512006~pagePK:64168445~piPK:64168309~theSitePK:3358997,00.html Living Standards Measurement Survey (LSMS)]: Most LSMS datasets are geocoded at the enumeration area level. <br />
<br />
===Impact Evaluation with Geospatial Data===<br />
The emergence of georeferenced data has provided opportunities to evaluate foreign investments at lower costs than traditional RCTs. These evaluations have been dubbed Geospatial Impact Evaluations (GIEs); see [http://docs.aiddata.org/ad4/pdfs/wps44_a_primer_on_geospatial_impact_evaluation_methods_tools_and_applications.pdf here] for a working paper from AidData that describes methods and applications to perform GIEs. The paper describes a number of papers that conduct GIEs. In addition, it highlights two R packages that employ methods relevant to using geospatial data: (1) [https://github.com/itpir/geoMatch geoMATCH], which employs matching while accounting for geographic spillover from treatment to control units and (2) [https://github.com/itpir/geoSIMEX geoSIMEX], which allows users to account for spatial imprecision in analysis. <br />
<br />
===Use of intersection to produce usable data for Stata===<br />
<br />
*For a particular project that has a spatial component, you first need to have geographical polygons that are representative of your project. For instance, the polygons that shows the limits of your villages.<br />
*Then, for some particular themes you will only find data in the form of GIS data. Examples of this are hospitals and clinics, or rainfall. The data in question is then intrinsically accompanied with coordinates, that allows to position the information in a spatial setting.<br />
*So then, after importing the particular data to a GIS software, you overlay your new data to the geographical layer that represents your project.<br />
*Once done, and simply intersect both layers, and ask for instance for a mean over your geographical polygons, or for a maximum or for a count. In QGIS, for instance, you will find a bunch of these useful spacial operations in the Vector/Geoprocessing Tools section.<br />
*These tools also allow you for instance to substract polygons from others; Then why not intersecting with data layer afterwards? It is much useful.<br />
*All that needs to be done then is to export your newly generated data.<br />
<br />
===Data Interpolation===<br />
<br />
This is useful for those who want to generate information in between spatial measurements. You will not want to this is several settings, since the granularity of your data has a value. However for some themes, like for the level of a groundwater table, or for the level of a ground contamination, this is useful as the nature of the subject itself (ex: contamination) is continuous.<br />
<br />
*The main thing one should know, is that the mathematical method you chose for the interpolation has a large impact on your results.<br />
*In case of doubts (or in much of cases, let's say), one should chose to use krigging, since the interpolation (let's visualize it as a surface) goes through the measurement points exactly. It's distance at a measurement point is equal to zero.<br />
*GIS softwares usually have modules that allow to do interpolation. They also allow to do dynamic modelling (state of your variables, in space, with time).<br />
*The interpolation of your data lead to the production of heat maps.<br />
<br />
===Heat Maps===<br />
*When producing heat maps, one should know that there exist mathematical tools that allow you to enhance the spatial shapes that your data.<br />
*These "tools" are in fact transformations on your surface, such as first difference, Fourrier, ect. They can provide a much better definition of your results, and even allow you to "see" something that you might have missed when not using them.<br />
<br />
===Examples of Papers===<br />
<br />
* Many influential papers using these type of data have been published in journals<br />
<br />
* J. Vernon Henderson, Adam Storeygard, and David N. Weil. 2012. [http://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.102.2.994 Measuring Economic Growth from Outer Space]. In '''American Economic Review''', 102(2): 994-1028. <br />
<br />
* Dave Donaldson and Adam Storeygard. 2016. [http://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.30.4.171 The View from Above: Applications of Satellite Data in Economics]. '''Journal of Economic Perspectives''', 30(4):171-198.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Secondary Data Sources]]<br />
<br />
== Additional Resources ==<br />
* Documentation for QGIS which covers a lot of topics in great detail: https://docs.qgis.org/2.2/en/docs/index.html<br />
<br />
[[Category: Secondary Data Sources ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Geo_Spatial_Data&diff=4783Geo Spatial Data2018-08-06T20:24:49Z<p>Maria jones: /* Examples of Papers */</p>
<hr />
<div>== Read First ==<br />
<onlyinclude><br />
The proliferation of remote sensing technology has created new opportunities for high-resolution, affordable geospatial data, which have great potential as a source of impact evaluation data. <br />
<br />
</onlyinclude><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Guidelines ==<br />
<br />
===Repositories of Spatial Data===<br />
The following are repositories of spatial data. The following sites pull in spatial data from a variety of sources.<br />
*[https://earthengine.google.com/datasets/ Google Earth Engine]: Stores petabytes of satellite imagery on google's cloud.<br />
*[http://sedac.ciesin.columbia.edu/ Socio Economic Data and Applications Center (SEDAC)]: Provides links to a number of spatially referenced datasets.<br />
*[http://geoquery.org/ AidData geo.query]: Allows users to extract data to administrative boundaries.<br />
<br />
===Satellite-Based Datasets===<br />
The following are commonly used datasets from satellite imagery or derived from satellite imagery.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Dataset<br />
! Spatial Resolution<br />
! Temporal Resolution<br />
! Description<br />
|-<br />
| [https://ngdc.noaa.gov/eog/viirs/index.html Nighttime Lights: VIIRS]<br />
| 300m<br />
| Monthly, 2012 to Present<br />
| Nighttime lights has increasingly been used as a metric for local economic development. <br />
|-<br />
| [https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html Nighttime Lights: DMSP-OLS]<br />
| 750m<br />
| Annual, 1992-2013<br />
| For studies that need a long time series of nighttime lights, DMSP-OLS data can be combined with VIIRS data. VIIRS, however, has [http://journals.sfu.ca/apan/index.php/apan/article/view/7/pdf_7 several improvements] over DMSP-OLS, including a high resolution and less light saturation in urban areas.<br />
|-<br />
| [https://landsat.usgs.gov/ Landsat]<br />
| 30m<br />
| Every 16 days, 1972 to Present<br />
| Landsat images capture the earth across [https://landsat.usgs.gov/what-are-band-designations-landsat-satellites multiple spectral bands], including spectra unobservable to the human eye. Different combinations of these spectral bands emphasize different aspects of the earth. One of the most common indices is the Normalized Difference Vegetation Index ([https://earthobservatory.nasa.gov/Features/MeasuringVegetation/measuring_vegetation_2.php NDVI]), which provides a measure of vegetation biomass. A list of common indices can be found [http://pro.arcgis.com/en/pro-app/help/data/imagery/indices-gallery.htm here].<br />
|-<br />
| [https://www.esa-landcover-cci.org/?q=node/175 ESA Land Cover]<br />
| 300m<br />
| Annual, 1992 to 2015<br />
| Classifies land cover into one of [https://www.theia-land.fr/en/products/land-cover-globcover 22 land cover types].<br />
|}<br />
<br />
===Georeferenced Data Sources===<br />
*[http://aiddata.org/datasets AidData]: AidData has geocoded foreign aid projects from a number of different donors and countries, including all approved [http://aiddata.org/data/world-bank-geocoded-research-release-level-1-v1-4-2 World Bank projects] from 1995 to 2014, [http://aiddata.org/datasets Chinese official finance] from 2000 to 2014, and [http://aiddata.org/data/afdb-2009-2010-all-approved-projects African Development Bank] projects approved in 2009-2010.<br />
*[http://afrobarometer.org/data/geocoded-data Afrobarometer]: Afrobarometer has surveyed attitudes on democracy, governance and society across 36 countries in Africa in 6 survey rounds from 1999 to 2015. In partnership with AidData, Afrobarometer has recently geocoded the surveys.<br />
*[https://dhsprogram.com/ Demographic and Health Surveys (DHS)]: DHS surveys from USAID are georeferenced at the enumeration area level. However, DHS [https://dhsprogram.com/pubs/pdf/SAR7/SAR7.pdf randomly displaces] the geographic coordinates to protect respondent confidentiality. <br />
*[http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/EXTLSMS/0,,contentMDK:23512006~pagePK:64168445~piPK:64168309~theSitePK:3358997,00.html Living Standards Measurement Survey (LSMS)]: Most LSMS datasets are geocoded at the enumeration area level. <br />
<br />
===Impact Evaluation with Geospatial Data===<br />
The emergence of georeferenced data has provided opportunities to evaluate foreign investments at lower costs than traditional RCTs. These evaluations have been dubbed Geospatial Impact Evaluations (GIEs); see [http://docs.aiddata.org/ad4/pdfs/wps44_a_primer_on_geospatial_impact_evaluation_methods_tools_and_applications.pdf here] for a working paper from AidData that describes methods and applications to perform GIEs. The paper describes a number of papers that conduct GIEs. In addition, it highlights two R packages that employ methods relevant to using geospatial data: (1) [https://github.com/itpir/geoMatch geoMATCH], which employs matching while accounting for geographic spillover from treatment to control units and (2) [https://github.com/itpir/geoSIMEX geoSIMEX], which allows users to account for spatial imprecision in analysis. <br />
<br />
===Use of intersection to produce usable data for Stata===<br />
<br />
*For a particular project that has a spatial component, you first need to have geographical polygons that are representative of your project. For instance, the polygons that shows the limits of your villages.<br />
*Then, for some particular themes you will only find data in the form of GIS data. Examples of this are hospitals and clinics, or rainfall. The data in question is then intrinsically accompanied with coordinates, that allows to position the information in a spatial setting.<br />
*So then, after importing the particular data to a GIS software, you overlay your new data to the geographical layer that represents your project.<br />
*Once done, and simply intersect both layers, and ask for instance for a mean over your geographical polygons, or for a maximum or for a count. In QGIS, for instance, you will find a bunch of these useful spacial operations in the Vector/Geoprocessing Tools section.<br />
*These tools also allow you for instance to substract polygons from others; Then why not intersecting with data layer afterwards? It is much useful.<br />
*All that needs to be done then is to export your newly generated data.<br />
<br />
===Data Interpolation===<br />
<br />
This is useful for those who want to generate information in between spatial measurements. You will not want to this is several settings, since the granularity of your data has a value. However for some themes, like for the level of a groundwater table, or for the level of a ground contamination, this is useful as the nature of the subject itself (ex: contamination) is continuous.<br />
<br />
*The main thing one should know, is that the mathematical method you chose for the interpolation has a large impact on your results.<br />
*In case of doubts (or in much of cases, let's say), one should chose to use krigging, since the interpolation (let's visualize it as a surface) goes through the measurement points exactly. It's distance at a measurement point is equal to zero.<br />
*GIS softwares usually have modules that allow to do interpolation. They also allow to do dynamic modelling (state of your variables, in space, with time).<br />
*The interpolation of your data lead to the production of heat maps.<br />
<br />
===Heat Maps===<br />
*When producing heat maps, one should know that there exist mathematical tools that allow you to enhance the spatial shapes that your data.<br />
*These "tools" are in fact transformations on your surface, such as first difference, Fourrier, ect. They can provide a much better definition of your results, and even allow you to "see" something that you might have missed when not using them.<br />
<br />
===Examples of Papers===<br />
<br />
* Many influential papers using these type of data have been published in journals<br />
<br />
* J. Vernon Henderson, Adam Storeygard, and David N. Weil. 2012. [http://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.102.2.994 Measuring Economic Growth from Outer Space]. In '''American Economic Review''', 102(2): 994-1028. <br />
<br />
* Dave Donaldson and Adam Storeygard. 2016. <br />
[http://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.30.4.171 The View from Above: Applications of Satellite Data in Economics]. '''Journal of Economic Perspectives''', 30(4):171-198.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Secondary Data Sources]]<br />
<br />
== Additional Resources ==<br />
* Documentation for QGIS which covers a lot of topics in great detail: https://docs.qgis.org/2.2/en/docs/index.html<br />
<br />
[[Category: Secondary Data Sources ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Geo_Spatial_Data&diff=4782Geo Spatial Data2018-08-06T20:23:57Z<p>Maria jones: /* Examples of Papers */</p>
<hr />
<div>== Read First ==<br />
<onlyinclude><br />
The proliferation of remote sensing technology has created new opportunities for high-resolution, affordable geospatial data, which have great potential as a source of impact evaluation data. <br />
<br />
</onlyinclude><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Guidelines ==<br />
<br />
===Repositories of Spatial Data===<br />
The following are repositories of spatial data. The following sites pull in spatial data from a variety of sources.<br />
*[https://earthengine.google.com/datasets/ Google Earth Engine]: Stores petabytes of satellite imagery on google's cloud.<br />
*[http://sedac.ciesin.columbia.edu/ Socio Economic Data and Applications Center (SEDAC)]: Provides links to a number of spatially referenced datasets.<br />
*[http://geoquery.org/ AidData geo.query]: Allows users to extract data to administrative boundaries.<br />
<br />
===Satellite-Based Datasets===<br />
The following are commonly used datasets from satellite imagery or derived from satellite imagery.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Dataset<br />
! Spatial Resolution<br />
! Temporal Resolution<br />
! Description<br />
|-<br />
| [https://ngdc.noaa.gov/eog/viirs/index.html Nighttime Lights: VIIRS]<br />
| 300m<br />
| Monthly, 2012 to Present<br />
| Nighttime lights has increasingly been used as a metric for local economic development. <br />
|-<br />
| [https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html Nighttime Lights: DMSP-OLS]<br />
| 750m<br />
| Annual, 1992-2013<br />
| For studies that need a long time series of nighttime lights, DMSP-OLS data can be combined with VIIRS data. VIIRS, however, has [http://journals.sfu.ca/apan/index.php/apan/article/view/7/pdf_7 several improvements] over DMSP-OLS, including a high resolution and less light saturation in urban areas.<br />
|-<br />
| [https://landsat.usgs.gov/ Landsat]<br />
| 30m<br />
| Every 16 days, 1972 to Present<br />
| Landsat images capture the earth across [https://landsat.usgs.gov/what-are-band-designations-landsat-satellites multiple spectral bands], including spectra unobservable to the human eye. Different combinations of these spectral bands emphasize different aspects of the earth. One of the most common indices is the Normalized Difference Vegetation Index ([https://earthobservatory.nasa.gov/Features/MeasuringVegetation/measuring_vegetation_2.php NDVI]), which provides a measure of vegetation biomass. A list of common indices can be found [http://pro.arcgis.com/en/pro-app/help/data/imagery/indices-gallery.htm here].<br />
|-<br />
| [https://www.esa-landcover-cci.org/?q=node/175 ESA Land Cover]<br />
| 300m<br />
| Annual, 1992 to 2015<br />
| Classifies land cover into one of [https://www.theia-land.fr/en/products/land-cover-globcover 22 land cover types].<br />
|}<br />
<br />
===Georeferenced Data Sources===<br />
*[http://aiddata.org/datasets AidData]: AidData has geocoded foreign aid projects from a number of different donors and countries, including all approved [http://aiddata.org/data/world-bank-geocoded-research-release-level-1-v1-4-2 World Bank projects] from 1995 to 2014, [http://aiddata.org/datasets Chinese official finance] from 2000 to 2014, and [http://aiddata.org/data/afdb-2009-2010-all-approved-projects African Development Bank] projects approved in 2009-2010.<br />
*[http://afrobarometer.org/data/geocoded-data Afrobarometer]: Afrobarometer has surveyed attitudes on democracy, governance and society across 36 countries in Africa in 6 survey rounds from 1999 to 2015. In partnership with AidData, Afrobarometer has recently geocoded the surveys.<br />
*[https://dhsprogram.com/ Demographic and Health Surveys (DHS)]: DHS surveys from USAID are georeferenced at the enumeration area level. However, DHS [https://dhsprogram.com/pubs/pdf/SAR7/SAR7.pdf randomly displaces] the geographic coordinates to protect respondent confidentiality. <br />
*[http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/EXTLSMS/0,,contentMDK:23512006~pagePK:64168445~piPK:64168309~theSitePK:3358997,00.html Living Standards Measurement Survey (LSMS)]: Most LSMS datasets are geocoded at the enumeration area level. <br />
<br />
===Impact Evaluation with Geospatial Data===<br />
The emergence of georeferenced data has provided opportunities to evaluate foreign investments at lower costs than traditional RCTs. These evaluations have been dubbed Geospatial Impact Evaluations (GIEs); see [http://docs.aiddata.org/ad4/pdfs/wps44_a_primer_on_geospatial_impact_evaluation_methods_tools_and_applications.pdf here] for a working paper from AidData that describes methods and applications to perform GIEs. The paper describes a number of papers that conduct GIEs. In addition, it highlights two R packages that employ methods relevant to using geospatial data: (1) [https://github.com/itpir/geoMatch geoMATCH], which employs matching while accounting for geographic spillover from treatment to control units and (2) [https://github.com/itpir/geoSIMEX geoSIMEX], which allows users to account for spatial imprecision in analysis. <br />
<br />
===Use of intersection to produce usable data for Stata===<br />
<br />
*For a particular project that has a spatial component, you first need to have geographical polygons that are representative of your project. For instance, the polygons that shows the limits of your villages.<br />
*Then, for some particular themes you will only find data in the form of GIS data. Examples of this are hospitals and clinics, or rainfall. The data in question is then intrinsically accompanied with coordinates, that allows to position the information in a spatial setting.<br />
*So then, after importing the particular data to a GIS software, you overlay your new data to the geographical layer that represents your project.<br />
*Once done, and simply intersect both layers, and ask for instance for a mean over your geographical polygons, or for a maximum or for a count. In QGIS, for instance, you will find a bunch of these useful spacial operations in the Vector/Geoprocessing Tools section.<br />
*These tools also allow you for instance to substract polygons from others; Then why not intersecting with data layer afterwards? It is much useful.<br />
*All that needs to be done then is to export your newly generated data.<br />
<br />
===Data Interpolation===<br />
<br />
This is useful for those who want to generate information in between spatial measurements. You will not want to this is several settings, since the granularity of your data has a value. However for some themes, like for the level of a groundwater table, or for the level of a ground contamination, this is useful as the nature of the subject itself (ex: contamination) is continuous.<br />
<br />
*The main thing one should know, is that the mathematical method you chose for the interpolation has a large impact on your results.<br />
*In case of doubts (or in much of cases, let's say), one should chose to use krigging, since the interpolation (let's visualize it as a surface) goes through the measurement points exactly. It's distance at a measurement point is equal to zero.<br />
*GIS softwares usually have modules that allow to do interpolation. They also allow to do dynamic modelling (state of your variables, in space, with time).<br />
*The interpolation of your data lead to the production of heat maps.<br />
<br />
===Heat Maps===<br />
*When producing heat maps, one should know that there exist mathematical tools that allow you to enhance the spatial shapes that your data.<br />
*These "tools" are in fact transformations on your surface, such as first difference, Fourrier, ect. They can provide a much better definition of your results, and even allow you to "see" something that you might have missed when not using them.<br />
<br />
===Examples of Papers===<br />
<br />
* Many influential papers using these type of data have been published in journals<br />
<br />
* J. Vernon Henderson, Adam Storeygard, and David N. Weil. 2012. [http://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.102.2.994 Measuring Economic Growth from Outer Space]. In '''American Economic Review''', 102(2): 994-1028. <br />
<br />
* Dave Donaldson and Adam Storeygard. 2016. [http://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.30.4.171 The View from Above: Applications of Satellite Data in Economics]. ''Journal of Economic Perspectives'', 30(4):171-198.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Secondary Data Sources]]<br />
<br />
== Additional Resources ==<br />
* Documentation for QGIS which covers a lot of topics in great detail: https://docs.qgis.org/2.2/en/docs/index.html<br />
<br />
[[Category: Secondary Data Sources ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Geo_Spatial_Data&diff=4781Geo Spatial Data2018-08-06T20:23:28Z<p>Maria jones: /* Examples of Papers */</p>
<hr />
<div>== Read First ==<br />
<onlyinclude><br />
The proliferation of remote sensing technology has created new opportunities for high-resolution, affordable geospatial data, which have great potential as a source of impact evaluation data. <br />
<br />
</onlyinclude><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Guidelines ==<br />
<br />
===Repositories of Spatial Data===<br />
The following are repositories of spatial data. The following sites pull in spatial data from a variety of sources.<br />
*[https://earthengine.google.com/datasets/ Google Earth Engine]: Stores petabytes of satellite imagery on google's cloud.<br />
*[http://sedac.ciesin.columbia.edu/ Socio Economic Data and Applications Center (SEDAC)]: Provides links to a number of spatially referenced datasets.<br />
*[http://geoquery.org/ AidData geo.query]: Allows users to extract data to administrative boundaries.<br />
<br />
===Satellite-Based Datasets===<br />
The following are commonly used datasets from satellite imagery or derived from satellite imagery.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Dataset<br />
! Spatial Resolution<br />
! Temporal Resolution<br />
! Description<br />
|-<br />
| [https://ngdc.noaa.gov/eog/viirs/index.html Nighttime Lights: VIIRS]<br />
| 300m<br />
| Monthly, 2012 to Present<br />
| Nighttime lights has increasingly been used as a metric for local economic development. <br />
|-<br />
| [https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html Nighttime Lights: DMSP-OLS]<br />
| 750m<br />
| Annual, 1992-2013<br />
| For studies that need a long time series of nighttime lights, DMSP-OLS data can be combined with VIIRS data. VIIRS, however, has [http://journals.sfu.ca/apan/index.php/apan/article/view/7/pdf_7 several improvements] over DMSP-OLS, including a high resolution and less light saturation in urban areas.<br />
|-<br />
| [https://landsat.usgs.gov/ Landsat]<br />
| 30m<br />
| Every 16 days, 1972 to Present<br />
| Landsat images capture the earth across [https://landsat.usgs.gov/what-are-band-designations-landsat-satellites multiple spectral bands], including spectra unobservable to the human eye. Different combinations of these spectral bands emphasize different aspects of the earth. One of the most common indices is the Normalized Difference Vegetation Index ([https://earthobservatory.nasa.gov/Features/MeasuringVegetation/measuring_vegetation_2.php NDVI]), which provides a measure of vegetation biomass. A list of common indices can be found [http://pro.arcgis.com/en/pro-app/help/data/imagery/indices-gallery.htm here].<br />
|-<br />
| [https://www.esa-landcover-cci.org/?q=node/175 ESA Land Cover]<br />
| 300m<br />
| Annual, 1992 to 2015<br />
| Classifies land cover into one of [https://www.theia-land.fr/en/products/land-cover-globcover 22 land cover types].<br />
|}<br />
<br />
===Georeferenced Data Sources===<br />
*[http://aiddata.org/datasets AidData]: AidData has geocoded foreign aid projects from a number of different donors and countries, including all approved [http://aiddata.org/data/world-bank-geocoded-research-release-level-1-v1-4-2 World Bank projects] from 1995 to 2014, [http://aiddata.org/datasets Chinese official finance] from 2000 to 2014, and [http://aiddata.org/data/afdb-2009-2010-all-approved-projects African Development Bank] projects approved in 2009-2010.<br />
*[http://afrobarometer.org/data/geocoded-data Afrobarometer]: Afrobarometer has surveyed attitudes on democracy, governance and society across 36 countries in Africa in 6 survey rounds from 1999 to 2015. In partnership with AidData, Afrobarometer has recently geocoded the surveys.<br />
*[https://dhsprogram.com/ Demographic and Health Surveys (DHS)]: DHS surveys from USAID are georeferenced at the enumeration area level. However, DHS [https://dhsprogram.com/pubs/pdf/SAR7/SAR7.pdf randomly displaces] the geographic coordinates to protect respondent confidentiality. <br />
*[http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/EXTLSMS/0,,contentMDK:23512006~pagePK:64168445~piPK:64168309~theSitePK:3358997,00.html Living Standards Measurement Survey (LSMS)]: Most LSMS datasets are geocoded at the enumeration area level. <br />
<br />
===Impact Evaluation with Geospatial Data===<br />
The emergence of georeferenced data has provided opportunities to evaluate foreign investments at lower costs than traditional RCTs. These evaluations have been dubbed Geospatial Impact Evaluations (GIEs); see [http://docs.aiddata.org/ad4/pdfs/wps44_a_primer_on_geospatial_impact_evaluation_methods_tools_and_applications.pdf here] for a working paper from AidData that describes methods and applications to perform GIEs. The paper describes a number of papers that conduct GIEs. In addition, it highlights two R packages that employ methods relevant to using geospatial data: (1) [https://github.com/itpir/geoMatch geoMATCH], which employs matching while accounting for geographic spillover from treatment to control units and (2) [https://github.com/itpir/geoSIMEX geoSIMEX], which allows users to account for spatial imprecision in analysis. <br />
<br />
===Use of intersection to produce usable data for Stata===<br />
<br />
*For a particular project that has a spatial component, you first need to have geographical polygons that are representative of your project. For instance, the polygons that shows the limits of your villages.<br />
*Then, for some particular themes you will only find data in the form of GIS data. Examples of this are hospitals and clinics, or rainfall. The data in question is then intrinsically accompanied with coordinates, that allows to position the information in a spatial setting.<br />
*So then, after importing the particular data to a GIS software, you overlay your new data to the geographical layer that represents your project.<br />
*Once done, and simply intersect both layers, and ask for instance for a mean over your geographical polygons, or for a maximum or for a count. In QGIS, for instance, you will find a bunch of these useful spacial operations in the Vector/Geoprocessing Tools section.<br />
*These tools also allow you for instance to substract polygons from others; Then why not intersecting with data layer afterwards? It is much useful.<br />
*All that needs to be done then is to export your newly generated data.<br />
<br />
===Data Interpolation===<br />
<br />
This is useful for those who want to generate information in between spatial measurements. You will not want to this is several settings, since the granularity of your data has a value. However for some themes, like for the level of a groundwater table, or for the level of a ground contamination, this is useful as the nature of the subject itself (ex: contamination) is continuous.<br />
<br />
*The main thing one should know, is that the mathematical method you chose for the interpolation has a large impact on your results.<br />
*In case of doubts (or in much of cases, let's say), one should chose to use krigging, since the interpolation (let's visualize it as a surface) goes through the measurement points exactly. It's distance at a measurement point is equal to zero.<br />
*GIS softwares usually have modules that allow to do interpolation. They also allow to do dynamic modelling (state of your variables, in space, with time).<br />
*The interpolation of your data lead to the production of heat maps.<br />
<br />
===Heat Maps===<br />
*When producing heat maps, one should know that there exist mathematical tools that allow you to enhance the spatial shapes that your data.<br />
*These "tools" are in fact transformations on your surface, such as first difference, Fourrier, ect. They can provide a much better definition of your results, and even allow you to "see" something that you might have missed when not using them.<br />
<br />
===Examples of Papers===<br />
<br />
* Many influential papers using these type of data have been published in journals<br />
<br />
* J. Vernon Henderson, Adam Storeygard, and David N. Weil. 2012. [http://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.102.2.994 Measuring Economic Growth from Outer Space]. In '''American Economic Review''', 102(2): 994-1028. <br />
<br />
* Dave Donaldson and Adam Storeygard. 2016. [http://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.30.4.171 The View from Above: Applications of Satellite Data in Economics']. ''Journal of Economic Perspectives'', 30(4):171-198.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Secondary Data Sources]]<br />
<br />
== Additional Resources ==<br />
* Documentation for QGIS which covers a lot of topics in great detail: https://docs.qgis.org/2.2/en/docs/index.html<br />
<br />
[[Category: Secondary Data Sources ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices&diff=4780Stata Coding Practices2018-08-03T19:57:34Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div>This page list a lot of resources both developed at DIME but also by other people or organizations.<br />
<br />
== ietoolkit ==<br />
<onlyinclude><br />
At DIME we have developed a packge of Stata commands specially developed for impact evaluations but could also be useful in other contexts as well. The package is called '''ietoolkit''' and can be installed from the SSC server. To install the package, type <code>ssc install ietoolkit</code> in your Stata command window.<br />
</onlyinclude><br />
Please visit our github page for details: https://github.com/worldbank/ietoolkit<br />
<br />
'''ietoolkit''' provides a set of commands that address different aspects of data management and data analysis in relation to Impact Evaluations. These include the following: <br />
# [[iebaltab]] is a tool for multiple treatment arm balance tables<br />
# [[ieboilsave]] performs checks before saving a data set<br />
# [[ieboilstart]] standardizes the boilerplate code at the top of all do-files<br />
# [[iedropone]] drops observations and controls that the correct number was dropped<br />
# [[ieduplicates]] and [[iecompdup]] are useful tools to identify and correct for duplicates, particulary in primary survey data<br />
# [[iefolder]] sets up project folders and creates master do-files that links to all sub-folders<br />
# [[iegitaddmd]] adds a placeholder file to empty folders so that folder structures with empty folders can be shared on GitHub<br />
# [[iegraph]] produces graphs of estimation results in common impact evaluation regression models<br />
# [[iematch]] is an algortihm for matching observations in one group to "the most similar" observations in another group<br />
<br />
== Stata Command Repository ==<br />
<br />
Repository with a large number of [https://github.com/worldbank/stata Stata ado files]. These commands cannot be installed through SSC but click the link for installation instructions. This repository contains a broad variety of Stata commands (adofiles) which are useful in data management, statistical analysis, and the production of graphics. In many cases, these adofiles reduce the production of routine items from a tedious programming task to a single command line – such as data import and cleaning; production of summary statistics tables; and categorical bar charts with confidence intervals.<br />
<br />
== DIME's Stata IE Visual Library ==<br />
<br />
We have developed a repository where we collect [https://github.com/worldbank/Stata-IE-Visual-Library Stata Graph examples] on GitHub. Feel free to submit your own example codes there. <br />
<br />
== Additional Resources ==<br />
<br />
* [http://www.poverty-action.org/researchers/research-resources/stata-programs Stata modules for data collection and analysis] developed by Innovations for Poverty Action<br />
* [https://github.com/PovertyAction/odkmeta odkmeta odkmeta command]<br />
*[http://geocenter.github.io/StataTraining/portfolio/01_resource/ Stata cheat sheets] on github<br />
<br />
[[Category: Stata ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_Analysis&diff=4779Data Analysis2018-08-03T19:56:17Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div><br />
<onlyinclude><br />
Data analysis refers to the full process of exploring and describing trends and results from data. Data analysis typically has two stages:<br />
<br />
# Exploratory Analysis<br />
# Final Analysis<br />
</onlyinclude><br />
In exploratory analysis, emphasis will be on producing easily understood summaries of the trends in the data so that the reports, publications, presentations, and summaries that need to be produced can begin to be outlined. Once those stories begin to come together, the code is re-written in a "final" form which would be appropriate for public release with the results.<br />
<br />
== Preparing the Dataset for Analysis ==<br />
<br />
Once data is collected, it must be recombined into a final format for analysis, including the construction of derived variables not present in the initial collection. See [[Data Cleaning]].<br />
<br />
== Organizing Analysis Files ==<br />
<br />
Analysis programs that is exploratory in nature should be held in an "exploratory" folder and separated according to topic. Particularly when folder syncing over [https://www.dropbox.com Dropbox] or [https://www.github.com Github] is being used, separating these files by function (rather than combining them into a single "analysis" file) allows multiple researchers to work simultaneously and modularly.<br />
<br />
When the final analysis workflow is agreed upon for a given publication or other output, a final analysis file should be collated for that output only in the "final" analysis folder. This allows selective reuse of the code from the exploratory analyses, in preparation for the final release of the code if required. This allows any collaborator, referee, or replicator to access only the code used to prepare the final outputs and reproduce them exactly.<br />
<br />
== Outputting Analytical Results ==<br />
<br />
Since the final analysis do-files are intended to be fully replicable, and the code itself is considered a vital, shareable output, all tables and figures should be created in such a way that the files are ordered, named, placed, and formatted appropriately. Running the analysis dofile should result in ''only'' necessary files in the "outputs" folder, with names like "figure_1.png", "table_1.xlsx", and so on.<br />
<br />
For some applications (such as creating internal presentations or simple Word reports, file types like PNG and XLSX are sufficiently functional. For larger projects with multiple collaborators, particularly when syncing over a [https://www.github.com GitHub] service, plaintext file types such as EPS, CSV, and TEX will be the preferred formats. Tables and figures should at minimum be produced by this file such that no further mathematical calculations are required; they should furthermore be organized and formatted as nearly to the published versions as possible. Figures are typically easy to do this in by using an appropriate <code>graph export</code> command in Stata or the equivalent. [https://www.latex-project.org LaTeX] is a particularly powerful tool for doing this with tables. DIME provides several guides on both processes. See [[Exporting Analysis |exporting analysis results]] for more details and more resources.<br />
<br />
== Resources for Specific Analytical Tasks ==<br />
<br />
===Spatial/GIS Analysis===<br />
<br />
[[Spatial Analysis]] involves using geospatial information from your data to explore relationships mediated by proximity or connectiveness. This can be descriptive (such as map illustrations) or informative (such as distance to and quality of the nearest road).<br />
<br />
===Randomization Inference===<br />
<br />
[[Randomization Inference]] techniques replace the "normal" p-values from regression analyses with values based on the treatment assignment methodology, and are generally recommended for reporting in experiments whose estimates are of randomly assigned treatment controlled by the implementer and researcher.<br />
<br />
===Heterogeneous Effects Analysis===<br />
<br />
=== Cost Effectiveness Analysis ===<br />
<br />
[[Cost-effectiveness Analysis]] is the economic analysis of the costs and benefits of an impact evaluation project.<br />
<br />
=== Regression Discontinuity Analysis ===<br />
Here is [http://www-personal.umich.edu/~cattaneo/books/Cattaneo-Idrobo-Titiunik_2018_CUP-Vol2.pdf a practical guide] for analyzing regression discontinuity studies. <br />
<br />
=== Data Visualization ===<br />
<br />
[[Data visualization]] is a critical step in effectively communicating your research results.<br />
<br />
== Additional Resources ==<br />
* The Stata cheat sheet on [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_analysis_201615_June-REV.pdf Data analysis] is a useful reminder of relevant stata code. The cheat sheet on [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_programming15_2016_June_TE-REV.pdf Stata programming] is a good resource for more advanced analytical tasks in Stata. <br />
<br />
<br />
[[Category: Data Analysis ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_Analysis&diff=4778Data Analysis2018-08-03T19:55:50Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div><br />
<onlyinclude><br />
Data analysis refers to the full process of exploring and describing trends and results from data. Data analysis typically has two stages:<br />
<br />
# Exploratory Analysis<br />
# Final Analysis<br />
</onlyinclude><br />
In exploratory analysis, emphasis will be on producing easily understood summaries of the trends in the data so that the reports, publications, presentations, and summaries that need to be produced can begin to be outlined. Once those stories begin to come together, the code is re-written in a "final" form which would be appropriate for public release with the results.<br />
<br />
== Preparing the Dataset for Analysis ==<br />
<br />
Once data is collected, it must be recombined into a final format for analysis, including the construction of derived variables not present in the initial collection. See [[Data Cleaning]].<br />
<br />
== Organizing Analysis Files ==<br />
<br />
Analysis programs that is exploratory in nature should be held in an "exploratory" folder and separated according to topic. Particularly when folder syncing over [https://www.dropbox.com Dropbox] or [https://www.github.com Github] is being used, separating these files by function (rather than combining them into a single "analysis" file) allows multiple researchers to work simultaneously and modularly.<br />
<br />
When the final analysis workflow is agreed upon for a given publication or other output, a final analysis file should be collated for that output only in the "final" analysis folder. This allows selective reuse of the code from the exploratory analyses, in preparation for the final release of the code if required. This allows any collaborator, referee, or replicator to access only the code used to prepare the final outputs and reproduce them exactly.<br />
<br />
== Outputting Analytical Results ==<br />
<br />
Since the final analysis do-files are intended to be fully replicable, and the code itself is considered a vital, shareable output, all tables and figures should be created in such a way that the files are ordered, named, placed, and formatted appropriately. Running the analysis dofile should result in ''only'' necessary files in the "outputs" folder, with names like "figure_1.png", "table_1.xlsx", and so on.<br />
<br />
For some applications (such as creating internal presentations or simple Word reports, file types like PNG and XLSX are sufficiently functional. For larger projects with multiple collaborators, particularly when syncing over a [https://www.github.com GitHub] service, plaintext file types such as EPS, CSV, and TEX will be the preferred formats. Tables and figures should at minimum be produced by this file such that no further mathematical calculations are required; they should furthermore be organized and formatted as nearly to the published versions as possible. Figures are typically easy to do this in by using an appropriate <code>graph export</code> command in Stata or the equivalent. [https://www.latex-project.org LaTeX] is a particularly powerful tool for doing this with tables. DIME provides several guides on both processes. See [[Exporting Analysis |exporting analysis results]] for more details and more resources.<br />
<br />
== Resources for Specific Analytical Tasks ==<br />
<br />
===Spatial/GIS Analysis===<br />
<br />
[[Spatial Analysis]] involves using geospatial information from your data to explore relationships mediated by proximity or connectiveness. This can be descriptive (such as map illustrations) or informative (such as distance to and quality of the nearest road).<br />
<br />
===Randomization Inference===<br />
<br />
[[Randomization Inference]] techniques replace the "normal" p-values from regression analyses with values based on the treatment assignment methodology, and are generally recommended for reporting in experiments whose estimates are of randomly assigned treatment controlled by the implementer and researcher.<br />
<br />
===Heterogeneous Effects Analysis===<br />
<br />
=== Cost Effectiveness Analysis ===<br />
<br />
[[Cost-effectiveness Analysis]] is the economic analysis of the costs and benefits of an impact evaluation project.<br />
<br />
=== Regression Discontinuity Analysis ===<br />
Here is [http://www-personal.umich.edu/~cattaneo/books/Cattaneo-Idrobo-Titiunik_2018_CUP-Vol2.pdf a practical guide] for analyzing regression discontinuity studies. <br />
<br />
=== Data Visualization ===<br />
<br />
[[Data visualization]] is a critical step in effectively communicating your research results.<br />
<br />
== Additional Resources ==<br />
* The Stata cheat sheet on [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_analysis_201615_June-REV.pdf Data analysis] is a useful reminder of relevant stata code. The cheat sheet on [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_programming15_2016_June_TE-REV.pdf] is a good resource for more advanced analytical tasks in Stata. <br />
<br />
<br />
[[Category: Data Analysis ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_Analysis&diff=4777Data Analysis2018-08-03T19:55:08Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div><br />
<onlyinclude><br />
Data analysis refers to the full process of exploring and describing trends and results from data. Data analysis typically has two stages:<br />
<br />
# Exploratory Analysis<br />
# Final Analysis<br />
</onlyinclude><br />
In exploratory analysis, emphasis will be on producing easily understood summaries of the trends in the data so that the reports, publications, presentations, and summaries that need to be produced can begin to be outlined. Once those stories begin to come together, the code is re-written in a "final" form which would be appropriate for public release with the results.<br />
<br />
== Preparing the Dataset for Analysis ==<br />
<br />
Once data is collected, it must be recombined into a final format for analysis, including the construction of derived variables not present in the initial collection. See [[Data Cleaning]].<br />
<br />
== Organizing Analysis Files ==<br />
<br />
Analysis programs that is exploratory in nature should be held in an "exploratory" folder and separated according to topic. Particularly when folder syncing over [https://www.dropbox.com Dropbox] or [https://www.github.com Github] is being used, separating these files by function (rather than combining them into a single "analysis" file) allows multiple researchers to work simultaneously and modularly.<br />
<br />
When the final analysis workflow is agreed upon for a given publication or other output, a final analysis file should be collated for that output only in the "final" analysis folder. This allows selective reuse of the code from the exploratory analyses, in preparation for the final release of the code if required. This allows any collaborator, referee, or replicator to access only the code used to prepare the final outputs and reproduce them exactly.<br />
<br />
== Outputting Analytical Results ==<br />
<br />
Since the final analysis do-files are intended to be fully replicable, and the code itself is considered a vital, shareable output, all tables and figures should be created in such a way that the files are ordered, named, placed, and formatted appropriately. Running the analysis dofile should result in ''only'' necessary files in the "outputs" folder, with names like "figure_1.png", "table_1.xlsx", and so on.<br />
<br />
For some applications (such as creating internal presentations or simple Word reports, file types like PNG and XLSX are sufficiently functional. For larger projects with multiple collaborators, particularly when syncing over a [https://www.github.com GitHub] service, plaintext file types such as EPS, CSV, and TEX will be the preferred formats. Tables and figures should at minimum be produced by this file such that no further mathematical calculations are required; they should furthermore be organized and formatted as nearly to the published versions as possible. Figures are typically easy to do this in by using an appropriate <code>graph export</code> command in Stata or the equivalent. [https://www.latex-project.org LaTeX] is a particularly powerful tool for doing this with tables. DIME provides several guides on both processes. See [[Exporting Analysis |exporting analysis results]] for more details and more resources.<br />
<br />
== Resources for Specific Analytical Tasks ==<br />
<br />
===Spatial/GIS Analysis===<br />
<br />
[[Spatial Analysis]] involves using geospatial information from your data to explore relationships mediated by proximity or connectiveness. This can be descriptive (such as map illustrations) or informative (such as distance to and quality of the nearest road).<br />
<br />
===Randomization Inference===<br />
<br />
[[Randomization Inference]] techniques replace the "normal" p-values from regression analyses with values based on the treatment assignment methodology, and are generally recommended for reporting in experiments whose estimates are of randomly assigned treatment controlled by the implementer and researcher.<br />
<br />
===Heterogeneous Effects Analysis===<br />
<br />
=== Cost Effectiveness Analysis ===<br />
<br />
[[Cost-effectiveness Analysis]] is the economic analysis of the costs and benefits of an impact evaluation project.<br />
<br />
=== Regression Discontinuity Analysis ===<br />
Here is [http://www-personal.umich.edu/~cattaneo/books/Cattaneo-Idrobo-Titiunik_2018_CUP-Vol2.pdf a practical guide] for analyzing regression discontinuity studies. <br />
<br />
=== Data Visualization ===<br />
<br />
[[Data visualization]] is a critical step in effectively communicating your research results.<br />
<br />
== Additional Resources ==<br />
* The Stata cheat sheet on [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_analysis_201615_June-REV.pdf Data analysis] is a useful reminder of relevant stata code. <br />
<br />
<br />
[[Category: Data Analysis ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_visualization&diff=4776Data visualization2018-08-03T19:54:00Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div>Data visualization is creating a visual representation of your data, for example in the form of a chart or a graph. Choosing the right format to visualize your data is critical to effectively communicating the results of your study. Good visualizations can be more memorable and persuasive than pure text. <br />
<br />
<br />
== Read First ==<br />
Specific code for data visualization is available on the software-specific tools (e.g. [[iegraph]]). This page discusses general principles for data visualization. <br />
<br />
== Guidelines ==<br />
<br />
=== What type of data visualization should I use? ===<br />
The best format for data visualization will depend on the type of data and results you wish to display, as well as the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles. <br />
<br />
* [https://www.data-to-viz.com/ |Data to Viz] provides a handy decision tree. <br />
<br />
* The [http://www.visual-literacy.org/periodic_table/periodic_table.html|Periodic Table of Visualization] provides a catalogue of all data visualization types with visual examples. <br />
<br />
* Gapminder.org [https://www.gapminder.org/tools/#$chart-type=bubbles|interactive visualization tools] provide beautiful examples of effective visualizations. <br />
<br />
===Stata Visual Library===<br />
The DIME Analytics team has created a [https://worldbank.github.io/Stata-IE-Visual-Library/|Stata Visual Library for Impact Evaluation] which shows examples of graphs and provides the codes used to create them. You can contribute to the library [https://github.com/worldbank/Stata-IE-Visual-Library|on our github]. <br />
<br />
=== Data Visualization in R ===<br />
R has many options for data visualization; the ggplot package is one of the best. Here is a list of [http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html|50 ggplot2 visualizations with full R code] . <br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Data Analysis]]<br />
<br />
<br />
== Additional Resources ==<br />
* Harvard Business Review Article on [https://hbr.org/2016/06/visualizations-that-really-work/| Visualizations that Really Work]<br />
* Stata Cheat sheets on [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_visualization15_Plots_2016_June-REV.pdf Data visualization] and [http://geocenter.github.io/StataTraining/pdf/StataCheatSheet_visualization15_Syntax_2016_June-REV.pdf customizing data visualization] are useful reminders of relevant stata code. <br />
<br />
<br />
<br />
[[Category: Data Analysis]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_Cleaning&diff=4775Data Cleaning2018-08-03T19:52:36Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div><span style="font-size:150%"><br />
</span><br />
</span><br />
<onlyinclude><br />
Data cleaning is an essential step between data collection and data analysis. Raw primary data is always imperfect and needs to be prepared so that it is easy to use in the analysis. This is the high level goal of Data Cleaning. In extremely rare cases, the only preparation needed is to document the data set, for example - by using labels. However, in the vast majority of cases, there are many small things that need to be addressed in the data set itself. This could both be addressing data points that are incorrect, or replacing values that are not real data points, but codes explaining why there is no real data point.<br />
</onlyinclude><br />
== Read First ==<br />
<br />
*See this [[Checklist:_Data_Cleaning|check list]] that can be used to make sure that common cleaning actions have been done when applicable.<br />
*As a [[Impact_Evaluation_Team#Research_Assistant|Research Assistant]] (RA) or [[Impact_Evaluation_Team#Field_Coordinator|Field Coordinator]] (FC), do not spend time trying to fix irregularities in the data at the expense of not having time to identifying as many irregularities as possible.<br />
*The quality of the analysis will never be better than the quality of data cleaning.<br />
*There is no such thing as an exhaustive list of what to do during data cleaning as each project will have individual cleaning needs, but this article provides a very good place to start.<br />
*After finishing the data cleaning for each round of data collection, data can be [[Publishing Data|released]]<br />
<br />
== The Goal of Cleaning ==<br />
<br />
There are two main goals when cleaning the data set:<br />
<br />
#Cleaning individual data points that invalidate or incorrectly bias the analysis.<br />
#Preparing a clean data set so that it is easy to use for other researchers. Both for researchers inside your team and outside your team.<br />
<br />
[[File:Picture2.png|700px|link=|center]]<br />
Another overarching goal of the cleaning process is to understand the data and the data collection really well. Much of this understanding feeds directly into the two points above, but a really good data cleaning process should also result in documented lessons learned that can be used in future data collection. Both in later data collection rounds in the same project, but also in data collections in other similar projects.<br />
<br />
=== Cleaning individual data points ===<br />
<br />
In impact evaluations, our analysis often come down to test for statistical differences in the mean between the control group and any of the treatment arms. We do so through regression analysis where we include control variables, fixed effects, and different error estimators, among many other tools. In essence, though, one can think of it as an advanced comparison of means. While this is far from a complete description of impact evaluation analysis, it might give the person cleaning a data set for the first time a framework on what cleaning a data set should achieve.<br />
<br />
It is difficult to have an intuition for the math behind a regression, but it easy to have an intuition for the math behind a mean. Anything that biases a mean will bias a regression, and while there are many more things that can bias a regression, this is a good place to start for anyone cleaning a data set for the first time. The researcher in charge of the analysis is trained in what else that needs to be done for the specific regression models used. The articles linked to below will go through specific examples, but it is probably obvious to most readers that outliers, typos in data, survey codes (often values like -999 or -888) etc. bias means, so it is never wrong to start with those examples.<br />
<br />
=== Prepare a clean data set ===<br />
<br />
The second goal of the data cleaning is to document the data set so that variables, values, and anything else is as self-explanatory as possible. This will help other researchers that you grant access to this data set, but it will also help you and your research team when accessing the data set in the future. At the time of the data collection or at the time of the data cleaning, you know the data set much better than you will at any time in the future. Carefully documenting this knowledge so that it can be used at the time of analysis is often the difference between a good analysis and a great analysis.<br />
<br />
== Role Division during Data Cleaning ==<br />
As a [[Impact_Evaluation_Team#Research_Assistant|Research Assistant]] (RA) or [[Impact_Evaluation_Team#Field_Coordinator|Field Coordinator]] (FC), spend time identifying and documenting irregularities in the data. It is never bad to suggest corrections to irregularities, but a common mistake RAs or FCs do is that they spend too much time on trying to fix irregularities at the expense of having enough time to identify and document as many as possible. One major reason for that is that different regression models might require different ways to correct issues and this is often a perspective only the PI has. In such cases, much time might have been spent on coming up with a correction that is not valid given the regression model used in the analysis.<br />
<br />
Eventually the [[Impact_Evaluation_Team#Principal_Investigator|Principal Investigator]] (PI) and the RA or FC will have a common understanding on what correction calls can be made without involving the PI, but until then, it's recommended that the RA focus her/his time on identifying and documenting as many issues as possible rather than spending a lot of time on how to fix the issues. It is no problem to do both as long as the fixing doesn't happen at the cost of identifying as many issues as possible.<br />
<br />
== Import Data ==<br />
<br />
The first step in cleaning the data is to import the data. If you work with secondary data (data prepared by someone else) then this step is often straightforward, but this is a step often underestimated when working with primary data. It is very important for any change, no matter how small, to always be made in Stata (or in R or any other scripting language). Even if you know that there are incorrect submissions in your raw data (duplicates, pilot data mixed with the main data etc.), those deletions should always be done in such a way that they can be replicated by re-running code. Without this information, the analysis might no longer be valid. See the article on [[DataWork_Survey_Round#Raw_Folder|raw data folders]] for more details.<br />
<br />
=== Importing Primary Survey Data ===<br />
<br />
All modern CAPI survey data collections tools provided methods for importing the raw data in a way that drastically reduces the amount of work that needs to be done when cleaning the data. These methods typically include a Stata do-file that generates labels and much more from the questionnaire code and then applies that to the raw data as it is being imported. If you are working in SurveyCTO see this article on [[SurveyCTO Stata Template | SurveyCTO's Stata Template]].<br />
<br />
== Examples of Data Cleaning Actions ==<br />
<br />
The material in this section has been generated with primary survey data in mind, although a lot of these practices are also applicable when cleaning other types of data sets.<br />
<br />
'''Data Cleaning Check List'''. This is a check list that can be used to make sure that all common aspects of data cleaning has been covered. Note that this is not an exhaustive list. Such a list is impossible to create as the individual data sets and the analysis methods used on them all require different cleaning that in the details depends on the context of that data set.<br />
<br />
===ID Variables===<br />
It's important that the clean dataset be uniquely and fully identifiable by a single variable. It often is the case that when [[Primary Data Collection|primary data]] is imported, there are [[Duplicates and Survey Logs|duplicated entries]]. These cases must be carefully documented, and should only be corrected after discussing with the [[Field Coordinator]] and field team what caused them, so the right observations are kept in the dataset. [[ieduplicates]], a command in [[Stata Coding Practices#ietoolkit|ietoolkit]] is a useful command to identify and correct duplicated entries. Once duplicates are corrected, the observations can be linked to the [[Master Data Set|master dataset]], and the dataset, [[De-identification|de-identified]].<br />
<br />
=== Incorrect Data and Other Irregularities ===<br />
<br />
There are countless ways that there can be irregularities in a primary data set, so there is no way to do an exhaustive list of what should be done. This section gives a few examples: <br />
<br />
'''Outliers'''. There are many rules of thumb for how to define an outlier but there is no silver bullet. One rule of thumb is that any data point that is three standard deviations away from the mean of the same data point for all observations. This may always be a starting point, but one needs to qualitatively consider if this is a correct approach. Observations with outliers should not be dropped, but in some cases, the data point for that observation is replaced with a missing value. There are often better approaches. One common approach is to use winsorization, where any values bigger than a certain percentile, often the 99th, are replaced with the value at that percentile. This way very large values are prevented from biasing the mean. This also has an equality of impact aspect. For example, if all benefit of a project went to a single observation in the treatment group, then the mean would still be high, but that is rarely a desired outcome in development. So winsorization penalizes inequitable distribution of the benefits of a project.<br />
<!----- EDIT HERE -------><br />
<br />
'''Illogical Values'''. This is to test that one data point is possible in relation to another value. For example, if a respondent is male, then the respondent cannot answer that he is pregnant. This simple case is something that can and should be programmed into the questionnaire so that it does not happen. However, no questionnaire ever can be pre-programmed to control for every such case.<br />
<br />
'''Typos'''. If it is obvious beyond any doubt that the response is incorrect due to a simple typo, then it is a good idea to correct the type as long as it is done in a documented and reproducible way.<br />
<br />
=== Survey Codes and Missing Values ===<br />
<br />
Almost all data collection done through surveys of any sort allows the respondent to answer something like "Do not know" or "Declined to answer" for individual questions. These answers are usually recorded using survey codes on the format -999, -88 or something similar. It is obvious that these numbers will bias means and regressions if they are left as such. These values must be replaced with missing values in Stata. <br />
<br />
Stata has several missing values. The most well know is the regular missing value represented by a single "." but we would lose the difference in meaning between "Do not know" and "Declined to answer" if both codes representing them were replaced with the regular missing value. Stata offers a solution with its extended missing values. They are represented by ".a", ".b", ".c" etc. all the way to ".z". Stata handles these values the same as "." in commands that expect a numeric value, but they can be labeled differently and the original information is therefore not lost. Make sure that the same letter ".a", ".b" etc. is used to always represent only one thing across your project. The missing values should be assigned value labels so that they can be interpreted. See [http://www.stata.com/manuals13/u12.pdf#u12.2 Stata Manual Missing Values] for more details on missing values.<br />
<br />
Missing values can be used for much more than just survey codes. Any value that we remove because we found out is incorrect should be replaced with a missing value. In a [[Master Data Set]], there should be no regular missing values. All missing values in a master data set should contain an explanation of why we do not have that information for that observation.<br />
<br />
=== No Strings ===<br />
<br />
All data should be stored in numeric format. There are multiple reasons for this, but the two most important are that (1) numbers are stored more efficiently and (2) many Stata commands expect values to be stored numerically. Categorical string variables should be stored as numeric codes and have value labels assigned.<br />
<br />
There are two exceptions where string variables are allowed. The two examples are listed below:<br />
<br />
'''Numbers that cannot be stored correctly numerically'''. There are two cases of this exception. The first case is when a number is more than 15 digits long. This can happen when working with some national IDs. If a continuous variable has more than 15 digits, then it should be rounded and converted to a different scale, as a precision of 16 digits is not even possible in natural sciences. An ID can for obvious reasons not be rounded. The other case is that of numbers starting with a zero. This is sometimes the case in some national IDs and it is also sometimes the case with telephone numbers in some countries. Any leading zeros are removed by Stata and therefore have to be stored as a string.<br />
<br />
'''Non-categorical text'''. Text answers that cannot be converted into categories need to be stored as strings. One example is open-ended questions. Open-ended questions should, in general, be avoided, but sometimes the questionnaire asks the respondent to answer a question in his or her own words, and then that answer has to be stored as strings. Another example is if the respondent is asked to specify the answer after answering ''Other'' in a multiple choice question. A different example where string format is needed is some cases of proper names, for example, the name of the respondent. Not all proper names should be stored as string as some can be made into categories. For example, if you collect data on respondents and multiple respondents live in the same villages, then the variable with the village names should be converted into a categorical numeric variable and have a value label assigned. See the section on value labels below.<br />
<br />
=== Labels ===<br />
There are several ways to add helpful descriptive text to a data set in Stata, but the two most common and important ways are variables labels and value labels.<br />
<br />
'''Variable Labels'''<br />
All variables in a clean data set should have variable labels describing the variable. The label can be up to 80 characters long so there is a limitation to how much information can be included here. In addition to a brief explanation of the variable, it is usually good to include information such as unit or currency used in the variable and other things that are not possible to read from the values themselves.<br />
<br />
'''Value Labels'''<br />
Categorical variables should always be stored numerically and have value labels that describe what the numeric code represents. For example, yes and no questions should be stored as 0 and 1 and have the label ''No'' for data cells with 0, and the label ''Yes'' for all data cells with 1. This should be applied to all multiple choice variables.<br />
<br />
There are tools in Stata to convert categorical string variables to a categorical numeric variable where the strings are automatically applied as value labels. The most common tool is the command <code>encode</code>. However, if you use <code>encode</code>, you should always use the two options <code>label()</code> and <code>noextend</code>. Without these two options, Stata assigns a code to each string value in alphabetic order. There is no guarantee that the alphabetic order is changed when observations are added or removed, or if someone else makes changes earlier in the code. <code>label()</code> forces you to manually create the label before using encode (this requires some manual work but it is worth it). <code>noextend</code> throws an error if there is a value in the data that does not exist in the pre-defined label. This way you are notified that you need to add the new value to the value label you created manually. Or you can change the string value if there is a typo or similar that is the reason why that string value was not assigned a value label.<br />
<br />
== Additional Resources ==<br />
* The Stata Cheat Sheets on [http://geocenter.github.io/StataTraining/pdf/StataCheatsheet_processing_15_June_2016_TE-REV.pdf Data processing] and [http://geocenter.github.io/StataTraining/pdf/StataCheatsheet_Transformation15_June_2016_TE-REV.pdf Data Transformation] are helpful reminder of relevant stata code<br />
* The [https://github.com/Quartz/bad-data-guide#values-are-missing Quartz guide to bad data] on Github has lots of helpful tips for dealing with the kind of data problems that often come up in real world settings.<br />
<br />
[[Category: Data Cleaning ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_Cleaning&diff=4774Data Cleaning2018-08-03T19:50:22Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div><span style="font-size:150%"><br />
</span><br />
</span><br />
<onlyinclude><br />
Data cleaning is an essential step between data collection and data analysis. Raw primary data is always imperfect and needs to be prepared so that it is easy to use in the analysis. This is the high level goal of Data Cleaning. In extremely rare cases, the only preparation needed is to document the data set, for example - by using labels. However, in the vast majority of cases, there are many small things that need to be addressed in the data set itself. This could both be addressing data points that are incorrect, or replacing values that are not real data points, but codes explaining why there is no real data point.<br />
</onlyinclude><br />
== Read First ==<br />
<br />
*See this [[Checklist:_Data_Cleaning|check list]] that can be used to make sure that common cleaning actions have been done when applicable.<br />
*As a [[Impact_Evaluation_Team#Research_Assistant|Research Assistant]] (RA) or [[Impact_Evaluation_Team#Field_Coordinator|Field Coordinator]] (FC), do not spend time trying to fix irregularities in the data at the expense of not having time to identifying as many irregularities as possible.<br />
*The quality of the analysis will never be better than the quality of data cleaning.<br />
*There is no such thing as an exhaustive list of what to do during data cleaning as each project will have individual cleaning needs, but this article provides a very good place to start.<br />
*After finishing the data cleaning for each round of data collection, data can be [[Publishing Data|released]]<br />
<br />
== The Goal of Cleaning ==<br />
<br />
There are two main goals when cleaning the data set:<br />
<br />
#Cleaning individual data points that invalidate or incorrectly bias the analysis.<br />
#Preparing a clean data set so that it is easy to use for other researchers. Both for researchers inside your team and outside your team.<br />
<br />
[[File:Picture2.png|700px|link=|center]]<br />
Another overarching goal of the cleaning process is to understand the data and the data collection really well. Much of this understanding feeds directly into the two points above, but a really good data cleaning process should also result in documented lessons learned that can be used in future data collection. Both in later data collection rounds in the same project, but also in data collections in other similar projects.<br />
<br />
=== Cleaning individual data points ===<br />
<br />
In impact evaluations, our analysis often come down to test for statistical differences in the mean between the control group and any of the treatment arms. We do so through regression analysis where we include control variables, fixed effects, and different error estimators, among many other tools. In essence, though, one can think of it as an advanced comparison of means. While this is far from a complete description of impact evaluation analysis, it might give the person cleaning a data set for the first time a framework on what cleaning a data set should achieve.<br />
<br />
It is difficult to have an intuition for the math behind a regression, but it easy to have an intuition for the math behind a mean. Anything that biases a mean will bias a regression, and while there are many more things that can bias a regression, this is a good place to start for anyone cleaning a data set for the first time. The researcher in charge of the analysis is trained in what else that needs to be done for the specific regression models used. The articles linked to below will go through specific examples, but it is probably obvious to most readers that outliers, typos in data, survey codes (often values like -999 or -888) etc. bias means, so it is never wrong to start with those examples.<br />
<br />
=== Prepare a clean data set ===<br />
<br />
The second goal of the data cleaning is to document the data set so that variables, values, and anything else is as self-explanatory as possible. This will help other researchers that you grant access to this data set, but it will also help you and your research team when accessing the data set in the future. At the time of the data collection or at the time of the data cleaning, you know the data set much better than you will at any time in the future. Carefully documenting this knowledge so that it can be used at the time of analysis is often the difference between a good analysis and a great analysis.<br />
<br />
== Role Division during Data Cleaning ==<br />
As a [[Impact_Evaluation_Team#Research_Assistant|Research Assistant]] (RA) or [[Impact_Evaluation_Team#Field_Coordinator|Field Coordinator]] (FC), spend time identifying and documenting irregularities in the data. It is never bad to suggest corrections to irregularities, but a common mistake RAs or FCs do is that they spend too much time on trying to fix irregularities at the expense of having enough time to identify and document as many as possible. One major reason for that is that different regression models might require different ways to correct issues and this is often a perspective only the PI has. In such cases, much time might have been spent on coming up with a correction that is not valid given the regression model used in the analysis.<br />
<br />
Eventually the [[Impact_Evaluation_Team#Principal_Investigator|Principal Investigator]] (PI) and the RA or FC will have a common understanding on what correction calls can be made without involving the PI, but until then, it's recommended that the RA focus her/his time on identifying and documenting as many issues as possible rather than spending a lot of time on how to fix the issues. It is no problem to do both as long as the fixing doesn't happen at the cost of identifying as many issues as possible.<br />
<br />
== Import Data ==<br />
<br />
The first step in cleaning the data is to import the data. If you work with secondary data (data prepared by someone else) then this step is often straightforward, but this is a step often underestimated when working with primary data. It is very important for any change, no matter how small, to always be made in Stata (or in R or any other scripting language). Even if you know that there are incorrect submissions in your raw data (duplicates, pilot data mixed with the main data etc.), those deletions should always be done in such a way that they can be replicated by re-running code. Without this information, the analysis might no longer be valid. See the article on [[DataWork_Survey_Round#Raw_Folder|raw data folders]] for more details.<br />
<br />
=== Importing Primary Survey Data ===<br />
<br />
All modern CAPI survey data collections tools provided methods for importing the raw data in a way that drastically reduces the amount of work that needs to be done when cleaning the data. These methods typically include a Stata do-file that generates labels and much more from the questionnaire code and then applies that to the raw data as it is being imported. If you are working in SurveyCTO see this article on [[SurveyCTO Stata Template | SurveyCTO's Stata Template]].<br />
<br />
== Examples of Data Cleaning Actions ==<br />
<br />
The material in this section has been generated with primary survey data in mind, although a lot of these practices are also applicable when cleaning other types of data sets.<br />
<br />
'''Data Cleaning Check List'''. This is a check list that can be used to make sure that all common aspects of data cleaning has been covered. Note that this is not an exhaustive list. Such a list is impossible to create as the individual data sets and the analysis methods used on them all require different cleaning that in the details depends on the context of that data set.<br />
<br />
===ID Variables===<br />
It's important that the clean dataset be uniquely and fully identifiable by a single variable. It often is the case that when [[Primary Data Collection|primary data]] is imported, there are [[Duplicates and Survey Logs|duplicated entries]]. These cases must be carefully documented, and should only be corrected after discussing with the [[Field Coordinator]] and field team what caused them, so the right observations are kept in the dataset. [[ieduplicates]], a command in [[Stata Coding Practices#ietoolkit|ietoolkit]] is a useful command to identify and correct duplicated entries. Once duplicates are corrected, the observations can be linked to the [[Master Data Set|master dataset]], and the dataset, [[De-identification|de-identified]].<br />
<br />
=== Incorrect Data and Other Irregularities ===<br />
<br />
There are countless ways that there can be irregularities in a primary data set, so there is no way to do an exhaustive list of what should be done. This section gives a few examples: <br />
<br />
'''Outliers'''. There are many rules of thumb for how to define an outlier but there is no silver bullet. One rule of thumb is that any data point that is three standard deviations away from the mean of the same data point for all observations. This may always be a starting point, but one needs to qualitatively consider if this is a correct approach. Observations with outliers should not be dropped, but in some cases, the data point for that observation is replaced with a missing value. There are often better approaches. One common approach is to use winsorization, where any values bigger than a certain percentile, often the 99th, are replaced with the value at that percentile. This way very large values are prevented from biasing the mean. This also has an equality of impact aspect. For example, if all benefit of a project went to a single observation in the treatment group, then the mean would still be high, but that is rarely a desired outcome in development. So winsorization penalizes inequitable distribution of the benefits of a project.<br />
<!----- EDIT HERE -------><br />
<br />
'''Illogical Values'''. This is to test that one data point is possible in relation to another value. For example, if a respondent is male, then the respondent cannot answer that he is pregnant. This simple case is something that can and should be programmed into the questionnaire so that it does not happen. However, no questionnaire ever can be pre-programmed to control for every such case.<br />
<br />
'''Typos'''. If it is obvious beyond any doubt that the response is incorrect due to a simple typo, then it is a good idea to correct the type as long as it is done in a documented and reproducible way.<br />
<br />
=== Survey Codes and Missing Values ===<br />
<br />
Almost all data collection done through surveys of any sort allows the respondent to answer something like "Do not know" or "Declined to answer" for individual questions. These answers are usually recorded using survey codes on the format -999, -88 or something similar. It is obvious that these numbers will bias means and regressions if they are left as such. These values must be replaced with missing values in Stata. <br />
<br />
Stata has several missing values. The most well know is the regular missing value represented by a single "." but we would lose the difference in meaning between "Do not know" and "Declined to answer" if both codes representing them were replaced with the regular missing value. Stata offers a solution with its extended missing values. They are represented by ".a", ".b", ".c" etc. all the way to ".z". Stata handles these values the same as "." in commands that expect a numeric value, but they can be labeled differently and the original information is therefore not lost. Make sure that the same letter ".a", ".b" etc. is used to always represent only one thing across your project. The missing values should be assigned value labels so that they can be interpreted. See [http://www.stata.com/manuals13/u12.pdf#u12.2 Stata Manual Missing Values] for more details on missing values.<br />
<br />
Missing values can be used for much more than just survey codes. Any value that we remove because we found out is incorrect should be replaced with a missing value. In a [[Master Data Set]], there should be no regular missing values. All missing values in a master data set should contain an explanation of why we do not have that information for that observation.<br />
<br />
=== No Strings ===<br />
<br />
All data should be stored in numeric format. There are multiple reasons for this, but the two most important are that (1) numbers are stored more efficiently and (2) many Stata commands expect values to be stored numerically. Categorical string variables should be stored as numeric codes and have value labels assigned.<br />
<br />
There are two exceptions where string variables are allowed. The two examples are listed below:<br />
<br />
'''Numbers that cannot be stored correctly numerically'''. There are two cases of this exception. The first case is when a number is more than 15 digits long. This can happen when working with some national IDs. If a continuous variable has more than 15 digits, then it should be rounded and converted to a different scale, as a precision of 16 digits is not even possible in natural sciences. An ID can for obvious reasons not be rounded. The other case is that of numbers starting with a zero. This is sometimes the case in some national IDs and it is also sometimes the case with telephone numbers in some countries. Any leading zeros are removed by Stata and therefore have to be stored as a string.<br />
<br />
'''Non-categorical text'''. Text answers that cannot be converted into categories need to be stored as strings. One example is open-ended questions. Open-ended questions should, in general, be avoided, but sometimes the questionnaire asks the respondent to answer a question in his or her own words, and then that answer has to be stored as strings. Another example is if the respondent is asked to specify the answer after answering ''Other'' in a multiple choice question. A different example where string format is needed is some cases of proper names, for example, the name of the respondent. Not all proper names should be stored as string as some can be made into categories. For example, if you collect data on respondents and multiple respondents live in the same villages, then the variable with the village names should be converted into a categorical numeric variable and have a value label assigned. See the section on value labels below.<br />
<br />
=== Labels ===<br />
There are several ways to add helpful descriptive text to a data set in Stata, but the two most common and important ways are variables labels and value labels.<br />
<br />
'''Variable Labels'''<br />
All variables in a clean data set should have variable labels describing the variable. The label can be up to 80 characters long so there is a limitation to how much information can be included here. In addition to a brief explanation of the variable, it is usually good to include information such as unit or currency used in the variable and other things that are not possible to read from the values themselves.<br />
<br />
'''Value Labels'''<br />
Categorical variables should always be stored numerically and have value labels that describe what the numeric code represents. For example, yes and no questions should be stored as 0 and 1 and have the label ''No'' for data cells with 0, and the label ''Yes'' for all data cells with 1. This should be applied to all multiple choice variables.<br />
<br />
There are tools in Stata to convert categorical string variables to a categorical numeric variable where the strings are automatically applied as value labels. The most common tool is the command <code>encode</code>. However, if you use <code>encode</code>, you should always use the two options <code>label()</code> and <code>noextend</code>. Without these two options, Stata assigns a code to each string value in alphabetic order. There is no guarantee that the alphabetic order is changed when observations are added or removed, or if someone else makes changes earlier in the code. <code>label()</code> forces you to manually create the label before using encode (this requires some manual work but it is worth it). <code>noextend</code> throws an error if there is a value in the data that does not exist in the pre-defined label. This way you are notified that you need to add the new value to the value label you created manually. Or you can change the string value if there is a typo or similar that is the reason why that string value was not assigned a value label.<br />
<br />
== Additional Resources ==<br />
* The [https://github.com/Quartz/bad-data-guide#values-are-missing Quartz guide to bad data] on Github has lots of helpful tips for dealing with the kind of data problems that often come up in real world settings.<br />
<br />
[[Category: Data Cleaning ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Primary_Data_Collection&diff=4773Primary Data Collection2018-07-31T19:14:18Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div>== Read First ==<onlyinclude><br />
Primary data is directly generated by the researcher. Household surveys are the prototypical example of primary data collection. Unlike [[Secondary Data Sources]], primary data collection can be personally directed by the researcher to ensure it meets the standards of quality, availability, statistical power, and sampling required for a particular research inquiry. With globally increasing access to survey tools such as software, field manuals, and specialized firms, data collected and owned by the researcher has become the dominant method of empirical inquiry in development economics.</onlyinclude><br />
<br />
<br />
== Types of primary data ==<br />
The most common types of primary data are personal interviews. Depending on the research, these may take the form of household surveys, business (firm) surveys, or agricultural (farm) surveys. Some studies may include objective measurements such as [[Anthropometric Indicators]]. <br />
<br />
== Modes of primary data collection ==<br />
Surveys can be conducted on paper ([[Pen-and-Paper Personal Interviews (PAPI)]]) or electronically ([[Computer-Assisted Personal Interviews (CAPI)]]), or a combination of the two ([[Computer-Assisted Field Entry (CAFE)]]). <br />
<br />
== Preparing for primary data collection ==<br />
The following are critical steps in preparing for primary data collection:<br />
* Determine sampling frame<br />
* Conduct sampling, based on [[Sample Size]] calculations, taking care to do so [[Randomization in Stata|reproducibly]]<br />
* [[Questionnaire Design and Translation|Design and translate the survey instrument]]<br />
* [[Survey Pilot|Pilot the Survey Instrument]]<br />
* [[Questionnaire Programming|Program the survey instrument]] if data is being collected electronically<br />
* [[Procuring a Survey Firm|Procure a Survey Firm]], taking care to prepare detailed [[Survey Firm TOR|Terms of Reference]]<br />
* [[Preparing for Field Data Collection|Prepare for field work]]<br />
* Create a [[Data Quality Assurance Plan]]<br />
* [[Enumerator Training| Train data collectors]]<br />
<br />
== Additional Resources ==<br />
Oxfam provides [https://oxfamilibrary.openrepository.com/bitstream/handle/10546/620522/cs-going-digital-data-quality-data-collection-240718-en.pdf?sequence=1&isAllowed=y a detailed case study] of how to use electronic data collection (SurveyCTO) combined with Stata code to improve data quality in the field. <br />
<br />
[[Category: Primary Data Collection ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=User_talk:Maria_jones&diff=4772User talk:Maria jones2018-07-31T18:11:08Z<p>Maria jones: Created blank page</p>
<hr />
<div></div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_Analysis&diff=4771Data Analysis2018-07-31T18:10:06Z<p>Maria jones: /* Resources for Specific Analytical Tasks */</p>
<hr />
<div><br />
<onlyinclude><br />
Data analysis refers to the full process of exploring and describing trends and results from data. Data analysis typically has two stages:<br />
<br />
# Exploratory Analysis<br />
# Final Analysis<br />
</onlyinclude><br />
In exploratory analysis, emphasis will be on producing easily understood summaries of the trends in the data so that the reports, publications, presentations, and summaries that need to be produced can begin to be outlined. Once those stories begin to come together, the code is re-written in a "final" form which would be appropriate for public release with the results.<br />
<br />
== Preparing the Dataset for Analysis ==<br />
<br />
Once data is collected, it must be recombined into a final format for analysis, including the construction of derived variables not present in the initial collection. See [[Data Cleaning]].<br />
<br />
== Organizing Analysis Files ==<br />
<br />
Analysis programs that is exploratory in nature should be held in an "exploratory" folder and separated according to topic. Particularly when folder syncing over [https://www.dropbox.com Dropbox] or [https://www.github.com Github] is being used, separating these files by function (rather than combining them into a single "analysis" file) allows multiple researchers to work simultaneously and modularly.<br />
<br />
When the final analysis workflow is agreed upon for a given publication or other output, a final analysis file should be collated for that output only in the "final" analysis folder. This allows selective reuse of the code from the exploratory analyses, in preparation for the final release of the code if required. This allows any collaborator, referee, or replicator to access only the code used to prepare the final outputs and reproduce them exactly.<br />
<br />
== Outputting Analytical Results ==<br />
<br />
Since the final analysis do-files are intended to be fully replicable, and the code itself is considered a vital, shareable output, all tables and figures should be created in such a way that the files are ordered, named, placed, and formatted appropriately. Running the analysis dofile should result in ''only'' necessary files in the "outputs" folder, with names like "figure_1.png", "table_1.xlsx", and so on.<br />
<br />
For some applications (such as creating internal presentations or simple Word reports, file types like PNG and XLSX are sufficiently functional. For larger projects with multiple collaborators, particularly when syncing over a [https://www.github.com GitHub] service, plaintext file types such as EPS, CSV, and TEX will be the preferred formats. Tables and figures should at minimum be produced by this file such that no further mathematical calculations are required; they should furthermore be organized and formatted as nearly to the published versions as possible. Figures are typically easy to do this in by using an appropriate <code>graph export</code> command in Stata or the equivalent. [https://www.latex-project.org LaTeX] is a particularly powerful tool for doing this with tables. DIME provides several guides on both processes. See [[Exporting Analysis |exporting analysis results]] for more details and more resources.<br />
<br />
== Resources for Specific Analytical Tasks ==<br />
<br />
===Spatial/GIS Analysis===<br />
<br />
[[Spatial Analysis]] involves using geospatial information from your data to explore relationships mediated by proximity or connectiveness. This can be descriptive (such as map illustrations) or informative (such as distance to and quality of the nearest road).<br />
<br />
===Randomization Inference===<br />
<br />
[[Randomization Inference]] techniques replace the "normal" p-values from regression analyses with values based on the treatment assignment methodology, and are generally recommended for reporting in experiments whose estimates are of randomly assigned treatment controlled by the implementer and researcher.<br />
<br />
===Heterogeneous Effects Analysis===<br />
<br />
=== Cost Effectiveness Analysis ===<br />
<br />
[[Cost-effectiveness Analysis]] is the economic analysis of the costs and benefits of an impact evaluation project.<br />
<br />
=== Regression Discontinuity Analysis ===<br />
Here is [http://www-personal.umich.edu/~cattaneo/books/Cattaneo-Idrobo-Titiunik_2018_CUP-Vol2.pdf a practical guide] for analyzing regression discontinuity studies. <br />
<br />
=== Data Visualization ===<br />
<br />
[[Data visualization]] is a critical step in effectively communicating your research results.<br />
<br />
== Additional Resources ==<br />
<br />
<br />
<br />
[[Category: Data Analysis ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_Analysis&diff=4770Data Analysis2018-07-20T16:02:38Z<p>Maria jones: </p>
<hr />
<div><br />
<onlyinclude><br />
Data analysis refers to the full process of exploring and describing trends and results from data. Data analysis typically has two stages:<br />
<br />
# Exploratory Analysis<br />
# Final Analysis<br />
</onlyinclude><br />
In exploratory analysis, emphasis will be on producing easily understood summaries of the trends in the data so that the reports, publications, presentations, and summaries that need to be produced can begin to be outlined. Once those stories begin to come together, the code is re-written in a "final" form which would be appropriate for public release with the results.<br />
<br />
== Preparing the Dataset for Analysis ==<br />
<br />
Once data is collected, it must be recombined into a final format for analysis, including the construction of derived variables not present in the initial collection. See [[Data Cleaning]].<br />
<br />
== Organizing Analysis Files ==<br />
<br />
Analysis programs that is exploratory in nature should be held in an "exploratory" folder and separated according to topic. Particularly when folder syncing over [https://www.dropbox.com Dropbox] or [https://www.github.com Github] is being used, separating these files by function (rather than combining them into a single "analysis" file) allows multiple researchers to work simultaneously and modularly.<br />
<br />
When the final analysis workflow is agreed upon for a given publication or other output, a final analysis file should be collated for that output only in the "final" analysis folder. This allows selective reuse of the code from the exploratory analyses, in preparation for the final release of the code if required. This allows any collaborator, referee, or replicator to access only the code used to prepare the final outputs and reproduce them exactly.<br />
<br />
== Outputting Analytical Results ==<br />
<br />
Since the final analysis do-files are intended to be fully replicable, and the code itself is considered a vital, shareable output, all tables and figures should be created in such a way that the files are ordered, named, placed, and formatted appropriately. Running the analysis dofile should result in ''only'' necessary files in the "outputs" folder, with names like "figure_1.png", "table_1.xlsx", and so on.<br />
<br />
For some applications (such as creating internal presentations or simple Word reports, file types like PNG and XLSX are sufficiently functional. For larger projects with multiple collaborators, particularly when syncing over a [https://www.github.com GitHub] service, plaintext file types such as EPS, CSV, and TEX will be the preferred formats. Tables and figures should at minimum be produced by this file such that no further mathematical calculations are required; they should furthermore be organized and formatted as nearly to the published versions as possible. Figures are typically easy to do this in by using an appropriate <code>graph export</code> command in Stata or the equivalent. [https://www.latex-project.org LaTeX] is a particularly powerful tool for doing this with tables. DIME provides several guides on both processes. See [[Exporting Analysis |exporting analysis results]] for more details and more resources.<br />
<br />
== Resources for Specific Analytical Tasks ==<br />
<br />
===Spatial/GIS Analysis===<br />
<br />
[[Spatial Analysis]] involves using geospatial information from your data to explore relationships mediated by proximity or connectiveness. This can be descriptive (such as map illustrations) or informative (such as distance to and quality of the nearest road).<br />
<br />
===Randomization Inference===<br />
<br />
[[Randomization Inference]] techniques replace the "normal" p-values from regression analyses with values based on the treatment assignment methodology, and are generally recommended for reporting in experiments whose estimates are of randomly assigned treatment controlled by the implementer and researcher.<br />
<br />
===Heterogeneous Effects Analysis===<br />
<br />
=== Cost Effectiveness Analysis ===<br />
<br />
[[Cost-effectiveness Analysis]] is the economic analysis of the costs and benefits of an impact evaluation project.<br />
<br />
=== Data Visualization ===<br />
<br />
[[Data visualization]] is a critical step in effectively communicating your research results. <br />
<br />
<br />
<br />
== Additional Resources ==<br />
<br />
<br />
<br />
[[Category: Data Analysis ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Questionnaire_Design&diff=4769Questionnaire Design2018-07-20T16:00:16Z<p>Maria jones: </p>
<hr />
<div><br />
== Read First ==<onlyinclude><br />
'''Do''' start with a careful review of existing survey instruments that cover similar topics. '''Don't''' reinvent the wheel -- working from a high-quality, previously-piloted survey instrument will save time and improve the quality of your final output. </onlyinclude><br />
<br />
<br />
== Guidelines ==<br />
<br />
=== Questionnaire design process ===<br />
<br />
When designing a survey instrument from scratch, follow these steps:<br />
<br />
# Review (or draft) a [[Theory of Change]] and [[Reproducible Research#Pre-Analysis Plan | Pre-Analysis Plan]]. <br />
# Make a list of all intermediary and final outcomes of interest, as well as important covariates and sources of heterogeneity<br />
# Prepare an outline of questionnaire modules, based on the above list. Get feedback from research team. <br />
# For each module, prepare a list of specific indicators to measure. Get feedback from research team and implementing partners. <br />
# [[Literature Review for Questionnaire|Review existing questionnaires]] and compile databank of relevant questions for each module<br />
# Draft questionnaire, noting source of each question (e.g. source: Uganda National Panel Survey (LSMS 2013-14), source: Uganda DHS 2011, source: Uganda Social Assistance Grants for Empowerment Programme 2013, Evaluation Follow-Up Survey [http://microdata.worldbank.org/index.php/catalog/2653], source: own design - extra attention required in pilot), and get feedback from research team and implementing partners<br />
# [[Survey Pilot#Guidelines for Survey Pilot| Content-based Pilot]] & resulting revisions<br />
# [[Questionnaire Translation]] & [[Questionnaire Programming]] (can happen concurrently)<br />
<br />
Designing a follow-up questionnaire is simpler. Try to keep as close to the baseline survey instrument as possible, to facilitate panel analysis. It is better to add/subtract questions than to modify existing ones.<br />
<br />
=== Basic Rules for Questionnaire Design ===<br />
<br />
# Group questions into modules<br />
#* Write an introductory script for each module, to guide the flow of the interview<br />
#** Example: ''Now I would like to ask you some questions about your relationships. It’s not that I want to invade your privacy. We are trying to learn how to make young people’s lives safer and happier. Please be open because for our work to be useful to anyone, we need to understand the reality of young people’s lives. Remember that all your answers will be kept strictly confidential.'' <br />
# All questions should have pre-coded answer options. Answer options must be:<br />
#* Clear, simple, and mutually exclusive<br />
#* Exhaustive (tested and refined during the [[Survey Pilot]])<br />
#*Include 'other' (but if >5% of respondents choose 'other', answer choices were insufficiently exhaustive)<br />
# Include hints to the enumerator as necessary, typically coded to appear in italics (not part of the question read to the respondent)<br />
#* Example: "For how many months did you work in the last 12 months? ''Enumerator: if less than 1 month, round up to 1''<br />
<br />
==== Key elements all questionnaires must have ====<br />
* [[ID_Variable_Properties| Unique ID]]<br />
* [[Human_Subjects_Approval#Informed Consent | informed consent]] <br />
* Most surveys also include identification of survey respondent<br />
<br />
=== Measurement Issues ===<br />
Keep in mind that even simple-seeming data points might not be simple to capture. For example, ''Household size'' will depend entirely on how 'household member' is defined (only those currently living in the household? those who have lived more than 6 of the last 12 months in the household? what about domestic servants? students away at school who are economically dependent on the household? Household head who has migrated but sends remittances back to support the household?)<br />
<br />
Types of data that are hard to measure in a questionnaire include:<br />
==== [[Difficult topics]] ====<br />
* Things that are hard to estimate or hard to remember. <br />
** Examples include: distance to grocery store, profits, plot size, income in the last year. See [[Recall Bias]]. <br />
* You should pay careful attention during the [[Survey Pilot]] for questions that are hard for the respondent. Questions that seem obvious to you may not be easy to answer, depending on the context. For example, ''Age'' can be difficult if people are innumerate, do not have birth certificates, or do not know their birth year.<br />
<br />
==== [[Sensitive Topics|Sensitive and/or taboo topics]] ====<br />
* Includes any topic perceived as socially undesirable<br />
* Examples include: drug/alcohol use, sexual practices, violent behaviors, criminal activities.<br />
<br />
==== [[Abstract concepts]]====<br />
* May be defined differently across cultures or may not translate well<br />
* Examples include: [[Measuring Empowerment|empowerment]], [[risk aversion]], bargaining power, social cohesion<br />
<br />
==== Outcomes that are not directly observable ====<br />
*Examples include: corruption, quality of care. <br />
* Strategies to use include: Audit Studies<br />
* It is always best to directly measure outcomes when possible. For example, consider the following two measures of literacy:<br />
** "Can you read?" ''Answer choices'': yes, no<br />
** "Can you please read me this sentence?" [Enumerators holds up showcard with a sentence written in the local language]. ''Answer choices:'' read sentence correctly, read sentence with some errors, unable to read sentence<br />
<br />
The second option, a more objective measure, is always preferable.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Questionnaire Design and Translation]]<br />
<br />
== Additional Resources ==<br />
<br />
Comprehensive resources on survey design<br />
* Margaret Grosh and Paul Glewwe. 2000. Designing Household Survey Questionnaires for Developing Countries: Lessons from 15 Years of the Living Standards Measurement Study. Volumes 1, 2, and 3. The World Bank.[http://documents.worldbank.org/curated/en/452741468778781879/pdf/multi-page.pdf]<br />
* Dhar, Diva. Instrument Design 101 [Powerpoint Slides]. Retried from https://www.povertyactionlab.org/sites/default/files/documents/Instrument%20Design_Diva_final.pdf<br />
<br />
* Development Impact Blog: Three New Papers Measuring Stuff that is Difficult to Measure [http://blogs.worldbank.org/impactevaluations/three-new-papers-measuring-stuff-difficult-measure]<br />
<br />
Measurement Topics<br />
* Oxfam on measuring household income: http://policy-practice.oxfam.org.uk/blog/2017/02/real-geek-faq-how-can-i-measure-household-income<br />
<br />
* [https://www.sciencedirect.com/science/article/pii/S0306919217306802?via%3Dihub | Measuring food consumption and expenditures in household consumption and expenditure surveys (HCES)]<br />
<br />
<br />
[[Category: Questionnaire Design]] [[Category: Primary Data Collection]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Questionnaire_Design&diff=4768Questionnaire Design2018-07-20T15:59:32Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div><br />
== Read First ==<onlyinclude><br />
'''Do''' start with a careful review of existing survey instruments that cover similar topics. '''Don't''' reinvent the wheel -- working from a high-quality, previously-piloted survey instrument will save time and improve the quality of your final output. </onlyinclude><br />
<br />
<br />
== Guidelines ==<br />
<br />
=== Questionnaire design process ===<br />
<br />
When designing a survey instrument from scratch, follow these steps:<br />
<br />
# Review (or draft) a [[Theory of Change]] and [[Reproducible Research#Pre-Analysis Plan | Pre-Analysis Plan]]. <br />
# Make a list of all intermediary and final outcomes of interest, as well as important covariates and sources of heterogeneity<br />
# Prepare an outline of questionnaire modules, based on the above list. Get feedback from research team. <br />
# For each module, prepare a list of specific indicators to measure. Get feedback from research team and implementing partners. <br />
# [[Literature Review for Questionnaire|Review existing questionnaires]] and compile databank of relevant questions for each module<br />
# Draft questionnaire, noting source of each question (e.g. source: Uganda National Panel Survey (LSMS 2013-14), source: Uganda DHS 2011, source: Uganda Social Assistance Grants for Empowerment Programme 2013, Evaluation Follow-Up Survey [http://microdata.worldbank.org/index.php/catalog/2653], source: own design - extra attention required in pilot), and get feedback from research team and implementing partners<br />
# [[Survey Pilot#Guidelines for Survey Pilot| Content-based Pilot]] & resulting revisions<br />
# [[Questionnaire Translation]] & [[Questionnaire Programming]] (can happen concurrently)<br />
<br />
Designing a follow-up questionnaire is simpler. Try to keep as close to the baseline survey instrument as possible, to facilitate panel analysis. It is better to add/subtract questions than to modify existing ones.<br />
<br />
=== Basic Rules for Questionnaire Design ===<br />
<br />
# Group questions into modules<br />
#* Write an introductory script for each module, to guide the flow of the interview<br />
#** Example: ''Now I would like to ask you some questions about your relationships. It’s not that I want to invade your privacy. We are trying to learn how to make young people’s lives safer and happier. Please be open because for our work to be useful to anyone, we need to understand the reality of young people’s lives. Remember that all your answers will be kept strictly confidential.'' <br />
# All questions should have pre-coded answer options. Answer options must be:<br />
#* Clear, simple, and mutually exclusive<br />
#* Exhaustive (tested and refined during the [[Survey Pilot]])<br />
#*Include 'other' (but if >5% of respondents choose 'other', answer choices were insufficiently exhaustive)<br />
# Include hints to the enumerator as necessary, typically coded to appear in italics (not part of the question read to the respondent)<br />
#* Example: "For how many months did you work in the last 12 months? ''Enumerator: if less than 1 month, round up to 1''<br />
<br />
==== Key elements all questionnaires must have ====<br />
* [[ID_Variable_Properties| Unique ID]]<br />
* [[Human_Subjects_Approval#Informed Consent | informed consent]] <br />
* Most surveys also include identification of survey respondent<br />
<br />
=== Measurement Issues ===<br />
Keep in mind that even simple-seeming data points might not be simple to capture. For example, ''Household size'' will depend entirely on how 'household member' is defined (only those currently living in the household? those who have lived more than 6 of the last 12 months in the household? what about domestic servants? students away at school who are economically dependent on the household? Household head who has migrated but sends remittances back to support the household?)<br />
<br />
Types of data that are hard to measure in a questionnaire include:<br />
==== [[Difficult topics]] ====<br />
* Things that are hard to estimate or hard to remember. <br />
** Examples include: distance to grocery store, profits, plot size, income in the last year. See [[Recall Bias]]. <br />
* You should pay careful attention during the [[Survey Pilot]] for questions that are hard for the respondent. Questions that seem obvious to you may not be easy to answer, depending on the context. For example, ''Age'' can be difficult if people are innumerate, do not have birth certificates, or do not know their birth year.<br />
<br />
==== [[Sensitive Topics|Sensitive and/or taboo topics]] ====<br />
* Includes any topic perceived as socially undesirable<br />
* Examples include: drug/alcohol use, sexual practices, violent behaviors, criminal activities.<br />
<br />
==== [[Abstract concepts]]====<br />
* May be defined differently across cultures or may not translate well<br />
* Examples include: [[Measuring Empowerment|empowerment]], [[risk aversion]], bargaining power, social cohesion<br />
<br />
==== Outcomes that are not directly observable ====<br />
*Examples include: corruption, quality of care. <br />
* Strategies to use include: Audit Studies<br />
* It is always best to directly measure outcomes when possible. For example, consider the following two measures of literacy:<br />
** "Can you read?" ''Answer choices'': yes, no<br />
** "Can you please read me this sentence?" [Enumerators holds up showcard with a sentence written in the local language]. ''Answer choices:'' read sentence correctly, read sentence with some errors, unable to read sentence<br />
<br />
The second option, a more objective measure, is always preferable.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Questionnaire Design and Translation]]<br />
<br />
== Additional Resources ==<br />
<br />
Comprehensive resources on survey design<br />
* Margaret Grosh and Paul Glewwe. 2000. Designing Household Survey Questionnaires for Developing Countries: Lessons from 15 Years of the Living Standards Measurement Study. Volumes 1, 2, and 3. The World Bank.[http://documents.worldbank.org/curated/en/452741468778781879/pdf/multi-page.pdf]<br />
* Dhar, Diva. Instrument Design 101 [Powerpoint Slides]. Retried from https://www.povertyactionlab.org/sites/default/files/documents/Instrument%20Design_Diva_final.pdf<br />
<br />
* Development Impact Blog: Three New Papers Measuring Stuff that is Difficult to Measure [http://blogs.worldbank.org/impactevaluations/three-new-papers-measuring-stuff-difficult-measure]<br />
<br />
Measurement Topics<br />
* Oxfam on measuring household income: http://policy-practice.oxfam.org.uk/blog/2017/02/real-geek-faq-how-can-i-measure-household-income<br />
* [ https://www.sciencedirect.com/science/article/pii/S0306919217306802?via%3Dihub | Measuring food consumption and expenditures in household consumption and expenditure surveys (HCES)]<br />
<br />
<br />
[[Category: Questionnaire Design]] [[Category: Primary Data Collection]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Measuring_Empowerment&diff=4767Measuring Empowerment2018-07-20T15:57:36Z<p>Maria jones: </p>
<hr />
<div>Empowerment, especially of women and girls, is an important - but hard to measure - development outcome. This page provides tips and resources for collecting data on empowerment. <br />
<br />
<br />
== Guidelines ==<br />
<br />
JPAL prepared an excellent [https://www.povertyactionlab.org/practical-guide-measuring-women-and-girls-empowerment-impact-evaluations | Practical Guide to Measuring Women and Girls' Empowerment in Impact Evaluations]. [https://www.povertyactionlab.org/sites/default/files/resources/practical-guide-to-measuring-women-and-girls-empowerment-appendix1.pdf|Appendix 1] provides example survey questions related to empowerment and tips on using them, including the following categories: Economic indicators, Social indicators, Intimate partner and family indicators, Political and civic indicators, Psychological indicators, Education indicators, and Health indicators. [https://www.povertyactionlab.org/sites/default/files/resources/practical-guide-to-measuring-women-and-girls-empowerment-appendix2.pdf | Appendix 2] provides examples of non-survey instruments. <br />
<br />
Oxfam has a [https://policy-practice.oxfam.org.uk/publications/a-how-to-guide-to-measuring-womens-empowerment-sharing-experience-from-oxfams-i-620271 | How-to Guide to Measuring Women's Empowerment], sharing the experiences from their own research. It includes details on their measurement tool, along with example survey modules and stata do files. <br />
<br />
Oxfam's [https://views-voices.oxfam.org.uk/methodology/real-geek/2018/07/how-to-measure-womens-empowerment | Real Geek blog] provides details of a recent discrete choice experiment to assign implicit weights to the empowerment indicators which reflect the views and the perceptions of the women interviewed in the survey.<br />
<br />
IFPRI developed a [http://www.ifpri.org/publication/womens-empowerment-agriculture-index | Women's Empowerment Agricultural Index], which measures the empowerment, agency, and inclusion of women in the agriculture sector in an effort to identify ways to overcome those obstacles and constraints. The [http://www.ifpri.org/publication/instructional-guide-abbreviated-womens-empowerment-agriculture-index-weai | Instructional Guide] provides details on survey methodology, adapting the tool to local contexts, training enumerators, data cleaning, and analysis. <br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Questionnaire Design]]<br />
<br />
<br />
== Additional Resources ==<br />
* list here other articles related to this topic, with a brief description and link<br />
<br />
[[Category: Questionnaire Design ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Measuring_Empowerment&diff=4766Measuring Empowerment2018-07-20T15:43:55Z<p>Maria jones: Created page with "{{subst:dime_wiki}}"</p>
<hr />
<div><span style="font-size:150%"><br />
<span style="color:#ff0000"> '''NOTE: this article is only a template. Please add content!''' </span><br />
</span><br />
<br />
<br />
add introductory 1-2 sentences here<br />
<br />
<br />
<br />
== Read First ==<br />
* include here key points you want to make sure all readers understand<br />
<br />
<br />
== Guidelines ==<br />
* organize information on the topic into subsections. for each subsection, include a brief description / overview, with links to articles that provide details<br />
===Subsection 1===<br />
===Subsection 2===<br />
===Subsection 3===<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[*topic name, as listed on main page*]]<br />
<br />
<br />
== Additional Resources ==<br />
* list here other articles related to this topic, with a brief description and link<br />
<br />
[[Category: *category name* ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Questionnaire_Design&diff=4765Questionnaire Design2018-07-20T15:43:37Z<p>Maria jones: </p>
<hr />
<div><br />
== Read First ==<onlyinclude><br />
'''Do''' start with a careful review of existing survey instruments that cover similar topics. '''Don't''' reinvent the wheel -- working from a high-quality, previously-piloted survey instrument will save time and improve the quality of your final output. </onlyinclude><br />
<br />
<br />
== Guidelines ==<br />
<br />
=== Questionnaire design process ===<br />
<br />
When designing a survey instrument from scratch, follow these steps:<br />
<br />
# Review (or draft) a [[Theory of Change]] and [[Reproducible Research#Pre-Analysis Plan | Pre-Analysis Plan]]. <br />
# Make a list of all intermediary and final outcomes of interest, as well as important covariates and sources of heterogeneity<br />
# Prepare an outline of questionnaire modules, based on the above list. Get feedback from research team. <br />
# For each module, prepare a list of specific indicators to measure. Get feedback from research team and implementing partners. <br />
# [[Literature Review for Questionnaire|Review existing questionnaires]] and compile databank of relevant questions for each module<br />
# Draft questionnaire, noting source of each question (e.g. source: Uganda National Panel Survey (LSMS 2013-14), source: Uganda DHS 2011, source: Uganda Social Assistance Grants for Empowerment Programme 2013, Evaluation Follow-Up Survey [http://microdata.worldbank.org/index.php/catalog/2653], source: own design - extra attention required in pilot), and get feedback from research team and implementing partners<br />
# [[Survey Pilot#Guidelines for Survey Pilot| Content-based Pilot]] & resulting revisions<br />
# [[Questionnaire Translation]] & [[Questionnaire Programming]] (can happen concurrently)<br />
<br />
Designing a follow-up questionnaire is simpler. Try to keep as close to the baseline survey instrument as possible, to facilitate panel analysis. It is better to add/subtract questions than to modify existing ones.<br />
<br />
=== Basic Rules for Questionnaire Design ===<br />
<br />
# Group questions into modules<br />
#* Write an introductory script for each module, to guide the flow of the interview<br />
#** Example: ''Now I would like to ask you some questions about your relationships. It’s not that I want to invade your privacy. We are trying to learn how to make young people’s lives safer and happier. Please be open because for our work to be useful to anyone, we need to understand the reality of young people’s lives. Remember that all your answers will be kept strictly confidential.'' <br />
# All questions should have pre-coded answer options. Answer options must be:<br />
#* Clear, simple, and mutually exclusive<br />
#* Exhaustive (tested and refined during the [[Survey Pilot]])<br />
#*Include 'other' (but if >5% of respondents choose 'other', answer choices were insufficiently exhaustive)<br />
# Include hints to the enumerator as necessary, typically coded to appear in italics (not part of the question read to the respondent)<br />
#* Example: "For how many months did you work in the last 12 months? ''Enumerator: if less than 1 month, round up to 1''<br />
<br />
==== Key elements all questionnaires must have ====<br />
* [[ID_Variable_Properties| Unique ID]]<br />
* [[Human_Subjects_Approval#Informed Consent | informed consent]] <br />
* Most surveys also include identification of survey respondent<br />
<br />
=== Measurement Issues ===<br />
Keep in mind that even simple-seeming data points might not be simple to capture. For example, ''Household size'' will depend entirely on how 'household member' is defined (only those currently living in the household? those who have lived more than 6 of the last 12 months in the household? what about domestic servants? students away at school who are economically dependent on the household? Household head who has migrated but sends remittances back to support the household?)<br />
<br />
Types of data that are hard to measure in a questionnaire include:<br />
==== [[Difficult topics]] ====<br />
* Things that are hard to estimate or hard to remember. <br />
** Examples include: distance to grocery store, profits, plot size, income in the last year. See [[Recall Bias]]. <br />
* You should pay careful attention during the [[Survey Pilot]] for questions that are hard for the respondent. Questions that seem obvious to you may not be easy to answer, depending on the context. For example, ''Age'' can be difficult if people are innumerate, do not have birth certificates, or do not know their birth year.<br />
<br />
==== [[Sensitive Topics|Sensitive and/or taboo topics]] ====<br />
* Includes any topic perceived as socially undesirable<br />
* Examples include: drug/alcohol use, sexual practices, violent behaviors, criminal activities.<br />
<br />
==== [[Abstract concepts]]====<br />
* May be defined differently across cultures or may not translate well<br />
* Examples include: [[Measuring Empowerment|empowerment]], [[risk aversion]], bargaining power, social cohesion<br />
<br />
==== Outcomes that are not directly observable ====<br />
*Examples include: corruption, quality of care. <br />
* Strategies to use include: Audit Studies<br />
* It is always best to directly measure outcomes when possible. For example, consider the following two measures of literacy:<br />
** "Can you read?" ''Answer choices'': yes, no<br />
** "Can you please read me this sentence?" [Enumerators holds up showcard with a sentence written in the local language]. ''Answer choices:'' read sentence correctly, read sentence with some errors, unable to read sentence<br />
<br />
The second option, a more objective measure, is always preferable.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Questionnaire Design and Translation]]<br />
<br />
== Additional Resources ==<br />
<br />
Comprehensive resources on survey design<br />
* Margaret Grosh and Paul Glewwe. 2000. Designing Household Survey Questionnaires for Developing Countries: Lessons from 15 Years of the Living Standards Measurement Study. Volumes 1, 2, and 3. The World Bank.[http://documents.worldbank.org/curated/en/452741468778781879/pdf/multi-page.pdf]<br />
* Dhar, Diva. Instrument Design 101 [Powerpoint Slides]. Retried from https://www.povertyactionlab.org/sites/default/files/documents/Instrument%20Design_Diva_final.pdf<br />
<br />
* Development Impact Blog: Three New Papers Measuring Stuff that is Difficult to Measure [http://blogs.worldbank.org/impactevaluations/three-new-papers-measuring-stuff-difficult-measure]<br />
<br />
Measurement Topics<br />
* Oxfam on measuring household income: http://policy-practice.oxfam.org.uk/blog/2017/02/real-geek-faq-how-can-i-measure-household-income<br />
<br />
<br />
[[Category: Questionnaire Design]] [[Category: Primary Data Collection]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_visualization&diff=4764Data visualization2018-07-20T15:32:05Z<p>Maria jones: </p>
<hr />
<div>Data visualization is creating a visual representation of your data, for example in the form of a chart or a graph. Choosing the right format to visualize your data is critical to effectively communicating the results of your study. Good visualizations can be more memorable and persuasive than pure text. <br />
<br />
<br />
== Read First ==<br />
Specific code for data visualization is available on the software-specific tools (e.g. [[iegraph]]). This page discusses general principles for data visualization. <br />
<br />
== Guidelines ==<br />
<br />
=== What type of data visualization should I use? ===<br />
The best format for data visualization will depend on the type of data and results you wish to display, as well as the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles. <br />
<br />
* [https://www.data-to-viz.com/ |Data to Viz] provides a handy decision tree. <br />
<br />
* The [http://www.visual-literacy.org/periodic_table/periodic_table.html|Periodic Table of Visualization] provides a catalogue of all data visualization types with visual examples. <br />
<br />
* Gapminder.org [https://www.gapminder.org/tools/#$chart-type=bubbles|interactive visualization tools] provide beautiful examples of effective visualizations. <br />
<br />
===Stata Visual Library===<br />
The DIME Analytics team has created a [https://worldbank.github.io/Stata-IE-Visual-Library/|Stata Visual Library for Impact Evaluation] which shows examples of graphs and provides the codes used to create them. You can contribute to the library [https://github.com/worldbank/Stata-IE-Visual-Library|on our github]. <br />
<br />
=== Data Visualization in R ===<br />
R has many options for data visualization; the ggplot package is one of the best. Here is a list of [http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html|50 ggplot2 visualizations with full R code] . <br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Data Analysis]]<br />
<br />
<br />
== Additional Resources ==<br />
* Harvard Business Review Article on [https://hbr.org/2016/06/visualizations-that-really-work/| Visualizations that Really Work]<br />
<br />
<br />
<br />
[[Category: Data Analysis]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_visualization&diff=4763Data visualization2018-07-20T15:31:05Z<p>Maria jones: </p>
<hr />
<div>Data visualization is creating a visual representation of your data, for example in the form of a chart or a graph. Choosing the right format to visualize your data is critical to effectively communicating the results of your study. Good visualizations can be more memorable and persuasive than pure text. <br />
<br />
<br />
== Read First ==<br />
Specific code for data visualization is available on the software-specific tools (e.g. [[iegraph]]). This page discusses general principles for data visualization. <br />
<br />
== Guidelines ==<br />
<br />
=== What type of data visualization should I use? ===<br />
The best format for data visualization will depend on the type of data and results you wish to display, as well as the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles. <br />
<br />
* [https://www.data-to-viz.com/ |[Data to Viz] provides a handy decision tree. <br />
<br />
* The [http://www.visual-literacy.org/periodic_table/periodic_table.html|Periodic Table of Visualization] provides a catalogue of all data visualization types with visual examples. <br />
<br />
* Gapminder.org [https://www.gapminder.org/tools/#$chart-type=bubbles|interactive visualization tools] provide beautiful examples of effective visualizations. <br />
<br />
===Stata Visual Library===<br />
The DIME Analytics team has created a [https://worldbank.github.io/Stata-IE-Visual-Library/|Stata Visual Library for Impact Evaluation] which shows examples of graphs and provides the codes used to create them. You can contribute to the library [https://github.com/worldbank/Stata-IE-Visual-Library|on our github]. <br />
<br />
=== Data Visualization in R ===<br />
R has many options for data visualization; the ggplot package is one of the best. Here is a list of [http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html|50 ggplot2 Visualizations with full R code] . <br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Data Analysis]]<br />
<br />
<br />
== Additional Resources ==<br />
* Harvard Business Review Article on [https://hbr.org/2016/06/visualizations-that-really-work|Visualizations that Really Work]<br />
<br />
<br />
<br />
[[Category: Data Analysis]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_visualization&diff=4762Data visualization2018-07-20T15:30:14Z<p>Maria jones: </p>
<hr />
<div>Data visualization is creating a visual representation of your data, for example in the form of a chart or a graph. Choosing the right format to visualize your data is critical to effectively communicating the results of your study. Good visualizations can be more memorable and persuasive than pure text. <br />
<br />
<br />
== Read First ==<br />
Specific code for data visualization is available on the software-specific tools (e.g. [[iegraph]]). This page discusses general principles for data visualization. [http://www.example.com link title]<br />
<br />
== Guidelines ==<br />
<br />
=== What type of data visualization should I use? ===<br />
The best format for data visualization will depend on the type of data and results you wish to display, as well as the medium in which they will be displayed. For example, online interfaces allow for more dynamic visualizations than printed articles. <br />
<br />
* [https://www.data-to-viz.com/ |[Data to Viz] provides a handy decision tree. <br />
<br />
* The [http://www.visual-literacy.org/periodic_table/periodic_table.html|Periodic Table of Visualization] provides a catalogue of all data visualization types with visual examples. <br />
<br />
* Gapminder.org [https://www.gapminder.org/tools/#$chart-type=bubbles|interactive visualization tools] provide beautiful examples of effective visualizations. <br />
<br />
===Stata Visual Library===<br />
The DIME Analytics team has created a [https://worldbank.github.io/Stata-IE-Visual-Library/|Stata Visual Library for Impact Evaluation] which shows examples of graphs and provides the codes used to create them. You can contribute to the library [https://github.com/worldbank/Stata-IE-Visual-Library|on our github]. <br />
<br />
=== Data Visualization in R ===<br />
R has many options for data visualization; the ggplot package is one of the best. Here is a list of [http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html|50 ggplot2 Visualizations with full R code] . <br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Data Analysis]]<br />
<br />
<br />
== Additional Resources ==<br />
* Harvard Business Review Article on [https://hbr.org/2016/06/visualizations-that-really-work|Visualizations that Really Work]<br />
<br />
<br />
<br />
[[Category: Data Analysis]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_visualization&diff=4761Data visualization2018-07-20T15:02:05Z<p>Maria jones: Created page with "{{subst:dime_wiki}}"</p>
<hr />
<div><span style="font-size:150%"><br />
<span style="color:#ff0000"> '''NOTE: this article is only a template. Please add content!''' </span><br />
</span><br />
<br />
<br />
add introductory 1-2 sentences here<br />
<br />
<br />
<br />
== Read First ==<br />
* include here key points you want to make sure all readers understand<br />
<br />
<br />
== Guidelines ==<br />
* organize information on the topic into subsections. for each subsection, include a brief description / overview, with links to articles that provide details<br />
===Subsection 1===<br />
===Subsection 2===<br />
===Subsection 3===<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[*topic name, as listed on main page*]]<br />
<br />
<br />
== Additional Resources ==<br />
* list here other articles related to this topic, with a brief description and link<br />
<br />
[[Category: *category name* ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_Analysis&diff=4760Data Analysis2018-07-20T15:01:16Z<p>Maria jones: </p>
<hr />
<div><br />
<onlyinclude><br />
Data analysis refers to the full process of exploring and describing trends and results from data. Data analysis typically has two stages:<br />
<br />
# Exploratory Analysis<br />
# Final Analysis<br />
</onlyinclude><br />
In exploratory analysis, emphasis will be on producing easily understood summaries of the trends in the data so that the reports, publications, presentations, and summaries that need to be produced can begin to be outlined. Once those stories begin to come together, the code is re-written in a "final" form which would be appropriate for public release with the results.<br />
<br />
== Preparing the Dataset for Analysis ==<br />
<br />
Once data is collected, it must be recombined into a final format for analysis, including the construction of derived variables not present in the initial collection. See [[Data Cleaning]].<br />
<br />
== Organizing Analysis Files ==<br />
<br />
Analysis programs that is exploratory in nature should be held in an "exploratory" folder and separated according to topic. Particularly when folder syncing over [https://www.dropbox.com Dropbox] or [https://www.github.com Github] is being used, separating these files by function (rather than combining them into a single "analysis" file) allows multiple researchers to work simultaneously and modularly.<br />
<br />
When the final analysis workflow is agreed upon for a given publication or other output, a final analysis file should be collated for that output only in the "final" analysis folder. This allows selective reuse of the code from the exploratory analyses, in preparation for the final release of the code if required. This allows any collaborator, referee, or replicator to access only the code used to prepare the final outputs and reproduce them exactly.<br />
<br />
== Outputting Analytical Results ==<br />
<br />
Since the final analysis do-files are intended to be fully replicable, and the code itself is considered a vital, shareable output, all tables and figures should be created in such a way that the files are ordered, named, placed, and formatted appropriately. Running the analysis dofile should result in ''only'' necessary files in the "outputs" folder, with names like "figure_1.png", "table_1.xlsx", and so on.<br />
<br />
For some applications (such as creating internal presentations or simple Word reports, file types like PNG and XLSX are sufficiently functional. For larger projects with multiple collaborators, particularly when syncing over a [https://www.github.com GitHub] service, plaintext file types such as EPS, CSV, and TEX will be the preferred formats. Tables and figures should at minimum be produced by this file such that no further mathematical calculations are required; they should furthermore be organized and formatted as nearly to the published versions as possible. Figures are typically easy to do this in by using an appropriate <code>graph export</code> command in Stata or the equivalent. [https://www.latex-project.org LaTeX] is a particularly powerful tool for doing this with tables. DIME provides several guides on both processes. See [[Exporting Analysis |exporting analysis results]] for more details and more resources.<br />
<br />
== Resources for Specific Analytical Tasks ==<br />
<br />
===Spatial/GIS Analysis===<br />
<br />
[[Spatial Analysis]] involves using geospatial information from your data to explore relationships mediated by proximity or connectiveness. This can be descriptive (such as map illustrations) or informative (such as distance to and quality of the nearest road).<br />
<br />
===Randomization Inference===<br />
<br />
[[Randomization Inference]] techniques replace the "normal" p-values from regression analyses with values based on the treatment assignment methodology, and are generally recommended for reporting in experiments whose estimates are of randomly assigned treatment controlled by the implementer and researcher.<br />
<br />
===Heterogeneous Effects Analysis===<br />
<br />
=== Cost Effectiveness Analysis ===<br />
<br />
[[Cost-effectiveness Analysis]] is the economic analysis of the costs and benefits of an impact evaluation project.<br />
<br />
=== Data Visualization ===<br />
<br />
[[Data visualization]] is a critical step in effectively communicating your research results. <br />
<br />
<br />
== Additional Resources ==<br />
<br />
<br />
[[Category: Data Analysis ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Data_Analysis&diff=4759Data Analysis2018-07-20T15:00:31Z<p>Maria jones: </p>
<hr />
<div><br />
<onlyinclude><br />
Data analysis refers to the full process of exploring and describing trends and results from data. Data analysis typically has two stages:<br />
<br />
# Exploratory Analysis<br />
# Final Analysis<br />
</onlyinclude><br />
In exploratory analysis, emphasis will be on producing easily understood summaries of the trends in the data so that the reports, publications, presentations, and summaries that need to be produced can begin to be outlined. Once those stories begin to come together, the code is re-written in a "final" form which would be appropriate for public release with the results.<br />
<br />
== Preparing the Dataset for Analysis ==<br />
<br />
Once data is collected, it must be recombined into a final format for analysis, including the construction of derived variables not present in the initial collection. See [[Data Cleaning]].<br />
<br />
== Organizing Analysis Files ==<br />
<br />
Analysis programs that is exploratory in nature should be held in an "exploratory" folder and separated according to topic. Particularly when folder syncing over [https://www.dropbox.com Dropbox] or [https://www.github.com Github] is being used, separating these files by function (rather than combining them into a single "analysis" file) allows multiple researchers to work simultaneously and modularly.<br />
<br />
When the final analysis workflow is agreed upon for a given publication or other output, a final analysis file should be collated for that output only in the "final" analysis folder. This allows selective reuse of the code from the exploratory analyses, in preparation for the final release of the code if required. This allows any collaborator, referee, or replicator to access only the code used to prepare the final outputs and reproduce them exactly.<br />
<br />
== Outputting Analytical Results ==<br />
<br />
Since the final analysis do-files are intended to be fully replicable, and the code itself is considered a vital, shareable output, all tables and figures should be created in such a way that the files are ordered, named, placed, and formatted appropriately. Running the analysis dofile should result in ''only'' necessary files in the "outputs" folder, with names like "figure_1.png", "table_1.xlsx", and so on.<br />
<br />
For some applications (such as creating internal presentations or simple Word reports, file types like PNG and XLSX are sufficiently functional. For larger projects with multiple collaborators, particularly when syncing over a [https://www.github.com GitHub] service, plaintext file types such as EPS, CSV, and TEX will be the preferred formats. Tables and figures should at minimum be produced by this file such that no further mathematical calculations are required; they should furthermore be organized and formatted as nearly to the published versions as possible. Figures are typically easy to do this in by using an appropriate <code>graph export</code> command in Stata or the equivalent. [https://www.latex-project.org LaTeX] is a particularly powerful tool for doing this with tables. DIME provides several guides on both processes. See [[Exporting Analysis |exporting analysis results]] for more details and more resources.<br />
<br />
== Resources for Specific Analytical Tasks ==<br />
<br />
===Spatial/GIS Analysis===<br />
<br />
[[Spatial Analysis]] involves using geospatial information from your data to explore relationships mediated by proximity or connectiveness. This can be descriptive (such as map illustrations) or informative (such as distance to and quality of the nearest road).<br />
<br />
===Randomization Inference===<br />
<br />
[[Randomization Inference]] techniques replace the "normal" p-values from regression analyses with values based on the treatment assignment methodology, and are generally recommended for reporting in experiments whose estimates are of randomly assigned treatment controlled by the implementer and researcher.<br />
<br />
===Heterogeneous Effects Analysis===<br />
<br />
=== Cost Effectiveness Analysis ===<br />
<br />
[[Cost-effectiveness Analysis]] is the economic analysis of the costs and benefits of an impact evaluation project.<br />
<br />
=== Data Visualization ===<br />
[[Data visualization]] is a critical step in effectively communicating your research results. <br />
<br />
== Additional Resources ==<br />
<br />
<br />
[[Category: Data Analysis ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Survey_Budget&diff=4757Survey Budget2018-07-09T19:26:35Z<p>Maria jones: /* Additional Resources */</p>
<hr />
<div><br />
== Read First ==<onlyinclude><br />
Preparing estimated survey budgets is an important component of planning for and designing an impact evaluation. While the final cost of the exercise may depend on bids from research firms; it is important for the research team to have an accurate estimate of data collection costs, to verify that the data strategy is feasible and the study will be [[Sampling & Power Calculations|well-powered]] given the research budget. </onlyinclude><br />
<br />
== Guidelines ==<br />
=== Step 1: Make a list of things your budget should include ===<br />
* Salaries<br />
* Allowances<br />
* Equipment<br />
* Transport<br />
* Stationery<br />
* Communication<br />
* Other<br />
<br />
=== Step 2: Talk to people who have implemented surveys in your setting! ===<br />
* How much do they pay for standard survey cost items?<br />
* How do they organize transport for enumerators? Accommodation?<br />
Add estimated item costs to the list you made<br />
<br />
=== Step 3: Assumptions - Think through how the fieldwork might be organized ===<br />
* Number of surveys/person/day?<br />
* Number of teams that can be realistically supervised?<br />
Consider: time constraints for implementation (especially if a baseline survey), training duration, field logistics (e.g. will enumerators move by private car? public transport?)<br />
<br />
=== Step 4: Bring it all together ===<br />
Link each budget line item with the assumptions and with the standard rates. It's best practice to add buffer survey days in case of delays (15-20% extra is recommended). <br />
<br />
Budget for contingencies: unanticipated delays, re-training, fuel price hikes<br />
<br />
Make sure to take into account gross versus net budgeting. There may be country-related administrative costs (i.e. taxes) that you have forgotten.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Survey Firm Procurement]]<br />
<br />
<br />
== Additional Resources ==<br />
<br />
<br />
<br />
[[Category: Survey Firm Procurement]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Survey_Budget&diff=4756Survey Budget2018-07-09T19:26:19Z<p>Maria jones: /* Step 4: Bring it all together */</p>
<hr />
<div><br />
== Read First ==<onlyinclude><br />
Preparing estimated survey budgets is an important component of planning for and designing an impact evaluation. While the final cost of the exercise may depend on bids from research firms; it is important for the research team to have an accurate estimate of data collection costs, to verify that the data strategy is feasible and the study will be [[Sampling & Power Calculations|well-powered]] given the research budget. </onlyinclude><br />
<br />
== Guidelines ==<br />
=== Step 1: Make a list of things your budget should include ===<br />
* Salaries<br />
* Allowances<br />
* Equipment<br />
* Transport<br />
* Stationery<br />
* Communication<br />
* Other<br />
<br />
=== Step 2: Talk to people who have implemented surveys in your setting! ===<br />
* How much do they pay for standard survey cost items?<br />
* How do they organize transport for enumerators? Accommodation?<br />
Add estimated item costs to the list you made<br />
<br />
=== Step 3: Assumptions - Think through how the fieldwork might be organized ===<br />
* Number of surveys/person/day?<br />
* Number of teams that can be realistically supervised?<br />
Consider: time constraints for implementation (especially if a baseline survey), training duration, field logistics (e.g. will enumerators move by private car? public transport?)<br />
<br />
=== Step 4: Bring it all together ===<br />
Link each budget line item with the assumptions and with the standard rates. It's best practice to add buffer survey days in case of delays (15-20% extra is recommended). <br />
<br />
Budget for contingencies: unanticipated delays, re-training, fuel price hikes<br />
<br />
Make sure to take into account gross versus net budgeting. There may be country-related administrative costs (i.e. taxes) that you have forgotten.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Survey Firm Procurement]]<br />
<br />
<br />
== Additional Resources ==<br />
Please add here any articles related to this topic, with a brief description and link<br />
<br />
[[Category: Survey Firm Procurement]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Survey_Budget&diff=4755Survey Budget2018-07-09T19:25:16Z<p>Maria jones: /* Step 3: Assumptions - Think through how the fieldwork might be organized */</p>
<hr />
<div><br />
== Read First ==<onlyinclude><br />
Preparing estimated survey budgets is an important component of planning for and designing an impact evaluation. While the final cost of the exercise may depend on bids from research firms; it is important for the research team to have an accurate estimate of data collection costs, to verify that the data strategy is feasible and the study will be [[Sampling & Power Calculations|well-powered]] given the research budget. </onlyinclude><br />
<br />
== Guidelines ==<br />
=== Step 1: Make a list of things your budget should include ===<br />
* Salaries<br />
* Allowances<br />
* Equipment<br />
* Transport<br />
* Stationery<br />
* Communication<br />
* Other<br />
<br />
=== Step 2: Talk to people who have implemented surveys in your setting! ===<br />
* How much do they pay for standard survey cost items?<br />
* How do they organize transport for enumerators? Accommodation?<br />
Add estimated item costs to the list you made<br />
<br />
=== Step 3: Assumptions - Think through how the fieldwork might be organized ===<br />
* Number of surveys/person/day?<br />
* Number of teams that can be realistically supervised?<br />
Consider: time constraints for implementation (especially if a baseline survey), training duration, field logistics (e.g. will enumerators move by private car? public transport?)<br />
<br />
=== Step 4: Bring it all together ===<br />
Link each budget line item with the assumptions and with the standard rates<br />
Add a buffer survey days in case there are delays. 15-20% extra is a good idea<br />
Budget for contingencies: unanticipated delays, re-training, fuel price hikes<br />
Make sure to take into account gross versus net budgeting, there may be country related administrative costs (i.e. taxes) that you have forgotten.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Survey Firm Procurement]]<br />
<br />
<br />
== Additional Resources ==<br />
Please add here any articles related to this topic, with a brief description and link<br />
<br />
[[Category: Survey Firm Procurement]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Survey_Budget&diff=4754Survey Budget2018-07-09T19:23:56Z<p>Maria jones: /* Step 2: Talk to people who have implemented surveys in your setting! */</p>
<hr />
<div><br />
== Read First ==<onlyinclude><br />
Preparing estimated survey budgets is an important component of planning for and designing an impact evaluation. While the final cost of the exercise may depend on bids from research firms; it is important for the research team to have an accurate estimate of data collection costs, to verify that the data strategy is feasible and the study will be [[Sampling & Power Calculations|well-powered]] given the research budget. </onlyinclude><br />
<br />
== Guidelines ==<br />
=== Step 1: Make a list of things your budget should include ===<br />
* Salaries<br />
* Allowances<br />
* Equipment<br />
* Transport<br />
* Stationery<br />
* Communication<br />
* Other<br />
<br />
=== Step 2: Talk to people who have implemented surveys in your setting! ===<br />
* How much do they pay for standard survey cost items?<br />
* How do they organize transport for enumerators? Accommodation?<br />
Add estimated item costs to the list you made<br />
<br />
=== Step 3: Assumptions - Think through how the fieldwork might be organized ===<br />
# surveys/person/day?<br />
# teams I can realistically monitor?<br />
time constraints<br />
training duration<br />
transport: car hire/okada/own bikes?<br />
<br />
=== Step 4: Bring it all together ===<br />
Link each budget line item with the assumptions and with the standard rates<br />
Add a buffer survey days in case there are delays. 15-20% extra is a good idea<br />
Budget for contingencies: unanticipated delays, re-training, fuel price hikes<br />
Make sure to take into account gross versus net budgeting, there may be country related administrative costs (i.e. taxes) that you have forgotten.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Survey Firm Procurement]]<br />
<br />
<br />
== Additional Resources ==<br />
Please add here any articles related to this topic, with a brief description and link<br />
<br />
[[Category: Survey Firm Procurement]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Survey_Budget&diff=4753Survey Budget2018-07-09T19:23:32Z<p>Maria jones: /* Step 1: Make a list of things your budget should include */</p>
<hr />
<div><br />
== Read First ==<onlyinclude><br />
Preparing estimated survey budgets is an important component of planning for and designing an impact evaluation. While the final cost of the exercise may depend on bids from research firms; it is important for the research team to have an accurate estimate of data collection costs, to verify that the data strategy is feasible and the study will be [[Sampling & Power Calculations|well-powered]] given the research budget. </onlyinclude><br />
<br />
== Guidelines ==<br />
=== Step 1: Make a list of things your budget should include ===<br />
* Salaries<br />
* Allowances<br />
* Equipment<br />
* Transport<br />
* Stationery<br />
* Communication<br />
* Other<br />
<br />
=== Step 2: Talk to people who have implemented surveys in your setting! ===<br />
How much do they pay for standard survey cost items?<br />
How do they organize transport for enumerators? Accommodation?<br />
Add item costs to the list you made<br />
<br />
=== Step 3: Assumptions - Think through how the fieldwork might be organized ===<br />
# surveys/person/day?<br />
# teams I can realistically monitor?<br />
time constraints<br />
training duration<br />
transport: car hire/okada/own bikes?<br />
<br />
=== Step 4: Bring it all together ===<br />
Link each budget line item with the assumptions and with the standard rates<br />
Add a buffer survey days in case there are delays. 15-20% extra is a good idea<br />
Budget for contingencies: unanticipated delays, re-training, fuel price hikes<br />
Make sure to take into account gross versus net budgeting, there may be country related administrative costs (i.e. taxes) that you have forgotten.<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Survey Firm Procurement]]<br />
<br />
<br />
== Additional Resources ==<br />
Please add here any articles related to this topic, with a brief description and link<br />
<br />
[[Category: Survey Firm Procurement]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Anthropometric_Indicators&diff=4752Anthropometric Indicators2018-07-09T19:19:04Z<p>Maria jones: </p>
<hr />
<div>Anthopometry is the measurement of the human body, typically used to assess nutritional well-being and outcomes such as stunting and wasting. <br />
<br />
== Guidelines ==<br />
Anthropometric measurements include: weight (including birth weight), height/length, knee height (a proxy for height), mid-upper arm circumference, head circumference, waist circumference, and calf circumference. These can be used to calculate indices to identify low birth weight, stunting, wasting, BMI, head circumference for age, and acute malnutrition. <br />
<br />
[https://www.fantaproject.org FANTA], Food And Nutrition Technical Assistance project of USAID, developed a detailed [[https://www.fantaproject.org/tools/anthropometry-guide Guide to Anthropometry], including an [https://www.fantaproject.org/sites/default/files/resources/MODULE-1-FANTA-Anthropometry-Guide-May2018.pdf overview of types of anthropometric data]; details and classifications for [https://www.fantaproject.org/sites/default/files/resources/MODULE-2-FANTA-Anthropometry-Guide-May2018.pdf children from 0-5], [https://www.fantaproject.org/sites/default/files/resources/MODULE-3-FANTA-Anthropometry-Guide-May2018.pdf youth from 5-19], [https://www.fantaproject.org/sites/default/files/resources/MODULE-4-FANTA-Anthropometry-Guide-May2018.pdf pregnant and postpartum women and girls], and [https://www.fantaproject.org/sites/default/files/resources/MODULE-5-FANTA-Anthropometry-Guide-May2018.pdf adults 18 years and older]; and [https://www.fantaproject.org/sites/default/files/resources/MODULE-6-FANTA-Anthropometry-Guide-May2018.pdf measurement protocols and equipment guidance]. <br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Primary Data Collection]]<br />
<br />
<br />
== Additional Resources ==<br />
<br />
[https://www.fantaproject.org/tools/anthropometry-guide FANTA Guide to Anthropometry]<br />
<br />
[[Category: Primary Data Collection ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Anthropometric_Indicators&diff=4751Anthropometric Indicators2018-07-09T19:16:51Z<p>Maria jones: </p>
<hr />
<div>Anthopometry is the measurement of the human body, typically used to assess nutritional well-being and outcomes such as stunting and wasting. <br />
<br />
<br />
== Guidelines ==<br />
Anthropometric measurements include: weight (including birth weight), height/length, knee height (a proxy for height), mid-upper arm circumference, head circumference, waist circumference, and calf circumference. These can be used to calculate indices to identify low birth weight, stunting, wasting, BMI, head circumference for age, and acute malnutrition. <br />
<br />
[https://www.fantaproject.org FANTA], Food And Nutrition Technical Assistance project of USAID, developed a detailed guide to anthropometric data[https://www.fantaproject.org/tools/anthropometry-guide], including an overview of types of anthropometric data; details and classifications for children from 0-5, youth from 5-19, pregnant and postpartum women and girls, and adults 18 years and older; measurement protocols and equipment guidance. <br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Primary Data Collection]]<br />
<br />
<br />
== Additional Resources ==<br />
<br />
[https://www.fantaproject.org/tools/anthropometry-guide FANTA Guide to Anthropometry]<br />
<br />
[[Category: Primary Data Collection ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Anthropometric_Indicators&diff=4750Anthropometric Indicators2018-07-09T19:04:12Z<p>Maria jones: Created page with "{{subst:dime_wiki}}"</p>
<hr />
<div><span style="font-size:150%"><br />
<span style="color:#ff0000"> '''NOTE: this article is only a template. Please add content!''' </span><br />
</span><br />
<br />
<br />
add introductory 1-2 sentences here<br />
<br />
<br />
<br />
== Read First ==<br />
* include here key points you want to make sure all readers understand<br />
<br />
<br />
== Guidelines ==<br />
* organize information on the topic into subsections. for each subsection, include a brief description / overview, with links to articles that provide details<br />
===Subsection 1===<br />
===Subsection 2===<br />
===Subsection 3===<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[*topic name, as listed on main page*]]<br />
<br />
<br />
== Additional Resources ==<br />
* list here other articles related to this topic, with a brief description and link<br />
<br />
[[Category: *category name* ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Primary_Data_Collection&diff=4749Primary Data Collection2018-07-09T19:02:54Z<p>Maria jones: </p>
<hr />
<div>== Read First ==<onlyinclude><br />
Primary data is directly generated by the researcher. Household surveys are the prototypical example of primary data collection. Unlike [[Secondary Data Sources]], primary data collection can be personally directed by the researcher to ensure it meets the standards of quality, availability, statistical power, and sampling required for a particular research inquiry. With globally increasing access to survey tools such as software, field manuals, and specialized firms, data collected and owned by the researcher has become the dominant method of empirical inquiry in development economics.</onlyinclude><br />
<br />
<br />
== Types of primary data ==<br />
The most common types of primary data are personal interviews. Depending on the research, these may take the form of household surveys, business (firm) surveys, or agricultural (farm) surveys. Some studies may include objective measurements such as [[Anthropometric Indicators]]. <br />
<br />
== Modes of primary data collection ==<br />
Surveys can be conducted on paper ([[Pen-and-Paper Personal Interviews (PAPI)]]) or electronically ([[Computer-Assisted Personal Interviews (CAPI)]]), or a combination of the two ([[Computer-Assisted Field Entry (CAFE)]]). <br />
<br />
== Preparing for primary data collection ==<br />
The following are critical steps in preparing for primary data collection:<br />
* Determine sampling frame<br />
* Conduct sampling, based on [[Sample Size]] calculations, taking care to do so [[Randomization in Stata|reproducibly]]<br />
* [[Questionnaire Design and Translation|Design and translate the survey instrument]]<br />
* [[Survey Pilot|Pilot the Survey Instrument]]<br />
* [[Questionnaire Programming|Program the survey instrument]] if data is being collected electronically<br />
* [[Procuring a Survey Firm|Procure a Survey Firm]], taking care to prepare detailed [[Survey Firm TOR|Terms of Reference]]<br />
* [[Preparing for Field Data Collection|Prepare for field work]]<br />
* Create a [[Data Quality Assurance Plan]]<br />
* [[Enumerator Training| Train data collectors]]<br />
<br />
== Additional Resources ==<br />
<br />
<br />
[[Category: Primary Data Collection ]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Pre-Analysis_Plan&diff=4748Pre-Analysis Plan2018-06-29T14:43:39Z<p>Maria jones: </p>
<hr />
<div><onlyinclude>A pre-analysis plan (PAP) lays out how the researcher will analyze data, at the design stage of an impact evaluation. The objective of a PAP is to prevent data mining and specification searching. </onlyinclude><br />
<br />
<br />
== Read First ==<br />
While most economics journals do not currently require PAPs as a condition for publication, researchers may choose to produce a PAP prior to data analysis to: (i) increase the credibility of their findings; and (ii) help researchers finetune their analysis strategy.<br />
<br />
While PAPs provide the benefit of potentially reducing the prevalence of spurious results, this comes at the cost of tying researcher hands more formally to ex ante analysis plans that may limit the potential of exploratory learning. Benjamin Olken provides a summary of the costs and benefits associated with fully pre-specifying the analysis for a development economics RCT [https://www.aeaweb.org/articles?id=10.1257/jep.29.3.61]. He notes that "forcing all papers to be fully pre-specified from start to end would likely results in simpler papers, which could potentially lose some of the nuance of current work", but that "in many contexts, pre-specification of one (or a few) key primary outcome variables, statistical specifications, and control variables offers a number of advantages".<br />
<br />
== Guidelines ==<br />
<br />
The Berkeley Institute for Transparency in Social Sciences prepared a template for * [https://www.bitss.org/wp-content/uploads/2015/12/Pre-Analysis-Plan-Template.pdf Pre-Analysis Plans], which provides the overall structure and guidance on what details to include (.doc and .tex formats available). <br />
<br />
We recommend also consulting this [http://blogs.worldbank.org/impactevaluations/a-pre-analysis-plan-checklist Pre-analysis plan checklist] from the Development Impact Blog. <br />
<br />
You can find 13 examples of pre-analysis plans at the [https://www.povertyactionlab.org/Hypothesis-Registry JPAL Hypothesis Registry]. <br />
<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Research Ethics]]<br />
<br />
<br />
== Additional Resources ==<br />
*Olken, Benjamin A.. 2015. "[https://www.aeaweb.org/articles?id=10.1257/jep.29.3.61 Promises and Perils of Pre-analysis Plans]." Journal of Economic Perspectives, 29(3): 61-80.<br />
DOI: 10.1257/jep.29.3.61<br />
<br />
<br />
<br />
[[Category: Research Ethics]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Ietoolkit&diff=4746Ietoolkit2018-06-21T18:51:42Z<p>Maria jones: Redirected page to Stata Coding Practices#ietoolkit</p>
<hr />
<div>#redirect [[Stata Coding Practices #ietoolkit |ietoolkit]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Ietoolkit&diff=4745Ietoolkit2018-06-21T18:50:21Z<p>Maria jones: Redirected page to Ietoolkit</p>
<hr />
<div>#redirect [[ietoolkit|Stata Coding Practices #ietoolkit]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Sensitive_Topics&diff=4736Sensitive Topics2018-06-15T16:42:19Z<p>Maria jones: </p>
<hr />
<div>This article provides guidance for how to collect data on sensitive topics. <br />
<br />
<br />
== Read First ==<onlyinclude><br />
For certain topics, respondents will have incentives to conceal the truth, due to taboos, social pressure (e.g. Social Desirability Bias, fear of retaliation, etc). This can create bias, the size and direction of which can be hard to predict. To avoid this, it is essential to guarantee anonymity / confidentiality, and to develop [[Survey Protocols]] to guarantee privacy and maximize trust. If this is not sufficient, experimental methods such as [[Randomized Response Technique]], [[List Experiments]] and [[Endorsement Experiments]] can be used.</onlyinclude><br />
<br />
== Guidelines ==<br />
=== Survey Design for Sensitive Data ===<br />
* Never start with difficult or sensitive modules! Start with easy questions, and work up to harder questions, expecting that the respondent will become increasingly comfortable with / trusting of the enumerator as the interview proceeds. <br />
* Survey mode: self-administered questionnaires may provide more accurate data than interviews<br />
* Frame questions to avoid social desirability bias<br />
* One possible strategy is to ask to count the number of statements that are true among a list that contains '''one''' sensitive statement (ex: My partner is sometimes violent with me). The difference in the counts between the treatment arms will reveal the effect (or lack of).<br />
<br />
=== Survey Protocols for Collecting Sensitive Data ===<br />
* Make sure the respondent knows that responses will never be personally identified. This should be part of the Informed Consent module. <br />
* Interviews should be done privately, without even family members around (especially for discussing issues such as domestic violence)<br />
* Enumerators who share characteristics with the respondent (same gender, age, ethnic group, background) may garner increased trust<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Questionnaire Design]]<br />
<br />
<br />
== Additional Resources ==<br />
* Survey Methods for Sensitive Topics: https://graemeblair.com/papers/sensitive.pdf<br />
* Bowling, Ann. "Mode of questionnaire administration can have serious effects on data quality." Journal of public health 27.3 (2005): 281-291. [https://academic.oup.com/jpubhealth/article/27/3/281/1511097]<br />
* Frauke Kreuter, Stanley Presser, Roger Tourangeau; Social Desirability Bias in CATI, IVR, and Web Surveys: The Effects of Mode and Question Sensitivity. Public Opin Q 2009; 72 (5): 847-865. doi: 10.1093/poq/nfn063 <br />
[https://academic.oup.com/poq/article/72/5/847/1833162]<br />
<br />
[[Category: Questionnaire Design]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Sensitive_Topics&diff=4735Sensitive Topics2018-06-15T16:40:17Z<p>Maria jones: </p>
<hr />
<div>This article provides guidance for how to collect data on sensitive topics. <br />
<br />
<br />
== Read First ==<onlyinclude><br />
For certain topics, respondents will have incentives to conceal the truth, due to taboos, social pressure (e.g. Social Desirability Bias, fear of retaliation, etc). This can create bias, the size and direction of which can be hard to predict. To avoid this, it is essential to guarantee anonymity / confidentiality, and to develop [[Survey Protocols]] to guarantee privacy and maximize trust. If this is not sufficient, experimental methods such as [[Randomized Response Technique]], [[List Experiments]] and [[Endorsement Experiments]] can be used.</onlyinclude><br />
<br />
== Guidelines ==<br />
=== Survey Design for Sensitive Data ===<br />
* Never start with difficult or sensitive modules! Start with easy questions, and work up to harder questions, expecting that the respondent will become increasingly comfortable with / trusting of the enumerator as the interview proceeds. <br />
* Survey mode: self-administered questionnaires may provide more accurate data than interviews<br />
* Frame questions to avoid social desirability bias<br />
* One possible strategy is to ask to count the number of statements that are true among a list that contains '''one''' sensitive statement (ex: My partner is sometimes violent with me). The difference in the counts between the treatment arms will reveal the effect (or lack of).<br />
<br />
=== Survey Protocols for Collecting Sensitive Data ===<br />
* Make sure the respondent knows that responses will never be personally identified. This should be part of the Informed Consent module. <br />
* Interviews should be done privately, without even family members around (especially for discussing issues such as domestic violence)<br />
* Enumerators who share characteristics with the respondent (same gender, age, ethnic group, background) may garner increased trust<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Questionnaire Design]]<br />
<br />
<br />
== Additional Resources ==<br />
* Survey Methods for Sensitive Topics: https://graemeblair.com/papers/sensitive.pdf<br />
* Bowling, Ann. "Mode of questionnaire administration can have serious effects on data quality." Journal of public health 27.3 (2005): 281-291. [https://msrc.fsu.edu/system/files/Bowling%202005%20Mode%20of%20Questionnaire%20Administration%20Can%20Have%20Serious%20Effects%20on%20Date%20Quality.pdf]<br />
* Frauke Kreuter, Stanley Presser, Roger Tourangeau; Social Desirability Bias in CATI, IVR, and Web Surveys: The Effects of Mode and Question Sensitivity. Public Opin Q 2009; 72 (5): 847-865. doi: 10.1093/poq/nfn063 <br />
[https://academic.oup.com/poq/article/72/5/847/1833162]<br />
<br />
[[Category: Questionnaire Design]]</div>Maria joneshttps://dimewiki.worldbank.org/index.php?title=Sensitive_Topics&diff=4734Sensitive Topics2018-06-15T16:37:26Z<p>Maria jones: </p>
<hr />
<div>This article provides guidance for how to collect data on sensitive topics. <br />
<br />
<br />
== Read First ==<onlyinclude><br />
For certain topics, respondents will have incentives to conceal the truth, due to taboos, social pressure (e.g. Social Desirability Bias, fear of retaliation, etc). This can create bias, the size and direction of which can be hard to predict. To avoid this, it is essential to guarantee anonymity / confidentiality, and to develop [[Survey Protocols]] to guarantee privacy and maximize trust. If this is not sufficient, experimental methods such as [[Randomized Response Technique]], [[List Experiments]] and [[Endorsement Experiments]] can be used.</onlyinclude><br />
<br />
== Guidelines ==<br />
=== Survey Design for Sensitive Data ===<br />
* Never start with difficult or sensitive modules! Start with easy questions, and work up to harder questions, expecting that the respondent will become increasingly comfortable with / trusting of the enumerator as the interview proceeds. <br />
* Survey mode: self-administered questionnaires may provide more accurate data than interviews<br />
* Frame questions to avoid social desirability bias<br />
* One possible strategy is to ask to count the number of statements that are true among a list that contains '''one''' sensitive statement (ex: My partner is sometimes violent with me). The difference in the counts between the treatment arms will reveal the effect (or lack of).<br />
<br />
=== Survey Protocols for Collecting Sensitive Data ===<br />
* Make sure the respondent knows that responses will never be personally identified. This should be part of the Informed Consent module. <br />
* Interviews should be done privately, without even family members around (especially for discussing issues such as domestic violence)<br />
* Enumerators who share characteristics with the respondent (same gender, age, ethnic group, background) may garner increased trust<br />
<br />
== Back to Parent ==<br />
This article is part of the topic [[Questionnaire Design]]<br />
<br />
<br />
== Additional Resources ==<br />
* Survey Methods for Sensitive Topics: https://graemeblair.com/papers/sensitive.pdf<br />
* Bowling, Ann. "Mode of questionnaire administration can have serious effects on data quality." Journal of public health 27.3 (2005): 281-291. [https://msrc.fsu.edu/system/files/Bowling%202005%20Mode%20of%20Questionnaire%20Administration%20Can%20Have%20Serious%20Effects%20on%20Date%20Quality.pdf]<br />
* Frauke Kreuter, Stanley Presser, Roger Tourangeau; Social Desirability Bias in CATI, IVR, and Web Surveys: The Effects of Mode and Question Sensitivity. Public Opin Q 2009; 72 (5): 847-865. doi: 10.1093/poq/nfn063 <br />
[https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/poq/72/5/10.1093/poq/nfn063/2/nfn063.pdf?Expires=1486496735&Signature=GonbJEVn87BQJ1B9ouJHx4IqiRPGNSbzmU4YNoLwcQpTSizbX828jFkHaJRt8Pj6fSkvWIoy60XwiSIlL2BhmqOVQqzSHfcgiH3tpV6F45FTL8CK-naAfkbDH589fKC51d1aYvhXNQfo5vIZprjxv3cOuuexvB1a-Aqwma3jQqMra8izONnypi0EfglULbPqcVisT3T1sPWeXEK5ii1ilzc74jRQ7sOiixdlGJ29EDdlUbH3MGYtfxquayAuEySiACUQ-ERAvfYuiI0XOyv5kI-yCn9vmMKz1ZBQfuLI6Y6W~z6Vcc3mmgVpkXWpgodoX-obFUL7X4OXUK6MTFQ14w__&Key-Pair-Id=APKAIUCZBIA4LVPAVW3Q]<br />
<br />
[[Category: Questionnaire Design]]</div>Maria jones