Difference between revisions of "Identity Checks"
(7 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Interviewing the | Interviewing the correct individuals is a crucial part of [[Primary Data Collection|data collection]], and preparing to overcome the possible barriers to this goal should be part of the [[Data Quality Assurance Plan|data quality assurance plan]]. | ||
== Causes == | == Causes == | ||
Line 5: | Line 5: | ||
=== Enumerator mistake === | === Enumerator mistake === | ||
Checking identities will normally depend on the information at hand to confirm that the person found is the one that was being tracked. Sometimes enumerators fail to confirm that all of the identity variables were checked, and | Checking identities will normally depend on the information at hand to confirm that the person found is the one that was being tracked. Sometimes [[Enumerator Training|enumerators]] fail to confirm that all of the identity '''variables''' were checked, and continue even if only some of them were correct. It can even be the case that the right person is found, but some of their info is actually wrong in the '''tracking data''' (such as a wrong birthdate or address). This makes it hard for '''enumerators''' to decide conclusively on the respondent's identity, and after considerable effort, they may be inclined to confirm it. Sometimes even the respondent may think that he's the person being searched. An homonymous neighbor, someone with the same nickname or other matching characteristics may incline many people (such as the ones playing dice in J-PAL's story) to point the '''enumerator''' to a person that's similar, but not exactly the one in the sample. | ||
=== Enumerator fraud === | === Enumerator fraud === | ||
There are different degrees of fraud that can be committed by an enumerator, from a plain frauded interview | There are different degrees of fraud that can be committed by an [[Enumerator Training|enumerator]], from a plain frauded interview to an actual interview conducted with someone with very similar identity characteristics, so that even if the actual respondent is not found, it could be seen as an honest mistake. | ||
Respondents may also fake their identities. In [[Primary Data Collection|data collections]] where there is a gift for respondents to compensate for the time they have invested in answering the [[Survey Pilot|survey]], people may try to convince '''enumerators''' that they are actually part of the sample. In '''surveys''' with adolescents or children, it is also not unusual for them to try to provide false identities to play with '''enumerators'''. | |||
== Solutions == | == Solutions == | ||
Based on these possibilities, it is necessary for the research team to prepare the identification process thoroughly, covering all possible loopholes. During this process, some of the factors to have in mind are: | Based on these possibilities, it is necessary for the [[Impact Evaluation Team|research team]] to prepare the identification process thoroughly, covering all possible loopholes. During this process, some of the factors to have in mind are: | ||
* Previously assemble the available information from respondents for identity checks. This information may come from administrative data sets, a Baseline | *Previously assemble the available information from respondents for '''identity checks'''. This information may come from [[Administrative and Monitoring Data|administrative data sets]], a Baseline [[Survey Pilot|Survey]] (in the case of a follow-up interview), etc. Characteristics such as the individual's full name, the full names of other household members and close relatives, birthdate, identification numbers, address, occupation and other relevant information can be used to create '''identity checks'''. These could be used both as inputs for the tracking work of '''enumerators''' and as checks built into the [[Questionnaire Design|questionnaire]] to confirm the respondent's identity. | ||
*There's a trade-off between checking enumerator fraud and respondent fraud. The more information you give enumerators for tracking, the higher the probability that they will be able to find the correct respondent. On the other side, these variables will be known to the interviewer, and will no longer be useful to check enumerator fraud (the enumerator will be able to use these correct answers in an eventual frauded interview). It is important to find a balance between both of these checks, which depends on | *There's a trade-off between checking for [[Enumerator Training|enumerator]] fraud and respondent fraud. The more information you give '''enumerators''' for tracking, the higher the probability that they will be able to find the correct respondent. On the other side, these '''variables''' will be known to the interviewer, and will no longer be useful to check for '''enumerator''' fraud (the '''enumerator''' will be able to use these correct answers in an eventual frauded interview). It is important to find a balance between both of these checks, which depends on the incentives for both of these frauds. In a case where birthdates of respondents were previously available and fraud incentives for both the interviewer and the respondent were identified, for example, one measure taken was to block the interview when the date of birth given was incorrect, but not for the month by programming it in the [[Questionnaire Programming|questionnaire]]. This way, the day of birth acts as a filter for respondent fraud (since the interviewer will immediately know that this was not the correct respondent), while the month of birth can still be used as a measure to check '''enumerator''' fraud. | ||
Map all possible measures to check identities. These could be: | |||
* | *'''Adding questions to confirm identity''': the first questions in a [[Survey Pilot|survey]] could be used specifically to confirm the respondent's identity, preloading identification numbers, birthdates or any other personal information. By preloading these checks on the [[Questionnaire Programming|questionnaire]] itself, you also limit the amount of information given to [[Enumerator Training|enumerators]] (they will receive alerts when the information doesn't match, but still won't know the correct answer, which makes it harder to fake interviews). | ||
* Raise probability of successful back-checks. J-PAL's story shows the difficulties | *'''Recording interviews''': although a time-intensive task, it provides important information to understand the work being done in the field and why inconsistencies were found in an interview. Checking inconsistencies through [[Monitoring Data Quality#Random Audio Audits|audio]] can help to confirm the identity of interviewees, and may also flag problems such as methodological errors, where the interviewer created an inconsistency by asking a question poorly or stating the question incorrectly. It is important to plan the recording of interviews thoroughly prior to the [[Primary Data Collection|data collection]], since this will have to be approved by the [[IRB Approval|Institutional Review Board]]. In specific cases, such as when interviewing children or focusing on [[Sensitive Topics|sensitive information]] such as income or sexual habits, it may be impossible to record interviews (either because it is not approved by the '''IRB''' or because it may bias responses and make respondents uncomfortable). The equipment will also have to be tested based on the audio, since data storage demands will increase considerably. This may be a problem in cases where tablets provided by the [[Survey Firm|survey firm]] are of low quality. [[Monitoring Data Quality#Phone-based back checks|Phone interviews]] may also be a problem in recording full interviews. At the time of this writing (May 2020), SurveyCTO is adding features to its early release version to support recording '''phone interviews.''' This will be done using the VOICE_CALL audio feature in '''audio audits'''. | ||
* Pilot and review the surveying procedure exhaustively to avoid errors and unexpected events. It is hard to predict all problems that may affect the data collection and, more specifically, the identification of respondents. An example was observed in several self-administered questionnaires with adolescents in schools: enumerators would identify the students, enter their identification in the first part of the questionnaire and then provide them the tablets to answer the rest of the survey. After the pilots, though, it became clear that some students would | |||
* Have a clear agreement on the procedure and the penalties for wrong and frauded interviews. The process of finding and dealing with | *'''Recording GPS data''': recording the geographical location at the time of the interview is another possible way to show that, at the very least, the interviewer was at the designated location. On the other hand, this also demands previous checks, since sometimes tablets may be of poor quality, taking a very long time to reach the desired accuracy. In SurveyCTO, for example, this accuracy can also be changed when [[Questionnaire Programming|programming the questionnaire]], overcoming these issues. | ||
* Provide continuous feedback to enumerators and contact them promptly when inconsistencies are found. Checking these inconsistencies promptly raises the probability that enumerators still remember the details of the interview, making it easier to understand what happened. Additionally, it also shows enumerators that their interviews are being thoroughly checked, | |||
* Assure a reasonable balance between the amount of effort needed to find respondents and the compensation of enumerators. If the enumerator in J-PAL's story had a work contract that expected him to make 5 interviews per day, how would he ever be able to spend that whole day tracking the respondent near the dice table? Having a clear understanding of the difficulties faced by the enumerator team in the tracking work allows a realistic approach, and may help to increase the compensation of enumerators and to plan their work in a feasible manner. There is usually not a large scope of power for the research team in this decision, but having a realistic budget and describing the specific demands to be expected of enumerators in detail in the Terms of Reference are all steps that may prove to be valuable to guarantee the correct balance of incentives in the fieldwork. | |||
* Assert the importance of the procedures for the research. During | ===Analyzing metadata=== | ||
Apps such as SurveyCTO allow its users to store metadata from an interview, such as light and sound levels, or even the probability that a conversation actually happened during the [[Survey Pilot|survey]]. | |||
* Raise probability of successful [[Back Checks|back-checks]]. J-PAL's story shows the difficulties in finding a respondent. If it is this hard to find them once, imagine doing it once again for a '''back-check'''. The interview should capture as much information as possible, such as phone numbers of family members, an updated address, etc., to allow for a '''back-check'''. However, long interviews may also create resistance among respondents to repeat the process in a '''back-check''', even if told that it will be a much shorter subset of questions. The power of persuasion from back-checkers is essential in these cases. If gifts to respondents are being distributed, this could also be structured in a way that, for interviews flagged for '''back-checks''', the gift is only given after it, guaranteeing an incentive to take part in this second interview. | |||
* Pilot and review the '''surveying''' procedure exhaustively to avoid errors and unexpected events. It is hard to predict all problems that may affect the [[Primary Data Collection|data collection]] and, more specifically, the identification of respondents. An example was observed in several self-administered [[Questionnaire Design|questionnaires]] with adolescents in schools: [[Enumerators|enumerators]] would identify the students, enter their identification in the first part of the '''questionnaire''' and then provide them the tablets to answer the rest of the '''survey'''. After the pilots, though, it became clear that some students would go back and change their identity information, answering the '''survey''' as one of their schoolmates. Having observed this in the pilot allowed the [[Impact Evaluation Team|research team]] to add other measures to the procedure of the '''data collection''', such as blocking changes in the first part of the '''survey''' or having '''enumerators''' check that the identity of the students at the beginning and the end of the '''survey''' are still the same. | |||
* Have a clear agreement on the procedure and the penalties for wrong and frauded interviews. The process of finding and dealing with these interviews is delicate. Wrong interviews will need to be performed once again, demanding additional transportation costs, '''enumerator''' time and all related inputs. This can be dealt with using resistance, both by '''enumerators''' and by the [[Survey Firm|survey firm]]. Thus, it is important that, before starting the '''data collection''', the whole team is aware of the problems that may arise, as well as their consequences. Frauded interviews may also result in contract disputes, both between the '''research team''' and the '''survey firm''' or between the firm and '''enumerators'''. It is important to cover these possibilities as much as possible in the contracts, preventing grey areas that may result in unsolvable conflicts, endangering the completion of the '''data collection'''. | |||
* Provide continuous feedback to '''enumerators''' and contact them promptly when inconsistencies are found. Checking these inconsistencies promptly raises the probability that '''enumerators''' still remember the details of the interview, making it easier to understand what happened. Additionally, it also shows '''enumerators''' that their interviews are being thoroughly checked, incentivizing them to find the correct respondent and not falsify the interview. | |||
* Assure a reasonable balance between the amount of effort needed to find respondents and the compensation of '''enumerators'''. If the '''enumerator''' in J-PAL's story had a work contract that expected him to make 5 interviews per day, how would he ever be able to spend that whole day tracking the respondent near the dice table? Having a clear understanding of the difficulties faced by the '''enumerator''' team in the tracking work allows for a realistic approach, and may help to increase the compensation of '''enumerators''' and to plan their work in a feasible manner. There is usually not a large scope of power for the '''research team''' in this decision, but having a realistic [[Survey Budget|budget]] and describing the specific demands to be expected of '''enumerators''' in detail in the [[Survey Firm TOR|Terms of Reference]] are all steps that may prove to be valuable to guarantee the correct balance of incentives in the fieldwork. | |||
* Assert the importance of the procedures for the research. During '''enumerator''' training of, it is good to show them the big picture of the '''impact evaluation''', helping them understand why it is important to find the exact subjects that are being interviewed. Sometimes an '''enumerator''' has had a long career in this area, where market research and other '''surveys''' may demand different sampling methodologies (such as [[Stratified Random Sample|stratifying subjects]], so that they need to find a respondent matching characteristics A, B and C, but have the freedom to select who to interview, as long as these attributes are met). To understand why our demands may be different is another step to assure that they stick to this methodology. |
Latest revision as of 14:10, 7 August 2023
Interviewing the correct individuals is a crucial part of data collection, and preparing to overcome the possible barriers to this goal should be part of the data quality assurance plan.
Causes
The following are the reasons for the occurrence of identity checks:
Enumerator mistake
Checking identities will normally depend on the information at hand to confirm that the person found is the one that was being tracked. Sometimes enumerators fail to confirm that all of the identity variables were checked, and continue even if only some of them were correct. It can even be the case that the right person is found, but some of their info is actually wrong in the tracking data (such as a wrong birthdate or address). This makes it hard for enumerators to decide conclusively on the respondent's identity, and after considerable effort, they may be inclined to confirm it. Sometimes even the respondent may think that he's the person being searched. An homonymous neighbor, someone with the same nickname or other matching characteristics may incline many people (such as the ones playing dice in J-PAL's story) to point the enumerator to a person that's similar, but not exactly the one in the sample.
Enumerator fraud
There are different degrees of fraud that can be committed by an enumerator, from a plain frauded interview to an actual interview conducted with someone with very similar identity characteristics, so that even if the actual respondent is not found, it could be seen as an honest mistake. Respondents may also fake their identities. In data collections where there is a gift for respondents to compensate for the time they have invested in answering the survey, people may try to convince enumerators that they are actually part of the sample. In surveys with adolescents or children, it is also not unusual for them to try to provide false identities to play with enumerators.
Solutions
Based on these possibilities, it is necessary for the research team to prepare the identification process thoroughly, covering all possible loopholes. During this process, some of the factors to have in mind are:
- Previously assemble the available information from respondents for identity checks. This information may come from administrative data sets, a Baseline Survey (in the case of a follow-up interview), etc. Characteristics such as the individual's full name, the full names of other household members and close relatives, birthdate, identification numbers, address, occupation and other relevant information can be used to create identity checks. These could be used both as inputs for the tracking work of enumerators and as checks built into the questionnaire to confirm the respondent's identity.
- There's a trade-off between checking for enumerator fraud and respondent fraud. The more information you give enumerators for tracking, the higher the probability that they will be able to find the correct respondent. On the other side, these variables will be known to the interviewer, and will no longer be useful to check for enumerator fraud (the enumerator will be able to use these correct answers in an eventual frauded interview). It is important to find a balance between both of these checks, which depends on the incentives for both of these frauds. In a case where birthdates of respondents were previously available and fraud incentives for both the interviewer and the respondent were identified, for example, one measure taken was to block the interview when the date of birth given was incorrect, but not for the month by programming it in the questionnaire. This way, the day of birth acts as a filter for respondent fraud (since the interviewer will immediately know that this was not the correct respondent), while the month of birth can still be used as a measure to check enumerator fraud.
Map all possible measures to check identities. These could be:
- Adding questions to confirm identity: the first questions in a survey could be used specifically to confirm the respondent's identity, preloading identification numbers, birthdates or any other personal information. By preloading these checks on the questionnaire itself, you also limit the amount of information given to enumerators (they will receive alerts when the information doesn't match, but still won't know the correct answer, which makes it harder to fake interviews).
- Recording interviews: although a time-intensive task, it provides important information to understand the work being done in the field and why inconsistencies were found in an interview. Checking inconsistencies through audio can help to confirm the identity of interviewees, and may also flag problems such as methodological errors, where the interviewer created an inconsistency by asking a question poorly or stating the question incorrectly. It is important to plan the recording of interviews thoroughly prior to the data collection, since this will have to be approved by the Institutional Review Board. In specific cases, such as when interviewing children or focusing on sensitive information such as income or sexual habits, it may be impossible to record interviews (either because it is not approved by the IRB or because it may bias responses and make respondents uncomfortable). The equipment will also have to be tested based on the audio, since data storage demands will increase considerably. This may be a problem in cases where tablets provided by the survey firm are of low quality. Phone interviews may also be a problem in recording full interviews. At the time of this writing (May 2020), SurveyCTO is adding features to its early release version to support recording phone interviews. This will be done using the VOICE_CALL audio feature in audio audits.
- Recording GPS data: recording the geographical location at the time of the interview is another possible way to show that, at the very least, the interviewer was at the designated location. On the other hand, this also demands previous checks, since sometimes tablets may be of poor quality, taking a very long time to reach the desired accuracy. In SurveyCTO, for example, this accuracy can also be changed when programming the questionnaire, overcoming these issues.
Analyzing metadata
Apps such as SurveyCTO allow its users to store metadata from an interview, such as light and sound levels, or even the probability that a conversation actually happened during the survey.
- Raise probability of successful back-checks. J-PAL's story shows the difficulties in finding a respondent. If it is this hard to find them once, imagine doing it once again for a back-check. The interview should capture as much information as possible, such as phone numbers of family members, an updated address, etc., to allow for a back-check. However, long interviews may also create resistance among respondents to repeat the process in a back-check, even if told that it will be a much shorter subset of questions. The power of persuasion from back-checkers is essential in these cases. If gifts to respondents are being distributed, this could also be structured in a way that, for interviews flagged for back-checks, the gift is only given after it, guaranteeing an incentive to take part in this second interview.
- Pilot and review the surveying procedure exhaustively to avoid errors and unexpected events. It is hard to predict all problems that may affect the data collection and, more specifically, the identification of respondents. An example was observed in several self-administered questionnaires with adolescents in schools: enumerators would identify the students, enter their identification in the first part of the questionnaire and then provide them the tablets to answer the rest of the survey. After the pilots, though, it became clear that some students would go back and change their identity information, answering the survey as one of their schoolmates. Having observed this in the pilot allowed the research team to add other measures to the procedure of the data collection, such as blocking changes in the first part of the survey or having enumerators check that the identity of the students at the beginning and the end of the survey are still the same.
- Have a clear agreement on the procedure and the penalties for wrong and frauded interviews. The process of finding and dealing with these interviews is delicate. Wrong interviews will need to be performed once again, demanding additional transportation costs, enumerator time and all related inputs. This can be dealt with using resistance, both by enumerators and by the survey firm. Thus, it is important that, before starting the data collection, the whole team is aware of the problems that may arise, as well as their consequences. Frauded interviews may also result in contract disputes, both between the research team and the survey firm or between the firm and enumerators. It is important to cover these possibilities as much as possible in the contracts, preventing grey areas that may result in unsolvable conflicts, endangering the completion of the data collection.
- Provide continuous feedback to enumerators and contact them promptly when inconsistencies are found. Checking these inconsistencies promptly raises the probability that enumerators still remember the details of the interview, making it easier to understand what happened. Additionally, it also shows enumerators that their interviews are being thoroughly checked, incentivizing them to find the correct respondent and not falsify the interview.
- Assure a reasonable balance between the amount of effort needed to find respondents and the compensation of enumerators. If the enumerator in J-PAL's story had a work contract that expected him to make 5 interviews per day, how would he ever be able to spend that whole day tracking the respondent near the dice table? Having a clear understanding of the difficulties faced by the enumerator team in the tracking work allows for a realistic approach, and may help to increase the compensation of enumerators and to plan their work in a feasible manner. There is usually not a large scope of power for the research team in this decision, but having a realistic budget and describing the specific demands to be expected of enumerators in detail in the Terms of Reference are all steps that may prove to be valuable to guarantee the correct balance of incentives in the fieldwork.
- Assert the importance of the procedures for the research. During enumerator training of, it is good to show them the big picture of the impact evaluation, helping them understand why it is important to find the exact subjects that are being interviewed. Sometimes an enumerator has had a long career in this area, where market research and other surveys may demand different sampling methodologies (such as stratifying subjects, so that they need to find a respondent matching characteristics A, B and C, but have the freedom to select who to interview, as long as these attributes are met). To understand why our demands may be different is another step to assure that they stick to this methodology.