Difference between revisions of "SurveyCTO Random Draw of Beneficiaries 1"

Jump to: navigation, search
Tag: New redirect
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Best Practice == <onlyinclude>
#REDIRECT[[SurveyCTO Additional Topics#Random Draw of Beneficiaries Example 1]]
This example comes from an agriculture survey in Brazil, where the survey firm did a poor job in listing the members of each association. We are looking to survey 8 people in each association, but we cannot be sure that each person on the list is actually a valid member. The pools are quite large (>100) and most are assumed to be valid IDs, but we still need to be careful that there are enough IDs chosen for the list.</onlyinclude>
 
== Best Practice ==  
This page gives a detailed example on one of the ways to perform [[Randomization in SurveyCTO|randomization in SurveyCTO]].  If you would like an overview of the process, read that page first.
 
This example comes from an agriculture survey in Brazil, where the [[Survey Firm|survey firm]] did a poor job in listing the members of each association. We are looking to [[Survey Pilot|survey]] 8 people in each association, but we cannot be sure that each person on the list is actually a valid member. The pools are quite large (>100) and most are assumed to be valid IDs, but we still need to be careful that there are enough IDs chosen for the list.</onlyinclude>


We approach the problem in 2 stages:
We approach the problem in 2 stages:
# Randomly select enough IDs from the total to be almost certain that 8 will be valid (here we choose 25)
#Randomly select enough IDs from the total to be almost certain that 8 will be valid (here we choose 25)
# Have the enumerator validate if these IDs are in fact proper members of the association. If not, more IDs are pulled from the randomized list to verify until 8 are selected for participation.
#Have the [[Enumerator Training|enumerator]] validate that these IDs are in fact proper members of the association. If not, more IDs are pulled from the '''randomized''' list to validate until 8 are selected for participation.


Ideally you should not be need to select participants in this manner - you would like a sample frame from which to select a sample before your survey teams go to the field to administer the household survey. But in this case it was not possible because of a problem with the listing process.
Ideally, you should not be need to select participants in this manner - you would like a '''sample frame''' from which to select a [[Sampling|sample]] before your '''survey teams''' go to the field to administer the '''household survey'''. But in this case, it was not possible because of a problem with the listing process.


The approach works nicely for selecting a smaller number of people, but would be rather clunky if you are looking to select, say 200 from 1,000 members. The main advantage over [[SurveyCTO Random Draw of Beneficiaries 2|other methods]] is that you are easily able to preload the randomized numbers created in Stata or R to ensure replicability.  
The approach works nicely for selecting a smaller number of people, but would be rather clunky if you are looking to select, say 200 from 1,000 members. The main advantage over [[SurveyCTO Random Draw of Beneficiaries 2|other methods]] is that you are easily able to preload the '''randomized''' numbers created in [[Randomization in Stata|Stata]] or '''R''' to ensure [[Reproducible Research|replicability]].  


This approach can be altered or scaled for application in various situations. It's important to consider the number of permutations that will mostly guarantee that there are enough IDs selected once they have been deduplicated and dropped for not being valid. If your situation is choosing from a smaller pool, or has IDs are more likely to be dropped for not being valid, you should increase the number of draws and the questions to validate members. In this example we assume that we will be able to select 8 people from the 25 drawn IDs.
This approach can be altered or scaled for application in various situations. It's important to consider the number of permutations that will mostly guarantee that there are enough IDs selected once they have been de-duplicated and dropped for not being valid. If your situation is choosing from a smaller pool, or having IDs that are more likely to be dropped for not being valid, you should increase the number of draws and the questions to validate members. In this example we assume that we will be able to select 8 people from the 25 drawn IDs.


== Code Example ==
== Code Example ==
[https://drive.google.com/open?id=1f2SuYEmzMnmzFxsobMa2qae16-5LRLxnKCF7kP6x374 Here is the code example] for a form that selects 8 members (IDs) from an association to participate in a survey round. The label column of the calculate fields also denote what actions are taking place on each line.  
[https://drive.google.com/open?id=1f2SuYEmzMnmzFxsobMa2qae16-5LRLxnKCF7kP6x374 Here is the code example] for a form that selects 8 members (IDs) from an association to participate in a [[Survey Pilot|survey round]]. The label column of the calculated fields also denotes what actions are taking place on each line.  


'''Stage 1''' - Random draws
'''Stage 1''' - Random draws
* We first define 25 random numbers (0 to 1). We [[randomization in SurveyCTO |already know]] in practice that random numbers should be preloaded into our SurveyCTO forms, unlike here.  
* We first define 25 random numbers (0 to 1). We [[randomization in SurveyCTO |already know]] in practice that random numbers should be preloaded into our [https://dimewiki.worldbank.org/SurveyCTO_Form_Settings SurveyCTO forms], unlike here.  
* Each of these 25 random numbers is scaled to select an ID between 1 and N, using the calculate field: <code>round((${randX} * ${num}+.5), 0 )</code>
* Each of these 25 random numbers is scaled to select an ID between 1 and N, using the calculated field: <code>round((${randX} * ${num}+.5), 0 )</code>
* We then concatenate all of these IDs so that the field ${randmems} has the structure of a select_multiple field and is a list of 25 numbers, e.g. '''"43 1 83 1 30 9 30 etc."'''
* We then concatenate all of these IDs so that the field ${randmems} has the structure of a ''select_multiple'' field and is a list of 25 numbers, e.g. '''"43, 1, 83, 1, 30, 9, 30, etc."'''
* This list is then deduplicated, so that we are left with a list of individual IDs of up to 25 IDs to verify from, e.g. '''"43 1 83 30 9 etc."''' This is a list of randomly selected participants, and can be applied to many different contexts.
* This list is then de-duplicated, so that we are left with a list of individual IDs of up to 25 IDs to verify from, e.g. '''"43, 1, 83, 30, 9, etc."''' This is a list of '''randomly''' selected participants, and can be applied to many different contexts.


'''Stage 2''' - Verification of IDs
'''Stage 2''' - Verification of IDs
* The fields ${idX} pulls each of these IDs in order from the list. Don't forget that the first instance is indexed at '0', not '1', these are then [[SurveyCTO Dynamically Populated Choice Lists|set up as choice names]].
* The field ${idX} pulls each of these IDs in order from the list. Don't forget that the first instance is indexed at '0', not '1. These are then [[SurveyCTO Dynamically Populated Choice Lists|set up as choice names]].
* Then we ask the enumerator to verify if each ID from the first 8 IDs are valid association members. If they are, the survey finishes. If not, the survey asks about sets of 3 further IDs until there are at least 8 valid IDs.  
* Then we ask the [[Enumerator Training|enumerator]] to verify if each ID from the first 8 IDs are valid association members. If they are, the '''survey''' finishes. If not, the '''survey''' asks about sets of 3 further IDs until there are at least 8 valid IDs.  
* The field ${idsX_missing} in each question takes the number of IDs that still need to be valid, when this reaches '0' or below, the form presents the final list.
* The field ${idsX_missing} in each question takes the number of IDs that still need to be validated. When this reaches '0' or below, the form presents the final list.
* Note that if no IDs are valid, the ${idsX} fields take the value of '0', which must be removed before concatenating the final list - this is performed in the calculate fields ${idsX_final}.
* Note that if no IDs are valid, the ${idsX} field take the value of '0', which must be removed before concatenating the final list - this is performed in the calculated field ${idsX_final}.
* To prepare the final list, we need to take the index numbers from the verification stage (the choice '''values''') and pull the associated member ID from the deduplicated randomly selected ID list from the first stage. The calculate field: <code>selected-at(${randmems_dedup}, selected-at(${concat_final}, X)-1)</code> does this for each of the 8 final IDs.
* To prepare the final list, we need to take the index numbers from the verification stage (the '''choice values''') and pull the associated member ID from the de-duplicated randomly selected ID list from the first stage. The calculated field: <code>selected-at(${randmems_dedup}, selected-at(${concat_final}, X)-1)</code> does this for each of the 8 final IDs.


== Back to Parent ==
== Back to Parent ==

Latest revision as of 23:40, 20 July 2023

Best Practice

This page gives a detailed example on one of the ways to perform randomization in SurveyCTO. If you would like an overview of the process, read that page first.

This example comes from an agriculture survey in Brazil, where the survey firm did a poor job in listing the members of each association. We are looking to survey 8 people in each association, but we cannot be sure that each person on the list is actually a valid member. The pools are quite large (>100) and most are assumed to be valid IDs, but we still need to be careful that there are enough IDs chosen for the list.

We approach the problem in 2 stages:

  1. Randomly select enough IDs from the total to be almost certain that 8 will be valid (here we choose 25)
  2. Have the enumerator validate that these IDs are in fact proper members of the association. If not, more IDs are pulled from the randomized list to validate until 8 are selected for participation.

Ideally, you should not be need to select participants in this manner - you would like a sample frame from which to select a sample before your survey teams go to the field to administer the household survey. But in this case, it was not possible because of a problem with the listing process.

The approach works nicely for selecting a smaller number of people, but would be rather clunky if you are looking to select, say 200 from 1,000 members. The main advantage over other methods is that you are easily able to preload the randomized numbers created in Stata or R to ensure replicability.

This approach can be altered or scaled for application in various situations. It's important to consider the number of permutations that will mostly guarantee that there are enough IDs selected once they have been de-duplicated and dropped for not being valid. If your situation is choosing from a smaller pool, or having IDs that are more likely to be dropped for not being valid, you should increase the number of draws and the questions to validate members. In this example we assume that we will be able to select 8 people from the 25 drawn IDs.

Code Example

Here is the code example for a form that selects 8 members (IDs) from an association to participate in a survey round. The label column of the calculated fields also denotes what actions are taking place on each line.

Stage 1 - Random draws

  • We first define 25 random numbers (0 to 1). We already know in practice that random numbers should be preloaded into our SurveyCTO forms, unlike here.
  • Each of these 25 random numbers is scaled to select an ID between 1 and N, using the calculated field: round((${randX} * ${num}+.5), 0 )
  • We then concatenate all of these IDs so that the field ${randmems} has the structure of a select_multiple field and is a list of 25 numbers, e.g. "43, 1, 83, 1, 30, 9, 30, etc."
  • This list is then de-duplicated, so that we are left with a list of individual IDs of up to 25 IDs to verify from, e.g. "43, 1, 83, 30, 9, etc." This is a list of randomly selected participants, and can be applied to many different contexts.

Stage 2 - Verification of IDs

  • The field ${idX} pulls each of these IDs in order from the list. Don't forget that the first instance is indexed at '0', not '1. These are then set up as choice names.
  • Then we ask the enumerator to verify if each ID from the first 8 IDs are valid association members. If they are, the survey finishes. If not, the survey asks about sets of 3 further IDs until there are at least 8 valid IDs.
  • The field ${idsX_missing} in each question takes the number of IDs that still need to be validated. When this reaches '0' or below, the form presents the final list.
  • Note that if no IDs are valid, the ${idsX} field take the value of '0', which must be removed before concatenating the final list - this is performed in the calculated field ${idsX_final}.
  • To prepare the final list, we need to take the index numbers from the verification stage (the choice values) and pull the associated member ID from the de-duplicated randomly selected ID list from the first stage. The calculated field: selected-at(${randmems_dedup}, selected-at(${concat_final}, X)-1) does this for each of the 8 final IDs.

Back to Parent

This article is part of the topic SurveyCTO Coding Practices