SurveyCTO Additional Topics

(Redirected from SurveyCTO Coding Practices)
Jump to: navigation, search

This article explains and provides examples of miscellaneous topics in SurveyCTO programming that may be of some interest to users.

Read First

  • Gives concrete examples of randomization in SurveyCTO to complement the overview given at the previous link
  • Also suggests methods to minimize coding mistakes due to copying; how to display multiple questions during a [Survey Pilot|survey]]; and editing [SurveyCTO Programming|SurveyCTO] HTML input.
  • Various other functions

Random Draw of Beneficiaries Example 1

This section gives a detailed example on one of the ways to perform randomization in SurveyCTO. If you would like an overview of the process, read that page first.

This example comes from an agriculture survey in Brazil, where the survey firm did a poor job in listing the members of each association. We are looking to survey 8 people in each association, but we cannot be sure that each person on the list is actually a valid member. The pools are quite large (>100) and most are assumed to be valid IDs, but we still need to be careful that there are enough IDs chosen for the list.

We approach the problem in 2 stages:

  1. Randomly select enough IDs from the total to be almost certain that 8 will be valid (here we choose 25)
  2. Have the enumerator validate that these IDs are in fact proper members of the association. If not, more IDs are pulled from the randomized list to validate until 8 are selected for participation.

Ideally, you should not be need to select participants in this manner - you would like a sample frame from which to select a sample before your survey teams go to the field to administer the household survey. But in this case, it was not possible because of a problem with the listing process.

The approach works nicely for selecting a smaller number of people, but would be rather clunky if you are looking to select, say 200 from 1,000 members. The main advantage over other methods is that you are easily able to preload the randomized numbers created in Stata or R to ensure replicability.

This approach can be altered or scaled for application in various situations. It's important to consider the number of permutations that will mostly guarantee that there are enough IDs selected once they have been de-duplicated and dropped for not being valid. If your situation is choosing from a smaller pool, or having IDs that are more likely to be dropped for not being valid, you should increase the number of draws and the questions to validate members. In this example we assume that we will be able to select 8 people from the 25 drawn IDs.

Code Example

Here is the code example for a form that selects 8 members (IDs) from an association to participate in a survey round. The label column of the calculated fields also denotes what actions are taking place on each line.

Stage 1 - Random draws

  • We first define 25 random numbers (0 to 1). We already know in practice that random numbers should be preloaded into our SurveyCTO forms, unlike here.
  • Each of these 25 random numbers is scaled to select an ID between 1 and N, using the calculated field: round((${randX} * ${num}+.5), 0 )
  • We then concatenate all of these IDs so that the field ${randmems} has the structure of a select_multiple field and is a list of 25 numbers, e.g. "43, 1, 83, 1, 30, 9, 30, etc."
  • This list is then de-duplicated, so that we are left with a list of individual IDs of up to 25 IDs to verify from, e.g. "43, 1, 83, 30, 9, etc." This is a list of randomly selected participants, and can be applied to many different contexts.

Stage 2 - Verification of IDs

  • The field ${idX} pulls each of these IDs in order from the list. Don't forget that the first instance is indexed at '0', not '1. These are then set up as choice names.
  • Then we ask the enumerator to verify if each ID from the first 8 IDs are valid association members. If they are, the survey finishes. If not, the survey asks about sets of 3 further IDs until there are at least 8 valid IDs.
  • The field ${idsX_missing} in each question takes the number of IDs that still need to be validated. When this reaches '0' or below, the form presents the final list.
  • Note that if no IDs are valid, the ${idsX} field take the value of '0', which must be removed before concatenating the final list - this is performed in the calculated field ${idsX_final}.
  • To prepare the final list, we need to take the index numbers from the verification stage (the choice values) and pull the associated member ID from the de-duplicated randomly selected ID list from the first stage. The calculated field: selected-at(${randmems_dedup}, selected-at(${concat_final}, X)-1) does this for each of the 8 final IDs.

Random Draw of Beneficiaries Example 2

In this example, we describe an elegant manner of randomly selecting a group of IDs (beneficiaries in the example) using only a small repeat group where a list of IDs 1 to N is randomly ordered, which can be utilized in many situations. It is also able to be scaled much easier than the example in the prior section. If you would like an overview of Randomization in SurveyCTO, click on the link at the beginning of this section.

It is NOT recommended for use in data collection forms as it is not replicable in this format, though it is a nice illustration of how think about programming solutions to this kind of problem in general.

Code Example

Here is the code example for this form.

  • First we select how many IDs (in this case association members) we want to randomly order.
  • The repeat group is 'powered' through the dynamic repeat-count, (insert code here) which continues producing randomly selected IDs until the condition count-selected(${unique_ids})>=${num} is fulfilled - i.e. there are the same number of unique draws (deduplicated previous draws) as there are numbers to draw from, implying that all potential IDs to draw from have been exhausted. Deduplication of the draws ensures that all potential IDs are covered.
  • The second repeat group shows the draw unfold, and is not necessary if this was to be used in a form.
  • The field ${unique_ids} contains the randomly ordered IDs, which can be accessed using a selected-at() command as shown in the other randomization example.

It could also be amended to only select a certain number of draws from the entire pool by changing the expression in the repeat-count of the repeat group to be equal to a specific or predefined (in another integer field) value.

Relevance Condition to Multiple Fields

There are ways to apply the same relevance code to multiple fields without copying it. Copying the same code to a couple of fields can be acceptable depending on personal preference, but copying to 5 or more fields should always be avoided as it increases the likelihood of making programming mistakes.

It is recommended to use a begin group field to apply the same relevance code to all the fields inside the group. Then, the relevance code is indicated only for the begin group field and there is no need to repeat it for all the fields for which the relevance code is necessary. A questionnaire coded like this will be much easier to update during the development phase or during the field work where there is little time as many other tasks are keeping the project team pre-occupied.

In general, using begin groups field in a programming form avoids the repetition of codes and makes the whole survey easy to read and to understand.

Code example

Here is a code example that shows how a group is used so that the relevance condition does not have to be applied to each field inside the group.

Note that there is no problem with having a field be required which is not always shown: required means that if the field is displayed then data needs to be recorded for this field before the enumerator is allowed to swipe past this question.

Displaying Multiple Questions at the Same Time

Sometimes a set of questions should be displayed at the same time on the tablet during the interview, without swiping. This is useful if the answer to a question has two parts, for example, a question of distance where both an amount and a unit of that amount is asked. Other examples, in addition to distance, are time duration (in minutes, hours, days...), land area (in acres, hectares...), salary (per week, per month, per year...). The respondent will have to choose between all these units of measurement after announcing the amount. Best practice is to enter the integer (amount, price, quantity...) and select the unit on the same screen for an easier reading. In these cases, data is collected with less error if both amount and unit can be recorded on the same screen. See link to an example in the next section.

Another reason to display multiple questions on one screen could be that a set of yes and no questions are asked with a better flow if the enumerator does not have to swipe and re-read the full question for each question. An example could be if we are asking about a households assets and we ask, "Does your household own any of the following assets?". See link to an example in the next section.

Coding Example

Here is a code example that shows example of both cases mentioned above.

Note that both answer options, meter and kilometer are provided as distance units despite the conversion between the two being simple. We always want to record the answer the way the respondent answered the question and we should never ask the enumerator to do arithmetic in the field. Later, we can convert all the units in the dataset to standardize quantity, distance, and time measurements across observations.

An "other specify" option can be selected in the list of units of measurement if the respondent's answer is not available on the list.

HTML Input

This section discusses best practices when editing SurveyCTO HTML Input and provides useful commands. For example, in some surveys it can be useful to highlight different parts of text in bold or a different color, either for a set of repeating questions where the text changes slightly or for a block of text that is particularly important.

Best Practices

  • It is probably better to display repeating questions such as: "How much cabbage did you cultivate on plot 3 during the rainy season?' as "How much cabbage did you cultivate on plot 3 during the rainy season?"
  • Enumerator instructions can be highlighted in different colors, e.g. " * IMPORTANT * Don't forget to..."
  • Note that these commands can only be used for labels and hints, not for choice options.


The following commands should enclose the text you want to highlight:

  • Bold: <b> text here </b>
  • Italic: <i> text here </i>
  • Colors: <font color="red"> text here </font>

More examples can be found here

Integrating Calculations

SurveyCTO has developed a best practices guide for using calculations, which help with the design of smarter surveys. For example, you can use calculations to find out how long it takes respondents to reach a certain point in the survey; to monitor your respondents’ observance of suggested response times for skill assessments or to ensure that PII doesn’t get captured in your data analysis. This guide provides tips and examples for using calculations in SurveyCTO.

Conducting Audio Audits

SurveyCTO supports random audio audits as a part of the survey meta-data. Audio audits are audio recordings that occur during an interview without an indication that the recording has been initiated. They are one of several tools that research teams can use to ensure that they are collecting the highest possible quality of data. They also provide a cost-effective way for research teams to better understand how enumerators are conducting surveys in the field. You can learn more about best practices, logistical and ethical considerations of audio audits in this SurveyCTO article.

Collecting Sensor Data

SurveyCTO can collect sensor meta-data using built-in Android device sensors. Android devices can come with a number of sensors beyond GPS including an accelerometer, gyroscope, light sensor, microphone, among others. The sensor data field types on SurveyCTO use these sensors to capture data during the survey that can provide users with an idea of:

  • Light conditions around the device
  • How much the device moved
  • How loud the sounds were around the device.
  • The tone of the sounds around the device.
  • An estimate of whether a conversation was taking place around the device.

SurveyCTO has built Stata commands to help users easily analyze large volumes of sensor data streams. Sensor streams can be time-consuming to work with because for every submission, a sensor stream records a stream of observations (potentially thousands) and stores it as an additional .csv file attached to the submission. You can learn about these commands in this article and you can access the scto package here. Visit this SurveyCTO help article to learn more about sensor data.

Related Pages


Additional Resources