Difference between revisions of "SurveyCTO Form Settings"

Jump to: navigation, search
(Undo revision 5383 by Ppaskov (talk))
(One intermediate revision by the same user not shown)
Line 1: Line 1:
Thousands of users in more than 150 countries depend on SurveyCTO to conduct [[Computer-Assisted Personal Interviews (CAPI) | computer-assisted personal interviews (CAPI)]].  This article discusses solutions to common approaches to sophisticated [[Questionnaire Design | design]] and [[Questionnaire Programming | programming]] in the SurveyCTO. For a general introduction to how to structure your approach to CAPI programming, see [[Questionnaire Programming]].
In the settings tab of a [[Computer-Assisted Personal Interviews (CAPI) | SurveyCTO]] [[Questionnaire Design | questionnaire]], there are a handful of key settings: Questionnaire Name, ID, Version,  [[Encryption]], and Language Settings. These settings should be specified when creating and updating surveys. This page outlines the best practices for each setting.  
==Read First==
== Read First ==
*Add labels to the SurveyCTO form to speed up data import and [[Data Cleaning |cleaning]].
*The questionnaire name, questionnaire ID and questionnaire version ensure that the different questionnaires and the datasets linked to them remain separate.
*Repeat groups/rosters for households, crops, activites, or otherwise, can be filtered, based on previous choices, and more.
*Each questionnaire ID is associated both with a questionnaire and a dataset. If you change the questionnaire ID, which you should do when you make changes to a questionnaire, you change the file to which the submitted data is saved.
*Use question groups to apply a relevant condition to multiple fields, display multiple questions on the same screen, or frame all questions on a module in a group.
*If you upload a completely different survey with a different questionnaire name and file name but with the same ID as a questionnaire already on the server, then the questionnaire already on the server will be replaced by the new one. See [[SurveyCTO Form Settings#Version control of questionnaires already used in the field|version control of questionnaires already used in the field]].
*Employ dynamically populated choice lists to list choices contingent upon previous responses.  
*Encryption guarantees the privacy of the data. SurveyCTO servers are installed on third party cloud servers; encryption ensures that the data does not end up in wrong hands in the case of a security breach.
*Gather additional data via audio audits and sensor data to improve data quality, monitoring, and precision.
== Name and ID==
==Best Practices==
It is common to set both the name and the ID to the same value. While the name value allows spaces, the ID does not; instead, use underscores (i.e. KBMAP_BL_HH_v1). The ID is a particularly important value: it dictates to which dataset a questionnaire is saved and what the dataset is called upon its export. There is some functionality in the questionnaire name. For example, if you start with name with the word test, then this form will not be visible to enumerators. You get the same functionality by the more recently added "deploy" feature.
=== Labelling ===
Follow a very exact structure when you naming your questionnaire and giving it an ID. If not, the name and ID of the questionnaire may create confusion down the road for the research team. Below follows one recommended way to generate the questionnaire name and ID:
To speed up data import, all SurveyCTO surveys should have a language labelling column in both the questionnaire and the value labeling tab called "label:stata." This will be used to download and process the data. These labels should be in English, no longer than 32 characters, and use no special characters. The research assistant responsible for data management helps to prepare this. For detailed instructions on how to use multiple languages on SurveyCTO, go to your SurveyCTO server, open up the documentation pages and search for "Translating a form into multiple languages".
{| class="wikitable"  style="margin-left: auto; margin-right: auto; border: none;"
=== Randomizing ===
|-
In the field, the [[Randomization in SurveyCTO | best practice when randomizing anything]] is to prepare randomization before the field activities begin – ideally in [[Randomization in Stata | Stata]] and preload the assignments into the survey. Ways to randomly select survey participants in SurveyCTO include:
! Meaning
*[[SurveyCTO Random Draw of Beneficiaries 1|Randomly drawing beneficiaries from a large pool]] without knowing if the potential beneficiaries are valid participants: this form randomly prioritizes participation over a group of [[ID Variable Properties | IDs]], which are then verified by the enumerator until a final group of 8 participants are registered.
! Part of name
*[[SurveyCTO Random Draw of Beneficiaries 2|Randomly drawing of any number of beneficiaries using repeat group]]: this form randomly prioritize a group of IDs using an elegant and concise repeat group solution. However, this is not recommended for use in the field as it's not [[Reproducible Research | replicable]] without adaptation.
! Description
=== Managing Repeat Groups / Rosters ===
! Other examples
This sections lists code examples that fulfill special requirements related to rosters and repeat groups. These can be used to develop interesting functionalities within forms, particularly with responses from a household, plot or crop roster. Here are some examples:
|-
* [[SurveyCTO Repeat Group Using Previous Choices|Setting Up Repeat Group Using Previous Choices]]: this form shows two main ways of coding to repeat a set of questions over previously selected responses (i.e. a set of crops cultivated or activities performed).  
|colspan="4" style="text-align:center;"|Example survey name : KBMAP_BL_HH_v1
* [[SurveyCTO Select from Roster Age Order|Select Member in Roster Based on Criteria]]: in this example we have a roster over children and then we want the respondent to be asked to select the youngest child if the mother is present, if she is not present, we ask the respondent to select the second youngest child if the mother of the child is present, and so fourth.
|-
*  [[SurveyCTO Conditional Filtering|Filtering on Conditions of Repeat Group Questions]]:  this form utilizes responses found inside a repeat group roster as conditions upon which to filter choices for questions further down in a form.
|Project
* [[SurveyCTO Filtering in Repeated Questions|Filtering in Repeated Choice Questions]] - this form shows how to code a repeating question where the list of choices is reduced if an option was previously selected.
| KBMAP
* [[SurveyCTO Dealing with 'Other' Crops Over Different Repeat Group Levels |Dealing with 'Other' Crops Over Different Repeat Group Levels]]: this form presents a solution for introducing new crops in different repeats and being able to recall them at other points in the survey. In general, many challenges arise when coding agriculture sections of household surveys. There is a lot of data to capture at different and changing levels: per season, per plot, per crop, etc. Sometimes you might want to change the level of questions from crop within plot within season to, for example, just the crop level. It's important that the respondents are able to recall harvest and sales information as accurately as possible, therefore we must structure surveys well to account for this. This form presents useful information for doing so.
| The name of the project, use an abbreviation or other short form.
=== Using Question Groups ===
A server is usually linked to only one project, so this part does not matter as much on the server. But all data associated with this questionnaire will be tagged with this name, and there the project name is indeed useful.
Use a lot of question groups but do not overuse them. In general, use question groups to:
| Any project name. Do not use survey round (baseline, follow up, back check etc.) or unit of observation (household, village, school etc.)
*[[Relevance Condition to Multiple Fields|Apply a relevance condition to multiple fields]].
|-
*[[Multiple Questions Displayed at the Same Time |Display multiple questions on the same screen]].
| Survey round
* Frame all the questions on a module in group: only do this at the highest level of the survey (i.e. do not use this for sub-levels of a module).
| BL
=== Applying Choice Lists ===
| Indicates which survey this is within the project.
Choice lists are the answer options from which an enumerator chooses in a ''select_one'' or ''select_multiple'' question. They are listed in the choices tab in the SurveyCTO questionnaire. Open Data Kit, the programming language of SurveyCTO, has very few restrictions on how you can code your options. However, there are [[SurveyCTO Choice Lists|choice list best practices]] that matter for data quality:
Since a difference in name here indicates a different dataset on the server, then we want to differentiate between baseline and baseline pilot, as we do not want to mix pilot data with real data.
* [[SurveyCTO Dynamically Populated Choice Lists|Dynamically Populated Choice Lists - basic]]: it is possible to program dynamically populated choice lists using answers given by the respondents in a previous question.
| Baseline (BL), Baseline pilot (BLpilot), Endline (EL), monitoring data, population listing etc.
* [[SurveyCTO Dynamically Populated Choice Lists From Select One|Dynamically Populated Choice Lists - from repeated select_one]]: a specific example of dynamically populated choice list is when you populate a ''select_multiple'' question with answers from a ''select_one'' asked in a repeat group. For example, say that you list crops grown in a repeat group where each repeat is a crop, and later you want to be able to ask "''which crop did you grow the most?''" and only the crops already selected in the repeat group should display.  
|-
=== Integrating Calculations===
| Unit of observation
SurveyCTO has developed a great best practices guide for using calculations, which help with the design of smarter surveys. For example, you can use calculations to find out how long it takes respondents to reach a certain point in your survey, to monitor your respondents’ observance of suggested response times for skill assessments or to ensure that [[Personally Identifiable Information (PII) | PII]] doesn’t get captured in your [[Data Analysis | data analysis]]. [https://www.surveycto.com/blog/using-calculations/ This guide] provides tips and examples for using calculations in SurveyCTO.
| HH
=== Conducting Audio Audits===
| What type of respondents will be interviewed? Household, school, etc. It is both needed to separate multiple survey within a survey round (doctor/patient, teacher/school), but it is also helpful documentation to anyone using the dataset in the future.
SurveyCTO supports random audio audits as a part of the survey meta-data. Audio audits are audio recordings that occur during an interview without an indication that the recording has been initiated. They are one of several tools that research teams can use to ensure that they are collecting the highest possible [[Data Quality Assurance Plan | quality of data]]. They also provide a cost-effective way for research teams to better understand how enumerators are conducting surveys in the field. You can learn more about best practices, logistical and [[Research Ethics | ethical]] considerations of audio audits in [https://www.surveycto.com/blog/audio-audits-best-practices/ this SurveyCTO article]. 
| Household (HH), village, civil servant, etc.
=== Collecting Sensor Data===
|-
SurveyCTO can collect sensor meta-data using built-in Android device sensors. Android devices can come with a number of sensors beyond GPS including an accelerometer, gyroscope, light sensor, microphone, among others. The sensor data field types on SurveyCTO use these sensors to capture data during the survey that can provide users with an idea of:
| Version
*The light conditions around the device.
*How much the device moved.
*How loud the sounds were around the device.
*The pitch of the sounds around the device.
*An estimate of whether a conversation was taking place around the device.
SurveyCTO has built Stata commands to help users easily analyze large volumes of sensor data streams. Sensor streams can be time-consuming to work with because for every submission, a sensor stream records a stream of observations (potentially thousands) and stores it as an additional .csv file attached to the submission. You can learn about these commands in [https://www.surveycto.com/blog/stata-sensor-data/ this article] and you can access the scto package [https://github.com/surveycto/scto here]. Visit [https://support.surveycto.com/hc/en-us/articles/360009552713-Using-sensor-meta-data?flash_digest=bc582e369a0560a8424218ec245209eefb41bfe4 this SurveyCTO help article] to learn more about sensor data.
===Updating Roster from Previous Rounds on Tablet During Interview===
===Confirming IDs and Identification===
===Assigning IDs in the Field===


| v1
| This is different from the setting version. When you are developing a questionnaire you use the version setting to indicate to the server that you have made updates. After you are happy with your updates (that usually takes testing several uploads) you update the version number in the name, and download this new version to the tablets. The reason why this is important is explained below (KB: link to that section).
| Number them in natural order, v1, v2 etc.
|}
Your SurveyCTO subscription level determines the number of questionnaires (counting each unique ID as one questionnaire) that you can store on your server. If you run out of space,  you can either upgrade your server or download questionnaires no longer in use and all the data associated with them. After ensuring that you have all data, you can delete the questionnaires you are no longer using to make space for a new questionnaire.
=== Naming and Versions===
When you update the questionnaire ID, all new data will be saved in a new dataset on the server with a reference to the new ID. If you have started to collect data with one questionnaire, but for some reason have to make an update to your questionnaire, then you should save the form with a new ID. The new questionnaire is likely to have different variables and different variable types for existing variables. Hence the data this questionnaire generates will be different. If you do not change the questionnaire ID, the server will save the new and slightly different dataset to the old dataset.
SurveyCTO's servers allow users to save data collected using an updated version of the questionnaire to the same dataset as for the previous version. While this feature does a fine job consolidating data, researchers can do an even better job with Stata. If you have already started to collect data and then make an update to the questionnaire, [[Naming Conventions | change the name]] from KBMAP_BL_HH_v1 to KBMAP_BL_HH_v2. You will end up with two datasets and two different import do files. After you have imported the datasets separately to Stata, you can edit the files as necessary – [[Reproducible Research | reproducibly]] via do files – in order to append them without any complications.
Before appending the different versions, create a variable called "questionnaireVersion" or similar. In this variable, store the version number of the questionnaire. You can then include fixed effects based on this variable in the [[Data Analysis | regressions]] to adjust for any bias introduced from collecting data with slightly different questionnaires.
===Naming Excel Files===
When uploading questionnaires written in Excel to SurveyCTO Server, the Excel file name is irrelevant to how your questionnaire is tracked on SurveyCTO Server and on tablets. While it is good practice to follow [[Naming Conventions | naming conventions]] for the Excel files saved on your computer, there is no technical requirement to do so.
===Naming Pilot SurveyCTO Forms===
When you begin the [[Survey Pilot | pilot]] questionnaire, indicate in both the questionnaire name and ID that it is a development version of the form; the best way to do so is by including the word ''pilot''. Keep ''pilot'' in the name until you have completed collecting pilot data. After removing ''pilot'' for the final version, there is little risk that someone will use the pilot form to collect the final data. Follow these protocols even if you do not make any changes between the pilot and the real data collection. After all, if you do not change the name, then the server will save the real data to the same dataset as the pilot data.
== Version ==
If two questionnaires have the same ID, then they must have a different version settings. This helps the server ensure that if you are replacing an outdated questionnaire on the server with an updated one. There is one simple equation you can use that converts the date and time to a unique version number; there is no good reason to not always use this. See this [https://docs.google.com/spreadsheets/d/1UBJBE4YWk1rXKCQRTWSpRtHMzAdoA9J_-cyh0XyJwaA/edit?usp=sharing settings example] for more information.
== Encryption ==
All data collected should be [[Encryption|encrypted]]. This [[Research Ethics | protects the privacy]] of respondents in case, for example, tablets are stolen or there is a breach to network security. Encryption is easy and in line with the privacy promises made during [[Informed Consent | informed consent]].
While data transferred from the tablet to SurveyCTO servers via internet are automatically encrypted using SSL, it is still necessary to encrypt this data. This protects the data from anyone who might get into the servers or get hold of the survey tablets. You can set data to be encrypted in [[SurveyCTO_Programming_Work_Flow#Template_Forms|template forms]]. Do so before data collection begins.
For more detailed instructions on how to encrypt your data, go to your SurveyCTO server, open up the documentation pages and search for "How do I encrypt my data?"
== Language ==
SurveyCTO’s language options are useful for questionnaires written in multiple languages. In the Excel file, all columns used to create text of media (i.e. label, hint, constraint message, relevance message, media:image, media:audio, and media:video) can be defined for different languages. To define, for example, which label column is English and which label column is French then the columns can be called ''label:English'' and ''label:French''. Then in the app on the tablet, the enumerator can go to <code>change language</code> and select English or French (or whatever language is called after "label:").
It is best practice to explicitly say indicate the language of each column. However, it is also possible to name the label column of one language ''label'' and the others, for example, ''label:French'' and ''label:Wolof''. The language used in the column ''label'' where the language is not defined is considered the default language. In this case, the language column in the settings tab needs to be used in order to define what the language is called. Otherwise the <code>change language</code> on the tablet will not display properly.
For detailed instructions on how to use multiple languages on SurveyCTO, go to your SurveyCTO server, open up the documentation pages and search for "Translating a form into multiple languages".
== Back to parent topic ==
This article is a part of the topic [[Questionnaire Programming]].
== Additional Resources ==
== Additional Resources ==
*IFPRI’s [https://www.surveycto.com/best-practices/pro-tips-for-agricultural-surveys-from-ifpri/ tips on coding complex agricultural surveys in SurveyCTO]
*DIME Analytics' [https://github.com/worldbank/DIME-Resources/blob/master/survey-cto.pdf SurveyCTO] slides
*See resources on question font formatting in HTML [[SurveyCTO HTML Input|here]]: SurveyCTO accepts HTML commands in the text of questions. This can be used to <font color="red"> highlight </font> and '''emphasize''' key information, among other uses.
[[Category: Questionnaire Programming]]
*DIME Analytics’ [https://github.com/worldbank/DIME-Resources/blob/master/survey-cto.pdf SurveyCTO slides]
 
[[Category: SurveyCTO Coding Practices ]]

Revision as of 22:04, 9 June 2019

In the settings tab of a SurveyCTO questionnaire, there are a handful of key settings: Questionnaire Name, ID, Version,  Encryption, and Language Settings. These settings should be specified when creating and updating surveys. This page outlines the best practices for each setting.  

Read First

  • The questionnaire name, questionnaire ID and questionnaire version ensure that the different questionnaires and the datasets linked to them remain separate.
  • Each questionnaire ID is associated both with a questionnaire and a dataset. If you change the questionnaire ID, which you should do when you make changes to a questionnaire, you change the file to which the submitted data is saved.
  • If you upload a completely different survey with a different questionnaire name and file name but with the same ID as a questionnaire already on the server, then the questionnaire already on the server will be replaced by the new one. See version control of questionnaires already used in the field.
  • Encryption guarantees the privacy of the data. SurveyCTO servers are installed on third party cloud servers; encryption ensures that the data does not end up in wrong hands in the case of a security breach.

Name and ID

It is common to set both the name and the ID to the same value. While the name value allows spaces, the ID does not; instead, use underscores (i.e. KBMAP_BL_HH_v1). The ID is a particularly important value: it dictates to which dataset a questionnaire is saved and what the dataset is called upon its export. There is some functionality in the questionnaire name. For example, if you start with name with the word test, then this form will not be visible to enumerators. You get the same functionality by the more recently added "deploy" feature. Follow a very exact structure when you naming your questionnaire and giving it an ID. If not, the name and ID of the questionnaire may create confusion down the road for the research team. Below follows one recommended way to generate the questionnaire name and ID:

Meaning Part of name Description Other examples
Example survey name : KBMAP_BL_HH_v1
Project KBMAP The name of the project, use an abbreviation or other short form.

A server is usually linked to only one project, so this part does not matter as much on the server. But all data associated with this questionnaire will be tagged with this name, and there the project name is indeed useful.

Any project name. Do not use survey round (baseline, follow up, back check etc.) or unit of observation (household, village, school etc.)
Survey round BL Indicates which survey this is within the project.

Since a difference in name here indicates a different dataset on the server, then we want to differentiate between baseline and baseline pilot, as we do not want to mix pilot data with real data.

Baseline (BL), Baseline pilot (BLpilot), Endline (EL), monitoring data, population listing etc.
Unit of observation HH What type of respondents will be interviewed? Household, school, etc. It is both needed to separate multiple survey within a survey round (doctor/patient, teacher/school), but it is also helpful documentation to anyone using the dataset in the future. Household (HH), village, civil servant, etc.
Version v1 This is different from the setting version. When you are developing a questionnaire you use the version setting to indicate to the server that you have made updates. After you are happy with your updates (that usually takes testing several uploads) you update the version number in the name, and download this new version to the tablets. The reason why this is important is explained below (KB: link to that section). Number them in natural order, v1, v2 etc.

Your SurveyCTO subscription level determines the number of questionnaires (counting each unique ID as one questionnaire) that you can store on your server. If you run out of space, you can either upgrade your server or download questionnaires no longer in use and all the data associated with them. After ensuring that you have all data, you can delete the questionnaires you are no longer using to make space for a new questionnaire.

Naming and Versions

When you update the questionnaire ID, all new data will be saved in a new dataset on the server with a reference to the new ID. If you have started to collect data with one questionnaire, but for some reason have to make an update to your questionnaire, then you should save the form with a new ID. The new questionnaire is likely to have different variables and different variable types for existing variables. Hence the data this questionnaire generates will be different. If you do not change the questionnaire ID, the server will save the new and slightly different dataset to the old dataset. SurveyCTO's servers allow users to save data collected using an updated version of the questionnaire to the same dataset as for the previous version. While this feature does a fine job consolidating data, researchers can do an even better job with Stata. If you have already started to collect data and then make an update to the questionnaire, change the name from KBMAP_BL_HH_v1 to KBMAP_BL_HH_v2. You will end up with two datasets and two different import do files. After you have imported the datasets separately to Stata, you can edit the files as necessary – reproducibly via do files – in order to append them without any complications. Before appending the different versions, create a variable called "questionnaireVersion" or similar. In this variable, store the version number of the questionnaire. You can then include fixed effects based on this variable in the regressions to adjust for any bias introduced from collecting data with slightly different questionnaires.

Naming Excel Files

When uploading questionnaires written in Excel to SurveyCTO Server, the Excel file name is irrelevant to how your questionnaire is tracked on SurveyCTO Server and on tablets. While it is good practice to follow naming conventions for the Excel files saved on your computer, there is no technical requirement to do so.

Naming Pilot SurveyCTO Forms

When you begin the pilot questionnaire, indicate in both the questionnaire name and ID that it is a development version of the form; the best way to do so is by including the word pilot. Keep pilot in the name until you have completed collecting pilot data. After removing pilot for the final version, there is little risk that someone will use the pilot form to collect the final data. Follow these protocols even if you do not make any changes between the pilot and the real data collection. After all, if you do not change the name, then the server will save the real data to the same dataset as the pilot data.

Version

If two questionnaires have the same ID, then they must have a different version settings. This helps the server ensure that if you are replacing an outdated questionnaire on the server with an updated one. There is one simple equation you can use that converts the date and time to a unique version number; there is no good reason to not always use this. See this settings example for more information.

Encryption

All data collected should be encrypted. This protects the privacy of respondents in case, for example, tablets are stolen or there is a breach to network security. Encryption is easy and in line with the privacy promises made during informed consent. While data transferred from the tablet to SurveyCTO servers via internet are automatically encrypted using SSL, it is still necessary to encrypt this data. This protects the data from anyone who might get into the servers or get hold of the survey tablets. You can set data to be encrypted in template forms. Do so before data collection begins. For more detailed instructions on how to encrypt your data, go to your SurveyCTO server, open up the documentation pages and search for "How do I encrypt my data?"

Language

SurveyCTO’s language options are useful for questionnaires written in multiple languages. In the Excel file, all columns used to create text of media (i.e. label, hint, constraint message, relevance message, media:image, media:audio, and media:video) can be defined for different languages. To define, for example, which label column is English and which label column is French then the columns can be called label:English and label:French. Then in the app on the tablet, the enumerator can go to change language and select English or French (or whatever language is called after "label:"). It is best practice to explicitly say indicate the language of each column. However, it is also possible to name the label column of one language label and the others, for example, label:French and label:Wolof. The language used in the column label where the language is not defined is considered the default language. In this case, the language column in the settings tab needs to be used in order to define what the language is called. Otherwise the change language on the tablet will not display properly. For detailed instructions on how to use multiple languages on SurveyCTO, go to your SurveyCTO server, open up the documentation pages and search for "Translating a form into multiple languages".

Back to parent topic

This article is a part of the topic Questionnaire Programming.

Additional Resources