Difference between revisions of "Variable Names"

Jump to: navigation, search
Line 7: Line 7:


== Question Numbers VS Descriptions ==
== Question Numbers VS Descriptions ==
Defenders of the number method often mention the usefulness of question numbers for navigating questionnaires and referring to questions during the development of the questionnaire and training of enumerators; one can simply call out “Go to question 15” to a room full of enumerators in training and everybody knows more or less where to find the question in the printed version of the questionnaire. I also heard more than once from a client that question numbers would make it easier to find variables in a dataset. I do not really buy the second argument — in my opinion a well-structured dataset with descriptive names is easier to navigate than a numbered one — but if there is one point I would concede to the proponents of the number method it is the one about usefulness during training. Other than that I believe that there enough points in favour of a descriptive naming scheme that make it superior to numbering:
Most [[Questionnaire Design|questionnaires]] use two broad methods for naming variables:
Descriptive names help to structure the questionnaire and dataset. The careful use of pre- and suffixes helps to indicate which variables belong together, either thematically (e.g. health_selfcare, health_medicine) or by type (e.g. awareness_likert understanding_likert), which makes it easier to understand the questionnaire when looking at it in XLSFormat and the dataset when looking at it in tabular format or as a list of variables.
* '''Question numbers''' like '''1''', '''2''', '''3''' or '''A1''', '''A2a''', '''A2b'''. This method makes it easy to refer to questions during [[Enumerator Training|enumerator training]] and discussions on the survey questions
Questionnaires can change a lot during the development phase or over time if they are used over multiple survey rounds. This can become messy quickly with numbered questionnaires; adding or dropping a variable means that all subsequent variables also need to renamed if one wants to keep a perfect sequence, which is especially annoying if you have a XLSForm with lots of dependencies in calculations and relevances, and it creates havoc in the scripts of data analysts who have already worked with the old names. To avoid this scenario, researchers sometimes revert to “exending” question numbers and one ends up with variable names like Q13 Q14a Q14b Q15 Q16 Q17a Q17b Q17c which go against the maybe only advantage a numbered questionnaire has over descriptive variable names: intuitive navigation for printed questionnaires.
* '''Descriptions''' like '''gender''', '''age''', '''employment''', and so on. This method allows the '''research team''' to indicate which variables belong together using prefixes, like '''health_selfcare''' and '''health_medicine''', and suffixes, like '''awareness_likert''' and '''understanding_likert'''. This method also makes it easier to understand the questionnaire when it is in the [[SurveyCTO_Programming#Excel_Method|Excel format]]. It also allows users to understand datasets in the form of tables.
The same descriptive variables names can be reused across different surveys, making it easier to reuse parts of mobile forms, recycle quality control backends and run similar types of analyses across multiple surveys.
 
- Descriptive variable names make form development, scripted data analysis and backend development more pleasant and less error-prone. Arguably, one does not have to be a statistics wizard to understand what the command tab gender if age < 25 means, whereas tab a26 if a28 < 25 requires one to look for the variables in a codebook of some sorts.
Generally speaking, a questionnaire undergoes various rounds of discussions during the [[Questionnaire Design|development stage]], or when it is used for multiple rounds of [[Primary Data Collection|data collection]]. This can become messy in the case of '''numbered variable names''' because adding or dropping a variable means that all subsequent variables also need to renamed if one wants to keep a perfect sequence. This problem gets harder to deal with in the case of an '''Excel form''' with various calculations and relevance conditions. This also means that users have to rewrite '''do-files''' which used old names, which in turn affects [[Reproducible Research|reproducibility]].  
 
This is however not the case with the '''description method'''. The same variable names can be reused across different surveys, which makes it easier to also reuse [[Monitoring Data Quality|quality control methods]] and use the same '''do-files''' for [[Data Analysis|analysis]] of multiple survey rounds. Moreoever, it is also much easier to write do-files for '''descriptive variable names'''. For example, it is easier for another member of the research team to understand what the command <syntaxhighlight lang="Stata" inline>tabulate gender if age < 25</syntaxhighlight> means. However, if the command is <syntaxhighlight lang="Stata" inline>tabulate a26 if b12 < 25</syntaxhighlight>, then it will require them to refer to the variable dictionary in the original questionnaire.


== Naming Conventions ==
== Naming Conventions ==

Revision as of 16:15, 18 February 2021

Variable names are one of the most important aspects of questionnaire design. Properly named variables improve the quality of data collection by giving members of the research team useful insights into the information captured by each variable. Using good naming conventions for variables allows others who are using the data to not only understand the purpose of the variable, but also its type (such as integer, date, string).

Read First

  • Questionnaire design is an important aspect of primary data collection.
  • Well designed questionnaires improve the quality of data collection, as well as the subsequent data analysis.
  • The research team should therefore spend considerable amount of time thinking about the different ways in which the dataset can be used for analysis.
  • Proper variable names make it easier for others to use and analyze the dataset at later stages.

Question Numbers VS Descriptions

Most questionnaires use two broad methods for naming variables:

  • Question numbers like 1, 2, 3 or A1, A2a, A2b. This method makes it easy to refer to questions during enumerator training and discussions on the survey questions
  • Descriptions like gender, age, employment, and so on. This method allows the research team to indicate which variables belong together using prefixes, like health_selfcare and health_medicine, and suffixes, like awareness_likert and understanding_likert. This method also makes it easier to understand the questionnaire when it is in the Excel format. It also allows users to understand datasets in the form of tables.

Generally speaking, a questionnaire undergoes various rounds of discussions during the development stage, or when it is used for multiple rounds of data collection. This can become messy in the case of numbered variable names because adding or dropping a variable means that all subsequent variables also need to renamed if one wants to keep a perfect sequence. This problem gets harder to deal with in the case of an Excel form with various calculations and relevance conditions. This also means that users have to rewrite do-files which used old names, which in turn affects reproducibility.

This is however not the case with the description method. The same variable names can be reused across different surveys, which makes it easier to also reuse quality control methods and use the same do-files for analysis of multiple survey rounds. Moreoever, it is also much easier to write do-files for descriptive variable names. For example, it is easier for another member of the research team to understand what the command tabulate gender if age < 25 means. However, if the command is tabulate a26 if b12 < 25, then it will require them to refer to the variable dictionary in the original questionnaire.

Naming Conventions

General Tips

Related Pages

Click here for pages that link to this topics.

Additional Resources