Difference between revisions of "Variable Names"
Line 3: | Line 3: | ||
* [[Questionnaire Design|Questionnaire design]] is an important aspect of [[Primary Data Collection|primary data collection]]. | * [[Questionnaire Design|Questionnaire design]] is an important aspect of [[Primary Data Collection|primary data collection]]. | ||
* Well designed questionnaires improve the quality of data collection, as well as the subsequent [[Data Analysis|data analysis]]. | * Well designed questionnaires improve the quality of data collection, as well as the subsequent [[Data Analysis|data analysis]]. | ||
* The [Impact Evaluation Team|research team]] should therefore spend considerable amount of time thinking about the different ways in which the dataset can be used for [[Data Analysis|analysis]]. | * The [[Impact Evaluation Team|research team]] should therefore spend considerable amount of time thinking about the different ways in which the dataset can be used for [[Data Analysis|analysis]]. | ||
* Proper '''variable names''' make it easier for others to use and analyze the dataset at later stages. | * Proper '''variable names''' make it easier for others to use and analyze the dataset at later stages. | ||
Revision as of 15:03, 18 February 2021
Variable names are one of the most important aspects of questionnaire design. Properly named variables improve the quality of data collection by giving members of the research team useful insights into the information captured by each variable. Using good naming conventions for variables allows others who are using the data to not only understand the purpose of the variable, but also its type (such as integer, date, string).
Read First
- Questionnaire design is an important aspect of primary data collection.
- Well designed questionnaires improve the quality of data collection, as well as the subsequent data analysis.
- The research team should therefore spend considerable amount of time thinking about the different ways in which the dataset can be used for analysis.
- Proper variable names make it easier for others to use and analyze the dataset at later stages.
Question Numbers VS Descriptions
Defenders of the number method often mention the usefulness of question numbers for navigating questionnaires and referring to questions during the development of the questionnaire and training of enumerators; one can simply call out “Go to question 15” to a room full of enumerators in training and everybody knows more or less where to find the question in the printed version of the questionnaire. I also heard more than once from a client that question numbers would make it easier to find variables in a dataset. I do not really buy the second argument — in my opinion a well-structured dataset with descriptive names is easier to navigate than a numbered one — but if there is one point I would concede to the proponents of the number method it is the one about usefulness during training. Other than that I believe that there enough points in favour of a descriptive naming scheme that make it superior to numbering: Descriptive names help to structure the questionnaire and dataset. The careful use of pre- and suffixes helps to indicate which variables belong together, either thematically (e.g. health_selfcare, health_medicine) or by type (e.g. awareness_likert understanding_likert), which makes it easier to understand the questionnaire when looking at it in XLSFormat and the dataset when looking at it in tabular format or as a list of variables. Questionnaires can change a lot during the development phase or over time if they are used over multiple survey rounds. This can become messy quickly with numbered questionnaires; adding or dropping a variable means that all subsequent variables also need to renamed if one wants to keep a perfect sequence, which is especially annoying if you have a XLSForm with lots of dependencies in calculations and relevances, and it creates havoc in the scripts of data analysts who have already worked with the old names. To avoid this scenario, researchers sometimes revert to “exending” question numbers and one ends up with variable names like Q13 Q14a Q14b Q15 Q16 Q17a Q17b Q17c which go against the maybe only advantage a numbered questionnaire has over descriptive variable names: intuitive navigation for printed questionnaires. The same descriptive variables names can be reused across different surveys, making it easier to reuse parts of mobile forms, recycle quality control backends and run similar types of analyses across multiple surveys. - Descriptive variable names make form development, scripted data analysis and backend development more pleasant and less error-prone. Arguably, one does not have to be a statistics wizard to understand what the command tab gender if age < 25 means, whereas tab a26 if a28 < 25 requires one to look for the variables in a codebook of some sorts.
Naming Conventions
General Tips
Related Pages
Click here for pages that link to this topics.
Additional Resources
- Jan Schenk, Variable Names in Survey Research
- Petri Silen, Useful Tips for Naming Your Variables