Difference between revisions of "Naming Conventions"
(Added content on file naming conventions for code-compatible files and a new reference.) |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
'''Impact Evaluation''' projects should follow a clear file naming convention as many team members will need to understand and interact with files over the project lifetime. It is very important to use a naming convention that not only you understand but someone looking at the files after years also understands. | |||
== Read First == | == Read First == | ||
*Follow clear and consistent naming conventions for all files | *Follow clear and consistent naming conventions for all files | ||
*Pay special attention to naming conventions for code-compatible files | *Pay special attention to naming conventions for code-compatible files | ||
*Use version control | *Use version control software, such as [[Getting Started with GitHub|Git/Github]], instead of naming the folders _v01, _v02, old, new, etc. | ||
== Naming requirements for code files == | == Naming requirements for code files == | ||
Line 10: | Line 10: | ||
=== Spaces === | === Spaces === | ||
Introducing spaces between words in a file name (including the folder path) can break a file's path when it | Introducing spaces between words in a file name (including the folder path) can break a file's path when it is read by code, so while a Word document may be called ''2019-10-30 Sampling Procedure Description.docx'', | ||
a related code file would have a name like | a related code file would have a name like ''sampling-endline.do''. | ||
=== Timestamps === | === Timestamps === | ||
Adding timestamps to binary files as in the example above can be useful, as it is not straightforward to track changes using version control software. | Adding timestamps to binary files as in the example above can be useful, as it is not straightforward to track changes using version control software. | ||
For plaintext files version-controlled using Git, timestamps are an unnecessary distraction. | For plaintext files version-controlled using Git, timestamps are an unnecessary distraction. | ||
Output tables, graphs, and | Output tables, graphs, and [[Data Documentation|documentation]] are an exception to this and it is good practice to date them. Instead of using versions, they should be dated for clarity. For example = "_2017June8" rather than "_v02". | ||
=== Capitalization === | === Capitalization === | ||
Line 22: | Line 22: | ||
=== Sortability === | === Sortability === | ||
Use names that can be sorted easily, for example through alphabetization. The best names from a coding perspective are usually the opposite of those from an English perspective. For example, for a deidentified household dataset from the baseline round, you should prefer a name like | Use names that can be sorted easily, for example through alphabetization. The best names from a coding perspective are usually the opposite of those from an English perspective. For example, for a [[De-identification|deidentified]] household [[Master Dataset|dataset]] from the baseline round, you should prefer a name like ''baseline-household-deidentified.dta'', rather than the opposite way around as occurs in natural language. This ensures that all baseline data stays together, then all baseline-household data, and finally provides unique information about this specific file. | ||
==Version Control == | ==Version Control == | ||
It is generally recommended to use version control software such as Git/GitHub instead of naming the folders _v01, _v02, _old, _new, _final, etc. The only exception to this would be tables, graphs, and | It is generally recommended to use version control software such as [[Getting Started with GitHub|Git/GitHub]] instead of naming the folders _v01, _v02, _old, _new, _final, etc. The only exception to this would be tables, graphs, and [[Data Documentation|documentation]] and it is good practice to date those. For these kinds of files, it should be dated i.e. ''filename''_2017June08 rather than ''filename''_v02. | ||
=== Using Github === | === Using Github === | ||
Github is an excellent tool used for version control for documentation, and do files. It also allows users to collaborate on codes together making it easier for multiple people to work on the same code. More information on what Github does and how to get started with Github can be found [https://guides.github.com/activities/hello-world/ here]. | [[Getting Started with GitHub|Github]] is an excellent tool used for version control for [[Data Documentation|documentation]], and '''do files'''. It also allows users to collaborate on codes together making it easier for multiple people to work on the same code. More information on what '''Github''' does and how to get started with '''Github''' can be found [https://guides.github.com/activities/hello-world/ here]. | ||
=== Using Box / Dropbox Version Control features === | === Using Box / Dropbox Version Control features === | ||
Line 36: | Line 36: | ||
Box and Dropbox are both file hosting/cloud storage services which provide version control features. | Box and Dropbox are both file hosting/cloud storage services which provide version control features. | ||
'''Important''': It is important that you pay attention to your subscriptions to DropBox and/or Box when using them for version control as they only store previous versions for certain time. For example - Free Dropbox stores different versions of a files from the last 30 days. Similarly, Box does not allow users with free accounts to access versions of their files after 30 days. | |||
For more information on Dropbox's version control, here are some useful links on [https://www.dropbox.com/en/help/11 recovering older versions of a file], [https://www.dropbox.com/en/help/9114 version history information], and [https://www.dropbox.com/help/113 extended version history for Dropbox Pro users]. | |||
More information on Box's version control, the number of histories saved, and tracking older versions can be found [https://community.box.com/t5/Managing-Your-Content/How-To-Track-Your-Files-and-File-Versions-Version-History/ta-p/329 here]. | More information on Box's version control, the number of histories saved, and tracking older versions can be found [https://community.box.com/t5/Managing-Your-Content/How-To-Track-Your-Files-and-File-Versions-Version-History/ta-p/329 here]. | ||
== Variable Names == | |||
== Back to Parent == | == Back to Parent == |
Latest revision as of 18:55, 14 August 2023
Impact Evaluation projects should follow a clear file naming convention as many team members will need to understand and interact with files over the project lifetime. It is very important to use a naming convention that not only you understand but someone looking at the files after years also understands.
Read First
- Follow clear and consistent naming conventions for all files
- Pay special attention to naming conventions for code-compatible files
- Use version control software, such as Git/Github, instead of naming the folders _v01, _v02, old, new, etc.
Naming requirements for code files
Files accessed by code have special naming requirements, since different software and operating systems read file names in different ways.
Spaces
Introducing spaces between words in a file name (including the folder path) can break a file's path when it is read by code, so while a Word document may be called 2019-10-30 Sampling Procedure Description.docx, a related code file would have a name like sampling-endline.do.
Timestamps
Adding timestamps to binary files as in the example above can be useful, as it is not straightforward to track changes using version control software. For plaintext files version-controlled using Git, timestamps are an unnecessary distraction. Output tables, graphs, and documentation are an exception to this and it is good practice to date them. Instead of using versions, they should be dated for clarity. For example = "_2017June8" rather than "_v02".
Capitalization
Code-compatible files should never include capital letters, as strings and file paths are case-sensitive in some software.
Sortability
Use names that can be sorted easily, for example through alphabetization. The best names from a coding perspective are usually the opposite of those from an English perspective. For example, for a deidentified household dataset from the baseline round, you should prefer a name like baseline-household-deidentified.dta, rather than the opposite way around as occurs in natural language. This ensures that all baseline data stays together, then all baseline-household data, and finally provides unique information about this specific file.
Version Control
It is generally recommended to use version control software such as Git/GitHub instead of naming the folders _v01, _v02, _old, _new, _final, etc. The only exception to this would be tables, graphs, and documentation and it is good practice to date those. For these kinds of files, it should be dated i.e. filename_2017June08 rather than filename_v02.
Using Github
Github is an excellent tool used for version control for documentation, and do files. It also allows users to collaborate on codes together making it easier for multiple people to work on the same code. More information on what Github does and how to get started with Github can be found here.
Using Box / Dropbox Version Control features
Box and Dropbox are both file hosting/cloud storage services which provide version control features.
Important: It is important that you pay attention to your subscriptions to DropBox and/or Box when using them for version control as they only store previous versions for certain time. For example - Free Dropbox stores different versions of a files from the last 30 days. Similarly, Box does not allow users with free accounts to access versions of their files after 30 days.
For more information on Dropbox's version control, here are some useful links on recovering older versions of a file, version history information, and extended version history for Dropbox Pro users.
More information on Box's version control, the number of histories saved, and tracking older versions can be found here.
Variable Names
Back to Parent
This article is part of the topic Data Management
Additional Resources
- Guidelines on naming conventions from Duke University