Difference between revisions of "Naming Conventions"

Jump to: navigation, search
 
(25 intermediate revisions by 6 users not shown)
Line 1: Line 1:
Impact Evaluation projects should follow a clear naming convention as all the files in the project will be used throughout the project duration. Since over the course of an impact evaluation project, personnel doing research, or on the field might change, it is very important to use a naming convention that not only you understand but someone looking at the files after years also understands.  
'''Impact Evaluation''' projects should follow a clear file naming convention as many team members will need to understand and interact with files over the project lifetime. It is very important to use a naming convention that not only you understand but someone looking at the files after years also understands.  


== Read First ==
== Read First ==
Use the version control in Box/DropBox instead of naming the folders _v01, _v02, old, new, etc. Output tables, graphs, and documentations are an exception to this and it is good practice to date them. Instead of using versions, they should be dated for clarity. For example = "_2017June8" rather than "_v02".
*Follow clear and consistent naming conventions for all files
*Pay special attention to naming conventions for code-compatible files
*Use version control software, such as [[Getting Started with GitHub|Git/Github]], instead of naming the folders _v01, _v02, old, new, etc.  


== Guidelines ==
== Naming requirements for code files ==
* organize information on the topic into subsections. for each subsection, include a brief description / overview, with links to articles that provide details
Files accessed by code have special naming requirements, since different software and operating systems read file names in different ways.
===Subsection 1===
 
===Subsection 2===
=== Spaces ===
===Subsection 3===
Introducing spaces between words in a file name (including the folder path) can break a file's path when it is read by code, so while a Word document may be called ''2019-10-30 Sampling Procedure Description.docx'',
a related code file would have a name like ''sampling-endline.do''.
 
=== Timestamps ===
Adding timestamps to binary files as in the example above can be useful, as it is not straightforward to track changes using version control software.
For plaintext files version-controlled using Git, timestamps are an unnecessary distraction.
Output tables, graphs, and [[Data Documentation|documentation]] are an exception to this and it is good practice to date them. Instead of using versions, they should be dated for clarity. For example = "_2017June8" rather than "_v02".
 
=== Capitalization ===
Code-compatible files should never include capital letters, as strings and file paths are case-sensitive in some software.
 
=== Sortability ===
Use names that can be sorted easily, for example through alphabetization. The best names from a coding perspective are usually the opposite of those from an English perspective. For example, for a [[De-identification|deidentified]] household [[Master Dataset|dataset]] from the baseline round, you should prefer a name like ''baseline-household-deidentified.dta'', rather than the opposite way around as occurs in natural language. This ensures that all baseline data stays together, then all baseline-household data, and finally provides unique information about this specific file.
 
==Version Control ==
 
It is generally recommended to use version control software such as [[Getting Started with GitHub|Git/GitHub]] instead of naming the folders _v01, _v02, _old, _new, _final, etc. The only exception to this would be tables, graphs, and [[Data Documentation|documentation]] and it is good practice to date those. For these kinds of files, it should be dated i.e. ''filename''_2017June08 rather than ''filename''_v02.
 
=== Using Github ===
 
[[Getting Started with GitHub|Github]] is an excellent tool used for version control for [[Data Documentation|documentation]], and '''do files'''. It also allows users to collaborate on codes together making it easier for multiple people to work on the same code. More information on what '''Github''' does and how to get started with '''Github''' can be found [https://guides.github.com/activities/hello-world/ here].
 
=== Using Box / Dropbox Version Control features ===
 
Box and Dropbox are both file hosting/cloud storage services which provide version control features.
 
'''Important''': It is important that you pay attention to your subscriptions to DropBox and/or Box when using them for version control as they only store previous versions for certain time. For example - Free Dropbox stores different versions of a files from the last 30 days. Similarly, Box does not allow users with free accounts to access versions of their files after 30 days.
 
For more information on Dropbox's version control, here are some useful links on [https://www.dropbox.com/en/help/11 recovering older versions of a file], [https://www.dropbox.com/en/help/9114 version history information], and [https://www.dropbox.com/help/113 extended version history for Dropbox Pro users].
 
More information on Box's version control, the number of histories saved, and tracking older versions can be found [https://community.box.com/t5/Managing-Your-Content/How-To-Track-Your-Files-and-File-Versions-Version-History/ta-p/329 here].
 
 
== Variable Names ==


== Back to Parent ==
== Back to Parent ==
This article is part of the topic [[*topic name, as listed on main page*]]
This article is part of the topic [[Data Management]]
 


== Additional Resources ==
== Additional Resources ==
* list here other articles related to this topic, with a brief description and link
* [https://www2.stat.duke.edu/~rcs46/lectures\_2015/01-markdown-git/slides/naming-slides/naming-slides.pdf| Guidelines on naming conventions] from Duke University


[[Category: Data Management ]]
[[Category: Data Management ]]

Latest revision as of 18:55, 14 August 2023

Impact Evaluation projects should follow a clear file naming convention as many team members will need to understand and interact with files over the project lifetime. It is very important to use a naming convention that not only you understand but someone looking at the files after years also understands.

Read First

  • Follow clear and consistent naming conventions for all files
  • Pay special attention to naming conventions for code-compatible files
  • Use version control software, such as Git/Github, instead of naming the folders _v01, _v02, old, new, etc.

Naming requirements for code files

Files accessed by code have special naming requirements, since different software and operating systems read file names in different ways.

Spaces

Introducing spaces between words in a file name (including the folder path) can break a file's path when it is read by code, so while a Word document may be called 2019-10-30 Sampling Procedure Description.docx, a related code file would have a name like sampling-endline.do.

Timestamps

Adding timestamps to binary files as in the example above can be useful, as it is not straightforward to track changes using version control software. For plaintext files version-controlled using Git, timestamps are an unnecessary distraction. Output tables, graphs, and documentation are an exception to this and it is good practice to date them. Instead of using versions, they should be dated for clarity. For example = "_2017June8" rather than "_v02".

Capitalization

Code-compatible files should never include capital letters, as strings and file paths are case-sensitive in some software.

Sortability

Use names that can be sorted easily, for example through alphabetization. The best names from a coding perspective are usually the opposite of those from an English perspective. For example, for a deidentified household dataset from the baseline round, you should prefer a name like baseline-household-deidentified.dta, rather than the opposite way around as occurs in natural language. This ensures that all baseline data stays together, then all baseline-household data, and finally provides unique information about this specific file.

Version Control

It is generally recommended to use version control software such as Git/GitHub instead of naming the folders _v01, _v02, _old, _new, _final, etc. The only exception to this would be tables, graphs, and documentation and it is good practice to date those. For these kinds of files, it should be dated i.e. filename_2017June08 rather than filename_v02.

Using Github

Github is an excellent tool used for version control for documentation, and do files. It also allows users to collaborate on codes together making it easier for multiple people to work on the same code. More information on what Github does and how to get started with Github can be found here.

Using Box / Dropbox Version Control features

Box and Dropbox are both file hosting/cloud storage services which provide version control features.

Important: It is important that you pay attention to your subscriptions to DropBox and/or Box when using them for version control as they only store previous versions for certain time. For example - Free Dropbox stores different versions of a files from the last 30 days. Similarly, Box does not allow users with free accounts to access versions of their files after 30 days.

For more information on Dropbox's version control, here are some useful links on recovering older versions of a file, version history information, and extended version history for Dropbox Pro users.

More information on Box's version control, the number of histories saved, and tracking older versions can be found here.


Variable Names

Back to Parent

This article is part of the topic Data Management

Additional Resources