Impact Evaluation projects should follow a clear file naming convention as many team members will need to understand and interact with files over the project lifetime. It is very important to use a naming convention that not only you understand but someone looking at the files after years also understands.
- Follow clear and consistent naming conventions for all files
- Pay special attention to naming conventions for code-compatible files
- Use version control softwares(such as Git/Github) instead of naming the folders _v01, _v02, old, new, etc.
Naming requirements for code files
Files accessed by code have special naming requirements, since different software and operating systems read file names in different ways.
Introducing spaces between words in a file name (including the folder path) can break a file's path when it's read by code, so while a Word document may be called 2019-10-30 Sampling Procedure Description.docx, a related code file would have a name like sampling-endline.do.
Adding timestamps to binary files as in the example above can be useful, as it is not straightforward to track changes using version control software. For plaintext files version-controlled using Git, timestamps are an unnecessary distraction. Output tables, graphs, and documentations are an exception to this and it is good practice to date them. Instead of using versions, they should be dated for clarity. For example = "_2017June8" rather than "_v02".
Code-compatible files should never include capital letters, as strings and file paths are case-sensitive in some software.
Use names that can be sorted easily, for example through alphabetization. The best names from a coding perspective are usually the opposite of those from an English perspective. For example, for a deidentified household dataset from the baseline round, you should prefer a name like baseline-household-deidentified.dta, rather than the opposite way around as occurs in natural language. This ensures that all baseline data stays together, then all baseline-household data, and finally provides unique information about this specific file.
It is generally recommended to use version control software such as Git/GitHub instead of naming the folders _v01, _v02, _old, _new, _final, etc. The only exception to this would be tables, graphs, and documentations and it is good practice to date those. For these kinds of files, it should be dated i.e. filename_2017June08 rather than filename_v02.
Github is an excellent tool used for version control for documentation, and do files. It also allows users to collaborate on codes together making it easier for multiple people to work on the same code. More information on what Github does and how to get started with Github can be found here.
Using Box / Dropbox Version Control features
Box and Dropbox are both file hosting/cloud storage services which provide version control features.
Important : It is important that you pay attention to your subscription of DropBox/Box when using them for version control as they only store previous versions for certain time. For example - Free Dropbox stores different versions of a files from the last 30 days. Similarly, Box does not allow users with free accounts to access versions of their files.
More information on Box's version control, the number of histories saved, and tracking older versions can be found here.
Back to Parent
This article is part of the topic Data Management
- Guidelines on naming conventions from Duke University