Difference between revisions of "Master Do-files"

Jump to: navigation, search
Line 7: Line 7:


=== Intro Header ===  
=== Intro Header ===  
 
--------------
The intro header should contain some descriptive information about the do file such that somebody who doesn't know the do file can read it and understand what the do file does and what it produces.  
The intro header should contain some descriptive information about the do file such that somebody who doesn't know the do file can read it and understand what the do file does and what it produces.  
Some examples of information to put on the header are the purpose of the do file, the outline of the do file, the data files required to run the do file correctly, the data files created by the do file, the name variable that uniquely identifies the unit of observation in the datasets, etc.
Some examples of information to put on the header are the purpose of the do file, the outline of the do file, the data files required to run the do file correctly, the data files created by the do file, the name variable that uniquely identifies the unit of observation in the datasets, etc.

Revision as of 22:09, 30 January 2017

The master do-file is the main do file that is used to call upon all the other do files. By running this file, all files needed from importing raw data to cleaning, constructing, analysing and outputting results should be run. This file therefore also functions as a map to the data folder.

Read First

  • The person creating the master do file should be able to run the do files from all stages(cleaning, construct, analysis, exporting tables, etc) from the master do-file and the someone else running the master do file should be able to run all of those just by changing the paths to their Dropbox/Box folders.

Components of a Master Do file

Intro Header


The intro header should contain some descriptive information about the do file such that somebody who doesn't know the do file can read it and understand what the do file does and what it produces. Some examples of information to put on the header are the purpose of the do file, the outline of the do file, the data files required to run the do file correctly, the data files created by the do file, the name variable that uniquely identifies the unit of observation in the datasets, etc.

Settings to Declare in the Master do-file


After the intro header, settings that are used throughout the project should also be declared in the master do-file. Some of the settings are as follows:

Version Settings

Sample of version, memory limits, and settings defined in the master do-file.

The version settings for Stata needs to be declared in the master do-file. Since, things like Stata's randomization algorithm sometimes changes across versions, it is important to declare Stata's version number to make sure that the analysis done using Stata is reproducible.

Basic and Advanced Memory Limits

Memory limits on Stata affect various things like the maximum number of variables a dataset can have( maxvar), number of variables than be used during Stata's estimation commands matsize, the amount and time Stata uses the system memory niceness / min_memory / max_memory , etc. Declaring the memory limits on the master do-files makes sure that the analysis runs smoothly with maximum efficiency.

Default Options

Default options like setting more off/on, pause on/off, and abbreviation should also be set in the master do file. Declaring options in the main file ensures that when the other do-files are run through the master do file, the settings do not have to be declared again.

Globals

Globals should also be defined in the master do-files. Some of the common globals to declare in the master do files are conversion rates for standardization of units, varlist commonly used across the projects, assumptions that need to be defined, etc. Since, globals defined in one do file also work on other do files throughout a Stata session, it is important to declare all the global variables necessary during the project on the master do-file.

Back to Parent

This article is part of the topic Data Management

Additional Resources