https://dimewiki.worldbank.org/api.php?action=feedcontributions&user=Bbdaniels&feedformat=atomDIME Wiki - User contributions [en]2021-05-11T00:16:14ZUser contributionsMediaWiki 1.35.2https://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=8007Stata Coding Practices: Programming (Ado-files)2021-02-10T19:44:53Z<p>Bbdaniels: </p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop autoreg<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
preserve<br />
<br />
// Implement [if]<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
There are several key features to note here. First, we write <syntaxhighlight lang="stata" inline>anything</syntaxhighlight> in the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command to allow the user to write absolutely anything they like as the arguments to be passed into the program. By default, this is assigned to the string local <syntaxhighlight lang="stata" inline>`anything'</syntaxhighlight> and can be recovered throughout the program. Recall that local macros in Stata have strictly local scope; in this case, that means locals from the calling do-file will not be passed into the program, and locals from the program will not be passed back into the calling do-file.<br />
<br />
Second, we write <syntaxhighlight lang="stata" inline>[if]</syntaxhighlight> in brackets to declare that the user can optionally declare an if-restriction to the command. This does nothing on its own: it simply creates another local string macro called <syntaxhighlight lang="stata" inline>`if'</syntaxhighlight> containing the restriction. However, Stata provides the implementation shortcut <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> to implement this restriction. By calling <syntaxhighlight lang="stata" inline>marksample touse</syntaxhighlight>, Stata creates a temporary variable <syntaxhighlight lang="stata" inline>`touse'</syntaxhighlight> for every observation indicating whether it satisfies the if-restriction or not. <br />
<br />
Then, the if-restriction must be applied: we can <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> the data and then <syntaxhighlight lang="stata" inline>drop</syntaxhighlight> the ineligible observations before running more code. This is an appropriate choice here for several reasons: <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> will always restore the data to the original state at the end of program execution, no matter what happens later in the program, due to its scope; <syntaxhighlight lang="stata" inline>restore</syntaxhighlight> is not even needed here. For this reason, we will often only use <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> in this context in programming, and prefer other methods for loading and re-loading data inside the program block.<br />
<br />
Now, we can run commands like:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist foreign<br />
levelslist foreign make if foreign == 1<br />
<br />
sysuse census.dta<br />
levelslist region<br />
levelslist state if region == 1<br />
</syntaxhighlight><br />
<br />
Other <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> elements work similarly, although they are not parsed through <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> (except <syntaxhighlight lang="stata" inline>in</syntaxhighlight>). The <syntaxhighlight lang="stata" inline>using</syntaxhighlight> syntax is typically used to target a file on the operating system; when you want to import or export data this is the feature of choice, and you should always test and implement it with compound double quotes (for example, <syntaxhighlight lang="stata" inline>`" `using' "'</syntaxhighlight>) and determine whether or not you want to pass <syntaxhighlight lang="stata" inline>using</syntaxhighlight> itself into the <syntaxhighlight lang="stata" inline>`using'</syntaxhighlight> macro by writing <syntaxhighlight lang="stata" inline>[using/]</syntaxhighlight> instead. See the helpfile for details.<br />
<br />
Finally, the options syntax allows optional triggers to be implemented. Let's allow the user to request value labels, by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels]<br />
preserve<br />
<br />
// Implement [if]<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement value label option if specified<br />
if "`valuelabels'" != "" {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
When the <syntaxhighlight lang="stata" inline>valuelabels</syntaxhighlight> option is specified (using either <syntaxhighlight lang="stata" inline>, val</syntaxhighlight> as an allowed abbreviation by the capitalization or writing out its full name), the <syntaxhighlight lang="stata" inline>`valuelabels'</syntaxhighlight> macro will contain <syntaxhighlight lang="stata" inline>"valuelabels"</syntaxhighlight>. Otherwise it will be empty. Therefore simple conditionals allow options to be checked and executed. Now we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse census.dta<br />
levelslist region , val<br />
</syntaxhighlight><br />
<br />
and we would get:<br />
<br />
<syntaxhighlight lang="stata"><br />
Levels of region: Census region<br />
1: NE<br />
2: N Cntrl<br />
3: South<br />
4: West<br />
</syntaxhighlight><br />
<br />
However, we can see that the command would then fail if we ran <syntaxhighlight lang="stata" inline>levelslist region state , val</syntaxhighlight>, because <syntaxhighlight lang="stata" inline>state</syntaxhighlight> is a string variable and cannot have labels. So we might want to allow the user to specify a list of variables to show labels for, as the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels(string asis)]<br />
preserve<br />
<br />
// Implement [if]<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement valuelabels option<br />
local thisLabel ""<br />
if strpos(" `valuelabels' "," `var' ") >= 1 {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
<br />
// Display value (and label if requested)<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
Because we now allow the option as <syntaxhighlight lang="stata" inline>[VALuelabels(string asis)]</syntaxhighlight>, it will either contain the string written into the option or it will contain nothing. We need to rewrite the implementation slightly. First, we need to reset <syntaxhighlight lang="stata" inline>`thisLabel'</syntaxhighlight> so it is emptied whenever it does not apply. Second, we need to use a tool like <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight> to check if a variable occurs in the list - when we write the helpfile, we will make clear that this option needs to take a list of variables. Is is possible to require this through the options syntax itself but it can introduce issues (if, for example, the command first loads data, a <syntaxhighlight lang="stata" inline>varlist</syntaxhighlight> check might fail on the data currently in memory). In this kind of operation, it is doubly clear that the full names of variables need to be used (to avoid needing to pull in commands like <syntaxhighlight lang="stata" inline>unab</syntaxhighlight>). Also, note the use of extra spacing around both arguments of <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight>; these ensures that variables whose name are a substring of another do not trigger the option. Now, we can run <syntaxhighlight lang="stata" inline>levelslist region state , val(region)</syntaxhighlight> and get the results we wanted.<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==<br />
<br />
Stata has a set of <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands that can be used to store information temporarily. This functionality</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7963Stata Coding Practices: Programming (Ado-files)2021-02-09T00:10:25Z<p>Bbdaniels: /* The program command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop autoreg<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
There are several key features to note here. First, we write <syntaxhighlight lang="stata" inline>anything</syntaxhighlight> in the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command to allow the user to write absolutely anything they like as the arguments to be passed into the program. By default, this is assigned to the string local <syntaxhighlight lang="stata" inline>`anything'</syntaxhighlight> and can be recovered throughout the program. Recall that local macros in Stata have strictly local scope; in this case, that means locals from the calling do-file will not be passed into the program, and locals from the program will not be passed back into the calling do-file.<br />
<br />
Second, we write <syntaxhighlight lang="stata" inline>[if]</syntaxhighlight> in brackets to declare that the user can optionally declare an if-restriction to the command. This does nothing on its own: it simply creates another local string macro called <syntaxhighlight lang="stata" inline>`if'</syntaxhighlight> containing the restriction. However, Stata provides the implementation shortcut <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> to implement this restriction. By calling <syntaxhighlight lang="stata" inline>marksample touse</syntaxhighlight>, Stata creates a temporary variable <syntaxhighlight lang="stata" inline>`touse'</syntaxhighlight> for every observation indicating whether it satisfies the if-restriction or not. <br />
<br />
Then, the if-restriction must be applied: we can <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> the data and then <syntaxhighlight lang="stata" inline>drop</syntaxhighlight> the ineligible observations before running more code. This is an appropriate choice here for several reasons: <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> will always restore the data to the original state at the end of program execution, no matter what happens later in the program, due to its scope; <syntaxhighlight lang="stata" inline>restore</syntaxhighlight> is not even needed here. For this reason, we will often only use <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> in this context in programming, and prefer other methods for loading and re-loading data inside the program block.<br />
<br />
Now, we can run commands like:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist foreign<br />
levelslist foreign make if foreign == 1<br />
<br />
sysuse census.dta<br />
levelslist region<br />
levelslist state if region == 1<br />
</syntaxhighlight><br />
<br />
Other <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> elements work similarly, although they are not parsed through <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> (except <syntaxhighlight lang="stata" inline>in</syntaxhighlight>). The <syntaxhighlight lang="stata" inline>using</syntaxhighlight> syntax is typically used to target a file on the operating system; when you want to import or export data this is the feature of choice, and you should always test and implement it with compound double quotes (for example, <syntaxhighlight lang="stata" inline>`" `using' "'</syntaxhighlight>) and determine whether or not you want to pass <syntaxhighlight lang="stata" inline>using</syntaxhighlight> itself into the <syntaxhighlight lang="stata" inline>`using'</syntaxhighlight> macro by writing <syntaxhighlight lang="stata" inline>[using/]</syntaxhighlight> instead. See the helpfile for details.<br />
<br />
Finally, the options syntax allows optional triggers to be implemented. Let's allow the user to request value labels, by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement value label option if specified<br />
if "`valuelabels'" != "" {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
When the <syntaxhighlight lang="stata" inline>valuelabels</syntaxhighlight> option is specified (using either <syntaxhighlight lang="stata" inline>, val</syntaxhighlight> as an allowed abbreviation by the capitalization or writing out its full name), the <syntaxhighlight lang="stata" inline>`valuelabels'</syntaxhighlight> macro will contain <syntaxhighlight lang="stata" inline>"valuelabels"</syntaxhighlight>. Otherwise it will be empty. Therefore simple conditionals allow options to be checked and executed. Now we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse census.dta<br />
levelslist region , val<br />
</syntaxhighlight><br />
<br />
and we would get:<br />
<br />
<syntaxhighlight lang="stata"><br />
Levels of region: Census region<br />
1: NE<br />
2: N Cntrl<br />
3: South<br />
4: West<br />
</syntaxhighlight><br />
<br />
However, we can see that the command would then fail if we ran <syntaxhighlight lang="stata" inline>levelslist region state , val</syntaxhighlight>, because <syntaxhighlight lang="stata" inline>state</syntaxhighlight> is a string variable and cannot have labels. So we might want to allow the user to specify a list of variables to show labels for, as the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels(string asis)]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement valuelabels option<br />
local thisLabel ""<br />
if strpos(" `valuelabels' "," `var' ") >= 1 {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
<br />
// Display value (and label if requested)<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
Because we now allow the option as <syntaxhighlight lang="stata" inline>[VALuelabels(string asis)]</syntaxhighlight>, it will either contain the string written into the option or it will contain nothing. We need to rewrite the implementation slightly. First, we need to reset <syntaxhighlight lang="stata" inline>`thisLabel'</syntaxhighlight> so it is emptied whenever it does not apply. Second, we need to use a tool like <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight> to check if a variable occurs in the list - when we write the helpfile, we will make clear that this option needs to take a list of variables. Is is possible to require this through the options syntax itself but it can introduce issues (if, for example, the command first loads data, a <syntaxhighlight lang="stata" inline>varlist</syntaxhighlight> check might fail on the data currently in memory). In this kind of operation, it is doubly clear that the full names of variables need to be used (to avoid needing to pull in commands like <syntaxhighlight lang="stata" inline>unab</syntaxhighlight>). Also, note the use of extra spacing around both arguments of <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight>; these ensures that variables whose name are a substring of another do not trigger the option. Now, we can run <syntaxhighlight lang="stata" inline>levelslist region state , val(region)</syntaxhighlight> and get the results we wanted.<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==<br />
<br />
Stata has a set of <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands that can be used to store information temporarily. This functionality</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Research_Documentation&diff=7925Research Documentation2021-01-27T23:19:57Z<p>Bbdaniels: /* What to include in research documentation */</p>
<hr />
<div>When [[Dissemination|publishing research]], it is important to make '''documentation''' available so that readers can understand the details of the [[Research design|research design]] that the work reports. This includes all of the technical details and decisions that could influence how the findings are read or understood. Usually, this will involve producing a document along the lines of a methodological note or appendix. That document will describe how a given study was designed and how the design was carried out. The level of detail is in such a document should be relatively high. This page will describe some common approaches to compiling this kind of material and retaining the needed information in an organized fashion throughout the life of a research project.<br />
<br />
== Read First ==<br />
* '''Research documentation''' provides the context to understanding the results of a given research output.<br />
* There is no standard form for this documentation, and its location and format will depend on the type of research output produced.<br />
* For academic materials, this documentation often takes the form of a structured methodological appendix.<br />
* For policy outputs or online products, it may be appropriate to include an informative <code>README</code> webpage or document.<br />
* The most important process for preparing this documentation will be retaining and organizing the needed information throughout the life of the project, so that the team will not have to search through communications or data archives for small details at publication time.<br />
<br />
== What to include in research documentation ==<br />
<br />
'''Research documentation''' should include all the information that is needed to understand the underlying design for the research output. This can include descriptions of:<br />
<br />
* Populations of interest that informed the study<br />
* Methods of sampling or other sources of data about selecting the units of observation that were actually included in the study<br />
* Power calculations and pre-analysis plans<br />
* Field work, including data collection or experimental manipulation, such as study protocols and monitoring or quality assurance information<br />
* Data collection tools such as survey instruments, search keywords, and instructions or code for API requests or database queries<br />
* Statistical approaches such as definitions of key constructed indicators, corrections or adjustments to data, and precise definitions of estimators and estimation procedures<br />
* Data completeness, including non-observed units or quantities that were planned or "tracking" information<br />
<br />
All of the research documentation taken together should broadly allow a reader to understand how information was gathered, what it represents, what kind of information and data files to expect, and how to relate that information to the results of the research. Research documentation is not a complete guide to data, however; it does not need to provide the level of detail or instructions that would enable a reader to approach different research questions using the same data.<br />
<br />
Documentation will take different forms depending on the information included. Much of it will be written narrative rather than, for example, formal data sets. Understanding research documentation should not require the user to have any special software or to undertake any analytical tasks themselves. Relevant datasets (such as tracking of units of observation over time) might be included alongside the documentation, but the documentation should summarize in narrative form all the information from that dataset that is likely to affect the interpretation of the research.<br />
<br />
== Structuring research documentation as a publication appendix ==<br />
<br />
If you are preparing documentation to accompany the publication of an academic output such as a working paper or journal article, the most common form of research documentation is a structured supplemental appendix. Check the journal's publication process for details. Some publishers allow unlimited supplementary materials to be included in a format such as an author-created document. These materials may or may not be included under the peer review of the main manuscript and might only be intended to provide context for readers and reviewers. In this case you should provide complete information in that material. Other publishers expect all supplementary materials to be read and reviewed as part of the publication process. In this case you should provide the minimum additional detail required to understand the research here (since much of the appendix will likely be taken up by supplementary results rather than documentation), and consider other methods for releasing complete documentation, such as self-publication on OSF or Zenodo.<br />
<br />
Since there is unlimited space and you may have a large amount of material to include in a documentation appendix, organization is essential. It is appropriate to have several appendices that cover different aspects of the research. For example, Appendix A may include information about the study population and data, such as the total number of units available for observation, the number selected or included for observation, the number successfully included, and descriptive statistics about subgroups, strata, clusters, or other units relevant to the research. It could be accompanied by a tracking dataset with full information about the process. Appendix B might include information about an intended experimental manipulation in one section, and information about implementation, take-up, and fidelity in a second section. It could be accompanied by a dataset with key indicators. Appendix C might include data collection protocols and definitions of constructed variables and comparisons with alternative definitions, and be accompanied by data collection instruments and illustrative figures. Each appendix should included relevant references. Supplementary exhibits should be numbered to correspond with the appendix they pertain to. More granular appendices are generally preferable so that referencing and numbering remains relatively uncomplicated.<br />
<br />
There have been many attempts to standardized some of these elements, such as the STROBE and CONSORT reporting checklists. Journals will let you know if they expect these exact templates to be followed. Even if they are not required, such templates can still be used directly or to provide inspiration or structure for the materials you might want to include.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Research_Documentation&diff=7924Research Documentation2021-01-27T22:38:28Z<p>Bbdaniels: /* Structuring research documentation as a publication appendix */</p>
<hr />
<div>When [[Dissemination|publishing research]], it is important to make '''documentation''' available so that readers can understand the details of the [[Research design|research design]] that the work reports. This includes all of the technical details and decisions that could influence how the findings are read or understood. Usually, this will involve producing a document along the lines of a methodological note or appendix. That document will describe how a given study was designed and how the design was carried out. The level of detail is in such a document should be relatively high. This page will describe some common approaches to compiling this kind of material and retaining the needed information in an organized fashion throughout the life of a research project.<br />
<br />
== Read First ==<br />
* '''Research documentation''' provides the context to understanding the results of a given research output.<br />
* There is no standard form for this documentation, and its location and format will depend on the type of research output produced.<br />
* For academic materials, this documentation often takes the form of a structured methodological appendix.<br />
* For policy outputs or online products, it may be appropriate to include an informative <code>README</code> webpage or document.<br />
* The most important process for preparing this documentation will be retaining and organizing the needed information throughout the life of the project, so that the team will not have to search through communications or data archives for small details at publication time.<br />
<br />
== What to include in research documentation ==<br />
<br />
'''Research documentation''' should include all the information that is needed to understand the underlying design for the research output. This can include descriptions of:<br />
<br />
* Populations of interest that informed the study<br />
* Methods of sampling or other sources of data about selecting the units of observation that were actually included in the study<br />
* Field work, including data collection or experimental manipulation, such as study protocols and monitoring or quality assurance information<br />
* Data collection tools such as survey instruments, search keywords, and instructions or code for API requests or database queries<br />
* Statistical approaches such as definitions of key constructed indicators, corrections or adjustments to data, and precise definitions of estimators and estimation procedures<br />
* Data completeness, including non-observed units or quantities that were planned or "tracking" information<br />
<br />
All of the research documentation taken together should broadly allow a reader to understand how information was gathered, what it represents, what kind of information and data files to expect, and how to relate that information to the results of the research. Research documentation is not a complete guide to data, however; it does not need to provide the level of detail or instructions that would enable a reader to approach different research questions using the same data.<br />
<br />
Documentation will take different forms depending on the information included. Much of it will be written narrative rather than, for example, formal data sets. Understanding research documentation should not require the user to have any special software or to undertake any analytical tasks themselves. Relevant datasets (such as tracking of units of observation over time) might be included alongside the documentation, but the documentation should summarize in narrative form all the information from that dataset that is likely to affect the interpretation of the research.<br />
<br />
== Structuring research documentation as a publication appendix ==<br />
<br />
If you are preparing documentation to accompany the publication of an academic output such as a working paper or journal article, the most common form of research documentation is a structured supplemental appendix. Check the journal's publication process for details. Some publishers allow unlimited supplementary materials to be included in a format such as an author-created document. These materials may or may not be included under the peer review of the main manuscript and might only be intended to provide context for readers and reviewers. In this case you should provide complete information in that material. Other publishers expect all supplementary materials to be read and reviewed as part of the publication process. In this case you should provide the minimum additional detail required to understand the research here (since much of the appendix will likely be taken up by supplementary results rather than documentation), and consider other methods for releasing complete documentation, such as self-publication on OSF or Zenodo.<br />
<br />
Since there is unlimited space and you may have a large amount of material to include in a documentation appendix, organization is essential. It is appropriate to have several appendices that cover different aspects of the research. For example, Appendix A may include information about the study population and data, such as the total number of units available for observation, the number selected or included for observation, the number successfully included, and descriptive statistics about subgroups, strata, clusters, or other units relevant to the research. It could be accompanied by a tracking dataset with full information about the process. Appendix B might include information about an intended experimental manipulation in one section, and information about implementation, take-up, and fidelity in a second section. It could be accompanied by a dataset with key indicators. Appendix C might include data collection protocols and definitions of constructed variables and comparisons with alternative definitions, and be accompanied by data collection instruments and illustrative figures. Each appendix should included relevant references. Supplementary exhibits should be numbered to correspond with the appendix they pertain to. More granular appendices are generally preferable so that referencing and numbering remains relatively uncomplicated.<br />
<br />
There have been many attempts to standardized some of these elements, such as the STROBE and CONSORT reporting checklists. Journals will let you know if they expect these exact templates to be followed. Even if they are not required, such templates can still be used directly or to provide inspiration or structure for the materials you might want to include.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Research_Documentation&diff=7923Research Documentation2021-01-27T22:30:09Z<p>Bbdaniels: /* Structuring research documentation as a publication appendix */</p>
<hr />
<div>When [[Dissemination|publishing research]], it is important to make '''documentation''' available so that readers can understand the details of the [[Research design|research design]] that the work reports. This includes all of the technical details and decisions that could influence how the findings are read or understood. Usually, this will involve producing a document along the lines of a methodological note or appendix. That document will describe how a given study was designed and how the design was carried out. The level of detail is in such a document should be relatively high. This page will describe some common approaches to compiling this kind of material and retaining the needed information in an organized fashion throughout the life of a research project.<br />
<br />
== Read First ==<br />
* '''Research documentation''' provides the context to understanding the results of a given research output.<br />
* There is no standard form for this documentation, and its location and format will depend on the type of research output produced.<br />
* For academic materials, this documentation often takes the form of a structured methodological appendix.<br />
* For policy outputs or online products, it may be appropriate to include an informative <code>README</code> webpage or document.<br />
* The most important process for preparing this documentation will be retaining and organizing the needed information throughout the life of the project, so that the team will not have to search through communications or data archives for small details at publication time.<br />
<br />
== What to include in research documentation ==<br />
<br />
'''Research documentation''' should include all the information that is needed to understand the underlying design for the research output. This can include descriptions of:<br />
<br />
* Populations of interest that informed the study<br />
* Methods of sampling or other sources of data about selecting the units of observation that were actually included in the study<br />
* Field work, including data collection or experimental manipulation, such as study protocols and monitoring or quality assurance information<br />
* Data collection tools such as survey instruments, search keywords, and instructions or code for API requests or database queries<br />
* Statistical approaches such as definitions of key constructed indicators, corrections or adjustments to data, and precise definitions of estimators and estimation procedures<br />
* Data completeness, including non-observed units or quantities that were planned or "tracking" information<br />
<br />
All of the research documentation taken together should broadly allow a reader to understand how information was gathered, what it represents, what kind of information and data files to expect, and how to relate that information to the results of the research. Research documentation is not a complete guide to data, however; it does not need to provide the level of detail or instructions that would enable a reader to approach different research questions using the same data.<br />
<br />
Documentation will take different forms depending on the information included. Much of it will be written narrative rather than, for example, formal data sets. Understanding research documentation should not require the user to have any special software or to undertake any analytical tasks themselves. Relevant datasets (such as tracking of units of observation over time) might be included alongside the documentation, but the documentation should summarize in narrative form all the information from that dataset that is likely to affect the interpretation of the research.<br />
<br />
== Structuring research documentation as a publication appendix ==<br />
<br />
If you are preparing documentation to accompany the publication of an academic output such as a working paper or journal article, the most common form of research documentation is a structured supplemental appendix. Check the journal's publication process for details. Some publishers allow unlimited supplementary materials to be included in a format such as an author-created document. These materials may or may not be included under the peer review of the main manuscript and might only be intended to provide context for readers and reviewers. In this case you should provide complete information in that material. Other publishers expect all supplementary materials to be read and reviewed as part of the publication process. In this case you should provide the minimum additional detail required to understand the research here (since much of the appendix will likely be taken up by supplementary results rather than documentation), and consider other methods for releasing complete documentation, such as self-publication on OSF or Zenodo.<br />
<br />
Since there is unlimited space and you may have a large amount of material to include in a documentation appendix, organization is essential. It is appropriate to have several appendices that cover different aspects of the research. For example, Appendix A may include information about the study population and data, such as the total number of units available for observation, the number selected or included for observation, the number successfully included, and descriptive statistics about subgroups, strata, clusters, or other units relevant to the research. It could be accompanied by a tracking dataset with full information about the process. Appendix B might include information about an intended experimental manipulation in one section, and information about implementation, take-up, and fidelity in a second section. It could be accompanied by a dataset with key indicators. Appendix C might include definitions of constructed variables and comparisons with alternative definitions, including some illustrative figures. Each appendix should included relevant references. Supplementary exhibits should be numbered to correspond with the appendix they pertain to. More granular appendices are generally preferable so that referencing and numbering remains relatively uncomplicated.<br />
<br />
There have been many attempts to standardized some of these elements, such as the STROBE and CONSORT reporting checklists. Journals will let you know if they expect these exact templates to be followed. Even if they are not required, such templates can still be used directly or to provide inspiration or structure for the materials you might want to include.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Research_Documentation&diff=7922Research Documentation2021-01-27T22:10:03Z<p>Bbdaniels: </p>
<hr />
<div>When [[Dissemination|publishing research]], it is important to make '''documentation''' available so that readers can understand the details of the [[Research design|research design]] that the work reports. This includes all of the technical details and decisions that could influence how the findings are read or understood. Usually, this will involve producing a document along the lines of a methodological note or appendix. That document will describe how a given study was designed and how the design was carried out. The level of detail is in such a document should be relatively high. This page will describe some common approaches to compiling this kind of material and retaining the needed information in an organized fashion throughout the life of a research project.<br />
<br />
== Read First ==<br />
* '''Research documentation''' provides the context to understanding the results of a given research output.<br />
* There is no standard form for this documentation, and its location and format will depend on the type of research output produced.<br />
* For academic materials, this documentation often takes the form of a structured methodological appendix.<br />
* For policy outputs or online products, it may be appropriate to include an informative <code>README</code> webpage or document.<br />
* The most important process for preparing this documentation will be retaining and organizing the needed information throughout the life of the project, so that the team will not have to search through communications or data archives for small details at publication time.<br />
<br />
== What to include in research documentation ==<br />
<br />
'''Research documentation''' should include all the information that is needed to understand the underlying design for the research output. This can include descriptions of:<br />
<br />
* Populations of interest that informed the study<br />
* Methods of sampling or other sources of data about selecting the units of observation that were actually included in the study<br />
* Field work, including data collection or experimental manipulation, such as study protocols and monitoring or quality assurance information<br />
* Data collection tools such as survey instruments, search keywords, and instructions or code for API requests or database queries<br />
* Statistical approaches such as definitions of key constructed indicators, corrections or adjustments to data, and precise definitions of estimators and estimation procedures<br />
* Data completeness, including non-observed units or quantities that were planned or "tracking" information<br />
<br />
All of the research documentation taken together should broadly allow a reader to understand how information was gathered, what it represents, what kind of information and data files to expect, and how to relate that information to the results of the research. Research documentation is not a complete guide to data, however; it does not need to provide the level of detail or instructions that would enable a reader to approach different research questions using the same data.<br />
<br />
Documentation will take different forms depending on the information included. Much of it will be written narrative rather than, for example, formal data sets. Understanding research documentation should not require the user to have any special software or to undertake any analytical tasks themselves. Relevant datasets (such as tracking of units of observation over time) might be included alongside the documentation, but the documentation should summarize in narrative form all the information from that dataset that is likely to affect the interpretation of the research.<br />
<br />
== Structuring research documentation as a publication appendix ==<br />
<br />
If you are preparing documentation to accompany the publication of an academic output such as a working paper or journal article, the most common form of research documentation is a structured supplemental appendix. Check the journal's publication process for details. Some publishers allow unlimited supplementary materials to be included in a format such as an author-created document. These materials may or may not be included under the peer review of the main manuscript and might only be intended to provide context for readers and reviewers. In this case you should provide complete information in that material. Other publishers expect all supplementary materials to be read and reviewed as part of the publication process. In this case you should provide the minimum additional detail required to understand the research here, and consider other methods for releasing complete documentation, such as self-publication on OSF or Zenodo.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Research_Documentation&diff=7921Research Documentation2021-01-27T21:49:15Z<p>Bbdaniels: /* What to include in research documentation */</p>
<hr />
<div>When [[Dissemination|publishing research]], it is important to make '''documentation''' available so that readers can understand the details of the [[Research design|research design]] that the work reports. This includes all of the technical details and decisions that could influence how the findings are read or understood. Usually, this will involve producing a document along the lines of a methodological note or appendix. That document will describe how a given study was designed and how the design was carried out. The level of detail is in such a document should be relatively high. This page will describe some common approaches to compiling this kind of material and retaining the needed information in an organized fashion throughout the life of a research project.<br />
<br />
== Read First ==<br />
* '''Research documentation''' provides the context to understanding the results of a given research output.<br />
* There is no standard form for this documentation, and its location and format will depend on the type of research output produced.<br />
* For academic materials, this documentation often takes the form of a structured methodological appendix.<br />
* For policy outputs or online products, it may be appropriate to include an informative <code>README</code> webpage or document.<br />
* The most important process for preparing this documentation will be retaining and organizing the needed information throughout the life of the project, so that the team will not have to search through communications or data archives for small details at publication time.<br />
<br />
== What to include in research documentation ==<br />
<br />
'''Research documentation''' should include all the information that is needed to understand the underlying design for the research output. This can include descriptions of:<br />
<br />
* Populations of interest that informed the study<br />
* Methods of sampling or other sources of data about selecting the units of observation that were actually included in the study<br />
* Field work, including data collection or experimental manipulation, such as study protocols and monitoring or quality assurance information<br />
* Statistical approaches such as definitions of key constructed indicators, corrections or adjustments to data, and precise definitions of estimators and estimation procedures<br />
* Data completeness, including non-observed units or quantities that were planned or "tracking" information<br />
<br />
All of the research documentation taken together should broadly allow a reader to understand how information was gathered, what it represents, what kind of information and data files to expect, and how to relate that information to the results of the research. Research documentation is not a complete guide to data, however; it does not need to provide the level of detail or instructions that would enable a reader to approach different research questions using the same data.<br />
<br />
Documentation will take different forms depending on the information included. Much of it will be written narrative rather than, for example, formal data sets. Understanding research documentation should not require the user to have any special software or to undertake any analytical tasks themselves. Relevant datasets (such as tracking of units of observation over time) might be included alongside the documentation, but the documentation should summarize in narrative form all the information from that dataset that is likely to affect the interpretation of the research.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Research_Documentation&diff=7920Research Documentation2021-01-27T21:44:22Z<p>Bbdaniels: </p>
<hr />
<div>When [[Dissemination|publishing research]], it is important to make '''documentation''' available so that readers can understand the details of the [[Research design|research design]] that the work reports. This includes all of the technical details and decisions that could influence how the findings are read or understood. Usually, this will involve producing a document along the lines of a methodological note or appendix. That document will describe how a given study was designed and how the design was carried out. The level of detail is in such a document should be relatively high. This page will describe some common approaches to compiling this kind of material and retaining the needed information in an organized fashion throughout the life of a research project.<br />
<br />
== Read First ==<br />
* '''Research documentation''' provides the context to understanding the results of a given research output.<br />
* There is no standard form for this documentation, and its location and format will depend on the type of research output produced.<br />
* For academic materials, this documentation often takes the form of a structured methodological appendix.<br />
* For policy outputs or online products, it may be appropriate to include an informative <code>README</code> webpage or document.<br />
* The most important process for preparing this documentation will be retaining and organizing the needed information throughout the life of the project, so that the team will not have to search through communications or data archives for small details at publication time.<br />
<br />
== What to include in research documentation ==<br />
<br />
'''Research documentation''' should include all the information that is needed to understand the underlying design for the research output. This can include descriptions of:<br />
<br />
* Populations of interest that informed the study<br />
* Methods of sampling or other sources of data about selecting the units of observation that were actually included in the study<br />
* Field work, including data collection or experimental manipulation, such as study protocols and monitoring or quality assurance information<br />
* Statistical approaches such as definitions of key constructed indicators, corrections or adjustments to data, and precise definitions of estimators and estimation procedures<br />
* Data completeness, including non-observed units or quantities that were planned or "tracking" information</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Research_Documentation&diff=7919Research Documentation2021-01-26T22:13:53Z<p>Bbdaniels: </p>
<hr />
<div>When [[Dissemination|publishing research]], it is important to make '''documentation''' available so that readers can understand the details of the [[Research design|research design]] that the work reports. This includes all of the technical details and decisions that could influence how the findings are read or understood. Usually, this will involve producing a document along the lines of a methodological note or appendix. That document will describe how a given study was designed and how the design was carried out. The level of detail is in such a document should be relatively high. This page will describe some common approaches to compiling this kind of material and retaining the needed information in an organized fashion throughout the life of a research project.<br />
<br />
== Read First ==<br />
* '''Research documentation''' provides the context to understanding the results of a given research output.<br />
* There is no standard form for this documentation, and its location and format will depend on the type of research output produced.<br />
* For academic materials, this documentation often takes the form of a structured methodological appendix.<br />
* For policy outputs or online products, it may be appropriate to include an informative <code>README</code> webpage or document.<br />
* The most important process for preparing this documentation will be retaining and organizing the needed information throughout the life of the project, so that the team will not have to search through communications or data archives for small details at publication time.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7879Stata Coding Practices: Programming (Ado-files)2021-01-13T15:34:36Z<p>Bbdaniels: </p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
There are several key features to note here. First, we write <syntaxhighlight lang="stata" inline>anything</syntaxhighlight> in the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command to allow the user to write absolutely anything they like as the arguments to be passed into the program. By default, this is assigned to the string local <syntaxhighlight lang="stata" inline>`anything'</syntaxhighlight> and can be recovered throughout the program. Recall that local macros in Stata have strictly local scope; in this case, that means locals from the calling do-file will not be passed into the program, and locals from the program will not be passed back into the calling do-file.<br />
<br />
Second, we write <syntaxhighlight lang="stata" inline>[if]</syntaxhighlight> in brackets to declare that the user can optionally declare an if-restriction to the command. This does nothing on its own: it simply creates another local string macro called <syntaxhighlight lang="stata" inline>`if'</syntaxhighlight> containing the restriction. However, Stata provides the implementation shortcut <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> to implement this restriction. By calling <syntaxhighlight lang="stata" inline>marksample touse</syntaxhighlight>, Stata creates a temporary variable <syntaxhighlight lang="stata" inline>`touse'</syntaxhighlight> for every observation indicating whether it satisfies the if-restriction or not. <br />
<br />
Then, the if-restriction must be applied: we can <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> the data and then <syntaxhighlight lang="stata" inline>drop</syntaxhighlight> the ineligible observations before running more code. This is an appropriate choice here for several reasons: <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> will always restore the data to the original state at the end of program execution, no matter what happens later in the program, due to its scope; <syntaxhighlight lang="stata" inline>restore</syntaxhighlight> is not even needed here. For this reason, we will often only use <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> in this context in programming, and prefer other methods for loading and re-loading data inside the program block.<br />
<br />
Now, we can run commands like:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist foreign<br />
levelslist foreign make if foreign == 1<br />
<br />
sysuse census.dta<br />
levelslist region<br />
levelslist state if region == 1<br />
</syntaxhighlight><br />
<br />
Other <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> elements work similarly, although they are not parsed through <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> (except <syntaxhighlight lang="stata" inline>in</syntaxhighlight>). The <syntaxhighlight lang="stata" inline>using</syntaxhighlight> syntax is typically used to target a file on the operating system; when you want to import or export data this is the feature of choice, and you should always test and implement it with compound double quotes (for example, <syntaxhighlight lang="stata" inline>`" `using' "'</syntaxhighlight>) and determine whether or not you want to pass <syntaxhighlight lang="stata" inline>using</syntaxhighlight> itself into the <syntaxhighlight lang="stata" inline>`using'</syntaxhighlight> macro by writing <syntaxhighlight lang="stata" inline>[using/]</syntaxhighlight> instead. See the helpfile for details.<br />
<br />
Finally, the options syntax allows optional triggers to be implemented. Let's allow the user to request value labels, by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement value label option if specified<br />
if "`valuelabels'" != "" {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
When the <syntaxhighlight lang="stata" inline>valuelabels</syntaxhighlight> option is specified (using either <syntaxhighlight lang="stata" inline>, val</syntaxhighlight> as an allowed abbreviation by the capitalization or writing out its full name), the <syntaxhighlight lang="stata" inline>`valuelabels'</syntaxhighlight> macro will contain <syntaxhighlight lang="stata" inline>"valuelabels"</syntaxhighlight>. Otherwise it will be empty. Therefore simple conditionals allow options to be checked and executed. Now we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse census.dta<br />
levelslist region , val<br />
</syntaxhighlight><br />
<br />
and we would get:<br />
<br />
<syntaxhighlight lang="stata"><br />
Levels of region: Census region<br />
1: NE<br />
2: N Cntrl<br />
3: South<br />
4: West<br />
</syntaxhighlight><br />
<br />
However, we can see that the command would then fail if we ran <syntaxhighlight lang="stata" inline>levelslist region state , val</syntaxhighlight>, because <syntaxhighlight lang="stata" inline>state</syntaxhighlight> is a string variable and cannot have labels. So we might want to allow the user to specify a list of variables to show labels for, as the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels(string asis)]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement valuelabels option<br />
local thisLabel ""<br />
if strpos(" `valuelabels' "," `var' ") >= 1 {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
<br />
// Display value (and label if requested)<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
Because we now allow the option as <syntaxhighlight lang="stata" inline>[VALuelabels(string asis)]</syntaxhighlight>, it will either contain the string written into the option or it will contain nothing. We need to rewrite the implementation slightly. First, we need to reset <syntaxhighlight lang="stata" inline>`thisLabel'</syntaxhighlight> so it is emptied whenever it does not apply. Second, we need to use a tool like <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight> to check if a variable occurs in the list - when we write the helpfile, we will make clear that this option needs to take a list of variables. Is is possible to require this through the options syntax itself but it can introduce issues (if, for example, the command first loads data, a <syntaxhighlight lang="stata" inline>varlist</syntaxhighlight> check might fail on the data currently in memory). In this kind of operation, it is doubly clear that the full names of variables need to be used (to avoid needing to pull in commands like <syntaxhighlight lang="stata" inline>unab</syntaxhighlight>). Also, note the use of extra spacing around both arguments of <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight>; these ensures that variables whose name are a substring of another do not trigger the option. Now, we can run <syntaxhighlight lang="stata" inline>levelslist region state , val(region)</syntaxhighlight> and get the results we wanted.<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==<br />
<br />
Stata has a set of <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands that can be used to store information temporarily. This functionality</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7878Stata Coding Practices: Programming (Ado-files)2021-01-13T15:31:05Z<p>Bbdaniels: </p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
There are several key features to note here. First, we write <syntaxhighlight lang="stata" inline>anything</syntaxhighlight> in the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command to allow the user to write absolutely anything they like as the arguments to be passed into the program. By default, this is assigned to the string local <syntaxhighlight lang="stata" inline>`anything'</syntaxhighlight> and can be recovered throughout the program. Recall that local macros in Stata have strictly local scope; in this case, that means locals from the calling do-file will not be passed into the program, and locals from the program will not be passed back into the calling do-file.<br />
<br />
Second, we write <syntaxhighlight lang="stata" inline>[if]</syntaxhighlight> in brackets to declare that the user can optionally declare an if-restriction to the command. This does nothing on its own: it simply creates another local string macro called <syntaxhighlight lang="stata" inline>`if'</syntaxhighlight> containing the restriction. However, Stata provides the implementation shortcut <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> to implement this restriction. By calling <syntaxhighlight lang="stata" inline>marksample touse</syntaxhighlight>, Stata creates a temporary variable <syntaxhighlight lang="stata" inline>`touse'</syntaxhighlight> for every observation indicating whether it satisfies the if-restriction or not. <br />
<br />
Then, the if-restriction must be applied: we can <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> the data and then <syntaxhighlight lang="stata" inline>drop</syntaxhighlight> the ineligible observations before running more code. This is an appropriate choice here for several reasons: <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> will always restore the data to the original state at the end of program execution, no matter what happens later in the program, due to its scope; <syntaxhighlight lang="stata" inline>restore</syntaxhighlight> is not even needed here. For this reason, we will often only use <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> in this context in programming, and prefer other methods for loading and re-loading data inside the program block.<br />
<br />
Now, we can run commands like:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist foreign<br />
levelslist foreign make if foreign == 1<br />
<br />
sysuse census.dta<br />
levelslist region<br />
levelslist state if region == 1<br />
</syntaxhighlight><br />
<br />
Other <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> elements work similarly, although they are not parsed through <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> (except <syntaxhighlight lang="stata" inline>in</syntaxhighlight>). The <syntaxhighlight lang="stata" inline>using</syntaxhighlight> syntax is typically used to target a file on the operating system; when you want to import or export data this is the feature of choice, and you should always test and implement it with compound double quotes (for example, <syntaxhighlight lang="stata" inline>`" `using' "'</syntaxhighlight>) and determine whether or not you want to pass <syntaxhighlight lang="stata" inline>using</syntaxhighlight> itself into the <syntaxhighlight lang="stata" inline>`using'</syntaxhighlight> macro by writing <syntaxhighlight lang="stata" inline>[using/]</syntaxhighlight> instead. See the helpfile for details.<br />
<br />
Finally, the options syntax allows optional triggers to be implemented. Let's allow the user to request value labels, by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement value label option if specified<br />
if "`valuelabels'" != "" {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
When the <syntaxhighlight lang="stata" inline>valuelabels</syntaxhighlight> option is specified (using either <syntaxhighlight lang="stata" inline>, val</syntaxhighlight> as an allowed abbreviation by the capitalization or writing out its full name), the <syntaxhighlight lang="stata" inline>`valuelabels'</syntaxhighlight> macro will contain <syntaxhighlight lang="stata" inline>"valuelabels"</syntaxhighlight>. Otherwise it will be empty. Therefore simple conditionals allow options to be checked and executed. Now we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse census.dta<br />
levelslist region , val<br />
</syntaxhighlight><br />
<br />
and we would get:<br />
<br />
<syntaxhighlight lang="stata"><br />
Levels of region: Census region<br />
1: NE<br />
2: N Cntrl<br />
3: South<br />
4: West<br />
</syntaxhighlight><br />
<br />
However, we can see that the command would then fail if we ran <syntaxhighlight lang="stata" inline>levelslist region state , val</syntaxhighlight>, because <syntaxhighlight lang="stata" inline>state</syntaxhighlight> is a string variable and cannot have labels. So we might want to allow the user to specify a list of variables to show labels for, as the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels(string asis)]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement valuelabels option<br />
local thisLabel ""<br />
if strpos(" `valuelabels' "," `var' ") >= 1 {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
<br />
// Display value (and label if requested)<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
Because we now allow the option as <syntaxhighlight lang="stata" inline>[VALuelabels(string asis)]</syntaxhighlight>, it will either contain the string written into the option or it will contain nothing. We need to rewrite the implementation slightly. First, we need to reset <syntaxhighlight lang="stata" inline>`thisLabel'</syntaxhighlight> so it is emptied whenever it does not apply. Second, we need to use a tool like <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight> to check if a variable occurs in the list - when we write the helpfile, we will make clear that this option needs to take a list of variables. Is is possible to require this through the options syntax itself but it can introduce issues (if, for example, the command first loads data, a <syntaxhighlight lang="stata" inline>varlist</syntaxhighlight> check might fail on the data currently in memory). In this kind of operation, it is doubly clear that the full names of variables need to be used (to avoid needing to pull in commands like <syntaxhighlight lang="stata" inline>unab</syntaxhighlight>). Also, note the use of extra spacing around both arguments of <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight>; these ensures that variables whose name are a substring of another do not trigger the option. Now, we can run <syntaxhighlight lang="stata" inline>levelslist region state , val(region)</syntaxhighlight> and get the results we wanted.<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==<br />
<br />
Stata has a set of <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands that can be used to store information temporarily.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7776Stata Coding Practices: Programming (Ado-files)2020-11-25T15:30:58Z<p>Bbdaniels: /* The syntax command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
There are several key features to note here. First, we write <syntaxhighlight lang="stata" inline>anything</syntaxhighlight> in the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command to allow the user to write absolutely anything they like as the arguments to be passed into the program. By default, this is assigned to the string local <syntaxhighlight lang="stata" inline>`anything'</syntaxhighlight> and can be recovered throughout the program. Recall that local macros in Stata have strictly local scope; in this case, that means locals from the calling do-file will not be passed into the program, and locals from the program will not be passed back into the calling do-file.<br />
<br />
Second, we write <syntaxhighlight lang="stata" inline>[if]</syntaxhighlight> in brackets to declare that the user can optionally declare an if-restriction to the command. This does nothing on its own: it simply creates another local string macro called <syntaxhighlight lang="stata" inline>`if'</syntaxhighlight> containing the restriction. However, Stata provides the implementation shortcut <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> to implement this restriction. By calling <syntaxhighlight lang="stata" inline>marksample touse</syntaxhighlight>, Stata creates a temporary variable <syntaxhighlight lang="stata" inline>`touse'</syntaxhighlight> for every observation indicating whether it satisfies the if-restriction or not. <br />
<br />
Then, the if-restriction must be applied: we can <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> the data and then <syntaxhighlight lang="stata" inline>drop</syntaxhighlight> the ineligible observations before running more code. This is an appropriate choice here for several reasons: <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> will always restore the data to the original state at the end of program execution, no matter what happens later in the program, due to its scope; <syntaxhighlight lang="stata" inline>restore</syntaxhighlight> is not even needed here. For this reason, we will often only use <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> in this context in programming, and prefer other methods for loading and re-loading data inside the program block.<br />
<br />
Now, we can run commands like:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist foreign<br />
levelslist foreign make if foreign == 1<br />
<br />
sysuse census.dta<br />
levelslist region<br />
levelslist state if region == 1<br />
</syntaxhighlight><br />
<br />
Other <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> elements work similarly, although they are not parsed through <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> (except <syntaxhighlight lang="stata" inline>in</syntaxhighlight>). The <syntaxhighlight lang="stata" inline>using</syntaxhighlight> syntax is typically used to target a file on the operating system; when you want to import or export data this is the feature of choice, and you should always test and implement it with compound double quotes (for example, <syntaxhighlight lang="stata" inline>`" `using' "'</syntaxhighlight>) and determine whether or not you want to pass <syntaxhighlight lang="stata" inline>using</syntaxhighlight> itself into the <syntaxhighlight lang="stata" inline>`using'</syntaxhighlight> macro by writing <syntaxhighlight lang="stata" inline>[using/]</syntaxhighlight> instead. See the helpfile for details.<br />
<br />
Finally, the options syntax allows optional triggers to be implemented. Let's allow the user to request value labels, by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement value label option if specified<br />
if "`valuelabels'" != "" {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
When the <syntaxhighlight lang="stata" inline>valuelabels</syntaxhighlight> option is specified (using either <syntaxhighlight lang="stata" inline>, val</syntaxhighlight> as an allowed abbreviation by the capitalization or writing out its full name), the <syntaxhighlight lang="stata" inline>`valuelabels'</syntaxhighlight> macro will contain <syntaxhighlight lang="stata" inline>"valuelabels"</syntaxhighlight>. Otherwise it will be empty. Therefore simple conditionals allow options to be checked and executed. Now we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse census.dta<br />
levelslist region , val<br />
</syntaxhighlight><br />
<br />
and we would get:<br />
<br />
<syntaxhighlight lang="stata"><br />
Levels of region: Census region<br />
1: NE<br />
2: N Cntrl<br />
3: South<br />
4: West<br />
</syntaxhighlight><br />
<br />
However, we can see that the command would then fail if we ran <syntaxhighlight lang="stata" inline>levelslist region state , val</syntaxhighlight>, because <syntaxhighlight lang="stata" inline>state</syntaxhighlight> is a string variable and cannot have labels. So we might want to allow the user to specify a list of variables to show labels for, as the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels(string asis)]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement valuelabels option<br />
local thisLabel ""<br />
if strpos(" `valuelabels' "," `var' ") >= 1 {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
<br />
// Display value (and label if requested)<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
Because we now allow the option as <syntaxhighlight lang="stata" inline>[VALuelabels(string asis)]</syntaxhighlight>, it will either contain the string written into the option or it will contain nothing. We need to rewrite the implementation slightly. First, we need to reset <syntaxhighlight lang="stata" inline>`thisLabel'</syntaxhighlight> so it is emptied whenever it does not apply. Second, we need to use a tool like <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight> to check if a variable occurs in the list - when we write the helpfile, we will make clear that this option needs to take a list of variables. Is is possible to require this through the options syntax itself but it can introduce issues (if, for example, the command first loads data, a <syntaxhighlight lang="stata" inline>varlist</syntaxhighlight> check might fail on the data currently in memory). In this kind of operation, it is doubly clear that the full names of variables need to be used (to avoid needing to pull in commands like <syntaxhighlight lang="stata" inline>unab</syntaxhighlight>). Also, note the use of extra spacing around both arguments of <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight>; these ensures that variables whose name are a substring of another do not trigger the option. Now, we can run <syntaxhighlight lang="stata" inline>levelslist region state , val(region)</syntaxhighlight> and get the results we wanted.<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7775Stata Coding Practices: Programming (Ado-files)2020-11-25T15:26:14Z<p>Bbdaniels: /* The syntax command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
There are several key features to note here. First, we write <syntaxhighlight lang="stata" inline>anything</syntaxhighlight> in the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command to allow the user to write absolutely anything they like as the arguments to be passed into the program. By default, this is assigned to the string local <syntaxhighlight lang="stata" inline>`anything'</syntaxhighlight> and can be recovered throughout the program. Recall that local macros in Stata have strictly local scope; in this case, that means locals from the calling do-file will not be passed into the program, and locals from the program will not be passed back into the calling do-file.<br />
<br />
Second, we write <syntaxhighlight lang="stata" inline>[if]</syntaxhighlight> in brackets to declare that the user can optionally declare an if-restriction to the command. This does nothing on its own: it simply creates another local string macro called <syntaxhighlight lang="stata" inline>`if'</syntaxhighlight> containing the restriction. However, Stata provides the implementation shortcut <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> to implement this restriction. By calling <syntaxhighlight lang="stata" inline>marksample touse</syntaxhighlight>, Stata creates a temporary variable <syntaxhighlight lang="stata" inline>`touse'</syntaxhighlight> for every observation indicating whether it satisfies the if-restriction or not. <br />
<br />
Then, the if-restriction must be applied: we can <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> the data and then <syntaxhighlight lang="stata" inline>drop</syntaxhighlight> the ineligible observations before running more code. This is an appropriate choice here for several reasons: <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> will always restore the data to the original state at the end of program execution, no matter what happens later in the program, due to its scope; <syntaxhighlight lang="stata" inline>restore</syntaxhighlight> is not even needed here. For this reason, we will often only use <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> in this context in programming, and prefer other methods for loading and re-loading data inside the program block.<br />
<br />
Now, we can run commands like:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist foreign<br />
levelslist foreign make if foreign == 1<br />
<br />
sysuse census.dta<br />
levelslist region<br />
levelslist state if region == 1<br />
</syntaxhighlight><br />
<br />
Other <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> elements work similarly, although they are not parsed through <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> (except <syntaxhighlight lang="stata" inline>in</syntaxhighlight>). The <syntaxhighlight lang="stata" inline>using</syntaxhighlight> syntax is typically used to target a file on the operating system; when you want to import or export data this is the feature of choice, and you should always test and implement it with compound double quotes (for example, <syntaxhighlight lang="stata" inline>`" `using' "'</syntaxhighlight>) and determine whether or not you want to pass <syntaxhighlight lang="stata" inline>using</syntaxhighlight> itself into the <syntaxhighlight lang="stata" inline>`using'</syntaxhighlight> macro by writing <syntaxhighlight lang="stata" inline>[using/]</syntaxhighlight> instead. See the helpfile for details.<br />
<br />
Finally, the options syntax allows optional triggers to be implemented. Let's allow the user to request value labels, by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement value label option if specified<br />
if "`valuelabels'" != "" {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
When the <syntaxhighlight lang="stata" inline>valuelabels</syntaxhighlight> option is specified (using either <syntaxhighlight lang="stata" inline>, val</syntaxhighlight> as an allowed abbreviation by the capitalization or writing out its full name), the <syntaxhighlight lang="stata" inline>`valuelabels'</syntaxhighlight> macro will contain <syntaxhighlight lang="stata" inline>"valuelabels"</syntaxhighlight>. Otherwise it will be empty. Therefore simple conditionals allow options to be checked and executed. Now we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse census.dta<br />
levelslist region , val<br />
</syntaxhighlight><br />
<br />
and we would get:<br />
<br />
<syntaxhighlight lang="stata"><br />
Levels of region: Census region<br />
1: NE<br />
2: N Cntrl<br />
3: South<br />
4: West<br />
</syntaxhighlight><br />
<br />
However, we can see that the command would then fail if we ran <syntaxhighlight lang="stata" inline>levelslist region state , val</syntaxhighlight>, because <syntaxhighlight lang="stata" inline>state</syntaxhighlight> is a string variable and cannot have labels. So we might want to allow the user to specify a list of variables to show labels for, as the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels(string asis)]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement valuelabels option<br />
local thisLabel ""<br />
if strpos(" `valuelabels' "," `var' ") >= 1 {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
Because we now allow the option as <syntaxhighlight lang="stata" inline>[VALuelabels(string asis)]</syntaxhighlight>, it will either contain the string written into the option or it will contain nothing. We need to rewrite the implementation slightly. First, we need to reset <syntaxhighlight lang="stata" inline>`thisLabel'</syntaxhighlight> so it is emptied whenever it does not apply. Second, we need to use a tool like <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight> to check if a variable occurs in the list - when we write the helpfile, we will make clear that this option needs to take a list of variables. Is is possible to require this through the options syntax itself but it can introduce issues (if, for example, the command first loads data, a <syntaxhighlight lang="stata" inline>varlist</syntaxhighlight> check might fail on the data currently in memory). In this kind of operation, it is doubly clear that the full names of variables need to be used (to avoid needing to pull in commands like <syntaxhighlight lang="stata" inline>unab</syntaxhighlight>). Also, note the use of extra spacing around both arguments of <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight>; these ensures that variables whose name are a substring of another do not trigger the option. Now, we can run <syntaxhighlight lang="stata" inline>levelslist region state , val(region)</syntaxhighlight> and get the results we wanted.<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7774Stata Coding Practices: Programming (Ado-files)2020-11-24T22:43:44Z<p>Bbdaniels: /* The program command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
There are several key features to note here. First, we write <syntaxhighlight lang="stata" inline>anything</syntaxhighlight> in the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command to allow the user to write absolutely anything they like as the arguments to be passed into the program. By default, this is assigned to the string local <syntaxhighlight lang="stata" inline>`anything'</syntaxhighlight> and can be recovered throughout the program. Recall that local macros in Stata have strictly local scope; in this case, that means locals from the calling do-file will not be passed into the program, and locals from the program will not be passed back into the calling do-file.<br />
<br />
Second, we write <syntaxhighlight lang="stata" inline>[if]</syntaxhighlight> in brackets to declare that the user can optionally declare an if-restriction to the command. This does nothing on its own: it simply creates another local string macro called <syntaxhighlight lang="stata" inline>`if'</syntaxhighlight> containing the restriction. However, Stata provides the implementation shortcut <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> to implement this restriction. By calling <syntaxhighlight lang="stata" inline>marksample touse</syntaxhighlight>, Stata creates a temporary variable <syntaxhighlight lang="stata" inline>`touse'</syntaxhighlight> for every observation indicating whether it satisfies the if-restriction or not. <br />
<br />
Then, the if-restriction must be applied: we can <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> the data and then <syntaxhighlight lang="stata" inline>drop</syntaxhighlight> the ineligible observations before running more code. This is an appropriate choice here for several reasons: <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> will always restore the data to the original state at the end of program execution, no matter what happens later in the program, due to its scope; <syntaxhighlight lang="stata" inline>restore</syntaxhighlight> is not even needed here. For this reason, we will often only use <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> in this context in programming, and prefer other methods for loading and re-loading data inside the program block.<br />
<br />
Now, we can run commands like:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist foreign<br />
levelslist foreign make if foreign == 1<br />
<br />
sysuse census.dta<br />
levelslist region<br />
levelslist state if region == 1<br />
</syntaxhighlight><br />
<br />
Other <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> elements work similarly, although they are not parsed through <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> (except <syntaxhighlight lang="stata" inline>in</syntaxhighlight>). The <syntaxhighlight lang="stata" inline>using</syntaxhighlight> syntax is typically used to target a file on the operating system; when you want to import or export data this is the feature of choice, and you should always test and implement it with compound double quotes (for example, <syntaxhighlight lang="stata" inline>`" `using' "'</syntaxhighlight>) and determine whether or not you want to pass <syntaxhighlight lang="stata" inline>using</syntaxhighlight> itself into the <syntaxhighlight lang="stata" inline>`using'</syntaxhighlight> macro by writing <syntaxhighlight lang="stata" inline>[using/]</syntaxhighlight> instead. See the helpfile for details.<br />
<br />
Finally, the options syntax allows optional triggers to be implemented. Let's allow the user to request value labels, by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement value label option if specified<br />
if "`valuelabels'" != "" {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
When the <syntaxhighlight lang="stata" inline>valuelabels</syntaxhighlight> option is specified (using either <syntaxhighlight lang="stata" inline>, val</syntaxhighlight> as an allowed abbreviation by the capitalization or writing out its full name), the <syntaxhighlight lang="stata" inline>`valuelabels'</syntaxhighlight> macro will contain <syntaxhighlight lang="stata" inline>"valuelabels"</syntaxhighlight>. Otherwise it will be empty. Therefore simple conditionals allow options to be checked and executed. Now we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse census.dta<br />
levelslist region , val<br />
</syntaxhighlight><br />
<br />
and we would get:<br />
<br />
<syntaxhighlight lang="stata"><br />
Levels of region: Census region<br />
1: NE<br />
2: N Cntrl<br />
3: South<br />
4: West<br />
</syntaxhighlight><br />
<br />
However, we can see that the command would then fail if we ran <syntaxhighlight lang="stata" inline>levelslist region state , val</syntaxhighlight>, because <syntaxhighlight lang="stata" inline>state</syntaxhighlight> is a string variable and cannot have labels. So we might want to allow the user to specify a list of variables to show labels for, as the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels(string asis)]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement valuelabels option<br />
local thisLabel ""<br />
if strpos(" `valuelabels' "," `var' ") >= 1 {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
Because we now allow the option as <syntaxhighlight lang="stata" inline>[VALuelabels(string asis)]</syntaxhighlight>, it will either contain the string written into the option or it will contain nothing. We need to rewrite the implementation slightly. Sirst, we need to reset <syntaxhighlight lang="stata" inline>`thisLabel'</syntaxhighlight> so it is emptied whenever it does not apply Second, we need to use a tool like <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight> to check if a variable occurs in a list. In this kind of operation, it is doubly clear that the full names of variables need to be used. Also, note the use of extra spacing around both arguments of <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight>; these ensures that variables whose name are a substring of another do not trigger the option. Now, we can run <syntaxhighlight lang="stata" inline>levelslist region state , val(region)</syntaxhighlight> and get the results we wanted.<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7773Stata Coding Practices: Programming (Ado-files)2020-11-24T22:19:00Z<p>Bbdaniels: /* The syntax command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
There are several key features to note here. First, we write <syntaxhighlight lang="stata" inline>anything</syntaxhighlight> in the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command to allow the user to write absolutely anything they like as the arguments to be passed into the program. By default, this is assigned to the string local <syntaxhighlight lang="stata" inline>`anything'</syntaxhighlight> and can be recovered throughout the program. Recall that local macros in Stata have strictly local scope; in this case, that means locals from the calling do-file will not be passed into the program, and locals from the program will not be passed back into the calling do-file.<br />
<br />
Second, we write <syntaxhighlight lang="stata" inline>[if]</syntaxhighlight> in brackets to declare that the user can optionally declare an if-restriction to the command. This does nothing on its own: it simply creates another local string macro called <syntaxhighlight lang="stata" inline>`if'</syntaxhighlight> containing the restriction. However, Stata provides the implementation shortcut <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> to implement this restriction. By calling <syntaxhighlight lang="stata" inline>marksample touse</syntaxhighlight>, Stata creates a temporary variable <syntaxhighlight lang="stata" inline>`touse'</syntaxhighlight> for every observation indicating whether it satisfies the if-restriction or not. <br />
<br />
Then, the if-restriction must be applied: we can <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> the data and then <syntaxhighlight lang="stata" inline>drop</syntaxhighlight> the ineligible observations before running more code. This is an appropriate choice here for several reasons: <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> will always restore the data to the original state at the end of program execution, no matter what happens later in the program, due to its scope; <syntaxhighlight lang="stata" inline>restore</syntaxhighlight> is not even needed here. For this reason, we will often only use <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> in this context in programming, and prefer other methods for loading and re-loading data inside the program block.<br />
<br />
Now, we can run commands like:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist foreign<br />
levelslist foreign make if foreign == 1<br />
<br />
sysuse census.dta<br />
levelslist region<br />
levelslist state if region == 1<br />
</syntaxhighlight><br />
<br />
Other <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> elements work similarly, although they are not parsed through <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> (except <syntaxhighlight lang="stata" inline>in</syntaxhighlight>). The <syntaxhighlight lang="stata" inline>using</syntaxhighlight> syntax is typically used to target a file on the operating system; when you want to import or export data this is the feature of choice, and you should always test and implement it with compound double quotes (for example, <syntaxhighlight lang="stata" inline>`" `using' "'</syntaxhighlight>) and determine whether or not you want to pass <syntaxhighlight lang="stata" inline>using</syntaxhighlight> itself into the <syntaxhighlight lang="stata" inline>`using'</syntaxhighlight> macro by writing <syntaxhighlight lang="stata" inline>[using/]</syntaxhighlight> instead. See the helpfile for details.<br />
<br />
Finally, the options syntax allows optional triggers to be implemented. Let's allow the user to request value labels, by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement value label option if specified<br />
if "`valuelabels'" != "" {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
When the <syntaxhighlight lang="stata" inline>valuelabels</syntaxhighlight> option is specified (using either <syntaxhighlight lang="stata" inline>, val</syntaxhighlight> as an allowed abbreviation by the capitalization or writing out its full name), the <syntaxhighlight lang="stata" inline>`valuelabels'</syntaxhighlight> macro will contain <syntaxhighlight lang="stata" inline>"valuelabels"</syntaxhighlight>. Otherwise it will be empty. Therefore simple conditionals allow options to be checked and executed. Now we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse census.dta<br />
levelslist region , val<br />
</syntaxhighlight><br />
<br />
and we would get:<br />
<br />
<syntaxhighlight lang="stata"><br />
Levels of region: Census region<br />
1: NE<br />
2: N Cntrl<br />
3: South<br />
4: West<br />
</syntaxhighlight><br />
<br />
However, we can see that the command would then fail if we ran <syntaxhighlight lang="stata" inline>levelslist region state , val</syntaxhighlight>, because <syntaxhighlight lang="stata" inline>state</syntaxhighlight> is a string variable and cannot have labels. So we might want to allow the user to specify a list of variables to show labels for, as the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels(string asis)]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement valuelabels option<br />
local thisLabel ""<br />
if strpos(" `valuelabels' "," `var' ") >= 1 {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
Because we now allow the option as <syntaxhighlight lang="stata" inline>[VALuelabels(string asis)]</syntaxhighlight>, it will either contain the string written into the option or it will contain nothing. We need to rewrite the implementation slightly. Sirst, we need to reset <syntaxhighlight lang="stata" inline>`thisLabel'</syntaxhighlight> so it is emptied whenever it does not apply Second, we need to use a tool like <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight> to check if a variable occurs in a list. In this kind of operation, it is doubly clear that the full names of variables need to be used. Also, note the use of extra spacing around both arguments of <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight>; these ensures that variables whose name are a substring of another do not trigger the option. Now, we can run <syntaxhighlight lang="stata" inline>levelslist region state , val(region)</syntaxhighlight> and get the results we wanted.<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7772Stata Coding Practices: Programming (Ado-files)2020-11-24T22:16:02Z<p>Bbdaniels: /* The syntax command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
There are several key features to note here. First, we write <syntaxhighlight lang="stata" inline>anything</syntaxhighlight> in the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command to allow the user to write absolutely anything they like as the arguments to be passed into the program. By default, this is assigned to the string local <syntaxhighlight lang="stata" inline>`anything'</syntaxhighlight> and can be recovered throughout the program. Recall that local macros in Stata have strictly local scope; in this case, that means locals from the calling do-file will not be passed into the program, and locals from the program will not be passed back into the calling do-file.<br />
<br />
Second, we write <syntaxhighlight lang="stata" inline>[if]</syntaxhighlight> in brackets to declare that the user can optionally declare an if-restriction to the command. This does nothing on its own: it simply creates another local string macro called <syntaxhighlight lang="stata" inline>`if'</syntaxhighlight> containing the restriction. However, Stata provides the implementation shortcut <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> to implement this restriction. By calling <syntaxhighlight lang="stata" inline>marksample touse</syntaxhighlight>, Stata creates a temporary variable <syntaxhighlight lang="stata" inline>`touse'</syntaxhighlight> for every observation indicating whether it satisfies the if-restriction or not. <br />
<br />
Then, the if-restriction must be applied: we can <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> the data and then <syntaxhighlight lang="stata" inline>drop</syntaxhighlight> the ineligible observations before running more code. This is an appropriate choice here for several reasons: <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> will always restore the data to the original state at the end of program execution, no matter what happens later in the program, due to its scope; <syntaxhighlight lang="stata" inline>restore</syntaxhighlight> is not even needed here. For this reason, we will often only use <syntaxhighlight lang="stata" inline>preserve</syntaxhighlight> in this context in programming, and prefer other methods for loading and re-loading data inside the program block.<br />
<br />
Now, we can run commands like:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
levelslist foreign<br />
levelslist foreign make if foreign == 1<br />
<br />
sysuse census.dta<br />
levelslist region<br />
levelslist state if region == 1<br />
</syntaxhighlight><br />
<br />
Other <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> elements work similarly, although they are not parsed through <syntaxhighlight lang="stata" inline>marksample</syntaxhighlight> (except <syntaxhighlight lang="stata" inline>in</syntaxhighlight>). The <syntaxhighlight lang="stata" inline>using</syntaxhighlight> syntax is typically used to target a file on the operating system; when you want to import or export data this is the feature of choice, and you should always test and implement it with compound double quotes (for example, <syntaxhighlight lang="stata" inline>`" `using' "'</syntaxhighlight>) and determine whether or not you want to pass <syntaxhighlight lang="stata" inline>using</syntaxhighlight> itself into the <syntaxhighlight lang="stata" inline>`using'</syntaxhighlight> macro by writing <syntaxhighlight lang="stata" inline>[using/]</syntaxhighlight> instead. See the helpfile for details.<br />
<br />
Finally, the options syntax allows optional triggers to be implemented. Let's allow the user to request value labels, by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement value label option if specified<br />
if "`valuelabels'" != "" {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
When the <syntaxhighlight lang="stata" inline>valuelabels</syntaxhighlight> option is specified (using either <syntaxhighlight lang="stata" inline>, val</syntaxhighlight> as an allowed abbreviation by the capitalization or writing out its full name), the <syntaxhighlight lang="stata" inline>`valuelabels'</syntaxhighlight> macro will contain <syntaxhighlight lang="stata" inline>"valuelabels"</syntaxhighlight>. Otherwise it will be empty. Therefore simple conditionals allow options to be checked and executed. Now we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse census.dta<br />
levelslist region , val<br />
</syntaxhighlight><br />
<br />
and we would get:<br />
<br />
<syntaxhighlight lang="stata"><br />
Levels of region: Census region<br />
1: NE<br />
2: N Cntrl<br />
3: South<br />
4: West<br />
</syntaxhighlight><br />
<br />
However, we can see that the command would then fail if we ran <syntaxhighlight lang="stata" inline>levelslist region state , val</syntaxhighlight>, because <syntaxhighlight lang="stata" inline>state</syntaxhighlight> is a string variable and cannot have levels. So we might want to allow the user to specify a list of variables to show labels for, as the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if] , [VALuelabels(string asis)]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
// Implement valuelabels option<br />
local thisLabel ""<br />
if strpos(" `valuelabels' "," `var' ") >= 1 {<br />
local thisLabel : label (`var') `word'<br />
local thisLabel = ": `thisLabel'"<br />
}<br />
di " `word'`thisLabel'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
Because we now allow the option as <syntaxhighlight lang="stata" inline>[VALuelabels(string asis)]</syntaxhighlight>, it will either contain the string written into the option or it will contain nothing. We need to rewrite the implementation slightly. Sirst, we need to reset <syntaxhighlight lang="stata" inline>`thisLabel'</syntaxhighlight> so it is emptied whenever it does not apply Second, we need to use a tool like <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight> to check if a variable occurs in a list. In this kind of operation, it is doubly clear that the full names of variables need to be used. Also, note the use of extra spacing around both arguments of <syntaxhighlight lang="stata" inline>strpos()</syntaxhighlight>; these ensures that variables whose name are a substring of another do not trigger the option. Now, we can run <syntaxhighlight lang="stata" inline>levelslist region state , val(region)</syntaxhighlight> and get the results we wanted.<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7771Stata Coding Practices: Programming (Ado-files)2020-11-24T21:43:46Z<p>Bbdaniels: /* The syntax command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command takes a program block and allows its inputs to be customized based on the context it is being executed in. The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command enables all the main features of Stata that appear in ordinary commands, including input lists (such as variable lists or file names), <syntaxhighlight lang="stata" inline>if</syntaxhighlight> and <syntaxhighlight lang="stata" inline>in</syntaxhighlight> restrictions, <syntaxhighlight lang="stata" inline>using</syntaxhighlight> targets, <syntaxhighlight lang="stata" inline>=</syntaxhighlight> applications, weights, and options (after the option comma in the command).<br />
<br />
The help file for the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command is extensive and allows lots of automated checks and advanced features, particularly for modern features like factor variables and time series (<syntaxhighlight lang="stata" inline>fv</syntaxhighlight> and <syntaxhighlight lang="stata" inline>ts</syntaxhighlight>). For advanced applications, always consult the <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> help file to see how to accomplish your objective. For now, we will take a simple tour of how <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> creates an adaptive command.<br />
<br />
First, let's add simple syntax allowing the user to select the variables and observations they want to include. We might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
syntax anything [if]<br />
<br />
// Implement [if]<br />
preserve<br />
marksample touse<br />
qui keep if `touse' == 1<br />
<br />
// Main program loops<br />
foreach var of varlist `anything' {<br />
qui levelsof `var' , local(levels)<br />
di " "<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7770Stata Coding Practices: Programming (Ado-files)2020-11-24T20:49:07Z<p>Bbdaniels: </p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.<br />
<br />
==The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command==<br />
<br />
==The <syntaxhighlight lang="stata" inline>temp</syntaxhighlight> commands==</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7769Stata Coding Practices: Programming (Ado-files)2020-11-24T20:46:41Z<p>Bbdaniels: /* The program command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a starbang line and some comments with some metadata about the code. The full file would look something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7768Stata Coding Practices: Programming (Ado-files)2020-11-24T20:44:35Z<p>Bbdaniels: /* The program command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a hashbang line with some metadata about the code. The full file would looks something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
The file would then just need to be run using <syntaxhighlight lang="stata" inline>run levelslist.ado</syntaxhighlight> in the runfile for the reproducibility package to ensure that the command <syntaxhighlight lang="stata" inline>levelslist</syntaxhighlight> would be available to all do-files in that package (since programs have a global scope in Stata). However, this command is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels. The next section will introduce code that allows such commands to be customizable within each context you want to use them.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7767Stata Coding Practices: Programming (Ado-files)2020-11-24T20:39:54Z<p>Bbdaniels: /* The program command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight><br />
<br />
All this is to say is that Stata has taken the command <syntaxhighlight lang="stata" inline>reg price mpg i.foreign</syntaxhighlight> and will execute it whenever <syntaxhighlight lang="stata" inline>autoreg</syntaxhighlight> is run as if it were an ordinary command.<br />
<br />
As a first extension, we might try writing a command that is not dependent on the data, such as one that would list all the values of each variable for us. Such a program might look like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
foreach var of varlist * {<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
We could then run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
Similarly, we could use any other dataset in place of <syntaxhighlight lang="stata" inline>auto.dta</syntaxhighlight>. This means we would now have a useful piece of code that we could execute with any dataset open, without re-writing what is a mildly complex loop each time. When we want to save such a snippet, we usually write an ado-file: we name the file <syntaxhighlight lang="stata" inline>levelslist.ado</syntaxhighlight> and we add a hashbang line with some metadata about the code. The full file would looks something like this:<br />
<br />
<syntaxhighlight lang="stata"><br />
*! Version 0.1 published 24 November 2020<br />
*! by Benjamin Daniels bbdaniels@gmail.com<br />
<br />
// A program to print all levels of variables<br />
cap prog drop levelslist<br />
prog def levelslist<br />
<br />
// Loop over variables<br />
foreach var of varlist * {<br />
<br />
// Get levels and display name and label of variable<br />
qui levelsof `var' , local(levels)<br />
di "Levels of `var': `: var label `var''"<br />
<br />
// Print the value of each level for the current variable<br />
foreach word in `levels' {<br />
di " `word'"<br />
}<br />
<br />
}<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
However it is not very useful at this stage: it outputs far too much useless information, particularly when variables take integer or continuous values with many levels.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7766Stata Coding Practices: Programming (Ado-files)2020-11-24T20:21:28Z<p>Bbdaniels: /* The program command */</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==<br />
<br />
The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command defines the scope of a Stata program inside a do-file or ado-file. When a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command block is executed, Stata stores (until the end of the session) the sequence of commands written inside the block and assigns them to the command name used in the <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command. Using <syntaxhighlight lang="stata" inline>program drop</syntaxhighlight> before the block will ensure that the command space is available. For example, we might write the following program in an ordinary do-file:<br />
<br />
<syntaxhighlight lang="stata"><br />
cap prog drop<br />
prog def autoreg<br />
<br />
reg price mpg i.foreign<br />
<br />
end<br />
</syntaxhighlight><br />
<br />
After executing this command block (note that <syntaxhighlight lang="stata" inline>end</syntaxhighlight> tells Stata where to stop reading), we could run:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
autoreg<br />
</syntaxhighlight><br />
<br />
If we did this, Stata would output:<br />
<br />
<syntaxhighlight lang="stata"><br />
. autoreg<br />
<br />
Source | SS df MS Number of obs = 74<br />
-------------+---------------------------------- F(2, 71) = 14.07<br />
Model | 180261702 2 90130850.8 Prob > F = 0.0000<br />
Residual | 454803695 71 6405685.84 R-squared = 0.2838<br />
-------------+---------------------------------- Adj R-squared = 0.2637<br />
Total | 635065396 73 8699525.97 Root MSE = 2530.9<br />
<br />
------------------------------------------------------------------------------<br />
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494<br />
|<br />
foreign |<br />
Foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368<br />
_cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67<br />
------------------------------------------------------------------------------<br />
</syntaxhighlight></div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7765Stata Coding Practices: Programming (Ado-files)2020-11-24T19:24:46Z<p>Bbdaniels: </p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile); and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command==</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Programming_(Ado-files)&diff=7764Stata Coding Practices: Programming (Ado-files)2020-11-24T19:23:53Z<p>Bbdaniels: Created page with "Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code i..."</p>
<hr />
<div>Programs and ado-files are the main methods by which Stata code is condensed and generalized. By writing versions of code that apply to arbitrary inputs and saving that code in a separate file, the application of the code is cleaner in the main do-file and it becomes easier to re-use the same analytical process on other datasets in the future. Stata has special commands that enable this functionality. All commands on SSC are written as ado-files by other programmers; it is also possible to embed programs in ordinary do-files to save space and improve organization of code.<br />
<br />
==Read First==<br />
<br />
This article will refer somewhat interchangeably to the concepts of "programming", "ado-files", and "user-written commands". This is in contrast to ordinary programming of do-files. The article does not assume that you are actually writing an ado-file (as opposed to a <syntaxhighlight lang="stata" inline>program</syntaxhighlight> definition in an ordinary dofile; and it does not assume you are writing a command for distribution. That said, Stata programming functionality is achieved using several core features:<br />
<br />
* The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command sets up the code environment for writing a program into memory.<br />
* The <syntaxhighlight lang="stata" inline>syntax</syntaxhighlight> command parses inputs into a program as macros that can be used within the scope of that program execution.<br />
* The <syntaxhighlight lang="stata" inline>tempvar</syntaxhighlight>, <syntaxhighlight lang="stata" inline>tempfile</syntaxhighlight>, and <syntaxhighlight lang="stata" inline>tempname</syntaxhighlight> commands all create objects that can be used within the scope of program execution to avoid any conflict with arbitrary data structures.<br />
<br />
==The <syntaxhighlight lang="stata" inline>program</syntaxhighlight> command</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7719Stata Coding Practices: Visualization2020-11-18T22:25:00Z<p>Bbdaniels: /* Graphics options */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png" , width(4000)<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. The main strength of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command used in this way is that it uses the <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax to rapidly calculate many possible statistics for any number of variables. The <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> and <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> options provide flexibility to do any desired subgrouping of the results.<br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The main shortcoming of this command is that it provides little customization of the actual display of the results, such as combining various statistics. For example, it cannot combine the <syntaxhighlight lang="stata" inline>(mean)</syntaxhighlight> and <syntaxhighlight lang="stata" inline>(sem)</syntaxhighlight> options in different styles such that a bar graph with confidence intervals would be produced. (You might try <syntaxhighlight lang="stata" inline>betterbar</syntaxhighlight>, available from SSC, for that.) Similarly, multiple variables with very different scales may not be possible to display in the same graphic easily, and numerical variables which have non-numerical interpretations - such as dates or labelled variables - may not be easily or correctly handled as intended without extensive manipulation.<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command (often abbreviated <syntaxhighlight lang="stata" inline>tw</syntaxhighlight>) enables many of the same visualization approaches of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command. Unlike with <syntaxhighlight lang="stata" inline>graph</syntaxhighlight>, <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> creates an open-ended environment where multiple variables, various graphing styles, and several simultaneous axis environments can be combined. <br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6)<br />
</syntaxhighlight><br />
<br />
[[file:tw-scatter.png]]<br />
<br />
<br />
The <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option can be used with <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option cannot. <br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6) ///<br />
by(foreign , legend(off)) <br />
</syntaxhighlight><br />
<br />
This yields:<br />
<br />
[[File:Tw-scatter-by.png]]<br />
<br />
Instead of using the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, code where multiple subsets of data are intended for the same graphing axes must be written explicitly. Usually this is not too complicated, unless there are a large or unknown number of groupings. In those cases, loops must typically be used to compensate for the loss of the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, in code like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
levelsof foreign , local(levels)<br />
<br />
local colors = "red black"<br />
<br />
local counter 0<br />
foreach level in `levels' {<br />
local ++counter<br />
local graphs = "`graphs'" ///<br />
+ " (scatter weight length if foreign == `level' " ///<br />
+ " , mc(`: word `counter' of `colors''))" ///<br />
+ " (lpoly weight length if foreign == `level' " ///<br />
+ " , lc(`: word `counter' of `colors''))"<br />
}<br />
<br />
tw `graphs' ///<br />
, legend(on pos(5) ring(0) c(1) ///<br />
order(0 "Origin:" 2 "Domestic" 4 "Foreign") ) ///<br />
yscale(r(0)) ylab(#6) ///<br />
xtit("Car Length (in.)") ytit("Car Weight (lbs.)")<br />
</syntaxhighlight><br />
<br />
This code produces:<br />
<br />
[[file:tw-scatter-over.png]]<br />
<br />
===Built-in visualization commands===<br />
<br />
There are a small number of built-in visualization commands which do not need to be called through the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> commands. The most common are:<br />
<br />
* <syntaxhighlight lang="stata" inline>histogram</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight> <br />
* <syntaxhighlight lang="stata" inline>lpoly</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>scatter</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>marginsplot</syntaxhighlight> <br />
<br />
These can be called quickly to create simple graphs, such as using <syntaxhighlight lang="stata" inline>lowess price mpg, by(foreign)</syntaxhighlight> to create the following:<br />
<br />
[[file:lowess-by.png]]<br />
<br />
In general, however, these should be called within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment in most cases, because their behavior and options will change. For example, <syntaxhighlight lang="stata" inline>lpoly</syntaxhighlight> will not accept the <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option outside of <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; and <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight> will not create the scatterplot shown above inside a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
===User-written visualization commands===<br />
<br />
There are many user-written commands that produce visualizations as all or part of their functionality. These commands are usually purpose-built and cannot be combined with others through a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment. Additionally, depending on how the command is written, they may or may not take graphical options in the usual way. User-written commands will often have some set of the following features:<br />
<br />
* They will not take any options. This is rare.<br />
* They will take any regular <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> options as regular options. This is typical when the command is graphing data but not doing much customizable preprocessing.<br />
* They will take any regular <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> options within a special option, usually called something like <syntaxhighlight lang="stata" inline>graphoptions()</syntaxhighlight>. This is typical when the primary options are passed to a more important part of the command, like a regression model, before visualizing the results of that command.<br />
* They will take plot-specific <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> options, in cases where multiple elements are combined and general options would not allow appropriate styling, such as combining scatter plots, CIs, and regression lines. These types of options will be specified in the command help file.<br />
* They will allow you to add arbitrary additional plots in the same environment using an option such as <syntaxhighlight lang="stata" inline>addplot()</syntaxhighlight>, which follows the <syntaxhighlight lang="stata" inline>marginsplot</syntaxhighlight> syntax. This is uncommon.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7695Stata Coding Practices: Visualization2020-11-12T16:42:46Z<p>Bbdaniels: /* Built-in visualization commands */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. The main strength of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command used in this way is that it uses the <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax to rapidly calculate many possible statistics for any number of variables. The <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> and <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> options provide flexibility to do any desired subgrouping of the results.<br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The main shortcoming of this command is that it provides little customization of the actual display of the results, such as combining various statistics. For example, it cannot combine the <syntaxhighlight lang="stata" inline>(mean)</syntaxhighlight> and <syntaxhighlight lang="stata" inline>(sem)</syntaxhighlight> options in different styles such that a bar graph with confidence intervals would be produced. (You might try <syntaxhighlight lang="stata" inline>betterbar</syntaxhighlight>, available from SSC, for that.) Similarly, multiple variables with very different scales may not be possible to display in the same graphic easily, and numerical variables which have non-numerical interpretations - such as dates or labelled variables - may not be easily or correctly handled as intended without extensive manipulation.<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command (often abbreviated <syntaxhighlight lang="stata" inline>tw</syntaxhighlight>) enables many of the same visualization approaches of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command. Unlike with <syntaxhighlight lang="stata" inline>graph</syntaxhighlight>, <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> creates an open-ended environment where multiple variables, various graphing styles, and several simultaneous axis environments can be combined. <br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6)<br />
</syntaxhighlight><br />
<br />
[[file:tw-scatter.png]]<br />
<br />
<br />
The <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option can be used with <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option cannot. <br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6) ///<br />
by(foreign , legend(off)) <br />
</syntaxhighlight><br />
<br />
This yields:<br />
<br />
[[File:Tw-scatter-by.png]]<br />
<br />
Instead of using the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, code where multiple subsets of data are intended for the same graphing axes must be written explicitly. Usually this is not too complicated, unless there are a large or unknown number of groupings. In those cases, loops must typically be used to compensate for the loss of the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, in code like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
levelsof foreign , local(levels)<br />
<br />
local colors = "red black"<br />
<br />
local counter 0<br />
foreach level in `levels' {<br />
local ++counter<br />
local graphs = "`graphs'" ///<br />
+ " (scatter weight length if foreign == `level' " ///<br />
+ " , mc(`: word `counter' of `colors''))" ///<br />
+ " (lpoly weight length if foreign == `level' " ///<br />
+ " , lc(`: word `counter' of `colors''))"<br />
}<br />
<br />
tw `graphs' ///<br />
, legend(on pos(5) ring(0) c(1) ///<br />
order(0 "Origin:" 2 "Domestic" 4 "Foreign") ) ///<br />
yscale(r(0)) ylab(#6) ///<br />
xtit("Car Length (in.)") ytit("Car Weight (lbs.)")<br />
</syntaxhighlight><br />
<br />
This code produces:<br />
<br />
[[file:tw-scatter-over.png]]<br />
<br />
===Built-in visualization commands===<br />
<br />
There are a small number of built-in visualization commands which do not need to be called through the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> commands. The most common are:<br />
<br />
* <syntaxhighlight lang="stata" inline>histogram</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight> <br />
* <syntaxhighlight lang="stata" inline>lpoly</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>scatter</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>marginsplot</syntaxhighlight> <br />
<br />
These can be called quickly to create simple graphs, such as using <syntaxhighlight lang="stata" inline>lowess price mpg, by(foreign)</syntaxhighlight> to create the following:<br />
<br />
[[file:lowess-by.png]]<br />
<br />
In general, however, these should be called within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment in most cases, because their behavior and options will change. For example, <syntaxhighlight lang="stata" inline>lpoly</syntaxhighlight> will not accept the <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option outside of <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; and <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight> will not create the scatterplot shown above inside a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
===User-written visualization commands===<br />
<br />
There are many user-written commands that produce visualizations as all or part of their functionality. These commands are usually purpose-built and cannot be combined with others through a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment. Additionally, depending on how the command is written, they may or may not take graphical options in the usual way. User-written commands will often have some set of the following features:<br />
<br />
* They will not take any options. This is rare.<br />
* They will take any regular <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> options as regular options. This is typical when the command is graphing data but not doing much customizable preprocessing.<br />
* They will take any regular <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> options within a special option, usually called something like <syntaxhighlight lang="stata" inline>graphoptions()</syntaxhighlight>. This is typical when the primary options are passed to a more important part of the command, like a regression model, before visualizing the results of that command.<br />
* They will take plot-specific <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> options, in cases where multiple elements are combined and general options would not allow appropriate styling, such as combining scatter plots, CIs, and regression lines. These types of options will be specified in the command help file.<br />
* They will allow you to add arbitrary additional plots in the same environment using an option such as <syntaxhighlight lang="stata" inline>addplot()</syntaxhighlight>, which follows the <syntaxhighlight lang="stata" inline>marginsplot</syntaxhighlight> syntax. This is uncommon.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7694Stata Coding Practices: Visualization2020-11-12T16:40:00Z<p>Bbdaniels: /* User-written visualization commands */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. The main strength of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command used in this way is that it uses the <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax to rapidly calculate many possible statistics for any number of variables. The <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> and <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> options provide flexibility to do any desired subgrouping of the results.<br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The main shortcoming of this command is that it provides little customization of the actual display of the results, such as combining various statistics. For example, it cannot combine the <syntaxhighlight lang="stata" inline>(mean)</syntaxhighlight> and <syntaxhighlight lang="stata" inline>(sem)</syntaxhighlight> options in different styles such that a bar graph with confidence intervals would be produced. (You might try <syntaxhighlight lang="stata" inline>betterbar</syntaxhighlight>, available from SSC, for that.) Similarly, multiple variables with very different scales may not be possible to display in the same graphic easily, and numerical variables which have non-numerical interpretations - such as dates or labelled variables - may not be easily or correctly handled as intended without extensive manipulation.<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command (often abbreviated <syntaxhighlight lang="stata" inline>tw</syntaxhighlight>) enables many of the same visualization approaches of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command. Unlike with <syntaxhighlight lang="stata" inline>graph</syntaxhighlight>, <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> creates an open-ended environment where multiple variables, various graphing styles, and several simultaneous axis environments can be combined. <br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6)<br />
</syntaxhighlight><br />
<br />
[[file:tw-scatter.png]]<br />
<br />
<br />
The <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option can be used with <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option cannot. <br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6) ///<br />
by(foreign , legend(off)) <br />
</syntaxhighlight><br />
<br />
This yields:<br />
<br />
[[File:Tw-scatter-by.png]]<br />
<br />
Instead of using the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, code where multiple subsets of data are intended for the same graphing axes must be written explicitly. Usually this is not too complicated, unless there are a large or unknown number of groupings. In those cases, loops must typically be used to compensate for the loss of the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, in code like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
levelsof foreign , local(levels)<br />
<br />
local colors = "red black"<br />
<br />
local counter 0<br />
foreach level in `levels' {<br />
local ++counter<br />
local graphs = "`graphs'" ///<br />
+ " (scatter weight length if foreign == `level' " ///<br />
+ " , mc(`: word `counter' of `colors''))" ///<br />
+ " (lpoly weight length if foreign == `level' " ///<br />
+ " , lc(`: word `counter' of `colors''))"<br />
}<br />
<br />
tw `graphs' ///<br />
, legend(on pos(5) ring(0) c(1) ///<br />
order(0 "Origin:" 2 "Domestic" 4 "Foreign") ) ///<br />
yscale(r(0)) ylab(#6) ///<br />
xtit("Car Length (in.)") ytit("Car Weight (lbs.)")<br />
</syntaxhighlight><br />
<br />
This code produces:<br />
<br />
[[file:tw-scatter-over.png]]<br />
<br />
===Built-in visualization commands===<br />
<br />
There are a small number of built-in visualization commands which do not need to be called through the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> commands. The most common are:<br />
<br />
* <syntaxhighlight lang="stata" inline>histogram</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight> <br />
* <syntaxhighlight lang="stata" inline>lpoly</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>scatter</syntaxhighlight><br />
<br />
These can be called quickly to create simple graphs, such as using <syntaxhighlight lang="stata" inline>lowess price mpg, by(foreign)</syntaxhighlight> to create the following:<br />
<br />
[[file:lowess-by.png]]<br />
<br />
In general, however, these should be called within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment in most cases, because their behavior and options will change. For example, <syntaxhighlight lang="stata" inline>lpoly</syntaxhighlight> will not accept the <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option outside of <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; and <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight> will not create the scatterplot shown above inside a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
===User-written visualization commands===<br />
<br />
There are many user-written commands that produce visualizations as all or part of their functionality. These commands are usually purpose-built and cannot be combined with others through a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment. Additionally, depending on how the command is written, they may or may not take graphical options in the usual way. User-written commands will often have some set of the following features:<br />
<br />
* They will not take any options. This is rare.<br />
* They will take any regular <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> options as regular options. This is typical when the command is graphing data but not doing much customizable preprocessing.<br />
* They will take any regular <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> options within a special option, usually called something like <syntaxhighlight lang="stata" inline>graphoptions()</syntaxhighlight>. This is typical when the primary options are passed to a more important part of the command, like a regression model, before visualizing the results of that command.<br />
* They will take plot-specific <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> options, in cases where multiple elements are combined and general options would not allow appropriate styling, such as combining scatter plots, CIs, and regression lines. These types of options will be specified in the command help file.<br />
* They will allow you to add arbitrary additional plots in the same environment using an option such as <syntaxhighlight lang="stata" inline>addplot()</syntaxhighlight>, which follows the <syntaxhighlight lang="stata" inline>marginsplot</syntaxhighlight> syntax. This is uncommon.</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7693Stata Coding Practices: Visualization2020-11-12T16:16:36Z<p>Bbdaniels: /* Built-in visualization commands */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. The main strength of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command used in this way is that it uses the <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax to rapidly calculate many possible statistics for any number of variables. The <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> and <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> options provide flexibility to do any desired subgrouping of the results.<br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The main shortcoming of this command is that it provides little customization of the actual display of the results, such as combining various statistics. For example, it cannot combine the <syntaxhighlight lang="stata" inline>(mean)</syntaxhighlight> and <syntaxhighlight lang="stata" inline>(sem)</syntaxhighlight> options in different styles such that a bar graph with confidence intervals would be produced. (You might try <syntaxhighlight lang="stata" inline>betterbar</syntaxhighlight>, available from SSC, for that.) Similarly, multiple variables with very different scales may not be possible to display in the same graphic easily, and numerical variables which have non-numerical interpretations - such as dates or labelled variables - may not be easily or correctly handled as intended without extensive manipulation.<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command (often abbreviated <syntaxhighlight lang="stata" inline>tw</syntaxhighlight>) enables many of the same visualization approaches of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command. Unlike with <syntaxhighlight lang="stata" inline>graph</syntaxhighlight>, <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> creates an open-ended environment where multiple variables, various graphing styles, and several simultaneous axis environments can be combined. <br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6)<br />
</syntaxhighlight><br />
<br />
[[file:tw-scatter.png]]<br />
<br />
<br />
The <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option can be used with <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option cannot. <br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6) ///<br />
by(foreign , legend(off)) <br />
</syntaxhighlight><br />
<br />
This yields:<br />
<br />
[[File:Tw-scatter-by.png]]<br />
<br />
Instead of using the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, code where multiple subsets of data are intended for the same graphing axes must be written explicitly. Usually this is not too complicated, unless there are a large or unknown number of groupings. In those cases, loops must typically be used to compensate for the loss of the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, in code like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
levelsof foreign , local(levels)<br />
<br />
local colors = "red black"<br />
<br />
local counter 0<br />
foreach level in `levels' {<br />
local ++counter<br />
local graphs = "`graphs'" ///<br />
+ " (scatter weight length if foreign == `level' " ///<br />
+ " , mc(`: word `counter' of `colors''))" ///<br />
+ " (lpoly weight length if foreign == `level' " ///<br />
+ " , lc(`: word `counter' of `colors''))"<br />
}<br />
<br />
tw `graphs' ///<br />
, legend(on pos(5) ring(0) c(1) ///<br />
order(0 "Origin:" 2 "Domestic" 4 "Foreign") ) ///<br />
yscale(r(0)) ylab(#6) ///<br />
xtit("Car Length (in.)") ytit("Car Weight (lbs.)")<br />
</syntaxhighlight><br />
<br />
This code produces:<br />
<br />
[[file:tw-scatter-over.png]]<br />
<br />
===Built-in visualization commands===<br />
<br />
There are a small number of built-in visualization commands which do not need to be called through the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> commands. The most common are:<br />
<br />
* <syntaxhighlight lang="stata" inline>histogram</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight> <br />
* <syntaxhighlight lang="stata" inline>lpoly</syntaxhighlight><br />
* <syntaxhighlight lang="stata" inline>scatter</syntaxhighlight><br />
<br />
These can be called quickly to create simple graphs, such as using <syntaxhighlight lang="stata" inline>lowess price mpg, by(foreign)</syntaxhighlight> to create the following:<br />
<br />
[[file:lowess-by.png]]<br />
<br />
In general, however, these should be called within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment in most cases, because their behavior and options will change. For example, <syntaxhighlight lang="stata" inline>lpoly</syntaxhighlight> will not accept the <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option outside of <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; and <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight> will not create the scatterplot shown above inside a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=File:Lowess-by.png&diff=7692File:Lowess-by.png2020-11-12T16:12:52Z<p>Bbdaniels: </p>
<hr />
<div></div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7691Stata Coding Practices: Visualization2020-11-10T21:44:13Z<p>Bbdaniels: /* The twoway command */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. The main strength of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command used in this way is that it uses the <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax to rapidly calculate many possible statistics for any number of variables. The <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> and <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> options provide flexibility to do any desired subgrouping of the results.<br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The main shortcoming of this command is that it provides little customization of the actual display of the results, such as combining various statistics. For example, it cannot combine the <syntaxhighlight lang="stata" inline>(mean)</syntaxhighlight> and <syntaxhighlight lang="stata" inline>(sem)</syntaxhighlight> options in different styles such that a bar graph with confidence intervals would be produced. (You might try <syntaxhighlight lang="stata" inline>betterbar</syntaxhighlight>, available from SSC, for that.) Similarly, multiple variables with very different scales may not be possible to display in the same graphic easily, and numerical variables which have non-numerical interpretations - such as dates or labelled variables - may not be easily or correctly handled as intended without extensive manipulation.<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command (often abbreviated <syntaxhighlight lang="stata" inline>tw</syntaxhighlight>) enables many of the same visualization approaches of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command. Unlike with <syntaxhighlight lang="stata" inline>graph</syntaxhighlight>, <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> creates an open-ended environment where multiple variables, various graphing styles, and several simultaneous axis environments can be combined. <br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6)<br />
</syntaxhighlight><br />
<br />
[[file:tw-scatter.png]]<br />
<br />
<br />
The <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option can be used with <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option cannot. <br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6) ///<br />
by(foreign , legend(off)) <br />
</syntaxhighlight><br />
<br />
This yields:<br />
<br />
[[File:Tw-scatter-by.png]]<br />
<br />
Instead of using the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, code where multiple subsets of data are intended for the same graphing axes must be written explicitly. Usually this is not too complicated, unless there are a large or unknown number of groupings. In those cases, loops must typically be used to compensate for the loss of the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, in code like the following:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
levelsof foreign , local(levels)<br />
<br />
local colors = "red black"<br />
<br />
local counter 0<br />
foreach level in `levels' {<br />
local ++counter<br />
local graphs = "`graphs'" ///<br />
+ " (scatter weight length if foreign == `level' " ///<br />
+ " , mc(`: word `counter' of `colors''))" ///<br />
+ " (lpoly weight length if foreign == `level' " ///<br />
+ " , lc(`: word `counter' of `colors''))"<br />
}<br />
<br />
tw `graphs' ///<br />
, legend(on pos(5) ring(0) c(1) ///<br />
order(0 "Origin:" 2 "Domestic" 4 "Foreign") ) ///<br />
yscale(r(0)) ylab(#6) ///<br />
xtit("Car Length (in.)") ytit("Car Weight (lbs.)")<br />
</syntaxhighlight><br />
<br />
This code produces:<br />
<br />
[[file:tw-scatter-over.png]]<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=File:Tw-scatter-over.png&diff=7690File:Tw-scatter-over.png2020-11-10T21:43:25Z<p>Bbdaniels: </p>
<hr />
<div></div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7689Stata Coding Practices: Visualization2020-11-10T21:36:20Z<p>Bbdaniels: /* The twoway command */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. The main strength of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command used in this way is that it uses the <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax to rapidly calculate many possible statistics for any number of variables. The <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> and <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> options provide flexibility to do any desired subgrouping of the results.<br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The main shortcoming of this command is that it provides little customization of the actual display of the results, such as combining various statistics. For example, it cannot combine the <syntaxhighlight lang="stata" inline>(mean)</syntaxhighlight> and <syntaxhighlight lang="stata" inline>(sem)</syntaxhighlight> options in different styles such that a bar graph with confidence intervals would be produced. (You might try <syntaxhighlight lang="stata" inline>betterbar</syntaxhighlight>, available from SSC, for that.) Similarly, multiple variables with very different scales may not be possible to display in the same graphic easily, and numerical variables which have non-numerical interpretations - such as dates or labelled variables - may not be easily or correctly handled as intended without extensive manipulation.<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command (often abbreviated <syntaxhighlight lang="stata" inline>tw</syntaxhighlight>) enables many of the same visualization approaches of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command. Unlike with <syntaxhighlight lang="stata" inline>graph</syntaxhighlight>, <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> creates an open-ended environment where multiple variables, various graphing styles, and several simultaneous axis environments can be combined. <br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6)<br />
</syntaxhighlight><br />
<br />
[[file:tw-scatter.png]]<br />
<br />
<br />
The <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option can be used with <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option cannot. <br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6) ///<br />
by(foreign , legend(off)) <br />
</syntaxhighlight><br />
<br />
This yields:<br />
<br />
[[File:Tw-scatter-by.png]]<br />
<br />
Loops must typically be used to compensate for the loss of the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, in code like the following:<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=File:Tw-scatter-by.png&diff=7688File:Tw-scatter-by.png2020-11-10T21:20:27Z<p>Bbdaniels: </p>
<hr />
<div></div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7687Stata Coding Practices: Visualization2020-11-10T21:10:42Z<p>Bbdaniels: /* The twoway command */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. The main strength of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command used in this way is that it uses the <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax to rapidly calculate many possible statistics for any number of variables. The <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> and <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> options provide flexibility to do any desired subgrouping of the results.<br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The main shortcoming of this command is that it provides little customization of the actual display of the results, such as combining various statistics. For example, it cannot combine the <syntaxhighlight lang="stata" inline>(mean)</syntaxhighlight> and <syntaxhighlight lang="stata" inline>(sem)</syntaxhighlight> options in different styles such that a bar graph with confidence intervals would be produced. (You might try <syntaxhighlight lang="stata" inline>betterbar</syntaxhighlight>, available from SSC, for that.) Similarly, multiple variables with very different scales may not be possible to display in the same graphic easily, and numerical variables which have non-numerical interpretations - such as dates or labelled variables - may not be easily or correctly handled as intended without extensive manipulation.<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command (often abbreviated <syntaxhighlight lang="stata" inline>tw</syntaxhighlight>) enables many of the same visualization approaches of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command. Unlike with <syntaxhighlight lang="stata" inline>graph</syntaxhighlight>, <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> creates an open-ended environment where multiple variables, various graphing styles, and several simultaneous axis environments can be combined. <br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
tw ///<br />
(scatter weight length , mc(gray)) ///<br />
(lpoly weight length , lc(red)) ///<br />
(scatter weight length ///<br />
if rep78 == 2 ///<br />
, mlab(make) mlabsize(vsmall) mlabc(black) mc(black))<br />
, yscale(r(0)) ylab(#6)<br />
</syntaxhighlight><br />
<br />
[[file:tw-scatter.png]]<br />
<br />
<br />
The <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> option can be used with <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>; the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option cannot. Loops must typically be used to compensate for the loss of the <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> option, in code like the following:<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=File:Tw-scatter.png&diff=7686File:Tw-scatter.png2020-11-10T21:09:06Z<p>Bbdaniels: Bbdaniels uploaded a new version of File:Tw-scatter.png</p>
<hr />
<div></div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=File:Tw-scatter.png&diff=7685File:Tw-scatter.png2020-11-10T20:52:05Z<p>Bbdaniels: </p>
<hr />
<div></div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7684Stata Coding Practices: Visualization2020-11-10T20:18:29Z<p>Bbdaniels: /* Specific Visualization Approaches */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. The main strength of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command used in this way is that it uses the <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax to rapidly calculate many possible statistics for any number of variables. The <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> and <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> options provide flexibility to do any desired subgrouping of the results.<br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta, clear<br />
<br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The main shortcoming of this command is that it provides little customization of the actual display of the results, such as combining various statistics. For example, it cannot combine the <syntaxhighlight lang="stata" inline>(mean)</syntaxhighlight> and <syntaxhighlight lang="stata" inline>(sem)</syntaxhighlight> options in different styles such that a bar graph with confidence intervals would be produced. (You might try <syntaxhighlight lang="stata" inline>betterbar</syntaxhighlight>, available from SSC, for that.) Similarly, multiple variables with very different scales may not be possible to display in the same graphic easily, and numerical variables which have non-numerical interpretations - such as dates or labelled variables - may not be easily or correctly handled as intended without extensive manipulation.<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7683Stata Coding Practices: Visualization2020-11-10T20:16:49Z<p>Bbdaniels: /* The graph command */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. The main strength of the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command used in this way is that it uses the <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax to rapidly calculate many possible statistics for any number of variables. The <syntaxhighlight lang="stata" inline>over()</syntaxhighlight> and <syntaxhighlight lang="stata" inline>by()</syntaxhighlight> options provide flexibility to do any desired subgrouping of the results.<br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The main shortcoming of this command is that it provides little customization of the actual display of the results, such as combining various statistics. For example, it cannot combine the <syntaxhighlight lang="stata" inline>(mean)</syntaxhighlight> and <syntaxhighlight lang="stata" inline>(sem)</syntaxhighlight> options in different styles such that a bar graph with confidence intervals would be produced. (You might try <syntaxhighlight lang="stata" inline>betterbar</syntaxhighlight>, available from SSC, for that.) Similarly, multiple variables with very different scales may not be possible to display in the same graphic easily, and numerical variables which have non-numerical interpretations - such as dates or labelled variables - may not be easily or correctly handled as intended without extensive manipulation.<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7682Stata Coding Practices: Visualization2020-11-10T20:02:37Z<p>Bbdaniels: /* The graph command */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. <br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png|4000px]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=File:Graph-hbar.png&diff=7681File:Graph-hbar.png2020-11-10T19:59:33Z<p>Bbdaniels: Bbdaniels uploaded a new version of File:Graph-hbar.png</p>
<hr />
<div></div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7680Stata Coding Practices: Visualization2020-11-10T19:56:05Z<p>Bbdaniels: /* The graph command */</p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations of one or more variables in the dataset. The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command creates visualizations which have a Y-axis and a categorical axis. <br />
<br />
For example, we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph hbar ///<br />
(mean) price (median) price (max) length ///<br />
, asc yvaroptions( label(labsize(vsmall)) ///<br />
relabel(1 "Mean of Price" 2 "Median of Price" 3 "Max of Length") ) ///<br />
over(foreign) by(rep78 , c(1)) ///<br />
ysize(7) blabel(bar,size(vsmall)) <br />
</syntaxhighlight><br />
<br />
And we would obtain:<br />
<br />
[[file:graph-hbar.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command also provides a range of syntaxes for other graphing functions, such as drawing, saving, and exporting graphs. These are not described here and - other than these - most should rarely be used.<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=File:Graph-hbar.png&diff=7679File:Graph-hbar.png2020-11-10T19:55:33Z<p>Bbdaniels: Bbdaniels uploaded a new version of File:Graph-hbar.png</p>
<hr />
<div></div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=File:Graph-hbar.png&diff=7678File:Graph-hbar.png2020-11-10T19:54:26Z<p>Bbdaniels: </p>
<hr />
<div></div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7677Stata Coding Practices: Visualization2020-11-10T19:29:42Z<p>Bbdaniels: </p>
<hr />
<div>Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend. To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7675Stata Coding Practices: Visualization2020-11-09T22:20:42Z<p>Bbdaniels: /* Graphics options */</p>
<hr />
<div>(This page is under construction.)<br />
<br />
Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend.<br />
<br />
To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7674Stata Coding Practices: Visualization2020-11-09T22:18:09Z<p>Bbdaniels: /* Graphics options */</p>
<hr />
<div>(This page is under construction.)<br />
<br />
Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later. (Similarly, the <syntaxhighlight lang="stata" inline>saving()</syntaxhighlight> option is discouraged in all other uses.)<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export command</syntaxhighlight>:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend.<br />
<br />
To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7673Stata Coding Practices: Visualization2020-11-09T22:16:56Z<p>Bbdaniels: /* Graphics options */</p>
<hr />
<div>(This page is under construction.)<br />
<br />
Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns off axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
Two further primary settings will be desired when creating graphs for publication purposes: the file type of the exported image file and the aspect ratio (width-to-height) of the file. The aspect ratio is set using the <syntaxhighlight lang="stata" inline>ysize()</syntaxhighlight> or <syntaxhighlight lang="stata" inline>xsize()</syntaxhighlight> options, with integers as the arguments.<br />
<br />
The choice of file type is also important. PNG images tend to be of reasonable quality and natively viewable on all operating systems as well as on web browsers when stored in places like GitHub and Zenodo. However, PNG images will typically be insufficient quality for print media; journals may prefer "lossless" TIFF or EPS images. These may not be natively viewable in your operating system. You should never use <syntaxhighlight lang="stata" inline>graph save</syntaxhighlight> to create <syntaxhighlight lang="stata" inline>.gph</syntaxhighlight> files unless you intend to combine graphs later.<br />
<br />
One way to implement these settings is with code like the following. Note the file type is explicit in the file path extension for the <syntaxhighlight lang="stata" inline>graph export command</syntaxhighlight>:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysuse auto.dta , clear<br />
<br />
scatter price mpg ///<br />
, nodraw ${graph_opts}<br />
<br />
graph draw , ysize(7)<br />
graph export "scatter.png"<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend.<br />
<br />
To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7672Stata Coding Practices: Visualization2020-11-09T22:08:06Z<p>Bbdaniels: /* Graphics options */</p>
<hr />
<div>(This page is under construction.)<br />
<br />
Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements. In particular, this code:<br />
<br />
* Left-aligns the graph title<br />
* Sets the background colors to white<br />
* Turns of axis lines<br />
* Rotates y-labels 90 degrees<br />
* Left-aligns the x-axis title<br />
* Removes coloration and bordering from the legend<br />
<br />
These settings are implemented as follows: <br />
<br />
<syntaxhighlight lang="stata"><br />
<br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend.<br />
<br />
To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7671Stata Coding Practices: Visualization2020-11-09T22:05:03Z<p>Bbdaniels: /* Graphics options */</p>
<hr />
<div>(This page is under construction.)<br />
<br />
Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs. The specific settings here are not recommendations, but are for illustration purposes of common graphical elements:<br />
<br />
<syntaxhighlight lang="stata"><br />
<br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
xscale(noline) xtit(,placement(left) justification(left)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
yscale(noline) ylab(,angle(0) nogrid) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend.<br />
<br />
To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdanielshttps://dimewiki.worldbank.org/index.php?title=Stata_Coding_Practices:_Visualization&diff=7670Stata Coding Practices: Visualization2020-11-09T22:02:19Z<p>Bbdaniels: /* Graphics options */</p>
<hr />
<div>(This page is under construction.)<br />
<br />
Modern Stata versions have extremely powerful graphics capabilities which allow the rapid creation of publication-quality graphics from almost any kind of tabular data. Although the default graphical commands and settings leave much to be desired, the customizability and interoperability of Stata's visualization tools mean that almost any imaginable output can be rendered using Stata's built-in graphics engine.<br />
<br />
==Read First==<br />
<br />
Stata graphics are typically created using one of four command types. Each has specific use cases, strengths, and weaknesses, and it is important to be familiar with the abilities and limitations of each when considering which to use to create a particular visualization. All four methods (except some user-written commands) use the same basic styling syntax discussed in this article.<br />
<br />
* The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command suite creates pre-packaged visualizations, typically based on Stata's native <syntaxhighlight lang="stata" inline>collapse</syntaxhighlight> syntax and statistics.<br />
* The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> suite, which is the most commonly used tool, allows a flexible and open-ended approach to visualizing any amount of information in an abstract set of axes.<br />
* Built-in graphical commands (such as <syntaxhighlight lang="stata" inline>lowess</syntaxhighlight>) offer pre-packaged visualizations that do not follow the <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> style. These commands are typically better used within a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment and may behave differently when used independently.<br />
* User-written commands (such as <syntaxhighlight lang="stata" inline>iegraph</syntaxhighlight> or <syntaxhighlight lang="stata" inline>spmap</syntaxhighlight>) create custom visualizations, but typically have unique purpose-built syntaxes and cannot be integrated in a <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> environment.<br />
<br />
==General Graphics Tools==<br />
<br />
===Graphics options===<br />
<br />
There are an enormous number of options available for each specific type of graph in Stata, and we will not cover those here. When drawing a graph, refer to the specific help file for its command to understand the full range of specific options available. These typically include key elements like marker shapes and sizes; coloration of lines, markers, and fill elements; transparency and added text; and so on. All of these elements will allow you to create the exact visual components you want to display and there are a large number of resources on using graphical elements to efficiently convey information to readers. Therefore we do not cover these elements in this section.<br />
<br />
However, some elements are common to all graphs and it is typically beneficial to standardize these components across all the graphs you create for a single piece of work. One workable setting that covers the main bases is the following code, which creates global macros called easily into all graphs:<br />
<br />
<syntaxhighlight lang="stata"><br />
<br />
// For -twoway- graphs<br />
global graph_opts ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ylab(,angle(0) nogrid) ///<br />
xtit(,placement(left) justification(left)) ///<br />
yscale(noline) xscale(noline) ///<br />
legend(region(lc(none) fc(none)))<br />
<br />
// For -graph- graphs<br />
global graph_opts_1 ///<br />
title(, justification(left) color(black) span pos(11)) ///<br />
graphregion(color(white)) ///<br />
ylab(,angle(0) nogrid) yscale(noline) ///<br />
legend(region(lc(none) fc(none)))<br />
</syntaxhighlight><br />
<br />
===Graphical schemes===<br />
<br />
Graphical schemes apply a large number of these options simultaneously, and in doing so they provide one of the highest degrees of cross-system consistency that is possible in creating graphs. Stata includes several built-in graphical schemes; the familiar "Stata blue" graphs are created using the <code>s2color</code> scheme.<br />
<br />
The graph scheme can be changed using the <syntaxhighlight lang="stata" inline>set scheme</syntaxhighlight> command. Stata will use the <syntaxhighlight lang="stata" inline>sysdir</syntaxhighlight> path to search for matching graph schemes, so for example a third-party scheme file (like [https://github.com/graykimbrough/uncluttered-stata-graphs Uncluttered]) might be included in the top-level directory of a repository and applied in the run file by writing:<br />
<br />
<syntaxhighlight lang="stata"><br />
sysdir set PERSONAL "${directory}/"<br />
set scheme uncluttered<br />
</syntaxhighlight><br />
<br />
This directs Stata to search for <syntaxhighlight lang="stata" inline>scheme-uncluttered.scheme</syntaxhighlight> and apply it to all graphics created while Stata remains open. This is a simple scheme which incorporates many of the universally-applicable options above for all graphs, particularly region coloring and axis marking. As with any third-party scheme, you should read the documentation; notably, this scheme provides a specific color palette and turns off the legend by default.<br />
<br />
One thing that schemes cannot do, apparently, is control the default graphics font. This can be done using <syntaxhighlight lang="stata" inline>graph set</syntaxhighlight>, as in <syntaxhighlight lang="stata" inline>graph set window fontface "Helvetica"</syntaxhighlight>.<br />
<br />
===Combining Stata graphics===<br />
<br />
Combining multiple graphs into a single image is an excellent way to present various elects of a single analysis at the same time. Combining graphs is especially useful when facing constraints on the number of allowable exhibits, or when one or more graphical elements are very simple but important.<br />
<br />
There are two main approaches to combing graphs: overlaying multiple pieces of information on the same set of axes, or combining multiple visualizations into a single image with multiple panels (either aligned or not, although Stata handles alignment somewhat poorly).<br />
<br />
Overlaying graphics is accomplished using <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> syntax. In <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>, the graph axes are abstract, so with some abuse of notation it is possible to draw just about anything. Starting from the first axis, and proceeding in order of the commands written, Stata will layer graphs on top of each other on the same set of axes. Including a second (possibly invisible) axis allows further possibilities. For example, with the Uncluttered scheme applied and Helvetica set as the graph font, we might write the following <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command:<br />
<br />
<syntaxhighlight lang="stata"><br />
twoway ///<br />
/// Stacked histogram using total/subset approach<br />
(histogram date ///<br />
, freq yaxis(2) fc(gs14) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
(histogram date if voucher_use == 0 ///<br />
, freq yaxis(2) fc(gs10) ls(none) start(19997) width(7) barwidth(6) ) ///<br />
/// Positivity<br />
(lpoly mtb date if voucher_use == 0 , lc(black) lw(thick) lp(solid)) ///<br />
(lpoly mtb date if voucher_use == 1 , lc(red) lw(thick) lp(solid)) ///<br />
(lpoly rifres date if voucher_use == 0 , lc(black) lw(thick) lp(dash)) ///<br />
(lpoly rifres date if voucher_use == 1 , lc(red) lw(thick) lp(dash)) ///<br />
/// Data collection<br />
(function 0.8 , lc(black) range(20193 20321)) /// <br />
(scatteri 0.8 20193 "Round 1" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
(function 0.8 , lc(black) range(20814 20877)) /// <br />
(scatteri 0.8 20814 "Round 2" , mlabcolor(black) m(none) mlabpos(1)) /// <br />
/// Overall options <br />
, legend(on size(vsmall) pos(12) ///<br />
order( ///<br />
2 "TB Tests Done, non-PPIA" ///<br />
1 "TB Tests Done, PPIA" ///<br />
3 "TB Positive Rate, non-PPIA" ///<br />
4 "TB Positive Rate, PPIA" ///<br />
5 "Rifampicin Resistance, non-PPIA" ///<br />
6 "Rifampicin Resistance, PPIA" )) ///<br />
${hist_opts} xoverhang ///<br />
ylab(${pct}) ytit("Weekly Tests (Histogram)", axis(2)) ///<br />
xtit(" ") xlab(,labsize(small) format(%tdMon_CCYY))<br />
</syntaxhighlight><br />
<br />
If we did, we would obtain something like:<br />
<br />
[[File:twoway-layer.png]]<br />
<br />
Alternatively, we might like to display information in panels that would not layer well together, or from commands which cannot be combined by <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight>. For example, after creating some graphs with user-written commands (and including their panel titles), we might write:<br />
<br />
<syntaxhighlight lang="stata"><br />
graph combine ///<br />
"${git}/outputs/f-discontinuity-1.gph" ///<br />
"${git}/outputs/f-discontinuity-2.gph" ///<br />
"${git}/outputs/f-discontinuity-3.gph" ///<br />
"${git}/outputs/f-discontinuity-4.gph" ///<br />
, altshrink<br />
</syntaxhighlight><br />
<br />
And we would obtain something like:<br />
<br />
[[File:graph-combine.png]]<br />
<br />
The <syntaxhighlight lang="stata" inline>graph combine</syntaxhighlight> command provides many options for customizing the layout and alignment of the graphs included. The user-written <syntaxhighlight lang="stata" inline>grc1leg</syntaxhighlight> command may also be useful when all of the visualizations included in the final image are intended to share a common legend.<br />
<br />
To save processing time when combining graphs, consider rendering the underlying graphs using the <syntaxhighlight lang="stata" inline>nodraw</syntaxhighlight> option, which saves graph rendering until the combined graph is drawn. Rendering the Graph window is computationally costly in Stata and is best avoided whenever possible.<br />
<br />
==Specific Visualization Approaches==<br />
<br />
===The <syntaxhighlight lang="stata" inline>graph</syntaxhighlight> command===<br />
<br />
===The <syntaxhighlight lang="stata" inline>twoway</syntaxhighlight> command===<br />
<br />
===Built-in visualization commands===<br />
<br />
===User-written visualization commands===</div>Bbdaniels