Stata Coding Practices: Debugging
Debugging is the process of fixing runtime errors or other unexpected behaviors in Stata code. Unlike normal code execution, debugging involves intentionally preventing code from running completely so the user can investigate the current state of data or memory and determine what code would produce the desired outputs in a complete execution of the code.
Read First
Stata code is typically debugged using four tools: trace
, pause
, display
, and capture
. Understanding how to use these tools interactively will allow you to pinpoint and correct errors in code.
- The
trace
command requests that Stata display additional levels of detail as it executes code. Nearly all Stata code and commands use other commands as part of their execution, and an appropriately settrace
readout will allow you to pinpoint the line or command where an issue is occurring. - The
pause
command freezes the execution of Stata code in the designated location and optionally displays information about Stata's memory state. While paused, Stata allows the user to interact with the data as it is at that point in code execution and to permit code execution to continue when finished. - The
display
command is used to extract specific information about Stata's current state without pausing or tracing all code execution. Since Stata has rich information on hand at all times about macros and data in memory, it is possible to detect problems when unexpected or inappropriate values are displayed during code execution. - The
capture
command is used to extract specific information about errors resulting from commands while still allowing code to continue executing past a point that would otherwise result in Stata breaking and refusing to continue.
Principles of code debugging in Stata
When code execution produces errors or unexpected results, it is typically very difficult to determine what is causing an issue simply by looking at the code and results. Such a bug usually results from an intermediate state arising from a complex combination of code, which is not easily foreseeable when writing the code. Typical examples occur when code is generalized or looped such that code that is written or tested in a given environment is then applied to situations where it was not originally envisioned or tested. For example, a loop that includes conditional statements or data subsetting may result in empty datasets or undefined macros or expressions and cause later code to be invalid.
Debugging Stata code follows a three-step process each time such an issue is encountered. First, the author must determine where the issue is occurring, by finding the literal line of code that is producing the error or unexpected result. This is usually done with trace
or display
. Next, the author must determine why the input to that line is different than what they had anticipated, which typically involves inserting a pause
or display
command to assess the state of various memory elements at that point. Finally, the author needs to address the problem, which typically involves following the same process upstream in the code to find the line where the incorrect input is being generated. This process is iterated until the root cause is found and a code correction can be made.
Locating code issues with trace
When a command is executed in Stata and cannot be successfully run through to its endpoint, the error messages that are displayed are often uninformative, particularly for user-written commands. Thankfully, Stata provides a useful set of commands for users to examine code execution line-by-line, and Stata makes the data and memory state accessible to users before and after the execution of every single command.
When there is an error in code, Stata will break during that command and report its error state, and refuse to continue executing any more code. This usually makes it easy to detect the exact command that is unsuccessful and begin to understand and correct the error. However, when do-files execute other do-files, or when commands are called, Stata by default suppresses reporting the line-by-line evaluation and execution of these commands (displaying line-by-line results is relatively slow in Stata and should be avoided by using quietly
where possible).
To detect the location of errors in code that is called from a single do-file, the trace
setting needs to be invoked. By running set trace on
and set tracedepth #
(where #
is an integer), and then re-running the code, Stata will report detailed evaluations of the code up to the level requested. Using set tracedepth 1
does nothing. Using set tracedepth 2
will expand the evaluation of each command called by the main code, allowing detection of erroneous inputs to those commands by showing where those commands have stopped executing.
For example, running:
local bad BAD sysuse auto`bad'.dta
returns:
. sysuse auto`bad'.dta file "autoBAD.dta" not found r(601);
and running it after set tracedepth 2
returns:
. sysuse auto`bad'.dta --------------------------------------------------------------------------- begin sysuse --- - version 8 - gettoken first : 0, parse(" ,") quotes - if `"`first'"'=="dir" { = if `"autoBAD.dta"'=="dir" { gettoken first 0 : 0, parse(" ,") sysusedir `0' exit } - local 0 `"using `0'"' = local 0 `"using autoBAD.dta"' - syntax using/ [, CLEAR REPLACE] - local clear = cond("`replace'"!="", "clear", "`clear'") = local clear = cond(""!="", "clear", "") - if bsubstr(`"`using'"',-4,.)!=".dta" { = if bsubstr(`"autoBAD.dta"',-4,.)!=".dta" { local using `"`using'.dta"' } - quietly findfile `"`using'"' = quietly findfile `"autoBAD.dta"' file "autoBAD.dta" not found ----------------------------------------------------------------------------- end sysuse --- r(601);
This result makes the error more explicit and, among other things, helps detect that the contents of the macro `bad'
are the problem. When finished, set trace off
will return Stata to its normal display behavior. The trace
help file details various other functionalities that help quickly parse through the material that is returned. You may also need to check settings of any code that is executed quietly
to ensure it is correctly displayed in the Results window.
Locating code issues with display
While trace
is an invaluable tool for locating issues in code by manual inspection, its outputs are voluminous and can be hard to search and parse, regardless of whether you are doing so in the Results window directly or printing to a log file. In many situations, Stata's default error behavior of breaking at a failed command make trace
unnecessary to find the location of the issue. In other cases, Stata code will break inside a loop, and while you could diagnose such an issue by having trace
report all the calculations inside the loop, it is often much simpler only to report the information you need to find the bug, using display
. (The workflow detailed here, by the way, is easily adapted for any output command, such as matlist
.)
The display
command is simple: it prints the requested information to the Results window. Thanks to the large amount of information stored by Stata, this is an invaluable access point to information about the state of the program just before the code breaks. For example, you can write display
commands like:
display `" This is iteration `i'. The dependent variable is `yvar'. "' display `" It is labelled `: var lab `yvar''. Output to {browse `filepath'}. "' display `" There are `c(N)' observations and `c(k)' variables in the dataset. "' display `" The last regression coefficient was `=_b[mpg]'. "' display `" Average cluster size was `=e(N)/e(N_clust)'. "'
Debugging with display
should make liberal use of the built-in Stata features to access and handle information and results. These include macros, whether from loops or organically created; extended functions (documentation at extended_fcn
); SMCL code to create clickable links (smcl
); on-the-fly evaluation with `='
and `:'
; system information from return
, ereturn
, and creturn
. Finally, display
, like most other string-handling commands, should almost always be enclosed in compound double quotes `" "'
to ensure it handles any possible input correctly.