Stata Coding Practices: Debugging
Debugging is the process of fixing runtime errors or other unexpected behaviors in Stata code. Unlike normal code execution, debugging involves intentionally preventing code from running completely so the user can investigate the current state of data or memory and determine what code would produce the desired outputs in a complete execution of the code.
Read First
Stata code is typically debugged using four tools: trace
, pause
, display
, and capture
. Understanding how to use these tools interactively will allow you to pinpoint and correct errors in code.
- The
trace
command requests that Stata display additional levels of detail as it executes code. Nearly all Stata code and commands use other commands as part of their execution, and an appropriately settrace
readout will allow you to pinpoint the line or command where an issue is occurring. - The
pause
command freezes the execution of Stata code in the designated location and optionally displays information about Stata's memory state. While paused, Stata allows the user to interact with the data as it is at that point in code execution and to permit code execution to continue when finished. - The
display
command is used to extract specific information about Stata's current state without pausing or tracing all code execution. Since Stata has rich information on hand at all times about macros and data in memory, it is possible to detect problems when unexpected or inappropriate values are displayed during code execution. - The
capture
command is used to extract specific information about errors resulting from commands while still allowing code to continue executing past a point that would otherwise result in Stata breaking and refusing to continue.
Principles of code debugging in Stata
When code execution produces errors or unexpected results, it is typically very difficult to determine what is causing an issue simply by looking at the code and results. Such a bug usually results from an intermediate state arising from a complex combination of code, which is not easily foreseeable when writing the code. Typical examples occur when code is generalized or looped such that code that is written or tested in a given environment is then applied to situations where it was not originally envisioned or tested. For example, a loop that includes conditional statements or data subsetting may result in empty datasets or undefined macros or expressions and cause later code to be invalid.
Debugging Stata code follows a three-step process each time such an issue is encountered. First, the author must determine where the issue is occurring, by finding the literal line of code that is producing the error or unexpected result. This is usually done with trace
or display
. Next, the author must determine why the input to that line is different than what they had anticipated, which typically involves inserting a pause
or display
command to assess the state of various memory elements at that point. Finally, the author needs to address the problem, which typically involves following the same process upstream in the code to find the line where the incorrect input is being generated. This process is iterated until the root cause is found and a code correction can be made.
Detecting the location of code errors
When a command is executed in Stata and cannot be successfully run through to its endpoint, the error messages that are displayed are often uninformative, particularly for user-written commands. Thankfully, Stata provides a useful set of commands for users to examine code execution line-by-line, and Stata makes the data and memory state accessible to users before and after the execution of every single command.
When there is an error in code, Stata will break during that command and report its error state, and refuse to continue executing any more code. This usually makes it easy to detect the exact command that is unsuccessful and begin to understand and correct the error. However, when do-files execute other do-files, or when commands are called, Stata by default suppresses reporting the line-by-line evaluation and execution of these commands (displaying line-by-line results is relatively slow in Stata and should be avoided by using quietly
where possible).
To detect the location of errors in code that is called from a single do-file, the trace
setting needs to be invoked. By running set trace on
and set tracedepth #
(where #
is an integer), and then re-running the code, Stata will report detailed evaluations of the code up to the level requested. Using set tracedepth 1
does nothing. Using set tracedepth 2
will expand the evaluation of each command called by the main code, allowing detection of erroneous inputs to those commands by showing where those commands have stopped executing.
For example, running:
local bad BAD sysuse auto`bad'.dta
returns:
. sysuse auto`bad'.dta file "autoBAD.dta" not found r(601);