Stata Coding Practices: Debugging
Debugging is the process of fixing runtime errors or other unexpected behaviors in Stata code. Unlike normal code execution, debugging involves intentionally preventing code from running completely so the user can investigate the current state of data or memory and determine what code would produce the desired outputs in a complete execution of the code.
Read First
Stata code is typically debugged using three tools: trace
, pause
, and display
. Understanding how to use these tools interactively will allow you to pinpoint and correct errors in code.
- The
trace
command requests that Stata display additional levels of detail as it executes code. Nearly all Stata code and commands use other commands as part of their execution, and an appropriately settrace
readout will allow you to pinpoint the line or command where an issue is occurring. - The
pause
command freezes the execution of Stata code in the designated location and optionally displays information about Stata's memory state. While paused, Stata allows the user to interact with the data as it is at that point in code execution and to permit code execution to continue when finished. - The
display
command is used to extract specific information about Stata's current state without pausing or tracing all code execution. Since Stata has rich information on hand at all times about macros and data in memory, it is possible to detect problems when unexpected or inappropriate values are displayed during code execution.
Principles of Code Debugging in Stata
When code execution produces errors or unexpected results, it is typically very difficult to determine what is causing an issue simply by looking at the code and results. Such a bug usually results from an intermediate state arising from a complex combination of code, which is not easily foreseeable when writing the code. Typical examples occur when code is generalized or looped such that code that is written or tested in a given environment is then applied to situations where it was not originally envisioned or tested. For example, a loop that includes conditional statements or data subsetting may result in empty datasets or undefined macros or expressions and cause later code to be invalid.
Debugging Stata code follows a three-step process each time such an issue is encountered. First, the author must determine where the issue is occurring, by finding the literal line of code that is producing the error or unexpected result. This is usually done with trace
or display
. Next, the author must determine why the input to that line is different than what they had anticipated, which typically involves inserting a pause
or display
command to assess the state of various memory elements at that point. Finally, the author needs to address the problem, which typically involves following the same process upstream in the code to find the line where the incorrect input is being generated. This process is iterated until the root cause is found and a code correction can be made.