Difference between revisions of "Ieboilstart"

Jump to: navigation, search
 
Line 1: Line 1:
<onlyinclude>
<code>ieboilstart</code> is a Stata command that standardizes version, memory, and other Stata settings across all users for a project. Such code is usually referred to as boilerplate code, hence the command name. Research teams should standardize settings via boilerplate code throughout the course of a project – and especially during [[Randomization in Stata | randomization]] – to ensure that code behaves in a [[Reproducible Research | replicable]] manner between users. This page describes how to use <code>ieboilstart</code>.
'''ieboilstart''' is meant to help the user to set settings recommended to set at the top of do-files and to harmonize settings in the beginning of do-files between users.
</onlyinclude>
'''Disclaimer''': The only way to make sure that the code behave identical for two users in Stata is to for the users to run the same version of Stata on the same computer set up. Due to technical reasons, it is impossible to guarantee that different types of Stata (version number, Small/IC/SE/MP or PC/Mac/Linux) work exactly the same in every possible context. This command does not guarantee against any version discrepancies, it is solely a collection of common practices to reduce the risk of the same code running differently on different computers. See more details below.


This article is meant to describe use cases, work flow and the reasoning used when developing the commands. For instructions on how to use the command specifically in Stata and for a complete list of the options available, see the help files by typing <code>help ieboilstart</code> in Stata. This command is a part of the package [[Stata_Coding_Practices#ietoolkit|ietoolkit]], to install all the commands in this package including this command, type <code>ssc install ietoolkit</code> in Stata.
==Read First==
*<code>ieboilstart</code> does not guarantee against any discrepancies of computer setup or Stata type. It is the users’ responsibility to ensure that they are running the same type of Stata on the same computer setup (i.e. Small/IC/SE/MP or PC/Mac/Linux) .
*<code>ieboilstart</code> standardizes some settings by default (i.e. <code>set more off</code>, <code>set varabbrev off</code>) and allows users to specify additional settings through options. For detailed instructions on how to implement the command and its options in Stata, type <code>help ieboilstart</code> in Stata.
*This command is part of the package <code>[[Stata Coding Practices#ietoolkit | ietoolkit]]</code>. To install all commands in this package, including <code>ieboilstart</code>, type <code>ssc install ietoolkit</code> in Stata.


== Intended use cases ==
==Overview==
This command is intended to be run at the beginning of a do-file. If your project consist of many do-files being run from a master do-file, then you do not need to run this command from every do-file. Only run this command at the top of master do-files that run many other do-files and run this command at the top of do-files that includes randomization. Randomization is the one thing that is most sensitive to different version in Stata as the randomization algorithm is often updated between each version of Stata.
<code>ieboilstart</code> is a Stata command that standardizes version, memory, and other Stata settings across all users for a project. The code standardizes some settings by default and also allows users to specify additional settings through options. <code>ieboilstart</code> should be run at the top of all do-files to ensure identical results for all users. If a project consists of many do-files that are run from a [[Master Do-files | master do-file]], then it is only necessary to run <code>ieboilstart</code> 1) at the top of master do-files that run other do-files, and 2) at the top of do-files that includes randomization.  
 
==Disclaimer: Ensuring Identical Results==
== Instructions ==
Due to technical reasons, <code>ieboilstart</code> cannot guarantee that different types of Stata (Small/IC/SE/MP or PC/Mac/Linux) work exactly the same in every possible context. The command does not guarantee against any discrepancies in Stata or in under-contributed commands: it is solely a collection of common practices to reduce the risk of the same code running differently for different users.  
These instructions are meant to help you understand how to use the command. For technical instructions on how to implement the command in Stata see the help files by typing <code>help ieboilstart</code> in Stata.
== Implementation ==
 
<code>ieboilstart</code> sets three types of settings: ''version settings'', ''memory settings'' and ''other settings''.  
=== Important note on version ===
===Version Settings===
In order to set the correctly set the version two steps are required. You must first run the command and then on the row immediately after wards call one of the returned values. The reason for this is that you cannot set the version for a user inside the command. All version settings done inside a command are reverted once the command come to an end. See the code below:
As impact evaluations and other research projects often span the course of many years, users over time will likely run the same code in different versions of Stata. This may introduce discrepancies. For example, randomization is extremely sensitive to different Stata versions, since the randomization algorithm is often updated between each version of Stata. As such, research teams must use the same version of Stata and – as discussed above – ideally on the same computer setup to ensure that the code behaves identically across users.  
 
To correctly set the version:
<pre>iebolstart, version(14.0)
#Run <code>ieboilstart</code> with the <code>version</code> option (line 1)
#Call one of the returned values (line 2)
<pre>ieboilstart, version(14.0)
`r(version)'</pre>
`r(version)'</pre>
Setting the version is best practice and perhaps the most important setting to establish via <code>ieboilstart</code>.
===Memory Settings===
In Stata versions before Stata 12, memory is assigned statically. In other words, there is a fixed amount of memory assigned to Stata; if exceeding this amount when, for example, expanding a dataset or running a complex calculation, Stata crashes. In Stata 12 and later, memory is assigned dynamically. In other words, a little bit of memory is assigned to Stata when it is starts and is increased as needed. The only memory limit for Stata 12 and above is that dictated by the computer's hardware limits.


== Reasoning used during development ==
<code>ieboilstart</code> can set the fixed memory in Stata 11 with the option <code>setmem</code>. This option is simply ignored in Stata 12 or later. For Stata 12 and later, the dynamic memory can be fine-tuned through the commands <code>set  min_memory</code>, <code>set max_memory</code>, <code>set niceness</code>, and <code>set segmentsize</code>. However, even highly advanced users rarely have to worry about these settings as long as they are set to the recommended default values -- which <code>ieboilstart</code> ensures.
The settings set by ieboilstart are divided into three groups. ''Version settings'', ''memory settings'' and ''other settings''. The memory settings are all set to Stata's recommended values and the version settings are discussed above. The other commands are discussed here.
===Other Settings===
 
Other settings are standardized via <code>ieboilstart</code> as they are either very commonly preferred or reduce the risk of errors between users. These settings can be reverted to personal preferences after running <code>ieboilstart</code> or by using the <code>custom()</code> option.
=== set more off ===  
==== set more off ====
The ''set more'' setting in Stata tells Stata to either pause or keep running when the outputted results reaches the end of the result window. If ''more'' is set to ''on'', then the user has to tell Stata to resume the pause each time this happens that can be hundreds of times in a large project. ieboilstart sets ''more'' to permanently ''off''. Otherwise this setting defaults back to ''on'' each time Stata is restarted.
In <code>set more off</code>, which is <code>ieboilstart</code>’s default, Stata continues running until the results are complete rather than requiring the user to manually tell Stata to resume the pause each time the results reach the end of the window.
 
==== pause on ====
=== pause on ===  
In <code>pause on</code>, which is <code>ieboilstart</code> 's default, users can take advantage of Stata’s <code>pause</code> command. This is a great de-bugging tool. Type <code>help pause</code> in Stata for more details.
ieboilstart sets pause on, meaning that you can use Stata's ''pause'' command. This is a great de-bugging tool. Type <code>help pause</code> in Stata for more details.
==== set varabbrev off ====
 
In <code>varabbrev off</code>, which is <code>ieboilstart</code> 's default, variable abbreviation is set off to avoid errors. Otherwise, Stata allows variable abbreviation, meaning that if you have a variable called ''harvest,'' then you can call that variable by just typing <code>harv</code>, given no other variable starts with the letters ''h'', ''a'', ''r'', and ''v''. Copy and paste the code below and run it in Stata to see for yourself.
=== set varabbrev off ===
The default in Stata is to allow variable abbreviation (varabbrev). This means that if you have a variable called ''harvest'' then you can use that variable by just typing ''harv'' unless no other variable starts with the letters ''h'', ''a'', ''r'', and ''v''. Copy and paste the code below and run it in Stata to see for yourself.


<pre>
<pre>
Line 44: Line 47:
</pre>
</pre>


Using variable abbreviation is prone to strange errors, especially when several people collaborate on code, so therefore we recommend that variable abbreviation is turned off.
Variable abbreviation is prone to strange errors, especially when several people collaborate on code. Thus, <code>ieboilstart</code> by default sets this variable abbreviation off.  


== Back to Parent ==
== Back to Parent ==
This article is part of the topic [[Stata_Coding_Practices#ietoolkit|ietoolkit]]
This article is part of the topic [[Stata_Coding_Practices#ietoolkit|ietoolkit]]
 
==Additional Resources==
[[Category: Software Tools]]
[[Category: Software Tools]]

Latest revision as of 20:15, 4 June 2019

ieboilstart is a Stata command that standardizes version, memory, and other Stata settings across all users for a project. Such code is usually referred to as boilerplate code, hence the command name. Research teams should standardize settings via boilerplate code throughout the course of a project – and especially during randomization – to ensure that code behaves in a replicable manner between users. This page describes how to use ieboilstart.

Read First

  • ieboilstart does not guarantee against any discrepancies of computer setup or Stata type. It is the users’ responsibility to ensure that they are running the same type of Stata on the same computer setup (i.e. Small/IC/SE/MP or PC/Mac/Linux) .
  • ieboilstart standardizes some settings by default (i.e. set more off, set varabbrev off) and allows users to specify additional settings through options. For detailed instructions on how to implement the command and its options in Stata, type help ieboilstart in Stata.
  • This command is part of the package ietoolkit. To install all commands in this package, including ieboilstart, type ssc install ietoolkit in Stata.

Overview

ieboilstart is a Stata command that standardizes version, memory, and other Stata settings across all users for a project. The code standardizes some settings by default and also allows users to specify additional settings through options. ieboilstart should be run at the top of all do-files to ensure identical results for all users. If a project consists of many do-files that are run from a master do-file, then it is only necessary to run ieboilstart 1) at the top of master do-files that run other do-files, and 2) at the top of do-files that includes randomization.

Disclaimer: Ensuring Identical Results

Due to technical reasons, ieboilstart cannot guarantee that different types of Stata (Small/IC/SE/MP or PC/Mac/Linux) work exactly the same in every possible context. The command does not guarantee against any discrepancies in Stata or in under-contributed commands: it is solely a collection of common practices to reduce the risk of the same code running differently for different users.

Implementation

ieboilstart sets three types of settings: version settings, memory settings and other settings.

Version Settings

As impact evaluations and other research projects often span the course of many years, users over time will likely run the same code in different versions of Stata. This may introduce discrepancies. For example, randomization is extremely sensitive to different Stata versions, since the randomization algorithm is often updated between each version of Stata. As such, research teams must use the same version of Stata and – as discussed above – ideally on the same computer setup to ensure that the code behaves identically across users. To correctly set the version:

  1. Run ieboilstart with the version option (line 1)
  2. Call one of the returned values (line 2)
ieboilstart, version(14.0)
`r(version)'

Setting the version is best practice and perhaps the most important setting to establish via ieboilstart.

Memory Settings

In Stata versions before Stata 12, memory is assigned statically. In other words, there is a fixed amount of memory assigned to Stata; if exceeding this amount when, for example, expanding a dataset or running a complex calculation, Stata crashes. In Stata 12 and later, memory is assigned dynamically. In other words, a little bit of memory is assigned to Stata when it is starts and is increased as needed. The only memory limit for Stata 12 and above is that dictated by the computer's hardware limits.

ieboilstart can set the fixed memory in Stata 11 with the option setmem. This option is simply ignored in Stata 12 or later. For Stata 12 and later, the dynamic memory can be fine-tuned through the commands set min_memory, set max_memory, set niceness, and set segmentsize. However, even highly advanced users rarely have to worry about these settings as long as they are set to the recommended default values -- which ieboilstart ensures.

Other Settings

Other settings are standardized via ieboilstart as they are either very commonly preferred or reduce the risk of errors between users. These settings can be reverted to personal preferences after running ieboilstart or by using the custom() option.

set more off

In set more off, which is ieboilstart’s default, Stata continues running until the results are complete rather than requiring the user to manually tell Stata to resume the pause each time the results reach the end of the window.

pause on

In pause on, which is ieboilstart 's default, users can take advantage of Stata’s pause command. This is a great de-bugging tool. Type help pause in Stata for more details.

set varabbrev off

In varabbrev off, which is ieboilstart 's default, variable abbreviation is set off to avoid errors. Otherwise, Stata allows variable abbreviation, meaning that if you have a variable called harvest, then you can call that variable by just typing harv, given no other variable starts with the letters h, a, r, and v. Copy and paste the code below and run it in Stata to see for yourself.

clear
set obs 100
set varabbrev on

//Generate a tomato harvest variable and sum it using variable abbreviation
generate harvest_tomato  = uniform()
summarize harv

//Generate a potato harvest variable and try to sum it using variable abbreviation
generate harvest_potato = uniform()
summarize harv

Variable abbreviation is prone to strange errors, especially when several people collaborate on code. Thus, ieboilstart by default sets this variable abbreviation off.

Back to Parent

This article is part of the topic ietoolkit

Additional Resources