Difference between revisions of "R Coding Practices"

Jump to: navigation, search
 
(10 intermediate revisions by 2 users not shown)
Line 9: Line 9:


==Package installation==
==Package installation==
R packages are collections of functions, data, and documentation that extend the functionality of base R. They are essential for everything from data cleaning to statistical modeling and visualization.
==Installing CRAN Packages==
CRAN (The Comprehensive R Archive Network) is the primary repository for R packages.
Here's an example to install a package from CRAN:
<code>install.packages("tidyverse")</code>
To install multiple packages:
<code>install.packages(c("tidyverse", "haven", "data.table"))</code>
To load the packages installed:
<code>library(tidyverse)</code>


==Comments and script structure==
==Comments and script structure==
Running code that returns the right result is only the first half of the job. The other half is making sure your code is easy to follow, test, and reuse. This helps teams catch mistakes, audit decisions, and collaborate more effectively. Poorly structured code increases the risk of errors.
'''1. Use header comments to introduce each script:'''
<code>##################################################
# Script: 01_clean_data.R
# Purpose: Clean raw baseline data
# Author: First Last
# Date: 2025-05-30
# Inputs: data/raw/baseline.csv
# Outputs: data/clean/baseline_clean.rds
##################################################</code>
'''2. Use section headers to structure code within each script'''
<code># Load packages -------------------------------------------------------
# Import data ---------------------------------------------------------
# Clean variables -----------------------------------------------------
# Save outputs --------------------------------------------------------</code>
'''Good syntax''' makes it easy to understand what the code is doing and why. You should:
*Use clear, expressive names for variables and objects (e.g., baseline_data instead of bd).
*Avoid deeply nested code and one-liners that sacrifice clarity.
*Write logic in small chunks—long chains of operations should be broken down or commented carefully.


==Naming objects==
==Naming objects==
In R, object names are one of the most fundamental tools for writing readable, maintainable, and collaborative code. This includes variable names, function names, data frame names, and any other user-defined object. Good naming helps future users (and your future self) quickly understand what your code is doing without constantly referring back to earlier lines.
 
'''General Principles'''
 
*'''Be descriptive:''' Use names that clearly describe what the object represents. For example, use <code>total_income</code> instead of <code>x</code> or <code>ti</code>.
 
*'''Use lowercase letters:''' Stick to lowercase letters and use underscores (_) to separate words. This style is consistent with tidyverse conventions and improves readability (household_id, not HouseholdID or houseHoldId).
 
*'''Avoid abbreviations:''' Unless widely recognized (e.g., GDP, ID, ISO), avoid abbreviations that may not be clear to others.
 
*'''Don't overwrite base functions:''' Avoid naming objects data, mean, sum, T, c, etc., which are already used by R and can lead to hard-to-spot bugs.
 
*'''Consistency is key:''' Pick a style and stick with it throughout your project (e.g., always use snake_case or always use dot.case, but don’t mix them).
 
'''For Example;'''
*<code>survey_data_2023</code>: Descriptive and specific
 
*<code>income_total</code>: Meaningful variable name
 
*<code>calculate_growth</code>: Action-based function name
 
*<code>beneficiary_status</code>: Clear and readable
 
==Style and white space==
==Style and white space==


Line 26: Line 91:
== Additional Resources ==
== Additional Resources ==
* [https://www.r-bloggers.com/r-code-best-practices/| R-bloggers post on best practices]
* [https://www.r-bloggers.com/r-code-best-practices/| R-bloggers post on best practices]
* DIME Analytics, World Bank [https://github.com/dime-wb-trainings/shiny-training?tab=readme-ov-file/ shiny training]
* Tidyverse Team, [https://style.tidyverse.org/ Tidyverse style guide]
==Related DIME Analytics Trainings==
* Training session recording on [https://osf.io/8sgrh/files/osfstorage/68387ddec1e467f319a4db5a/ Introduction to R Shiny]
* Training session recording on [https://osf.io/8sgrh/files/osfstorage/65ac8ac0b1f2b501d5b0ef7d/ Big data workflows with R <code>data.table</code>]
[[Category: Coding Practices]]

Latest revision as of 13:51, 3 June 2025

This article lays out some best practices for coding using R. Though it is possible to use R without it, the RStudio integrated development environment makes its use easier and is the standard among R users. There is not a single set of best practices and the guidelines below are suggestions that can and should be adapted the each project's needs, as well as users' preferences


Read First

  • RStudio
  • Comments
  • Objects names

Package installation

R packages are collections of functions, data, and documentation that extend the functionality of base R. They are essential for everything from data cleaning to statistical modeling and visualization.

Installing CRAN Packages

CRAN (The Comprehensive R Archive Network) is the primary repository for R packages.

Here's an example to install a package from CRAN:

install.packages("tidyverse")

To install multiple packages:

install.packages(c("tidyverse", "haven", "data.table"))

To load the packages installed:

library(tidyverse)

Comments and script structure

Running code that returns the right result is only the first half of the job. The other half is making sure your code is easy to follow, test, and reuse. This helps teams catch mistakes, audit decisions, and collaborate more effectively. Poorly structured code increases the risk of errors.


1. Use header comments to introduce each script:

##################################################
# Script: 01_clean_data.R
# Purpose: Clean raw baseline data
# Author: First Last
# Date: 2025-05-30
# Inputs: data/raw/baseline.csv
# Outputs: data/clean/baseline_clean.rds
##################################################

2. Use section headers to structure code within each script

# Load packages -------------------------------------------------------
# Import data ---------------------------------------------------------
# Clean variables -----------------------------------------------------
# Save outputs --------------------------------------------------------

Good syntax makes it easy to understand what the code is doing and why. You should:

  • Use clear, expressive names for variables and objects (e.g., baseline_data instead of bd).
  • Avoid deeply nested code and one-liners that sacrifice clarity.
  • Write logic in small chunks—long chains of operations should be broken down or commented carefully.

Naming objects

In R, object names are one of the most fundamental tools for writing readable, maintainable, and collaborative code. This includes variable names, function names, data frame names, and any other user-defined object. Good naming helps future users (and your future self) quickly understand what your code is doing without constantly referring back to earlier lines.

General Principles

  • Be descriptive: Use names that clearly describe what the object represents. For example, use total_income instead of x or ti.
  • Use lowercase letters: Stick to lowercase letters and use underscores (_) to separate words. This style is consistent with tidyverse conventions and improves readability (household_id, not HouseholdID or houseHoldId).
  • Avoid abbreviations: Unless widely recognized (e.g., GDP, ID, ISO), avoid abbreviations that may not be clear to others.
  • Don't overwrite base functions: Avoid naming objects data, mean, sum, T, c, etc., which are already used by R and can lead to hard-to-spot bugs.
  • Consistency is key: Pick a style and stick with it throughout your project (e.g., always use snake_case or always use dot.case, but don’t mix them).

For Example;

  • survey_data_2023: Descriptive and specific
  • income_total: Meaningful variable name
  • calculate_growth: Action-based function name
  • beneficiary_status: Clear and readable

Style and white space

Loops in R

Tidyverse

Version control

RStudio projects

Additional Resources

Related DIME Analytics Trainings