2 Debugging R Code

Debugging is an essential skill for any R programmer. It’s the process of finding and fixing errors or unexpected behaviors in your code. This chapter will cover various techniques and tools to help you effectively debug R code.

2.1 Types of Errors in R

Understanding the different types of errors you may encounter is the first step in debugging code. In R, we typically deal with four main categories of errors: syntax, runtime, logical, and semantic.

2.1.1 Syntax Errors

Syntax errors occur when the R interpreter cannot parse your code due to violations of the language’s syntax rules. These errors prevent your code from running at all. These errors are quicker to spot and fix because they prevent the code from running. Let’s look at a couple common syntax errors in R:

To Do

Cut down and/or move some examples to an appendix

2.1.1.1 Missing terms

Common syntax errors in R include missing terms in function calls or expressions. For example, if you forget to include a comma between arguments in a function call, R will throw an error:

plot(x, y type = "l")
       # ^ Missing comma

# Error: unexpected symbol in "plot(x, y type"

Or, if you forget to include a plus sign in a linear model formula, R will throw an error:

lm(y ~ x1 + x2 x3, data = my_data)
            # ^ Missing plus sign

# Error: unexpected symbol in "lm(y ~ x1 + x2 x3"

The more severe syntax errors appear when you forget to close parentheses or quotation marks. For example:

lm(y ~ x1 + x2, data = my_data
                            # ^ Missing closing parenthesis

# > lm(y ~ x1 + x2, data = my_data
# +                            
# ^ R is waiting for more input.

print("Hello, world!)
                  # ^ Missing quotation mark

# > print("Hello, world!)
# + 
# ^ R is waiting for more input.

These syntax errors will require you to first stop R from waiting for more input by pressing Esc or Ctrl + C and then adding the missing parenthesis or quotation mark.

2.1.1.2 Case and Naming Sensitivity

R is case-sensitive, so you must use the correct case for function names, variable names, and other identifiers. For example, if you try to call a function with the wrong name, R will throw an error:

summry(my_data)  # Should be summary()

# Error in summry(my_data) : could not find function "summry"

Similarly, if you use the wrong case for a variable name, R will not recognize the variable:

N_STUDENTS <- 20
y <- n_students + 5  # Should be N_STUDENTS

# Error: object 'n_students' not found

2.1.1.3 Wrong Operators

Using an assignment operator instead of the equality operator in a conditional statement.

subset(df, x = 1)
           # ^ Should be x == 1

# Error in subset.default(df, x = 1) : 'subset' must be logical

2.1.1.4 Symbols instead of Strings

Using symbols instead of strings in function calls or assignments.

factor(c(male, female, male))  # Should be c("male", "female", "male")

# Error: object 'male' not found

2.1.2 Runtime Errors

Runtime errors occur during the execution of your code. The code is syntactically correct but ends up encountering issues during execution. These errors can be caused by various reasons like trying to perform an operation on an object of the wrong type, accessing a non-existent variable, or using incorrect functions or data structures.

Attempting to calculate a correlation with non-numeric data

df <- data.frame(x = c("a", "b", "c"), y = c(1, 2, 3))
cor(df)
# Error in cor(df) : 'x' must be numeric

In this example, the cor() function expects numeric data, but the x column contains character data, resulting in a runtime error when trying to calculate the correlation.

Trying to access a non-existent variable.

print(non_existent_variable)
# Error: object 'non_existent_variable' not found

In this case, the variable non_existent_variable doesn’t exist in the current environment, leading to a runtime error when trying to print it.

2.1.3 Logical Errors

Logical errors are the most subtle and often the hardest to detect. These errors don’t cause the program to crash or produce error messages, but they result in incorrect output. They occur when the code doesn’t do what you intended it to do.

Flawed algorithm for calculating factorial

factorial <- function(n) {
  result <- 0
  for (i in 1:n) {
    result <- result + i  # Should be result <- result * i
  }
  return(result)
}

Off-by-one errors:

days_in_month <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
for (i in 0:12) {  # Should be 1:12
 print(days_in_month[i])
}

numeric(0)
[1] 31
[1] 28
[1] 31
[1] 30
[1] 31
[1] 30
[1] 31
[1] 31
[1] 30
[1] 31
[1] 30
[1] 31

2.1.4 Semantic Errors

Semantic errors occur when code is syntactically correct and runs without throwing exceptions, but produces incorrect or unintended results. These errors are logical flaws in the program that don’t align with the intended behavior. They can be challenging to identify because they don’t manifest as errors or warnings.

Take the following example of a function that checks if a person is an adult:

is_adult <- function(age) {
  age > 18 & age < 65  # Should be age >= 18 & age < 65
}


is_adult(17)

[1] FALSE

is_adult(18)

[1] FALSE

is_adult(19)

[1] TRUE

In this case, the function is_adult() incorrectly classifies 18-year-olds as non-adults. This is a semantic error that doesn’t produce an error message but produces incorrect results.

To Do

Add a section comparing and contrasting semantic vs. logical errors.

2.2 Understanding Error Messages

The first step in debugging is to understand the error messages R provides. R error messages typically include:

The type of error
The line number where the error occurred
A brief description of the problem

For example:

Error in mean(x) : object 'x' not found

This error tells us that the function mean() was called with an object x, but x doesn’t exist in the current environment. From this message, we can infer that x is not defined or is out of scope.

2.3 Using `print()` Statements

One of the simplest debugging techniques is to use print() statements throughout your code. This helps you track the values of variables and the flow of your program.

calculate_mean <- function(data) {
  print("Entering function")
  print(paste("Data:", data))
  result <- mean(data)
  print(paste("Result:", result))
  return(result)
}

When you run this function, you’ll see the printed messages in the console, which can help you identify where things are going wrong.

calculate_mean(c(1, 2, 3, 4, 5))

[1] "Entering function"
[1] "Data: 1" "Data: 2" "Data: 3" "Data: 4" "Data: 5"
[1] "Result: 3"

[1] 3

2.3.1 Using Diagnostic Messages

In addition to print() statements, you can use message(), warning(), and stop() to provide more informative diagnostic messages.

2.3.1.1 `message()`

The message() function is similar to print() but is intended for informational messages rather than debugging output.

calculate_mean <- function(data) {
  if (length(data) == 0) {
    message("Data is empty")
  }
  result <- mean(data)
  return(result)
}

When you run this function with an empty data vector, you’ll see the message “Data is empty” in the console. The function will continue to execute, but the message can help you identify potential issues.

2.3.1.2 `warning()`

The warning() function is used to display non-fatal issues that may affect the results of your code.

calculate_mean_with_warning <- function(data) {
  if (length(data) == 0) {
    warning("Data is empty")
  }
  result <- mean(data)
  return(result)
}

When you run this function with an empty data vector, you’ll see a warning message in the console. The function will continue to execute, but the warning can help you identify potential problems. We can also use warnings() to retrieve a list of all warnings that have been generated.

warnings()

We can modify the behavior of warnings by setting the global option warn. In particular, warn offers three levels of support:

options(warn = 0) will suppress all warnings until the top-level function has completed.
options(warn = 1) will display warnings as they occur but not halt execution.
options(warn = 2) converts warnings into errors that will halt execution.

For debugging purposes, you may wish to use options(warn = 2) to catch issues early in your code.

2.3.1.3 `stop()`

The stop() function allows you to halt execution and display an error message. This can be useful for catching unexpected conditions in your code.

calculate_mean_with_stop <- function(data) {
  if (length(data) == 0) {
    stop("Data is empty")
  }
  result <- mean(data)
  return(result)
}

calculate_mean_with_stop(numeric())

Error in calculate_mean_with_stop(numeric()): Data is empty

2.4 Using `browser()`

The browser() function is a powerful tool for interactive debugging. When R encounters a browser() statement, it pauses execution and allows you to inspect the current environment.

calculate_mean_browser <- function(data) {
  browser()
  result <- mean(data)
  return(result)
}

calculate_mean_browser(1:5)

Called from: calculate_mean_browser(1:5)
debug: result <- mean(data)
debug: return(result)

[1] 3

When you run this function, R will pause at the browser() call, allowing you to examine and manipulate variables. This is particularly useful for understanding the state of your program at a specific point in time.

When you’re in the browser, you can navigate through the code using the following commands:

ls() to list the objects in the current environment.
print(object) to display the value of an object.
c to continue execution by exiting the browser.
f to finish the current loop or function.
n to step to the next line.
s to step into a function call.
where to display the call stack.
r to restart if you want to re-run the function.
Q to quit the debugger.

We’ll see in Section 2.8 that RStudio provides a more user-friendly interface for this type of debugging through icons and breakpoints.

2.5 Using an interactive debugger

The debug() function allows you to step through a function line by line. We can use debug() to immediately start debugging a function when it is called. For example:

debug(calculate_mean)
calculate_mean(c(1, 2, 3, 4, 5))

debugging in: calculate_mean(c(1, 2, 3, 4, 5))
debug: {
    if (length(data) == 0) {
        message("Data is empty")
    }
    result <- mean(data)
    return(result)
}
debug: if (length(data) == 0) {
    message("Data is empty")
}
debug: result <- mean(data)
debug: return(result)
exiting from: calculate_mean(c(1, 2, 3, 4, 5))

[1] 3

Notice, the function calculate_mean is now in debug mode. When you run the function, R will pause at the first line of the function and allow you to step through the code. You can use similar commands as we saw with browser() to inspect variables and control the flow of the program.

If we want to remove the debugger, we would use:

undebug(calculate_mean)

Though, if we only want to debug the next call, we can use debugonce() instead. This will only debug the next call to the function without needing to remove the debugger afterwards.

2.6 Using `traceback()`

traceback() shows you the sequence of function calls that led to an error. It’s particularly useful for understanding errors in complex, nested function calls.

f <- function(x) g(x)
g <- function(x) h(x)
h <- function(x) x + "a"

f(10)  # This will cause an error

Error in x + "a": non-numeric argument to binary operator

traceback()
# 3. h(x)
# 2. g(x)
# 1. f(10)

2.7 Implementing Error Handling

To Do

Add a section describing fail-fast principles

Proper error handling can make debugging easier. Use try(), tryCatch(), and custom error messages to manage potential issues.

safe_mean <- function(x) {
  tryCatch(
    mean(x),
    error = function(e) {
      message("An error occurred: ", e$message)
      return(NA)
    }
  )
}

safe_mean("bad input")

Warning in mean.default(x): argument is not numeric or logical: returning NA

[1] NA

In this example, safe_mean() attempts to calculate the mean of a vector. If an error occurs, it prints a message and returns NA instead of halting execution.

2.8 Using RStudio’s Debugging Tools

If you’re using RStudio, take advantage of its built-in debugging tools:

Breakpoints: Click next to a line number to set a breakpoint.
Environment pane: Inspect variable values during debugging.
Debug toolbar: Step through code, examine the call stack, and more.

To Do

Add screenshot of the debugger

2.9 Leveraging Package-Specific Debugging Tools

Some R packages provide their own debugging tools. For example, the debugr package offers a more streamlined print debugging experience.

To Do

Add an example of using debugr

2.10 Common Debugging Pitfalls

Be aware of common issues that can complicate debugging:

Scope issues: Ensure you’re looking at the correct environment.
Data type mismatches: Check that your functions are receiving the expected data types.
Missing values: Be cautious of NA values affecting your calculations.

2.11 Best Practices for Debuggable Code

Write code that’s easier to debug:

Use meaningful variable names
Break complex operations into smaller, testable functions
Comment your code thoroughly
Use version control to track changes

2.12 Debugging in Quarto

To Do

Add a section on debugging with Quarto

2.13 Summary

Through mastering these debugging techniques and tools, you’ll be well-equipped to tackle even the most challenging coding issues in R. Remember, effective debugging is not just about fixing errors—it’s about understanding your code more deeply and writing more robust programs.

2.1 Types of Errors in R

2.1.1 Syntax Errors

2.1.1.1 Missing terms

2.1.1.2 Case and Naming Sensitivity

2.1.1.3 Wrong Operators

2.1.1.4 Symbols instead of Strings

2.1.2 Runtime Errors

2.1.3 Logical Errors

2.1.4 Semantic Errors

2.2 Understanding Error Messages

2.3 Using print() Statements

2.3.1 Using Diagnostic Messages

2.3.1.1 message()

2.3.1.2 warning()

2.3.1.3 stop()

2.4 Using browser()

2.5 Using an interactive debugger

2.6 Using traceback()

2.7 Implementing Error Handling

2.8 Using RStudio’s Debugging Tools

2.9 Leveraging Package-Specific Debugging Tools

2.10 Common Debugging Pitfalls

2.11 Best Practices for Debuggable Code

2.12 Debugging in Quarto

2.13 Summary

2.3 Using `print()` Statements

2.3.1.1 `message()`

2.3.1.2 `warning()`

2.3.1.3 `stop()`

2.4 Using `browser()`

2.6 Using `traceback()`