factorial <- function(n) {
result <- 0
for (i in 1:n) {
result <- result + i # Should be result <- result * i
}
return(result)
}
2 Debugging R Code
Debugging is an essential skill for any R programmer. It’s the process of finding and fixing errors or unexpected behaviors in your code. This chapter will cover various techniques and tools to help you effectively debug R code.
2.1 Types of Errors in R
Understanding the different types of errors you may encounter is the first step in debugging code. In R, we typically deal with four main categories of errors: syntax, runtime, logical, and semantic.
2.1.1 Syntax Errors
Syntax errors occur when the R interpreter cannot parse your code due to violations of the language’s syntax rules. These errors prevent your code from running at all. These errors are quicker to spot and fix because they prevent the code from running. Let’s look at a couple common syntax errors in R:
Cut down and/or move some examples to an appendix
2.1.1.1 Missing terms
Common syntax errors in R include missing terms in function calls or expressions. For example, if you forget to include a comma between arguments in a function call, R will throw an error:
plot(x, y type = "l")
# ^ Missing comma
# Error: unexpected symbol in "plot(x, y type"
Or, if you forget to include a plus sign in a linear model formula, R will throw an error:
lm(y ~ x1 + x2 x3, data = my_data)
# ^ Missing plus sign
# Error: unexpected symbol in "lm(y ~ x1 + x2 x3"
The more severe syntax errors appear when you forget to close parentheses or quotation marks. For example:
lm(y ~ x1 + x2, data = my_data
# ^ Missing closing parenthesis
# > lm(y ~ x1 + x2, data = my_data
# +
# ^ R is waiting for more input.
print("Hello, world!)
# ^ Missing quotation mark
# > print("Hello, world!)
# +
# ^ R is waiting for more input.
These syntax errors will require you to first stop R from waiting for more input by pressing Esc
or Ctrl + C
and then adding the missing parenthesis or quotation mark.
2.1.1.2 Case and Naming Sensitivity
R is case-sensitive, so you must use the correct case for function names, variable names, and other identifiers. For example, if you try to call a function with the wrong name, R will throw an error:
summry(my_data) # Should be summary()
# Error in summry(my_data) : could not find function "summry"
Similarly, if you use the wrong case for a variable name, R will not recognize the variable:
N_STUDENTS <- 20
y <- n_students + 5 # Should be N_STUDENTS
# Error: object 'n_students' not found
2.1.1.3 Wrong Operators
Using an assignment operator instead of the equality operator in a conditional statement.
subset(df, x = 1)
# ^ Should be x == 1
# Error in subset.default(df, x = 1) : 'subset' must be logical
2.1.1.4 Symbols instead of Strings
Using symbols instead of strings in function calls or assignments.
2.1.2 Runtime Errors
Runtime errors occur during the execution of your code. The code is syntactically correct but ends up encountering issues during execution. These errors can be caused by various reasons like trying to perform an operation on an object of the wrong type, accessing a non-existent variable, or using incorrect functions or data structures.
- Attempting to calculate a correlation with non-numeric data
df <- data.frame(x = c("a", "b", "c"), y = c(1, 2, 3))
cor(df)
# Error in cor(df) : 'x' must be numeric
In this example, the cor()
function expects numeric data, but the x
column contains character data, resulting in a runtime error when trying to calculate the correlation.
- Trying to access a non-existent variable.
print(non_existent_variable)
# Error: object 'non_existent_variable' not found
In this case, the variable non_existent_variable
doesn’t exist in the current environment, leading to a runtime error when trying to print it.
2.1.3 Logical Errors
Logical errors are the most subtle and often the hardest to detect. These errors don’t cause the program to crash or produce error messages, but they result in incorrect output. They occur when the code doesn’t do what you intended it to do.
Flawed algorithm for calculating factorial
Off-by-one errors:
2.1.4 Semantic Errors
Semantic errors occur when code is syntactically correct and runs without throwing exceptions, but produces incorrect or unintended results. These errors are logical flaws in the program that don’t align with the intended behavior. They can be challenging to identify because they don’t manifest as errors or warnings.
Take the following example of a function that checks if a person is an adult:
is_adult <- function(age) {
age > 18 & age < 65 # Should be age >= 18 & age < 65
}
is_adult(17)
[1] FALSE
is_adult(18)
[1] FALSE
is_adult(19)
[1] TRUE
In this case, the function is_adult()
incorrectly classifies 18-year-olds as non-adults. This is a semantic error that doesn’t produce an error message but produces incorrect results.
Add a section comparing and contrasting semantic vs. logical errors.
2.2 Understanding Error Messages
The first step in debugging is to understand the error messages R provides. R error messages typically include:
- The type of error
- The line number where the error occurred
- A brief description of the problem
For example:
in mean(x) : object 'x' not found Error
This error tells us that the function mean()
was called with an object x
, but x
doesn’t exist in the current environment. From this message, we can infer that x
is not defined or is out of scope.
2.3 Using print()
Statements
One of the simplest debugging techniques is to use print()
statements throughout your code. This helps you track the values of variables and the flow of your program.
When you run this function, you’ll see the printed messages in the console, which can help you identify where things are going wrong.
calculate_mean(c(1, 2, 3, 4, 5))
[1] "Entering function"
[1] "Data: 1" "Data: 2" "Data: 3" "Data: 4" "Data: 5"
[1] "Result: 3"
[1] 3
2.3.1 Using Diagnostic Messages
In addition to print()
statements, you can use message()
, warning()
, and stop()
to provide more informative diagnostic messages.
2.3.1.1 message()
The message()
function is similar to print()
but is intended for informational messages rather than debugging output.
When you run this function with an empty data vector, you’ll see the message “Data is empty” in the console. The function will continue to execute, but the message can help you identify potential issues.
2.3.1.2 warning()
The warning()
function is used to display non-fatal issues that may affect the results of your code.
When you run this function with an empty data vector, you’ll see a warning message in the console. The function will continue to execute, but the warning can help you identify potential problems. We can also use warnings()
to retrieve a list of all warnings that have been generated.
warnings()
We can modify the behavior of warnings by setting the global option warn
. In particular, warn
offers three levels of support:
-
options(warn = 0)
will suppress all warnings until the top-level function has completed. -
options(warn = 1)
will display warnings as they occur but not halt execution. -
options(warn = 2)
converts warnings into errors that will halt execution.
For debugging purposes, you may wish to use options(warn = 2)
to catch issues early in your code.
2.3.1.3 stop()
The stop()
function allows you to halt execution and display an error message. This can be useful for catching unexpected conditions in your code.
2.4 Using browser()
The browser()
function is a powerful tool for interactive debugging. When R encounters a browser()
statement, it pauses execution and allows you to inspect the current environment.
calculate_mean_browser <- function(data) {
browser()
result <- mean(data)
return(result)
}
calculate_mean_browser(1:5)
Called from: calculate_mean_browser(1:5)
debug: result <- mean(data)
debug: return(result)
[1] 3
When you run this function, R will pause at the browser()
call, allowing you to examine and manipulate variables. This is particularly useful for understanding the state of your program at a specific point in time.
When you’re in the browser, you can navigate through the code using the following commands:
-
ls()
to list the objects in the current environment. -
print(object)
to display the value of an object. -
c
to continue execution by exiting the browser. -
f
to finish the current loop or function. -
n
to step to the next line. -
s
to step into a function call. -
where
to display the call stack. -
r
to restart if you want to re-run the function. -
Q
to quit the debugger.
We’ll see in Section 2.8 that RStudio provides a more user-friendly interface for this type of debugging through icons and breakpoints.
2.5 Using an interactive debugger
The debug()
function allows you to step through a function line by line. We can use debug()
to immediately start debugging a function when it is called. For example:
debugging in: calculate_mean(c(1, 2, 3, 4, 5))
debug: {
if (length(data) == 0) {
message("Data is empty")
}
result <- mean(data)
return(result)
}
debug: if (length(data) == 0) {
message("Data is empty")
}
debug: result <- mean(data)
debug: return(result)
exiting from: calculate_mean(c(1, 2, 3, 4, 5))
[1] 3
Notice, the function calculate_mean
is now in debug mode. When you run the function, R will pause at the first line of the function and allow you to step through the code. You can use similar commands as we saw with browser()
to inspect variables and control the flow of the program.
If we want to remove the debugger, we would use:
undebug(calculate_mean)
Though, if we only want to debug the next call, we can use debugonce()
instead. This will only debug the next call to the function without needing to remove the debugger afterwards.
2.6 Using traceback()
traceback()
shows you the sequence of function calls that led to an error. It’s particularly useful for understanding errors in complex, nested function calls.
f <- function(x) g(x)
g <- function(x) h(x)
h <- function(x) x + "a"
f(10) # This will cause an error
Error in x + "a": non-numeric argument to binary operator
traceback()
# 3. h(x)
# 2. g(x)
# 1. f(10)
2.7 Implementing Error Handling
Add a section describing fail-fast principles
Proper error handling can make debugging easier. Use try()
, tryCatch()
, and custom error messages to manage potential issues.
safe_mean <- function(x) {
tryCatch(
mean(x),
error = function(e) {
message("An error occurred: ", e$message)
return(NA)
}
)
}
safe_mean("bad input")
Warning in mean.default(x): argument is not numeric or logical: returning NA
[1] NA
In this example, safe_mean()
attempts to calculate the mean of a vector. If an error occurs, it prints a message and returns NA
instead of halting execution.
2.8 Using RStudio’s Debugging Tools
If you’re using RStudio, take advantage of its built-in debugging tools:
- Breakpoints: Click next to a line number to set a breakpoint.
- Environment pane: Inspect variable values during debugging.
- Debug toolbar: Step through code, examine the call stack, and more.
Add screenshot of the debugger
2.9 Leveraging Package-Specific Debugging Tools
Some R packages provide their own debugging tools. For example, the debugr
package offers a more streamlined print debugging experience.
Add an example of using debugr
2.10 Common Debugging Pitfalls
Be aware of common issues that can complicate debugging:
- Scope issues: Ensure you’re looking at the correct environment.
- Data type mismatches: Check that your functions are receiving the expected data types.
- Missing values: Be cautious of
NA
values affecting your calculations.
2.11 Best Practices for Debuggable Code
Write code that’s easier to debug:
- Use meaningful variable names
- Break complex operations into smaller, testable functions
- Comment your code thoroughly
- Use version control to track changes
2.12 Debugging in Quarto
Add a section on debugging with Quarto
2.13 Summary
Through mastering these debugging techniques and tools, you’ll be well-equipped to tackle even the most challenging coding issues in R. Remember, effective debugging is not just about fixing errors—it’s about understanding your code more deeply and writing more robust programs.