Structured Errors in Plumber APIs

#r #plumber

If you’ve used the Plumber package to make R models or other code accessible to others via an API, sooner or later you will need to decide how to handle and report errors.

By default, Plumber will catch R-level errors (like calls to stop()) and report them to users of your API as a JSON-encoded error message with HTTP status code 500 – also known as Internal Server Error. This might look something like the following from the command line:

$ curl -v localhost:8000/
> GET /status HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 500 Internal Server Error
< Date: Sun, 24 Mar 2019 22:56:27 GMT
< Content-Type: application/json
< Date: Sun, 24 Mar 2019 10:56:27 PM GMT
< Connection: close
< Content-Length: 97
< 
* Closing connection 0
{"error":["500 - Internal server error"],"message":["Error: Missing required 'id' parameter.\n"]}

There are two problems with this approach: first, it gives you almost zero control over how errors are reported to real users, and second, it’s badly behaved at the protocol level – HTTP status codes provide for much more granular and semantically meaningful error reporting.

In my view, the key to overcoming these problems is treating errors as more than simply a message and adding additional context when they are emitted. This is sometimes called structured error handling , and although it has not been used much historically in R, this may be changing. As you’ll see, we can take advantage of R’s powerful condition system to implement rich error handling and reporting for Plumber APIs with relative ease.

But first, it’s worth asking precisely what we want to get out of such an error handling system – that is, how can we distinguish errors we want our users to see?

Operational vs. Programmer Errors

Part of the issue here is that R (and Plumber) treat all errors as essentially the same, when in practice this is not the case.

The folks at Joyent coined the terms operational and programmer errors in the context of Javascript, and I think this distinction is apt for R as well. To quote the article at length:

People use the term “errors” to talk about both operational and programmer errors, but they’re really quite different. Operational errors are error conditions that all correct programs must deal with, and as long as they’re dealt with, they don’t necessarily indicate a bug or even a serious problem. “File not found” is an operational error, but it doesn’t necessarily mean anything’s wrong. It might just mean the program has to create the file it’s looking for first.

By contrast, programmer errors are bugs. They’re cases where you made a mistake, maybe by forgetting to validate user input, mistyping a variable name, or something like that. By definition there’s no way to handle those. If there were, you would have just used the error handling code in place of the code that caused the error!

In the context of Plumber APIs, we want to notify users of operational errors, because we require that they address these errors in order to use the API correctly. Programmer errors, on the other hand, might generate bizarre or misleading messages – so it’s not clear we want users to see them at all. At the same time, it is very important that we see them so that we can start to track down the underlying bugs that caused them.

Operational and programmer errors also have a very natural expression in terms of HTTP status codes; for the most part, 4xx codes are for client (operational) errors, and 5xxcodes are for server (programmer) errors.

By design, Plumber assumes that any error it encounters while running your code is a programmer error. This is the right default, but it does mean that you need to go out of your way to report operational errors instead. You can see this clearly by attempting to use R-style errors in a Plumber API.

R-Style Error Handling in Plumber

Suppose we have the following simple plumber.R file, which allows users to query the status of some familiar institutional patients:

records <- data.frame(
  id = 1:3,
  name = c("George", "Sally", "Michael"),
  admitted = c("2018-01-03", "2018-04-14", "2018-05-26"),
  released = c("2018-11-27", "2018-12-25", NA)
)

#* @param id:numeric* The patient's ID number.
#* @serializer unboxedJSON
#* @get /status
status <- function(id = NULL) {
  id <- as.integer(id)
  record <- records[records$id == id,]
  record$status <- if (!is.na(record$released)) "Released" else "Admitted"

  unclass(record)
}

You can run this in the usual way with

server <- plumber::plumber("plumber.R")
server$run(port = 8000, debug = TRUE, swagger = FALSE)

And normal queries should look like the following from the command line (withcurl and jq):

$ curl -s localhost:8000/status?id=2 | jq
{
  "id": 2,
  "name": "Sally",
  "admitted": "2018-04-14",
  "released": "2018-12-25",
  "status": "Released"
}
$ curl -s localhost:8000/status?id=3 | jq
{
  "id": 3,
  "name": "Michael",
  "admitted": "2018-05-26",
  "released": null,
  "status": "Admitted"
}

Of course, there are a number of ways the endpoint could fail, so let’s add some R-style error handling:

status <- function(id) {
  if (missing(id)) {
    stop("Missing required 'id' parameter.", call. = FALSE)
  }
  id <- suppressWarnings(as.integer(id))
  if (is.na(id)) {
    stop("The 'id' parameter must be a positive integer.", call. = FALSE)
  }

  record <- records[records$id == id,]
  if (nrow(record) == 0) {
    stop("No patient found with id: ", id, ".", call. = FALSE)
  }
  record$status <- if (!is.na(record$released)) "Released" else "Admitted"

  unclass(record)
}

We can then test some error conditions:

$ curl localhost:8000/status | jq
{
  "error": "500 - Internal server error",
  "message": "Error: Missing required 'id' parameter.\n"
}
$ curl localhost:8000/status?id=cats | jq
{
  "error": "500 - Internal server error",
  "message": "Error: The 'id' parameter must be a positive integer.\n"
}
$ curl localhost:8000/status?id=4 | jq
{
  "error": "500 - Internal server error",
  "message": "Error: No patient found with id: 4.\n"
}

You might notice that I passed debug = TRUE to the run() method above; this is because Plumber will only show the message field in the error responses in “debug” mode. This is partly for privacy – error messages could expose internal state you’d prefer users not to see – but it also in recognition of the point I made above: random R error messages are rarely helpful to users.

Plumber’s default error handler does a few useful things:

Prints the error to the console, so we can see it on the server side. This is absolutely essential for tracking down bugs.
Sets the status code to 500; and
Adds the error message to the response (as the message field you see above),but only when running in debug mode.

Unfortunately, this all means that we can’t use the default handler to send operational error messages back to the user. Instead, we can circumvent it by constructing error responses manually, or override it with smarter code.

Manual Error Reporting

To generate useful operational errors for users, we need to do two things: first, come up with a meaningful payload for errors; and second, ensure that errors set an appropriate HTTP status code. Both of these can be accomplished by manually modifying the response object that Plumber exposes as the magic parameter res.

There are many, many different takes on how to report errors in JSON; I’m going to use a pretty simple one here and include just a status code¹ and a message. For example:

{
  "status": 400,
  "message": "Missing required parameter."
}

Similarly, there is some debate on how to map errors like “invalid parameter” to HTTP status codes, but here I’ll use 400. Both 422 and 409 are common alternatives. For the case when a patient can’t be found, I also think it make sense to use 404.

status <- function(id, res) {
  if (missing(id)) {
    res$status <- 400
    res$body <- jsonlite::toJSON(auto_unbox = TRUE, list(
      status = 400,
      message = "Missing required 'id' parameter."
    ))
    return(res)
  }
  id <- suppressWarnings(as.integer(id))
  if (is.na(id)) {
    res$status <- 400
    res$body <- jsonlite::toJSON(auto_unbox = TRUE, list(
      status = 400,
      message = "The 'id' parameter must be a positive integer."
    ))
    return(res)
  }

  record <- records[records$id == id,]
  if (nrow(record) == 0) {
    res$status <- 404
    res$body <- jsonlite::toJSON(auto_unbox = TRUE, list(
      status = 404,
      message = paste0("No patient found with id: ", id, ".")
    ))
    return(res)
  }
  record$status <- if (!is.na(record$released)) "Released" else "Admitted"

  unclass(record)
}

This gives us much nicer, more meaningful errors we can safely pass down to users of the API:

$ curl -s localhost:8000/status | jq
{
  "status": 400,
  "message": "Missing required 'id' parameter."
}
$ curl -s localhost:8000/status?id=moose | jq
{
  "status": 400,
  "message": "The 'id' parameter must be a positive integer."
}
$ curl -s localhost:8000/status?id=4 | jq
{
  "status": 404,
  "message": "No patient found with id: 4."
}

The code to manipulate res objects for error handling ends up involving a lot of copy & paste, especially for larger APIs where you want to report certain classes of errors in a standard way. Ideally, we want to provide some helper functions so that API authors do the right thing without needing to copy so much code.

Emitting Errors via Custom Conditions

The underlying machinery that powers R’s stop(), warning(), and message()is the concept of a condition. We can construct and “signal” error-like conditions using a simple S3 object that inherits from the "error" class:

api_error <- function(message, status) {
  err <- structure(
    list(message = message, status = status),
    class = c("api_error", "error", "condition")
  )
  signalCondition(err)
}

# Works like stop():
api_error("Bad request.", 400)
#> Error: Bad request.

Moreover, since these are S3 objects, we can use the class attribute to sort out which errors are purposeful, operational errors that need to be reported to the user, and those that are not:

error_handler <- function(req, res, err) {
  if (!inherits(err, "api_error")) {
    res$status <- 500
    res$body <- "{\"status\":500,\"message\":\"Internal server error.\"}"

    # Print the internal error so we can see it from the server side. A more
    # robust implementation would use proper logging.
    print(err)
  } else {
    # We know that the message is intended to be user-facing.
    res$status <- err$status
    res$body <- sprintf(
      "{\"status\":%d,\"message\":\"%s\"}", err$status, err$message
    )
  }
  res
}

# Add this to the server with
# server$setErrorHandler(error_handler)

I’d also advise writing some helper methods, like the following:

not_found <- function(message = "Not found.") {
  api_error(message = message, status = 404)
}

missing_params <- function(message = "Missing required parameters.") {
  api_error(message = message, status = 400)
}

invalid_params <- function(message = "Invalid parameter value(s).") {
  api_error(message = message, status = 400)
}

These helper functions allow us to simplify and clarify the code so that it is as concise and familiar looking as it was when we were using stop():

status <- function(id, res) {
  if (missing(id)) {
    missing_params("Missing required 'id' parameter.")
  }
  id <- suppressWarnings(as.integer(id))
  if (is.na(id)) {
    invalid_params("The 'id' parameter must be a positive integer.")
  }

  record <- records[records$id == id,]
  if (nrow(record) == 0) {
    not_found(paste0("No patient found with id: ", id, "."))
  }
  record$status <- if (!is.na(record$released)) "Released" else "Admitted"

  unclass(record)
}

Using a custom error handler and the structured error support of S3 conditions, we now have a way to emit operational errors with ease and a consistent JSON error reporting format. This is an essential piece of providing a robust, user-friendly Plumber API.

I like having the original status code as part of the error payload. That way, even if I don’t have access to the full original request (e.g. someone just copy & pasted the error message to me, or it’s not in the logs, or a proxy along the way did not forward it appropriately), I still have a good idea where to look. ↩