Building a Error Library

#rust

Error handling is one of those things that quietly shapes the entire experience of a system. When it works well, users recover quickly, support teams stay efficient, and developers can focus on real problems. When it doesn’t, even small issues turn into long, frustrating investigations. To improve this, it helps to start with a simple question: what do we actually want from our errors? A big part of the problem with many existing approaches is that they treat all errors more or less the same. In practice, though, not all errors are equal. There’s an important difference between something the client did wrong and something that failed inside the system—and that difference should shape how we handle them.

When a client makes a mistake—bad input, incorrect API usage, missing data—the system should make it obvious what went wrong and how to fix it. That means returning clear, human-readable messages, along with structured error codes for programmatic use. It also means logging enough context and publishing metrics so patterns can be spotted over time. Ideally, clients should be able to resolve these issues on their own, without needing to contact support.

At the same time, if support does get involved, they should have everything they need to understand the issue quickly. The goal is to avoid pulling in developers for cases that aren’t actually system problems.

Service errors are a different story. These are failures inside the system or in its dependencies, and they require a different kind of response. Instead of focusing on user clarity, the focus shifts to fast diagnosis. Errors should automatically trigger alerts, include detailed diagnostic information, and make it easier to identify which component—or even which upstream service—caused the issue.
A good error library connects all of these needs. It gives clients clarity, gives support teams enough context to investigate, and gives developers the signals they need to quickly find and fix real problems.

Type of Error

Every interaction in the system—whether it’s a function call, API request, internal service, or upstream dependency—ultimately results in one of three outcomes: success, a client error, or a service error.

Making this distinction explicit is important because it drives how the system responds, who needs to act, and what kind of information should be surfaced. At its simplest, this classification can be represented with a small enum:

pub enum ResultType {
    Success,
    ClientError,
    ServiceError,
}

With this in place, every response can be consistently tagged with a specific type. That single piece of information becomes a powerful signal: it determines whether we return a user-friendly message, enable self-service debugging, or trigger alerts and deeper diagnostics for engineers.

Error Classification

In practice, errors are rarely simple. A single request can pass through multiple subsystems, and failures can happen at different stages for different reasons. What appears as a single error often represents a combination of underlying issues, each with its own meaning and implications.

For example, a client might send malformed data, which should be treated as a BadRequest. At the same time, a downstream dependency might fail due to a dropped connection—an entirely different category of problem. Even though both issues surface during the same flow, they should be classified differently because they require different responses.

To make this manageable, we can introduce a classification layer using simple enums that describe the nature of the error. These classifications are more specific than the high-level ResultType and help standardize how we reason about failures across the system.

For instance, authentication errors categories might look like this:

pub enum AuthErrorType {
    BadRequest,
    BadAuthenticationMethod,
    AuthenticationDenied,
    Service,
}

This approach establishes a shared vocabulary within a domain while remaining lightweight and easy to reuse. The same classifications can be used across APIs and services, avoiding duplication and ensuring consistency. Instead of reinventing error categories in each place, we build a unified system of error types that behave consistently everywhere.

We can then map these domain-specific classifications back to the higher-level ResultType. This allows us to automatically determine whether an error is a client or service issue based on its classification:

impl ResultKind for AuthErrorType {
    fn tp(&self) -> ResultType {
        match self {
            AuthErrorType::BadRequest
            | AuthErrorType::BadAuthenticationMethod
            | AuthErrorType::AuthenticationDenied => ResultType::ClientError,

            AuthErrorType::Service => ResultType::ServiceError,
        }
    }

    fn signature(&self) -> &'static str {
        match self {
            AuthErrorType::BadRequest => "Auth-BadRequest",
            AuthErrorType::BadAuthenticationMethod => "Auth-BadAuthMethod",
            AuthErrorType::AuthenticationDenied => "Auth-Unauthenticated",
            AuthErrorType::Service => "Auth-Service",
        }
    }
}

This mapping separates what the error means from how it is handled. The classification captures domain-specific detail, while the result type drives behavior—how the system responds, whether the issue is surfaced to the client, and how it is monitored internally.

By keeping classification and outcome distinct, the model remains both expressive and consistent: errors carry precise meaning at the domain level while fitting into a unified system-wide structure.

At its core, this can be expressed with a simple trait:

pub trait ResultKind {
    /// Returns the classification of the result (e.g. success, client error, service error).
    fn tp(&self) -> ResultType;

    /// Returns a stable identifier for the specific error classification.
    fn signature(&self) -> &'static str;
}

Reporting

The final step is reporting. Error reporting is tightly coupled to the error itself, and instead of reinventing the wheel, we can build on top of Rust’s standard Error trait. It already provides most of what we need to make error reporting both useful and consistent.

For public-facing errors, the primary representation should rely on fmt::Display. This is what clients see, so it needs to be clear, concise, and readable. It should communicate what went wrong without exposing unnecessary internal details.

For example, an error with the classification AuthErrorType::BadRequest could be represented like this:

Invalid SAS token signature
type: ClientError
signature: Auth-BadRequest

This format gives the client enough information to understand the issue and act on it, while also providing structured fields (like type and signature) that can be used for debugging or automation.

Service errors require a different approach. Here, the focus is on diagnostics rather than presentation. We want to capture as much useful context as possible, including nested errors and, when available, a backtrace.

This is where Debug and Error::source() come into play. The Debug implementation can be used for internal logging, allowing us to traverse and print the full chain of errors. By following Error::source(), we can expose underlying causes, which is especially important when dealing with failures in upstream services.

It’s worth keeping Debug output focused. There’s no need to dump every field of a struct—only the information that helps explain the failure. If an error is simply wrapping another error, it’s often enough to delegate to the nested error rather than adding redundant noise.

In this model, reporting naturally splits into two layers:  a clean, user-facing view via Display, and a rich, diagnostic view via Debug. Together, they ensure that errors are both understandable to clients and actionable for engineers.

As an additional improvement, we can include a service attribute in our error types. This allows us to explicitly indicate which *upstream service( is responsible for the failure, making it easier to trace issues across system boundaries and speeding up diagnosis in distributed environments.

With this in mind, error types should implement std::error::Error and expose a small amount of structured diagnostic information:

pub trait ErrorDiagnostic: std::error::Error {
    type Kind: ResultKind;

    /// Returns the classification of this error.
    fn kind(&self) -> Self::Kind;

    /// Returns the responsible service, if applicable.
    fn service(&self) -> Option<&'static str>;

    /// Returns a backtrace for debugging, if available.
    fn backtrace(&self) -> Option<&Backtrace>;
}

This keeps the design idiomatic while providing enough structure for classification, observability, and cross-service debugging.

Here is how an AuthError could look:

/// Error for use in authentication routines.
#[derive(Clone, Debug, thiserror::Error)]
pub enum AuthError {
    #[error("Bad request")]
    BadRequest(ErrorMessage),
    #[error("Bad authentication method")]
    BadAuthenticationMethod(ErrorMessage),
    #[error("Authentication denied")]
    AuthenticationDenied(ErrorMessage),
    #[error("Internal server error")]
    Internal(#[source] Box<dyn std::error::Error>),
}

impl ErrorDiagnostic for AuthError {
    type Kind = AuthErrorType;

    fn kind(&self) -> Self::Kind {
        match self {
            AuthError::BadRequest(_) => AuthErrorType::BadRequest,
            AuthError::BadAuthenticationMethod(_) => AuthErrorType::BadAuthenticationMethod,
            AuthError::AuthenticationDenied(_) => AuthErrorType::AuthenticationDenied,
            AuthError::Internal(_) => AuthErrorType::Service,
        }
    }
}

This design allows the system to present appropriate error messages to clients, generate meaningful client and server metrics, and trigger alerts when system-level issues occur.

Final notes

These ideas are reflected in the ntex-error crate, which applies this model in practice.

The approach is primarily suited to application development rather than library or framework design. For that reason, ntex-error focuses on providing building blocks rather than prescribing concrete error categories or types, leaving those decisions to the application domain.

It provides the core building blocks, including the Error container, ErrorDiagnostic and ResultKind traits, and utilities for error formatting, such as this fmt_diag helper.

This model builds on work by Max Gortman, originally developed in the context of private applications.