Fedor

Posted on Mar 25

Are you sure you know how to handle errors in Go properly?

#go #architecture #backend #productivity

If you've been writing in Go for more than a couple of months, the if err != nil construct is already muscle memory. The language's philosophy dictates a simple rule: we don't ignore errors, we don't hide them, and we don't rely on global exception catchers - we handle them explicitly.

But it's one thing to write a three-hundred-line script, and quite another to design a system that needs to be maintained for years. In large projects, improper error handling makes logs unreadable, and finding the root cause of a bug takes hours. In this article, we'll discuss how to build effective error tracing in a clean architecture, avoid code duplication, and properly pass the problem's context through all application layers.

Panic? What panic?

It would seem that the "always return the error up the stack" rule is the absolute baseline. That's what basic Go guides tell you. Actually, it's kind of true, but not entirely. It's like saying a brick doesn't taste good - technically true, but it doesn't quite capture the whole picture. So what about panic()?

There is one pattern for situations where an error is fatal and the application physically cannot continue working - functions with the Must prefix. For example, when loading the configuration during application startup. If there is no config, the database has nothing to connect to, and there is no reason for the server to start. Returning an error here is pointless - you need to crash.

Here is what it looks like in code:

// package config

func Load() (*Config, error) {
    // get the env path from flags
    cfg := &Config{
        // some default fields...
    }

    if err := godotenv.Load(envPath); err != nil {
        return nil, fmt.Errorf("load .env: %w", err)
    }

    if err := env.Parse(cfg); err != nil {
        return nil, fmt.Errorf("parse env: %w", err)
    }

    return cfg, nil
}

func MustLoad() *Config {
    cfg, err := Load()
    if err != nil {
        panic("failed to load config:", err.Error())
    }

    return cfg
}

If you look closely at the code, you'll notice a small detail: in Load(), we use %w, but we also have the good old %v in Go. Why the distinction?

The %w (wrap) verb preserves the original error inside the new one, allowing it to be programmatically extracted and inspected later via errors.Is(). On the other hand, %v (value) simply formats the error into a string, permanently cutting off any possibility of programmatic checks.

This raises an obvious question: why do we even need %w in the Load() function if it’s called exactly once by MustLoad(), which is just going to panic anyway? No one is going to inspect that error in a dying process.

The answer: we don't. In this specific case, it makes zero functional difference whether you wrap it or not - the application is about to hit a brick wall. But through the lens of architectural canons, Load() is an independent entity. It doesn't know who will call it tomorrow. Perhaps you'll write a CLI utility that tries to load the config and, if it fails, falls back to a default one. Therefore, the function is obliged to honor its contract: wrap the error and pass it up.

MustLoad(), knowing that a panic is imminent, simply crashes with a formatted message. We don't need fmt.Errorf inside a panic() here; fmt is for wrapping or formatting errors, and at the end of the line, a simple string is enough. It’s a pattern strictly for the application’s entry point, like main(), where a failure to initialize makes further execution impossible. In reality, most developers just develop a habit of slapping %w everywhere on autopilot, and there’s nothing wrong with that. In a config loader, do whatever you like - it doesn't really matter.

Okay, we have sorted out the wrapping and must-prefix. But how do we make these wrapped errors show the exact place where our program crashed?

Why const op is the absolute baseline

Discussions often pop up in Go about how to get an error's stack trace. Many drag heavy third-party libraries into their projects, which use runtime.Caller under the hood. Let me be blunt: it's slow, redundant, and you simply don't need it.

The Go developers demonstrated a simpler and more elegant approach in the Upspin project: the op pattern. Each function defines an op (operation) constant with its name and, upon returning, wraps the error by appending this name. We get explicit tracing with zero performance loss.

Personally, I use the naming formula folder.subfolder.FunctionName. It's just a battle-tested approach that makes logs transparent and allows you to quickly pinpoint the point of failure.

But here lies a nuance that is often forgotten. Having gotten their hands on this pattern, developers start wrapping errors in literally every tiny function without fully studying the pattern. Yes, this is often just a banal misunderstanding. As a result, the log turns into an endlessly long and duplicated list that looks more like the Dublin Spire than an actual log. To avoid this, I highly recommend a rule: wrap errors only at layer boundaries or in the main method that calls smaller sub-functions (which, in turn, simply pass these errors up without wrapping).

And since we're talking about layer boundaries, it's high time we apply our theory to a real architecture and clear up the terminology a bit.

If you look at modern Go projects, you'll notice that the main entities live in the domain package, not model. Why is that? The term model was dragged in from the ancient Model-View-Controller architecture. But the name Domain reflects the essence much more accurately, because a domain is your business area. This is a direct borrowing from DDD, which really helps make the code readable. However, full-blown DDD in Go is often overkill, spawning debates about where to put interfaces and whether to use anemic or rich models (although with rich ones, the code is cleaner).

But let's get back to our errors. You might ask me again: author, why do we even need this Domain in the context of error handling? Because that is exactly where our custom business errors live. Things like domain.ErrNotFound, domain.ErrEmpty, or domain.ErrInvalidFormat are declared right there in the domain. This allows all application layers to know about them, check them via errors.Is, and avoid creating cross-layer dependencies.

Let's see what this looks like.

Layer 1: Adapter (Working with the outside world)

I intentionally avoid the term "repository". In modern realities, Postgres is just as much an external resource as Redis or a neighboring microservice's API. By calling this layer an Adapter, we unify the architectural approach, and the poor soul who has to read this crappy code won't have to spend ages figuring out "what's inside the repository". Whether it's a MongoDB unexpectedly used for caching for some reason, or Redis suddenly deciding to become the primary database for God knows what reason - it doesn't matter. They are all adapters, and all layers depend on interfaces, not on each other, to ensure modularity and testability.

Here is what a clean and proper method for fetching data from Postgres looks like:

func (s *StyleProfileDB) GetInstructionByID(ctx context.Context, id uuid.UUID) (*domain.Instruction, error) {
    // op formula: folder (postgres) + subfolder (styleprofile) + method. 
    // No need to write the word "adapter", "postgres" already provides full context.
    const op = "postgres.styleprofile.GetInstructionByID"

    query := "SELECT name, content, created_at FROM instructions WHERE id = $1"
    var ins domain.Instruction

    err := s.db.QueryRow(ctx, query, id).Scan(&ins.ID, &ins.Name, &ins.Content)
    if err != nil {
        // Check for a specific database error
        if errors.Is(err, sql.ErrNoRows) {
            // Translate the infrastructure error into a domain error
            return nil, fmt.Errorf("%s: %w", op, domain.ErrNotFound)
        }
        // Wrap any other error
        return nil, fmt.Errorf("%s: %w", op, err)
    }

    return &ins, nil
}

Here we use errors.Is(err, sql.ErrNoRows). The nuance is that errors.Is checks if the error matches a specific value. And pay attention to a crucial detail: we translate the infrastructure error (sql.ErrNoRows) into a domain error (domain.ErrNotFound). The business logic above shouldn't know that we are using SQL specifically, because it simply doesn't need that data and it's not meant for it; it operates on interfaces.

And one more thing: there is no logging in this code. Why? Because the adapter is a dumb component with zero knowledge of the business context. The fact that a record wasn't found might be an expected system scenario. Or it might not be - that's for the business to decide. In this case, logging at the database level will only lead to cluttering your monitoring.

So where do we handle and log it? The error, neatly wrapped in the Adapter, moves one level up.

Layer 2: Usecase (Business Logic)

Our error lands in the Usecase layer. This is the application level of our system that manages the business logic. Requests arrive here, system decisions are made here, and this is exactly where logging happens.

Data is passed between layers via DTOs (Data Transfer Objects). And here the question arises again: "Why should I breed new structs if I can just pass the domain model?".

The answer is simple: a DTO makes your Usecase completely independent of the transport (be it HTTP, gRPC, or CLI). If your application is even slightly more complex than a basic CRUD, you will inevitably find that passing a hypothetical struct from the Domain directly to the handler is a one-way ticket to hell. A DTO is your strict contract. Period. The Usecase doesn't need anything else from you.

func (u *Usecase) GetInstructionByID(ctx context.Context, input dto.GetInstructionByIDInput) (dto.GetInstructionByIDOutput, error) {
    const op = "usecase.styleprofile.GetInstructionByID"

    // DTO validation happens exactly in the Usecase
    if err := validator.Validate(&input); err != nil {
        // The validation error also lives in the domain. We just return it.
        return dto.GetInstructionByIDOutput{}, fmt.Errorf("%s: %w", op, domain.ErrEmpty)
    }

    ctxWithTimeout, cancel := context.WithTimeout(ctx, u.cfg.ContextTimeout)
    defer cancel()

    // Go to the adapter for data via an interface
    instruction, err := u.db.GetInstructionByID(ctxWithTimeout, input.ID)
    if err != nil {
        // Log taking the business context into account
        if errors.Is(err, domain.ErrNotFound) {
            u.log.Warn(op, slog.String("error", err.Error()))
            return dto.GetInstructionByIDOutput{}, fmt.Errorf("%s: %w", op, domain.ErrNotFound)
        }

        u.log.Error(op, slog.String("error", err.Error()))
        return dto.GetInstructionByIDOutput{}, fmt.Errorf("%s: %w", op, err)
    }

    // Return the response DTO
    return dto.GetInstructionByIDOutput{
        Instruction: instruction,
    }, nil
}

There are a couple of important nuances here. Why do we validate the DTO specifically in the Usecase and not in the HTTP handler? And why do we return the DTO by value, not by reference? Answering the first question... just imagine that tomorrow the business asks to add a gRPC endpoint for this exact method. If the validation stays in the HTTP handler, you'll have to duplicate it in the new transport. The Usecase must protect itself. And note: the validation error (domain.ErrEmpty) also lives in the domain, we just return it, without logging. As for not accepting or returning by reference, it's simply because, by their nature, DTOs should not mutate. If someone changes something somewhere else, it shouldn't reflect on our business logic.

Here we smoothly transition to logging rules. What should you even write to a log? You should log business decisions and actual failures. If a user sent an empty ID - that's the user's problem, the system worked as expected, there's nothing to log here. If the adapter didn't find a record - that's a routine situation, we write a Warn. If the database dropped dead - that's an Error.

And for the love of God, stop dragging monstrous logrus setups into your projects. The times when they were vitally necessary are long gone. The standard log/slog out of the box does absolutely everything a modern application needs. Paired with our op pattern, it produces perfect structured logs without unnecessary allocations and a zoo of dependencies. Forget about that legacy stuff and write clean code.

The business logic has done its job, the logs are written, now we need to somehow return the result (or error) to the client. We are moving to the final frontier.

Layer 3: Handler (Transport)

The result of the Usecase's work is passed to the Handler. This is the simplest layer of the application. Its job is to parse the HTTP request, form a DTO, call the Usecase, and properly format the HTTP response.

func (h *Handler) getInstructionByID(w http.ResponseWriter, r *http.Request) {
    // Extract ID from URL and check the format
    idFromUrl := chi.URLParam(r, "id")
    id, err := uuid.Parse(idFromUrl)
    if err != nil {
        w.WriteHeader(http.StatusBadRequest)
        render.JSON(w, r, map[string]string{"error": "Invalid ID format"})
        return
    }

    // Form a DTO to pass to the business logic
    input := dto.GetInstructionByIDInput{ID: id}

    // Call the Usecase, again via an interface
    output, err := h.usecase.GetInstructionByID(r.Context(), input)
    if err != nil {
        // Map domain errors to HTTP statuses
        if errors.Is(err, domain.ErrNotFound) {
            w.WriteHeader(http.StatusNotFound)
            render.JSON(w, r, map[string]string{"error": "Instruction not found"})
            return
        }

        if errors.Is(err, domain.ErrEmpty) {
            w.WriteHeader(http.StatusBadRequest)
            render.JSON(w, r, map[string]string{"error": "Bad request data"})
            return
        }

        // If the error is unknown - return a standard 500 status
        w.WriteHeader(http.StatusInternalServerError)
        render.JSON(w, r, map[string]string{"error": "Internal server error"})
        return
    }

    // Return the successful result
    w.WriteHeader(http.StatusOK)
    render.JSON(w, r, output)
}

The main nuance of this layer: the Handler does not log errors. At all. This task has already been completed in the Usecase, and if we add a log here too, we'll get duplication in our monitoring. It is isolated from the database and infrastructure. The only thing it does with an error is check its type via errors.Is (consulting our dictionary from the Domain) and return the corresponding HTTP status to the client. If the error is unknown, the client gets a 500 Internal Server Error status, which hides the internal workings of the system, without leaking table names or internal IP addresses to the outside world.

Wrapping up

Proper error handling in Go is not just mindlessly copying if err != nil. Must functions are used exclusively for fatal crashes at startup. The op pattern allows you to create a clear call stack without wasting resources or using reflection magic. Errors live in the domain, are wrapped via %w strictly at layer boundaries, and are logged only where there is a business context, using the lightweight slog. By sticking to this approach, you get code that is easy to read, a pleasure to debug, and cheap to maintain. (And one more thing: never collaborate with dictatorships. The restrictions and systems you build are exactly what allow them to feel comfortable and wage wars.)

Good luck!

DEV Community

Are you sure you know how to handle errors in Go properly?

Top comments (0)