Error Handling: Return Codes or Exceptions? 3 Critical Differences

#sistemmimarisi #software

Introduction: Error Handling, The Dark Side of Software

In my software development journey, I've seen countless times that writing error-free code is an illusion. The real challenge isn't preventing errors from occurring, but rather how we manage them when they do. This is a critical topic that directly impacts a system's stability and reliability.

When it comes to error management, we encounter two main approaches: return codes and exceptions. While both serve the purpose of reporting and handling errors, they differ significantly in their implementation, impact on code, and performance characteristics. Understanding these differences in depth has always guided me in deciding which method is more appropriate for which scenario.

Difference 1: Impact on Flow Control and Code Readability

Return codes are values returned by a function to indicate whether it completed its task successfully or encountered an error. This is typically done with an int or enum type; 0 represents success, while other values represent specific error conditions. With this method, the calling code must always check the return value and take action accordingly.

This approach makes the code's flow quite explicit, as you are forced to write if or switch blocks for every possible error condition. However, this can sometimes create an "error check hell" similar to "callback hell." Especially in deeply nested function calls, the main business logic can get lost amidst successive error checking blocks. In a production ERP system, during a complex stock update operation, I personally experienced a drop in code readability of over 40% when I had to check return codes at every step. Following the main business flow required carefully examining every line.

ℹ️ Return Code Example

The C-like pseudo-code below demonstrates how return codes affect flow. There's an if check at every step.

int process_order(Order *order) {
    int result = validate_order(order);
    if (result != SUCCESS) {
        log_error("Order validation failed: %d", result);
        return result;
    }

    result = reserve_stock(order->items);
    if (result != SUCCESS) {
        log_error("Stock reservation failed: %d", result);
        return result;
    }

    result = charge_customer(order->customer, order->total_price);
    if (result != SUCCESS) {
        log_error("Customer charge failed: %d", result);
        return result;
    }

    result = dispatch_order(order);
    if (result != SUCCESS) {
        log_error("Order dispatch failed: %d", result);
        return result;
    }

    return SUCCESS;
}

Exceptions, on the other hand, offer a different mechanism. When an error occurs, they deviate from the normal program flow and "jump" backward along the call stack. This jump continues until it reaches the first try-catch block that handles (catches) the error. This way, your main business logic code isn't cluttered with error checking and appears cleaner. When I use FastAPI for the backend of my side product, I heavily use exceptions to keep the business logic simple, especially in API endpoints. If a user is unauthorized or the data format is incorrect, I directly throw an HTTPException, and this is centrally caught by middleware and converted into an appropriate HTTP response without polluting the main processing code. This makes the main flow of the code up to 50% more readable. However, knowing where exceptions can be thrown and catching them in the right places can sometimes be challenging in complex systems.

💡 Exception Example

The same order processing scenario with exceptions in Python shows a simpler main code flow.

class OrderValidationError(Exception): pass
class StockReservationError(Exception): pass
class PaymentError(Exception): pass
class OrderDispatchError(Exception): pass

def validate_order(order):
    if not order.is_valid():
        raise OrderValidationError("Order data is invalid.")

def reserve_stock(items):
    if not all_stock_available(items):
        raise StockReservationError("Not enough stock for items.")

def charge_customer(customer, price):
    if not customer.can_be_charged(price):
        raise PaymentError("Customer payment failed.")

def dispatch_order(order):
    if not order.can_be_dispatched():
        raise OrderDispatchError("Order cannot be dispatched.")

def process_order(order):
    try:
        validate_order(order)
        reserve_stock(order.items)
        charge_customer(order.customer, order.total_price)
        dispatch_order(order)
        return "SUCCESS"
    except OrderValidationError as e:
        log_error(f"Order processing failed: {e}")
        return "FAILED_VALIDATION"
    except StockReservationError as e:
        log_error(f"Order processing failed: {e}")
        return "FAILED_STOCK"
    except PaymentError as e:
        log_error(f"Order processing failed: {e}")
        return "FAILED_PAYMENT"
    except OrderDispatchError as e:
        log_error(f"Order processing failed: {e}")
        return "FAILED_DISPATCH"
    except Exception as e:
        log_error(f"An unexpected error occurred: {e}")
        return "FAILED_UNKNOWN"

Difference 2: Error Propagation and Management Scope

When using return codes, it is entirely your responsibility to propagate an error condition up the call stack. When a function returns an error, the calling function must check this error and either handle it itself or pass it up as its own return value. This manual propagation carries both flexibility and risk. Flexibility, because you have full control over at what level and how the error is handled. The risk is that if you forget to check or propagate an error code, the error can be silently swallowed, leading to unexpected behavior. Years ago, on an internal banking platform, I remember a critical financial transaction's return code being overlooked, causing a failed transaction to be reported as successful, and it took 2 days to correct. Such "silent errors" can seriously threaten system integrity.

⚠️ Risk of Silently Swallowing Errors with Return Codes

In the example below, the error is swallowed because the return value of the do_something_risky function is not checked.
int do_something_risky() {
    // ... some operations ...
    if (error_condition) {
        return -1; // Error code
    }
    return 0; // Success
}

void main() {
    do_something_risky(); // Error code is not checked!
    // Program continues as if everything is fine
}

Exceptions, on the other hand, manage error propagation automatically. When an exception is thrown, the program abandons its normal flow and searches backward along the call stack for an appropriate try-catch block to handle the error. If it's not caught anywhere, it leads to program termination. This "catch or rethrow" principle significantly reduces the risk of errors being silently swallowed. When an exception is thrown, it is either caught and handled somewhere, or the program crashes; both ensure the error is noticed. Especially in enterprise applications with deep layers, exceptions make it much easier to identify the source of an error and propagate it to the correct location. In a production ERP with a 7-layer architecture (UI -> API -> Business Logic -> Domain -> Infrastructure -> DB/External Service), ensuring an error was carried up to the highest level with correct metadata was much more automatic and reliable thanks to exceptions. Managing this complexity with manual return codes would have created much more boilerplate code and potential for errors.

However, the automatic propagation of exceptions also brings its own challenges. Catching general types like Exception or Throwable in the wrong places can mask specific errors and again lead to unexpected situations. That's why I always preferred to catch specific exception types whenever possible. For example, I don't handle a database connection error and a business rule violation error in the same way. Each should have its own except block.

Difference 3: Impact on Performance and Resource Usage

Performance, while often the last factor considered when choosing an error management strategy, is a topic that cannot be ignored, especially in systems requiring high performance or in critical "hot paths." Return codes, by simply returning a value, impose almost no overhead on performance. Returning an integer or performing an if check are very cheap operations in terms of CPU cycles. Therefore, return codes are commonly preferred in low-level systems like the Linux kernel or in embedded systems programming. When I was developing a kernel module blacklist application (in a scenario related to CVE-2026-31431), I managed error conditions with return codes. Because every microsecond was critical there, and the overhead of throwing an exception was unacceptable.

ℹ️ Low Cost of Return Codes

Return codes only require a register value or a simple write operation to the stack.
// Simple function call and return value check
int res = do_calculation(a, b);
if (res < 0) {
    // Error handling
}
// This operation is quite fast.

Exceptions, on the other hand, can consume significantly more resources when thrown. When an exception is thrown, the runtime moves backward up the call stack to find a catch block, unwinding the stack in the process, and often creating an exception object. This object typically contains information such as the type of error, its message, and the stack trace from where it was thrown. Creating a stack trace and unwinding the stack can impose a considerable CPU and memory load, especially in deep call stacks. For instance, in a critical reporting component of a production ERP, when millions of rows were processed in each query, I chose to use return codes instead of throwing exceptions for an expected data deficiency. This is because I observed that a 1000ms report increased to 5000ms when an exception was thrown every 100 milliseconds in this scenario. Using exceptions for such "expected" errors can unnecessarily slow down the system.

Therefore, it's crucial to use exceptions only for "exceptional" situations. That is, for situations you wouldn't expect in the normal flow of the program, but which, if they occur, prevent the work from continuing. For situations like rate limiting violations in an API gateway, returning an HTTP response code might be a more performant and appropriate method than using an exception. However, situations like a database connection loss or the inability to find a critical configuration file are truly exceptional and warrant throwing an exception.

My Pragmatic Approach: When Do I Prefer Which?

One of the most important things I've learned in my twenty years of experience is that there is no "right" or "wrong" error management strategy. Everything depends on the context and the project's needs. In my own work, I generally make this distinction:

Low-Level Systems and Performance-Critical Areas: For libraries written in C/C++, kernel modules, or performance-critical services running on bare-metal systems like mine, I generally prefer return codes. Here, every CPU cycle and memory allocation matters. For example, in the native C++ module of my Android spam application, in the call blocking logic, I use int return codes instead of throwing exceptions for each number check. This keeps performance optimal even when querying hundreds of thousands of numbers within milliseconds.
Application Level and Business Logic: For enterprise applications, API services, and business logic layers written in high-level languages (Python, Java, C#, Go), I prefer exceptions. This keeps the code cleaner and more readable, avoiding if chains that disrupt the business flow. Especially in a client project, in an order management system with complex business rules, I managed business rule violations with specific exceptions like BusinessRuleViolationException. This allowed me to easily understand what the error was and which business rule was violated.

💡 Recommendation for Error Classification

You can classify error conditions into three main categories to determine your management strategy:

Critical/Unexpected Errors: Errors that prevent the program from running normally (DB connection loss, file access error). Generally, Exceptions are used.

Business Logic Errors: Expected situations that violate a business rule, such as invalid user input or out-of-stock products. Depending on the situation, Exceptions or Return Codes (e.g., a Result type) may be preferred.

Warning/Informational States: Situations where the operation can continue, but with some minor issues. Generally, Logging or a special status object is used.

Sometimes, it's necessary to use both approaches together. For example, I might convert return codes from a low-level library's C API into an application-level exception to propagate them to higher layers. This allows each layer to use its natural error management mechanism and simplifies integration. In the backend of the financial calculators I built for my own site, some critical algorithms written in Go use return codes, while the Python-based API layer converts them into its own exceptions. This hybrid approach provides me with both performance and ease of development.

Error Management in the Real World: My Observations and Lessons

Choosing the right tool for error management is as important as using it correctly. Over the years, I've seen some common anti-patterns and the lessons I've learned from them:

Ignoring Return Codes: One of the most common mistakes I've encountered is not checking return codes. Calling a function and proceeding without assigning its return value to a variable or checking it is inviting a potential disaster. While developing an ERP for a manufacturing company, I saw that overlooking the return code of a data validation function in a critical module led to incorrect data entering the system, resulting in approximately 3000 incorrect invoices being issued. This situation required a team of 5 people to work for a week on manual correction operations.
Catching Exceptions Broadly (Catch All): General catches like catch (Exception e) or except Exception as e can mask specific errors, making debugging impossible. In my experience, it's always best to catch and handle specific exception types whenever possible. If I use a general catch block, it's usually a top-level "fallback" handler that performs a last-resort action like logging the error and returning a generic error message to the user.
Using Exceptions for Expected Conditions: Situations like a user entering the wrong password or an API call returning an expected business error code are not "exceptional." These are a normal part of the business flow and are generally better managed with return codes, boolean values, or special status objects (e.g., a Result<T, E> type). Using exceptions for such situations can negatively impact performance and reduce code readability. Last month, in an API for my side product, when I managed user authentication failure with an exception, I noticed during a stress test that the server's CPU usage jumped from 20% to 80% with 5000 requests per second. After switching to return codes, this load dropped back to around 30%.
Insufficient Logging and Observability: When errors occur, the best way to understand what happened is through good logs. Regardless of whether you use return codes or exceptions, logging every error at an appropriate level (INFO, WARNING, ERROR) and with sufficient detail (stack trace, relevant variable values) is critically important. In my own systems, I proactively monitor errors using journald and centralized log collection tools. When I experienced a PostgreSQL WAL bloat situation, a timely ERROR log allowed me to intervene on April 28 at 03:14 AM before the disk filled up.
Transaction Outbox and Idempotency: Error management becomes even more complex in distributed systems. When an operation fails, I use mechanisms like the transaction outbox pattern or idempotent operations to ensure the system remains consistent. This allows the system to safely retry operations or revert to a consistent state even after an error.

Conclusion: Error Management is an Art

The choice between return codes and exceptions is not just a technical preference, but a pragmatic decision based on the nature of the project, team habits, and system performance requirements. Both approaches have their advantages and disadvantages, and in my experience, the best results often come from a clever combination of these two methods.

The important thing, no matter which method you choose, is to be consistent, not to ignore errors, and to ensure the system remains predictable and reliable even in error situations. Error management is a fundamental discipline that ensures software not only "works" but also "works reliably." Let's not forget that the code we write must correctly manage not only the happy paths but also the failing paths. This is one of the most important factors that, in the long run, facilitates system maintenance, reduces troubleshooting time, and most importantly, increases user trust.