DEV Community

Cover image for You Need Your Code to Be More Reliable Than People Using It
Abdulkabir Musa
Abdulkabir Musa

Posted on

You Need Your Code to Be More Reliable Than People Using It

There's a harsh truth about software development that many engineers learn the hard way: users will break your application in ways you never imagined possible. They'll enter dates in the wrong format, upload 50GB files to your profile picture field, click submit 47 times in rapid succession, and somehow manage to get your application into states you didn't even know existed.

The knee-jerk reaction is often to blame the user. "They should have known better!" But here's the thing: your code needs to be more reliable than the people using it.

The Reality of User Behavior

Users don't read documentation. They don't carefully review error messages. They're distracted, in a hurry, or simply don't have the technical background to understand what your application expects. And that's perfectly fine—it's not their job to accommodate your code. It's your code's job to accommodate them.

Consider these real-world scenarios:

  • A user copies text from a PDF that includes invisible Unicode characters
  • Someone's internet connection drops mid-transaction
  • A user navigates away from the page while your async operation is still running
  • Someone clicks "back" after submitting a form
  • A user leaves their session open for three days straight
  • Multiple tabs of your application are open simultaneously

If any of these scenarios can break your application or corrupt data, that's not a user problem—that's a code reliability problem.

What Reliability Actually Means

Reliable code isn't just code that works under ideal conditions. It's code that:

  1. Validates rigorously - Never trust input, even from your own frontend
  2. Fails gracefully - When things go wrong, degrade functionality rather than crash
  3. Handles edge cases - The 99% case is important, but the 1% case is where bugs live
  4. Maintains data integrity - No matter what users do, your data stays consistent
  5. Recovers automatically - When possible, fix issues without user intervention

The Cost of Unreliable Code

I once worked on a financial application where a race condition allowed users to double-submit wire transfers. The bug was rare—it required clicking submit twice within a 200ms window. "Users won't do that," we thought. But with thousands of users, "rare" happened daily. Each incident required manual intervention, customer service calls, and potential financial liability.

The fix took two hours to implement: disable the button on first click and add idempotency keys. The cost of not implementing it from the start? Hundreds of hours of support time and damaged customer trust.

Strategies for Building Reliable Code

1. Input Validation Everywhere

// Bad: Trust the frontend
function createUser(data) {
  return database.users.insert(data);
}

// Good: Validate everything
function createUser(data) {
  const validated = userSchema.parse(data); // Throws on invalid data
  const sanitized = sanitizeInput(validated);
  return database.users.insert(sanitized);
}
Enter fullscreen mode Exit fullscreen mode

Never assume input is clean, even from your own UI. Browsers can be manipulated, APIs can be called directly, and middleware can fail.

2. Idempotency for All State Changes

Every operation that changes state should be idempotent—running it multiple times should produce the same result as running it once.

# Bad: Can create duplicates
def submit_order(user_id, items):
    order = Order.create(user_id=user_id, items=items)
    return order

# Good: Uses idempotency key
def submit_order(user_id, items, idempotency_key):
    existing = Order.find_by_idempotency_key(idempotency_key)
    if existing:
        return existing

    order = Order.create(
        user_id=user_id, 
        items=items,
        idempotency_key=idempotency_key
    )
    return order
Enter fullscreen mode Exit fullscreen mode

3. Defensive Database Operations

-- Bad: Assumes the record exists
UPDATE users SET balance = balance - 100 WHERE id = ?;

-- Good: Ensures constraints are maintained
UPDATE users 
SET balance = balance - 100 
WHERE id = ? AND balance >= 100;

-- Then check affected rows to ensure it succeeded
Enter fullscreen mode Exit fullscreen mode

4. Timeouts and Circuit Breakers

// Bad: Wait forever for a response
const response = await fetch(externalAPI);

// Good: Fail fast with timeout
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5000);

try {
  const response = await fetch(externalAPI, { signal: controller.signal });
  return response;
} catch (error) {
  if (error.name === 'AbortError') {
    // Handle timeout gracefully
    return fallbackResponse();
  }
  throw error;
} finally {
  clearTimeout(timeout);
}
Enter fullscreen mode Exit fullscreen mode

5. Rate Limiting and Resource Protection

Users will accidentally (or intentionally) hammer your endpoints. Your code needs to protect itself:

from functools import wraps
from flask import request
import time

def rate_limit(max_calls, time_window):
    calls = {}

    def decorator(f):
        @wraps(f)
        def wrapped(*args, **kwargs):
            user_id = get_current_user_id()
            now = time.time()

            # Clean old entries
            calls[user_id] = [t for t in calls.get(user_id, []) 
                             if now - t < time_window]

            if len(calls[user_id]) >= max_calls:
                return {"error": "Rate limit exceeded"}, 429

            calls[user_id].append(now)
            return f(*args, **kwargs)
        return wrapped
    return decorator
Enter fullscreen mode Exit fullscreen mode

The Mindset Shift

Building reliable code requires a mindset shift from "this should work" to "how could this fail?" Start asking questions like:

  • What happens if this takes 10 seconds instead of 100ms?
  • What if this function is called with null?
  • What if two users do this at the exact same time?
  • What if the network fails halfway through?
  • What if this external service is down?
  • What if someone sends me a 100MB string?

Testing for Reliability

Unit tests are great, but they rarely catch reliability issues. You need:

  • Chaos testing - Randomly kill processes, simulate network failures
  • Load testing - See what breaks under pressure
  • Fuzzing - Send random garbage input and see what happens
  • Concurrent testing - Run multiple instances simultaneously
  • Time-travel testing - Test with system clocks set to edge cases

The Payoff

Yes, building reliable code takes more time upfront. You'll write more validation logic, more error handling, more defensive checks. But the payoff is enormous:

  • Fewer production incidents and 3 AM wake-up calls
  • Reduced support burden
  • Increased user trust
  • Lower maintenance costs
  • Better sleep at night

Conclusion

Your users will make mistakes. They'll encounter network issues, browser quirks, and timing problems. They'll use your application in ways you never anticipated. That's not a bug in your users—it's the reality of building software for humans.

The question isn't whether users will do unexpected things. The question is: when they do, will your code handle it gracefully, or will everything fall apart?

Make your code more reliable than the people using it. Your future self (and your on-call rotation) will thank you.


What reliability lessons have you learned the hard way? Share your war stories in the comments below.

Top comments (0)