Kshitij Sharma

Posted on Apr 23

When Your “Clean” REST API Becomes a Production Nightmare

#api #webdev #systemdesign #backend

Everything looked perfect on paper.

Clean endpoints
Nice resource naming
Proper HTTP methods

Then production hit:

Clients started retrying aggressively
Data inconsistencies appeared
Versioning became a mess
One change broke 3 consumers

That’s when reality kicks in:

REST API design is not about elegance — it’s about survivability under change.

The Real Constraints of REST APIs in Production

You’re not designing endpoints.
You’re designing contracts under uncertainty.

What actually shapes your API:

Multiple clients (web, mobile, third-party)
Network unreliability
Backward compatibility pressure
Partial failures
Latency budgets
Data ownership boundaries

Ignoring these = brittle APIs that collapse under scale.

Resource Modeling Is Where Most People Fail

Everyone talks about /users and /orders.

That’s surface-level.

The real question:

What is the lifecycle of your resource?

Bad Design (naive CRUD mindset)

POST /orders
GET /orders/:id
PUT /orders/:id
DELETE /orders/:id

Looks fine. Completely wrong for real systems.

Why?

Orders aren’t freely mutable
State transitions matter (created → paid → shipped)
Business rules are ignored

Model State Transitions Explicitly

Better:

POST   /orders
POST   /orders/:id/pay
POST   /orders/:id/ship
POST   /orders/:id/cancel

Now:

You encode business logic in API
You prevent invalid transitions
You reduce client-side bugs

Idempotency: The Thing That Saves You From Chaos

Most APIs break under retries.

Reality:

Clients retry
Proxies retry
Load balancers retry

If your endpoint isn’t idempotent → duplicate operations.

Real Failure Case

Payment API:

POST /payments

Client times out → retries → duplicate charge.

Congrats, you just lost user trust.

Fix: Idempotency Keys

POST /payments
Idempotency-Key: 8f3a-xyz-123

Server logic:

if (exists(idempotencyKey)) {
  return previousResponse;
}

processPayment();
storeResult(idempotencyKey);

Partial Failure Handling (The Silent Killer)

Your API calls:

DB
Cache
External service

One fails.

Now what?

Most APIs:

Return 500 and pray.

That’s not a strategy.

Better Approach: Explicit Failure Semantics

Return partial success where valid
Use compensating actions
Log correlation IDs

Example:

{
  "status": "partial_success",
  "data": {...},
  "failed_dependencies": ["inventory-service"]
}

Versioning: Where APIs Go to Die

Naive approach:

/v1/users
/v2/users

Problem:

You now maintain 2 systems forever
Clients don’t migrate

Better Strategy: Evolution Over Versioning

Add fields, don’t remove
Use default values
Deprecate gradually

When Versioning Is Actually Needed

Breaking contract changes
Semantic shifts (not just fields)

Even then:

Prefer header-based versioning

Accept: application/vnd.myapi.v2+json

Overfetching vs Underfetching

Classic REST problem.

Overfetching

GET /users/:id

Returns:

Name
Email
Address
Preferences
Activity logs

Client only needs name.

Waste:

bandwidth
latency

Underfetching

Client needs:

user
orders
payments

Makes 3 calls.

Now latency multiplies.

Practical Fix: Controlled Expansion

GET /users/:id?include=orders,payments

Trade-off:

More complex backend
Better client efficiency

Implementation: What a Production-Ready API Looks Like

Express.js Example (Opinionated Structure)

const express = require('express');
const app = express();

// Middleware: request ID for tracing
app.use((req, res, next) => {
  req.id = crypto.randomUUID();
  next();
});

// Idempotency middleware
const store = new Map();

app.post('/payments', async (req, res) => {
  const key = req.headers['idempotency-key'];

  if (store.has(key)) {
    return res.json(store.get(key));
  }

  const result = await processPayment(req.body);

  store.set(key, result);
  res.json(result);
});

// Explicit state transition
app.post('/orders/:id/ship', async (req, res) => {
  const order = await getOrder(req.params.id);

  if (order.status !== 'paid') {
    return res.status(400).json({ error: 'Invalid state' });
  }

  await shipOrder(order);
  res.json({ status: 'shipped' });
});

Common Mistakes That Kill APIs

❌ Treating REST Like CRUD

You ignore:

business logic
state transitions
invariants

❌ Ignoring Timeouts and Retries

Your system works… until network instability hits.

❌ No Observability

No:

request IDs
structured logs
tracing

Debugging becomes guessing.

❌ Tight Coupling to DB Schema

Changing DB → breaks API

Fix:

API is a contract, not a reflection of your database

❌ Overusing HTTP Status Codes

People do:

200 OK (with error inside body)

Or:

500 for everything

Both are wrong.

Trade-offs You Can’t Escape

Flexibility vs Simplicity

Flexible APIs → harder to maintain
Simple APIs → limited use cases

Performance vs Consistency

Strong consistency → slower
Eventual consistency → complex

Versioning vs Evolution

Versioning → fragmentation
Evolution → constraints on change

Abstraction vs Control

High abstraction → easy usage
Low abstraction → better performance

What a Mature REST API Actually Looks Like

Explicit state transitions
Idempotent operations
Backward-compatible changes
Observability baked in
Controlled data fetching
Failure-aware responses

Final Reality Check

If your API:

breaks on retries
can’t evolve without versioning chaos
hides business logic
lacks observability

It’s not production-ready.

Key Takeaways

REST is not CRUD — it’s contract design under failure
Idempotency is non-negotiable
State transitions must be explicit
Versioning is a last resort, not default
Most failures come from network behavior, not code
API design is about handling bad conditions, not ideal flows

If you design APIs assuming everything works perfectly,
your system will fail the moment it doesn’t.

DEV Community