DEV Community

Cover image for I Stopped Using REST After This Production Incident
Rohit Tiwari
Rohit Tiwari

Posted on • Originally published at rohittiwari.me

I Stopped Using REST After This Production Incident

It started with a production incident

[INCIDENT — e.g. "A customer was charged twice. The client received a network timeout, retried the request, and our order service created two identical orders seconds apart. Same card. Two charges. Support was already on the phone before I knew anything was wrong."]

I found the root cause in twenty minutes. But I spent three more hours in the logs trying to reconstruct exactly what had happened — because nothing in the system agreed on how to report things. Different error shapes. Different HTTP codes. Some endpoints returned nothing at all on failure.

We fixed it. But I spent the following week thinking about the actual problem — which wasn't the bug. It was that our API had no shared contract for anything.


The naming problem REST never solved

Here is the specific thing that kept bothering me:

DELETE /orders/123
Enter fullscreen mode Exit fullscreen mode

Looks clean. But in our system, cancelling an order actually:

  1. Validates the order can be cancelled given its current state
  2. Issues a refund through Stripe
  3. Releases reserved inventory
  4. Sends a cancellation email to the customer
  5. Writes a compliance log entry

The URL says DELETE. The operation does five things. Those two facts are not just different — they are actively misleading. Every developer who touches this endpoint has to go read the implementation to understand what they are actually calling. There is no other way to know.

Multiply that by fifty endpoints. A hundred. Every URL is technically accurate and practically useless.


The fix: name the operation, not the noun

I started naming operations after what they actually do — not what noun they touch, but the whole operation.

# REST — says what thing, not what happens
DELETE /orders/123

# PACT — says exactly what happens
POST /api/v1/private/cancel-order-and-refund-and-notify
Enter fullscreen mode Exit fullscreen mode

That second URL is longer. It is also completely honest. You know what will happen before you call it. You know what to test. You know what to put in the incident report.

I called this convention PACT — Protocol for Action-based Coordinated Transport.

The resource segment of the URL is not a noun. It is a description of the full operation — whatever that involves, however many systems it touches.


The URL structure

POST /api/:version/:scheme/:resource
Enter fullscreen mode Exit fullscreen mode

Three segments after the prefix. That is all.

Segment What it is Examples
:version API contract version — team picks the format v1 v2 2024-01 stable
:scheme Access level — enforced before your code runs public private internal
:resource The full operation name cancel-order-and-refund-and-notify

Simple operations

POST /api/v1/public/fetch-product-catalogue
POST /api/v1/private/update-billing-address
POST /api/v1/private/delete-draft-post
Enter fullscreen mode Exit fullscreen mode

Compound operations — where this shines

POST /api/v1/private/register-user-and-send-welcome
POST /api/v1/private/cancel-order-and-refund-and-notify
POST /api/v1/private/downgrade-plan-and-prorate-and-email
POST /api/v1/internal/expire-stale-sessions-and-audit-log
Enter fullscreen mode Exit fullscreen mode

Read those compound operations. You know exactly what they do. No docs needed. No implementation dive. No guesswork.

When you ship v2, old and new run in parallel — clients migrate on their own timeline:

POST /api/v1/private/cancel-order-and-refund   # still running
POST /api/v2/private/cancel-order-and-refund   # new contract alongside it
Enter fullscreen mode Exit fullscreen mode

Seven URL prefixes, one naming pattern

The same idea extends across every transport in your system:

POST   /api/:version/:scheme/:resource     # application operations
POST   /webhook/:provider/:event           # Stripe, GitHub, SendGrid
WS     /ws/:version/:resource             # real-time bidirectional
GET    /stream/:version/:resource         # SSE — server pushes to client
POST   /events/:version/:resource         # internal event bus
POST   /rpc/:version/:resource            # service-to-service calls
GET    /health                            # liveness — no version needed
Enter fullscreen mode Exit fullscreen mode

One naming convention. Every transport. If you know what something does, you know what it is called — regardless of how it travels.


Responses: always HTTP 200, always the same shape

This is the other half of what the incident exposed. Every endpoint had invented its own error format.

PACT fixes it with one rule: always HTTP 200. Success or failure lives inside the body.

// Works
{
  "success": true,
  "requestId": "req_a3f9b2c1",
  "data": { "orderId": "ord_789", "refunded": true },
  "meta": { "duration": 42, "cached": false }
}
Enter fullscreen mode Exit fullscreen mode
// Fails  still HTTP 200
{
  "success": false,
  "requestId": "req_a3f9b2c1",
  "error": {
    "category": "CONFLICT",
    "code": "ORDER_ALREADY_CANCELLED",
    "message": "This order cannot be cancelled",
    "retryable": false,
    "source": "runner"
  },
  "meta": { "duration": 8 }
}
Enter fullscreen mode Exit fullscreen mode

requestId is present on every response — success or failure. Write your error handling once. It works on every endpoint, every version, forever.


Eight error categories

No more arguing about 422 vs 400 vs 409:

Category Retry? When
VALIDATION No Bad input — wrong types, missing fields
AUTH No Token missing or invalid
FORBIDDEN No Valid token, wrong permissions
NOT_FOUND No The thing does not exist
CONFLICT No Wrong state — duplicate, already done
RATE_LIMITED ✅ Yes Too many requests — wait and retry
INTERNAL ✅ Yes Server error
UNAVAILABLE ✅ Yes Dependency is down

retryable: true = try again. retryable: false = fix the request first. That is all your client needs to decide what to do next.


Per-operation files — contract.ts and runner.ts

Each operation is a folder:

/transports/api/v1/cancel-order-and-refund/
  contract.ts    ← rules — declared here, enforced automatically
  runner.ts      ← business logic only
  test.spec.ts
Enter fullscreen mode Exit fullscreen mode

contract.ts — you declare, the framework enforces:

export const contract = {
  scheme: 'private',

  schema: z.object({
    orderId: z.string(),
    reason:  z.string().optional(),
  }),

  idempotency: { required: true }, // ← prevents the double-charge incident
  cache:       { enabled: false },
  rateLimit:   { rpm: 30 },
}
Enter fullscreen mode Exit fullscreen mode

runner.ts — zero protocol knowledge. No HTTP. No auth checks. No rate limiting. All of that is already handled before your code runs:

export default async function run(ctx: PactContext) {
  // ctx.payload   — validated, matches schema exactly
  // ctx.identity  — logged-in user (null if public)
  // ctx.tenant    — resolved tenant (null if single-tenant)
  // ctx.version   — 'v1', 'v2' — read-only, for logging only

  const order = await db.orders.find(ctx.payload.orderId)
  if (!order) throw new NotFoundError('ORDER_NOT_FOUND')

  await stripe.refund(order.paymentId)
  await email.send('cancellation', { order })
  await db.orders.cancel(order.id)

  return { cancelled: true, refunded: true }
  // PACT wraps it: { success: true, data: { cancelled: true, refunded: true } }
}
Enter fullscreen mode Exit fullscreen mode

The double-charge from my incident? Solved by idempotency: { required: true }. Client sends X-Idempotency-Key. Same key seen again — stored result returned immediately, runner never executes twice. Declared at the contract level. Visible. Auditable. Automatic.


Versioning and deprecation

When v1 is being retired in favour of v2, you set one field:

// /transports/api/v1/cancel-order-and-refund/contract.ts
deprecated: '2026-06-01'  // 90-day minimum window required
Enter fullscreen mode Exit fullscreen mode

From that point, every response includes:

"meta": {
  "duration": 24,
  "cached": false,
  "deprecation": "2026-06-01"
}
Enter fullscreen mode Exit fullscreen mode

Plus a standard Sunset HTTP header that infrastructure tools read automatically. After the date, the route returns VERSION_RETIRED under UNAVAILABLE — not a silent 404. Clients get a clear, machine-readable signal that they need to upgrade.


Tenant context — not in the URL

Multi-tenancy lives in context, not path segments:

JWT token claim     →  wins always
X-Tenant-ID header  →  fallback when token has no tenant claim
Neither present     →  single-tenant mode — not an error
Enter fullscreen mode Exit fullscreen mode

No /api/v1/acme/private/cancel-order in your routes. The URL says what happens. The context carries who is asking.


The folder structure

/transports             ← all communication channels
  /api
    /v1
      /cancel-order-and-refund
        contract.ts  runner.ts  test.spec.ts
      /register-user-and-send-welcome
        contract.ts  runner.ts  test.spec.ts
    /v2
      /cancel-order-and-refund    ← new contract, runs in parallel
        contract.ts  runner.ts  test.spec.ts
  /webhook
    /stripe
      /payment-succeeded
  /ws/v1/order-tracking-updates
  /stream/v1/activity-feed
  /events/v1/order-placed
  /rpc/v1/check-permissions

/core                   ← framework — never edit for features
/adapters               ← outbound: stripe, sendgrid, openai, anthropic
/queue
  publisher.ts
  consumer.ts
  /jobs                 ← background jobs are queue consumers
    /send-invoice-reminders
    /expire-stale-sessions
/libs                   ← all shared internal code
  /utils  /types  /constants  /validators  /errors  /helpers  /middleware
/config
server.ts
Enter fullscreen mode Exit fullscreen mode

A new developer gets a bug report on POST /api/v1/private/cancel-order-and-refund. They open /transports/api/v1/cancel-order-and-refund/runner.ts. One path. No searching.


What I actually learned

The most valuable thing about operation-based naming was something I did not expect: it forced clearer thinking at design time.

When you have to write cancel-order-and-refund-and-notify as a URL, you commit to that operation as a unit. You think about what complete success means. You think about partial failure. You think about idempotency. You think about the audit trail.

When you write DELETE /orders/123, you are thinking about a resource state change — not the five downstream effects. That difference in framing is where a lot of production incidents are born.

The URL is a contract with everyone who reads it. It should be honest.


What PACT is — and is not

Not a framework. Not a package. Not a spec with a committee.

A convention — a set of rules a team agrees to follow. You implement it in whatever language and stack you already use. The value is consistency: every operation behaves the same way, reports errors the same way, handles retries the same way, deprecates the same way.

I built a full specification covering all seven transports, the complete request/response envelope, eight error categories, a 20-step middleware chain, versioning and deprecation contract, folder structure, and a compliance checklist. Drop a comment if you want it.


Does operation-based naming resonate with you? Where does it break down? What would you change?


PACT — Protocol for Action-based Coordinated Transport

Top comments (0)