DEV Community

Ama
Ama

Posted on

I ship a lot of API/webhook integrations. Here’s how I make them NOT hurt in production 🔥

I ship a lot of API/webhook integrations. Here’s how I make them NOT hurt in production 🔥

If you do freelance backend long enough, you start noticing a pattern:

Clients don’t pay for “beautiful code”.
They pay for it working tomorrow.

And webhook integrations are the fastest way to get random chaos:

  • duplicate events
  • out of order delivery
  • retries that DDoS you
  • and the classic “it worked yesterday 🤡”

So here’s my real-world baseline for building webhook/API integrations that don’t wake me up at 3AM.

No theory. Just a practical checklist + a simple architecture that scales.


1) Assume the webhook will be duplicated. Because it will. ✅

If you process every incoming request as “unique”, you’re cooked.

Rule: every webhook must be idempotent.

That means you need an event id or a hash that lets you say:

“Seen it. Skipping.”

Real workflow:

  • extract event_id from payload (or generate a hash from stable fields)
  • store it with a status
  • on repeat: return 200 OK and do nothing

Because if you return 500, they will retry harder.


2) Acknowledge fast. Process async. ⚡

A webhook handler that does real work inside the HTTP request is a trap.

It feels fine until:

  • your DB is slow for 5 seconds
  • the provider timeout hits
  • retries begin
  • now you’re processing the same event 5 times

My default:

  1. Receive webhook
  2. Validate signature / basic checks
  3. Save event to DB (raw payload + metadata)
  4. Return 200 OK fast
  5. Process the event in a worker/job queue

This makes your system calm.


3) Store raw payloads. Future you will thank you 🧠

When something breaks, the client will say:

“I don’t know, it just didn’t send.”

If you don’t store raw payloads, you have no evidence and no replay.

I always store:

  • full raw JSON payload
  • headers (at least important ones)
  • provider name
  • received timestamp
  • processing status
  • error message if failed

Then you can:

  • replay events
  • debug edge cases
  • prove what happened

It turns “guessing” into “knowing”.


4) Security: verify signatures or don’t pretend it’s secure 🔒

If the provider supports signatures, verify them.

Not later. Not “we’ll add it after MVP”.

Right away.

Because otherwise you’re basically running:

public endpoint that triggers actions

That’s how you get spam, abuse, or worse.


5) Rate limits and backoff: retries are not your enemy, your implementation is 😅

When processing fails, don’t do instant retries like a maniac.

Use backoff:

  • 1 min
  • 5 min
  • 30 min
  • 2 hours
  • dead-letter queue (manual review)

Most integrations fail because:

  • temporary provider downtime
  • temporary DB issue
  • network nonsense

Backoff makes it survive like a tank.


6) Logging that actually helps, not “we logged something” 📝

I log at two layers:

Request layer

  • request id
  • provider
  • event id
  • status returned

Job layer

  • event id
  • job attempt
  • result
  • full error stack (if any)

And one extra rule:
If job fails, save a short human-readable error near the event record.

So later I can scan the DB and instantly see patterns.


7) My minimal scalable structure (simple but powerful)

I like separating responsibilities like this:

  • webhook_controller
    accepts HTTP, validates, stores event, returns response fast

  • event_store
    saves raw payloads, dedup keys, statuses

  • processor
    contains business logic: “what do we do with this event”

  • adapters
    provider-specific mapping (CRM A vs CRM B)

  • queue/worker
    runs processing asynchronously with retry rules

This lets you add new integrations without rewriting everything.

You just add a new adapter.


Common production “gotchas” (learned the annoying way) 🤝

Out-of-order events

You might receive “updated” before “created”.

Solution:

  • allow upserts
  • store event history
  • process based on current state

Provider sends partial data

Sometimes they send only IDs and you must fetch details.

Solution:

  • use a “hydration step” in the worker (API pull)
  • cache if needed

Webhook timeouts

If you process inside request, you lose.

Solution:

  • fast ACK, async processing

TL;DR 🧾

If you want webhook integrations that behave in production:

  • idempotency is mandatory
  • acknowledge fast, process async
  • store raw payloads
  • verify signatures
  • implement sane retries
  • log like you’ll debug it later (because you will)

If you’ve ever shipped webhooks in production, you already know:

it’s never “done”.

it’s “stable enough to survive real traffic” 😄

Drop your worst webhook horror story below 👇

Top comments (0)