I ship a lot of API/webhook integrations. Here’s how I make them NOT hurt in production 🔥
If you do freelance backend long enough, you start noticing a pattern:
Clients don’t pay for “beautiful code”.
They pay for it working tomorrow.
And webhook integrations are the fastest way to get random chaos:
- duplicate events
- out of order delivery
- retries that DDoS you
- and the classic “it worked yesterday 🤡”
So here’s my real-world baseline for building webhook/API integrations that don’t wake me up at 3AM.
No theory. Just a practical checklist + a simple architecture that scales.
1) Assume the webhook will be duplicated. Because it will. ✅
If you process every incoming request as “unique”, you’re cooked.
Rule: every webhook must be idempotent.
That means you need an event id or a hash that lets you say:
“Seen it. Skipping.”
Real workflow:
- extract
event_idfrom payload (or generate a hash from stable fields) - store it with a status
- on repeat: return
200 OKand do nothing
Because if you return 500, they will retry harder.
2) Acknowledge fast. Process async. ⚡
A webhook handler that does real work inside the HTTP request is a trap.
It feels fine until:
- your DB is slow for 5 seconds
- the provider timeout hits
- retries begin
- now you’re processing the same event 5 times
My default:
- Receive webhook
- Validate signature / basic checks
- Save event to DB (raw payload + metadata)
- Return
200 OKfast - Process the event in a worker/job queue
This makes your system calm.
3) Store raw payloads. Future you will thank you 🧠
When something breaks, the client will say:
“I don’t know, it just didn’t send.”
If you don’t store raw payloads, you have no evidence and no replay.
I always store:
- full raw JSON payload
- headers (at least important ones)
- provider name
- received timestamp
- processing status
- error message if failed
Then you can:
- replay events
- debug edge cases
- prove what happened
It turns “guessing” into “knowing”.
4) Security: verify signatures or don’t pretend it’s secure 🔒
If the provider supports signatures, verify them.
Not later. Not “we’ll add it after MVP”.
Right away.
Because otherwise you’re basically running:
public endpoint that triggers actions
That’s how you get spam, abuse, or worse.
5) Rate limits and backoff: retries are not your enemy, your implementation is 😅
When processing fails, don’t do instant retries like a maniac.
Use backoff:
- 1 min
- 5 min
- 30 min
- 2 hours
- dead-letter queue (manual review)
Most integrations fail because:
- temporary provider downtime
- temporary DB issue
- network nonsense
Backoff makes it survive like a tank.
6) Logging that actually helps, not “we logged something” 📝
I log at two layers:
Request layer
- request id
- provider
- event id
- status returned
Job layer
- event id
- job attempt
- result
- full error stack (if any)
And one extra rule:
If job fails, save a short human-readable error near the event record.
So later I can scan the DB and instantly see patterns.
7) My minimal scalable structure (simple but powerful)
I like separating responsibilities like this:
webhook_controller
accepts HTTP, validates, stores event, returns response fastevent_store
saves raw payloads, dedup keys, statusesprocessor
contains business logic: “what do we do with this event”adapters
provider-specific mapping (CRM A vs CRM B)queue/worker
runs processing asynchronously with retry rules
This lets you add new integrations without rewriting everything.
You just add a new adapter.
Common production “gotchas” (learned the annoying way) 🤝
Out-of-order events
You might receive “updated” before “created”.
Solution:
- allow upserts
- store event history
- process based on current state
Provider sends partial data
Sometimes they send only IDs and you must fetch details.
Solution:
- use a “hydration step” in the worker (API pull)
- cache if needed
Webhook timeouts
If you process inside request, you lose.
Solution:
- fast ACK, async processing
TL;DR 🧾
If you want webhook integrations that behave in production:
- idempotency is mandatory
- acknowledge fast, process async
- store raw payloads
- verify signatures
- implement sane retries
- log like you’ll debug it later (because you will)
If you’ve ever shipped webhooks in production, you already know:
it’s never “done”.
it’s “stable enough to survive real traffic” 😄
Drop your worst webhook horror story below 👇
Top comments (0)