DEV Community

George Belsky
George Belsky

Posted on

I Stopped Building Webhook Retry Logic. Here's What I Use Instead.

Every backend team eventually builds the same thing: reliable message delivery between services. And every team builds it wrong at least once.

The Webhook Retry Stack

Here's what "just use webhooks" actually means in production:

# Receiver: build an HTTP endpoint
@app.post("/webhooks/orders")
async def receive_order(req):
    # Verify HMAC signature (or get spoofed)
    signature = req.headers.get("x-webhook-signature")
    if not verify_hmac(signature, req.body, WEBHOOK_SECRET):
        return {"error": "invalid signature"}, 401

    # Idempotency check (webhooks arrive twice, sometimes three times)
    idempotency_key = req.headers.get("x-idempotency-key")
    if db.exists("processed_webhooks", idempotency_key):
        return {"status": "already processed"}, 200

    process_order(req.json())
    db.insert("processed_webhooks", idempotency_key)
    return {"status": "ok"}, 200

# Sender: retry with backoff
async def send_with_retry(url, payload, max_retries=5):
    for attempt in range(max_retries):
        try:
            resp = requests.post(url, json=payload, headers=sign(payload))
            if resp.status_code == 200:
                return resp
            if resp.status_code >= 500:
                raise RetryableError()
        except (ConnectionError, Timeout, RetryableError):
            delay = min(2 ** attempt + random.uniform(0, 1), 300)
            await asyncio.sleep(delay)
    dlq.send(payload)  # dead letter queue
    alert("Webhook delivery failed after 5 retries")
Enter fullscreen mode Exit fullscreen mode

And this is the simplified version. Production adds:

  • DLQ consumer that retries or alerts
  • Monitoring for delivery success rate
  • Alerting on DLQ depth
  • Cleanup cron for the idempotency table
  • Secret rotation for HMAC keys
  • Circuit breaker when receiver is down
  • Thundering herd protection when receiver comes back up

That's 200+ lines of infrastructure code. For every pair of services that need to talk.

The Alternative: Let the Platform Deliver

from axme import AxmeClient, AxmeClientConfig

client = AxmeClient(AxmeClientConfig(api_key=os.environ["AXME_API_KEY"]))

intent_id = client.send_intent({
    "intent_type": "intent.order.process.v1",
    "to_agent": "agent://myorg/production/order-processor",
    "payload": {
        "order_id": "ORD-2026-00142",
        "customer": "acme-corp",
        "total": 4999.50,
    },
})
result = client.wait_for(intent_id)
Enter fullscreen mode Exit fullscreen mode

No webhook endpoint on the receiver. No HMAC. No idempotency table. No retry logic. No DLQ. No monitoring for delivery failures.

The platform handles at-least-once delivery on all channels.

Five Ways to Receive (Not Just Webhooks)

The receiver picks the delivery mode that fits their architecture:

Mode Transport Best For
stream SSE (server-sent events) Real-time agents, always-on services
poll GET request Serverless functions, cron jobs
http Webhook POST Traditional services (but platform handles retry)
inbox Human queue Approvals, reviews, manual tasks
internal Platform-handled Reminders, escalations, notifications

The sender doesn't care which mode the receiver uses. send_intent() is the same regardless.

This is the key difference from webhooks: the receiver chooses how to get messages, not the sender. The sender doesn't need to know if the receiver is a Lambda function, a Kubernetes pod, or a human with an email inbox.

What You Stop Building

Component With Webhooks With AXME
HTTP endpoint on receiver You build it Not needed (for stream/poll/inbox)
HMAC verification You build it Platform handles
Idempotency table You build it Built into intent lifecycle
Retry with backoff You build it Platform handles (configurable)
Dead letter queue You build it Platform handles
Delivery monitoring You build it Built-in lifecycle events
Secret rotation You manage Platform manages
Thundering herd protection You build it Platform handles

When Webhooks Are Still Fine

Webhooks work well when:

  • The receiver is always up (99.9%+ uptime)
  • Occasional message loss is acceptable
  • You only have 2-3 service pairs communicating
  • You already have the retry infrastructure built

Webhooks break down when:

  • You have 10+ services that need reliable delivery
  • Receivers go down for minutes/hours (deploys, incidents)
  • You need delivery guarantees (financial transactions, compliance)
  • You need human approval gates in the delivery chain
  • You're tired of debugging "why didn't the webhook arrive"

Try It

Working example - sender submits an order, receiver processes it via SSE stream, no webhook endpoint needed:

github.com/AxmeAI/reliable-delivery-without-webhooks

Python, TypeScript, and Go implementations included.

Built with AXME - 5 delivery bindings with at-least-once guarantees. Alpha - feedback welcome.

Top comments (0)