Every backend team eventually builds the same thing: reliable message delivery between services. And every team builds it wrong at least once.
The Webhook Retry Stack
Here's what "just use webhooks" actually means in production:
# Receiver: build an HTTP endpoint
@app.post("/webhooks/orders")
async def receive_order(req):
# Verify HMAC signature (or get spoofed)
signature = req.headers.get("x-webhook-signature")
if not verify_hmac(signature, req.body, WEBHOOK_SECRET):
return {"error": "invalid signature"}, 401
# Idempotency check (webhooks arrive twice, sometimes three times)
idempotency_key = req.headers.get("x-idempotency-key")
if db.exists("processed_webhooks", idempotency_key):
return {"status": "already processed"}, 200
process_order(req.json())
db.insert("processed_webhooks", idempotency_key)
return {"status": "ok"}, 200
# Sender: retry with backoff
async def send_with_retry(url, payload, max_retries=5):
for attempt in range(max_retries):
try:
resp = requests.post(url, json=payload, headers=sign(payload))
if resp.status_code == 200:
return resp
if resp.status_code >= 500:
raise RetryableError()
except (ConnectionError, Timeout, RetryableError):
delay = min(2 ** attempt + random.uniform(0, 1), 300)
await asyncio.sleep(delay)
dlq.send(payload) # dead letter queue
alert("Webhook delivery failed after 5 retries")
And this is the simplified version. Production adds:
- DLQ consumer that retries or alerts
- Monitoring for delivery success rate
- Alerting on DLQ depth
- Cleanup cron for the idempotency table
- Secret rotation for HMAC keys
- Circuit breaker when receiver is down
- Thundering herd protection when receiver comes back up
That's 200+ lines of infrastructure code. For every pair of services that need to talk.
The Alternative: Let the Platform Deliver
from axme import AxmeClient, AxmeClientConfig
client = AxmeClient(AxmeClientConfig(api_key=os.environ["AXME_API_KEY"]))
intent_id = client.send_intent({
"intent_type": "intent.order.process.v1",
"to_agent": "agent://myorg/production/order-processor",
"payload": {
"order_id": "ORD-2026-00142",
"customer": "acme-corp",
"total": 4999.50,
},
})
result = client.wait_for(intent_id)
No webhook endpoint on the receiver. No HMAC. No idempotency table. No retry logic. No DLQ. No monitoring for delivery failures.
The platform handles at-least-once delivery on all channels.
Five Ways to Receive (Not Just Webhooks)
The receiver picks the delivery mode that fits their architecture:
| Mode | Transport | Best For |
|---|---|---|
stream |
SSE (server-sent events) | Real-time agents, always-on services |
poll |
GET request | Serverless functions, cron jobs |
http |
Webhook POST | Traditional services (but platform handles retry) |
inbox |
Human queue | Approvals, reviews, manual tasks |
internal |
Platform-handled | Reminders, escalations, notifications |
The sender doesn't care which mode the receiver uses. send_intent() is the same regardless.
This is the key difference from webhooks: the receiver chooses how to get messages, not the sender. The sender doesn't need to know if the receiver is a Lambda function, a Kubernetes pod, or a human with an email inbox.
What You Stop Building
| Component | With Webhooks | With AXME |
|---|---|---|
| HTTP endpoint on receiver | You build it | Not needed (for stream/poll/inbox) |
| HMAC verification | You build it | Platform handles |
| Idempotency table | You build it | Built into intent lifecycle |
| Retry with backoff | You build it | Platform handles (configurable) |
| Dead letter queue | You build it | Platform handles |
| Delivery monitoring | You build it | Built-in lifecycle events |
| Secret rotation | You manage | Platform manages |
| Thundering herd protection | You build it | Platform handles |
When Webhooks Are Still Fine
Webhooks work well when:
- The receiver is always up (99.9%+ uptime)
- Occasional message loss is acceptable
- You only have 2-3 service pairs communicating
- You already have the retry infrastructure built
Webhooks break down when:
- You have 10+ services that need reliable delivery
- Receivers go down for minutes/hours (deploys, incidents)
- You need delivery guarantees (financial transactions, compliance)
- You need human approval gates in the delivery chain
- You're tired of debugging "why didn't the webhook arrive"
Try It
Working example - sender submits an order, receiver processes it via SSE stream, no webhook endpoint needed:
github.com/AxmeAI/reliable-delivery-without-webhooks
Python, TypeScript, and Go implementations included.
Built with AXME - 5 delivery bindings with at-least-once guarantees. Alpha - feedback welcome.
Top comments (0)