Gabriel Anhaia

Posted on May 24

Pull-Based vs Push-Based Architecture: The Choice That Decides Your Reliability Story

#systemdesign #architecture #distributedsystems #reliability

Book: System Design Pocket Guide: Fundamentals — Core Building Blocks for Scalable Systems
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Pull-based and push-based aren't just delivery styles. They're reliability stories. One absorbs producer spikes; the other absorbs consumer outages. Picking the wrong one means your most predictable failure shape becomes your worst.

Most teams pick by accident. Someone says "we need webhooks," everyone nods, and three months later a marketing campaign fires a million events at a receiver that was sized for a quiet Wednesday. Or someone picks Kafka because it's on the resume bingo card, and now a tiny webhook integration runs through a 7-broker cluster with a 4-week retention policy.

The right way to pick is to ask: which failure shape can I afford?

Push and pull, in failure-shape terms

A push system has the producer drive the data. The producer decides when to send, how much to send, and where to send it. The consumer is the passenger. When the producer is calm, life is easy. When the producer panics (a viral campaign, a backfill job, a fan-out from a single upstream event) the consumer absorbs the punch.

A pull system inverts the relationship. The consumer drives. It decides when to fetch, how much to fetch, and how fast to drain. The producer writes to a buffer (a log, a queue, a table) and walks away. When the consumer is slow, work piles up in the buffer. When the consumer dies, work waits.

That's the whole tradeoff. Push gives you low latency and amplifies producer spikes. Pull gives you natural backpressure and survives consumer outages.

Everything else in this post is a footnote on that one sentence.

Push: webhooks, server-sent events, server-initiated jobs

The push family is broader than people think. Webhooks are push. Server-sent events are push. WebSocket fan-out is push. A cron job firing HTTP POSTs at downstream services is push. AWS SNS topics that fan out to HTTPS endpoints are push.

What unites them: the producer decides the timing, and the network round-trip happens at production time.

The good news is latency. A push event lands at the consumer milliseconds after it's produced. That's why payment processors use webhooks for payment.succeeded. The merchant wants to update the order page right now, not in 30 seconds when a poller wakes up.

The bad news is that the consumer has no say. If the producer fires 50,000 webhooks per second and the consumer can absorb 5,000, the producer doesn't know or care. The 45,000 overflow becomes failed deliveries, retries, or both. Stripe, for example, retries failed webhooks with exponential backoff for up to 3 days, but most teams' receivers fall over long before the retries help.

There's no shared buffer between producer and consumer in pure push. The network is the buffer, and the network is bad at buffering.

Pull: polling consumers, Kafka-style log readers, batch jobs

Pull means the consumer initiates. A SQS consumer that calls ReceiveMessage every second. A Kafka consumer group that calls poll() and gets a batch back. A nightly ETL job that scans a transactions table where created_at > last_run. All pull.

The key property: there's a buffer between producer and consumer, and the consumer chooses its rate.

Here's the minimum viable Kafka consumer in Python. Production code, not pseudocode:

from confluent_kafka import Consumer, KafkaError
import json
import logging

log = logging.getLogger(__name__)

def make_consumer(group_id: str) -> Consumer:
    return Consumer({
        "bootstrap.servers": "kafka-1:9092,kafka-2:9092",
        "group.id": group_id,
        "enable.auto.commit": False,        # we commit after work succeeds
        "auto.offset.reset": "earliest",
        "max.poll.interval.ms": 300_000,    # 5 min — kicks slow consumers out
        "session.timeout.ms": 45_000,
    })

def run(topic: str, group_id: str, handle):
    c = make_consumer(group_id)
    c.subscribe([topic])
    try:
        while True:
            msg = c.poll(timeout=1.0)
            if msg is None:
                continue
            if msg.error():
                # partition EOF is fine, anything else we log and continue
                if msg.error().code() != KafkaError._PARTITION_EOF:
                    log.error("poll error: %s", msg.error())
                continue
            try:
                handle(json.loads(msg.value()))
                c.commit(message=msg, asynchronous=False)
            except Exception:
                # don't commit — message will be redelivered after rebalance
                log.exception("handler failed for offset %d", msg.offset())
    finally:
        c.close()

Notice what this consumer controls: poll timing, batch size (via the broker config), commit timing, and what "processed" means. The producer doesn't see any of this. If the consumer dies for 6 hours, Kafka holds the messages (default retention is a week, but you can keep them for years). When the consumer comes back, it picks up at the last committed offset and drains the backlog.

That's the magic of pull. The buffer absorbs the consumer's downtime.

How they handle producer spikes (pull absorbs, push amplifies)

This is the failure mode that surprises teams most.

Imagine a checkout system that emits an order.placed event for every order. On a normal day, 100 orders per second. Black Friday morning, 8,000 per second for 90 seconds, then back to normal.

In a pull system with Kafka or SQS in front of the consumers, the buffer takes the spike. The consumer group sees the backlog rising and drains it at whatever rate it can sustain. End-to-end latency goes from 50ms to maybe 30 seconds during the spike. Nothing breaks. No alerts.

In a push system, the producer tries to POST 8,000 webhooks per second at the consumer's /webhook endpoint. The consumer's connection pool runs out. New connections queue. The load balancer's queue fills. Requests start timing out at 30 seconds. The producer's retry logic kicks in and now we have 8,000 original requests plus retries, hammering an already-overloaded receiver. This is the classic retry storm, and it's how short producer spikes turn into multi-hour outages.

The fix in push systems is rate limiting at the producer (Stripe caps webhook concurrency per endpoint), backoff schedules, and dead letter queues for permanently-failed deliveries. All real, all expensive to build, all things teams discover after their first incident.

Pull doesn't need any of that. The buffer is the rate limiter.

How they handle consumer outages (push loses, pull catches up)

The mirror image of the spike scenario is the outage scenario.

Your consumer is down for 2 hours. Database migration, deploy bug, whatever.

In a pull system, nothing happens to the producer. It keeps writing to Kafka. Messages accumulate in the partitions. When the consumer comes back, it sees the unprocessed offsets and works through them. If you provisioned enough partitions to allow parallelism, you can catch up in minutes by scaling the consumer group temporarily.

In a push system, every webhook delivery during those 2 hours fails. The producer's retry policy decides what happens next. Stripe retries for 3 days; AWS SNS retries with limits configurable per subscription; GitHub gives up after a few attempts and writes the failure to an admin page nobody looks at. If your retry window is shorter than your outage, those events are gone unless the producer is willing to backfill on request — and most aren't.

This is why "we use webhooks for everything" usually becomes "we use webhooks plus a daily reconciliation job that pulls the truth from the source-of-record API." Teams end up building pull on top of push because push alone loses events.

Backpressure mechanisms per style

Pull has backpressure for free. The consumer doesn't poll faster than it can process. The buffer grows during slow periods and shrinks during fast ones. The producer never knows.

Push needs explicit backpressure, and it's awkward.

The HTTP-level mechanism is 429 Too Many Requests plus a Retry-After header. The producer is supposed to read this and back off. Some producers respect it (Stripe, GitHub). Many don't (your in-house service that fires HTTP from a cron job). And 429 only works if your service can still respond to the producer with a 429 — which means it's not actually overloaded, just selectively rejecting.

The transport-level mechanism is connection limits. Set your reverse proxy to accept N concurrent connections, no more. Past that, requests get rejected at the LB. The producer sees connection failures and (if well-behaved) retries with backoff.

Neither of these is as clean as "the queue grew by 10,000 messages." That's why senior engineers reach for pull whenever the work doesn't strictly need millisecond latency.

Real systems are usually hybrid: push for notifications, pull for processing

Once you've stared at the tradeoff long enough, you stop picking one. You pick both.

The pattern: push a tiny notification, pull the heavy data.

# producer side — the push is a 200-byte heads-up
def on_order_placed(order_id: str):
    db.insert("orders", order_id=order_id, status="placed", ...)
    notify_webhook(
        url=subscriber.webhook_url,
        body={"event": "order.placed", "order_id": order_id},
    )

# consumer side — push handler just records intent, pulls the rest
@app.post("/webhooks/orders")
async def receive(req: Request):
    event = await req.json()
    # idempotent enqueue keyed on order_id
    await tasks.enqueue("fetch_order", order_id=event["order_id"])
    return {"ok": True}, 202

# worker — pulls from the queue, fetches the full record at its own pace
async def fetch_order(order_id: str):
    order = await api.get(f"/orders/{order_id}")  # producer's read API
    await db.upsert("local_orders", **order)

Webhook delivery is now a 200-byte ping. The receiver's job is to write the event ID into a queue and return 202 in under 50ms. The actual processing happens on a worker pool that drains the queue at whatever rate is healthy. If the worker pool goes down, the queue grows; no webhook deliveries fail.

Stripe's docs explicitly recommend this pattern. The webhook is a notification, not a payload. The full source of truth is the producer's read API, which you can call at your own cadence.

This is also how Kafka Connect, Debezium CDC pipelines, and most production event systems work. The CDC connector pushes a small notification to Kafka; consumers pull. Nobody pushes a 10MB payload through 14 hops of HTTP.

The gotcha: webhooks need retry + idempotency on the receiver; most teams under-engineer this

Here's where most teams ship the bug that bites them six months in.

Webhooks retry. Stripe retries. GitHub retries. SNS retries. Your in-house webhook producer retries because you copied the pattern from Stripe's docs. Retries mean the same logical event arrives at your receiver multiple times. If your handler isn't idempotent, you charge the customer twice, you send two emails, you double the inventory decrement.

The minimum viable receiver looks like this:

from fastapi import FastAPI, Request, HTTPException
import hmac, hashlib, asyncpg

app = FastAPI()

@app.post("/webhooks/payments")
async def receive(req: Request):
    body = await req.body()
    sig = req.headers.get("Stripe-Signature", "")

    # 1. verify signature — reject forgeries before any DB touch
    if not verify(body, sig, WEBHOOK_SECRET):
        raise HTTPException(401, "bad signature")

    event = json.loads(body)
    event_id = event["id"]   # Stripe gives you a unique id per event

    # 2. idempotency — write the event id first, in its own txn
    async with pool.acquire() as conn:
        try:
            await conn.execute(
                "INSERT INTO processed_events (id, received_at) VALUES ($1, NOW())",
                event_id,
            )
        except asyncpg.UniqueViolationError:
            # already processed — return 200 so the producer stops retrying
            return {"status": "duplicate"}

    # 3. enqueue actual work — don't process inline
    await queue.enqueue("handle_payment_event", event_id=event_id, payload=event)

    # 4. ack within the producer's timeout (usually 30s)
    return {"status": "queued"}

Three things every webhook receiver needs and most don't have:

Signature verification before anything else. The receiver is on the public internet. Anyone can POST to it. If you write to the DB before checking the signature, you've built a public RCE on your event handler.

Idempotency on the event ID, persisted in your own store. Not Redis (it evicts), not in-memory (it resets on deploy). A real table with a unique constraint. The INSERT ... ON CONFLICT DO NOTHING pattern or a try-catch on the unique violation. The point is: the second delivery of the same event should be a no-op.

Decouple receipt from processing. Return 2xx fast, do the work async. If you process inline, a slow downstream call makes the producer think you failed and retry, turning one event into ten attempts. The receiver becomes a 50ms-budget endpoint whose only job is to validate, dedupe, and enqueue.

The thing that keeps biting people: idempotency keys on Stripe's side don't help here. Those protect your API calls to Stripe from being processed twice. They do nothing for webhook deliveries flowing the other way. Different problem, different key.

Picking the right side

A small checklist for your next system.

Pick push when: latency under one second matters, payloads are small, the consumer is reachable and well-sized, you control or trust the producer's retry behavior, and you're willing to invest in idempotency + signature + DLQ on the receiver.

Pick pull when: throughput is bursty, the consumer might be slow or down, you want replay, you want fan-out to multiple consumer groups at different speeds, or latency above a few seconds is acceptable.

Pick hybrid when: you want push's latency but pull's reliability, or when the payload is large and only some consumers care about the full record. This is the production default for a reason.

The mistake to avoid is treating delivery style as a taste preference. It's a reliability decision. The day your traffic doubles or your consumer crashes, the architecture you picked will either absorb the event or become the headline of the incident report.

If this was useful

Pull vs push is one of those choices that looks like plumbing and turns out to be load-bearing. The System Design Pocket Guide: Fundamentals walks through the same tradeoff lens for the rest of the core building blocks (queues, caches, replication, consistency) so the next time you sketch a system on a whiteboard, you're picking failure shapes on purpose instead of by accident.

What's the worst push-vs-pull mismatch you've inherited, and what did the cleanup actually look like?