Gabriel Anhaia

Posted on May 23

Design a Real-Time Ad-Bidding System (The Way Interviewers Actually Want)

#systemdesign #interview #adtech #distributedsystems

Book: System Design Pocket Guide: Interviews — 15 Real System Designs, Step by Step
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Ad-tech interviews are the system design loop with the highest fail rate, and it's not because the candidates are weak. It's because every "design Twitter" template they rehearsed maps badly to a 100ms auction.

The interviewer isn't testing whether you can sketch a load balancer in front of a service. They want to know if you understand that the system is not a service. It's a deadline. And the deadline is roughly 100 milliseconds, hard.

If you treat this like a normal request/response design, you'll spend 20 minutes drawing things the interviewer already assumes you can draw, and you'll never get to the four components that actually decide the auction.

The clock is the system

The whole design hangs off one number. From the moment a user's browser hits a publisher's page, the SSP has roughly 100ms to send back an ad. Maybe 120ms. After that the page renders without you and your bid was worthless even if it would have won.

That budget gets carved up something like this:

~20ms, network round-trip from SSP edge to your exchange
~10ms, request parse, user lookup, decoration
~50ms, fanout to bidders + wait for responses
~10ms, auction logic, response build, send

Notice what's missing. There's no slot for "wait for Kafka," no slot for "double-check the database," no slot for "let me consult the ML model with a 200ms SLA." If a step doesn't fit in single-digit milliseconds, it doesn't happen on the hot path.

The mental shift the interviewer is looking for: the system runs to a deadline, not to completion. You don't collect bids until everyone answers. You collect bids until the clock runs out, then you auction whatever showed up.

Read that twice.

That single idea reframes every component below.

The actors (60-second OpenRTB primer)

Run through this fast so the interviewer knows you know it. Don't camp here.

Publisher: the website or app showing the ad slot. Loads a script.
SSP (Supply-Side Platform): the publisher's broker. Sells inventory.
Ad Exchange: the auction venue. This is what you're designing.
DSP (Demand-Side Platform): the advertiser's broker. Decides whether and how much to bid for this impression on behalf of N campaigns.
Bidder: the actual service inside a DSP that returns a price + creative.

The exchange receives a BidRequest (OpenRTB 2.6 protobuf or JSON), fans it out to N DSPs, picks the winner, returns a BidResponse to the SSP. That's it. Everything below is about how you survive doing that 1.5 million times a second.

Component 1: Deadline-bounded fanout

This is the single component that separates the candidates who've thought about ad-tech from the ones who haven't. Get this right and you've earned the rest of the interview.

You do not wait for all bidders to respond. You set a deadline of, say, 60ms after fanout starts, and at 60ms you take whatever you have and run the auction. A bid that arrives at 61ms is logged and dropped.

Here's the shape in Go with gRPC, because it's the cleanest way to show context-cancellation deadlines:

func (e *Exchange) RunAuction(
    parent context.Context,
    req *pb.BidRequest,
) (*pb.BidResponse, error) {
    // 60ms hard cutoff on the fanout, separate from
    // the outer 100ms SSP deadline
    ctx, cancel := context.WithTimeout(parent, 60*time.Millisecond)
    defer cancel()

    bidders := e.eligibleBidders(req) // ~30-150 in practice
    bids := make(chan *pb.Bid, len(bidders))

    for _, b := range bidders {
        go func(b Bidder) {
            // hedge: fire to primary, then if no reply
            // by 30ms also fire to a warm replica
            bid, err := e.hedgedBid(ctx, b, req, 30*time.Millisecond)
            if err != nil {
                return // silently drop, do NOT block fanout
            }
            select {
            case bids <- bid:
            case <-ctx.Done():
            }
        }(b)
    }

    // gather until cutoff, never wait-for-all
    var collected []*pb.Bid
    for {
        select {
        case bid := <-bids:
            collected = append(collected, bid)
        case <-ctx.Done():
            return e.auction(collected, req), nil
        }
    }
}

The hedgedBid helper is the tail-latency trick. You send the request to a primary bidder pod. If you don't get a response within ~30ms (well under the cutoff), you fire the same request at a warm replica and race them:

func (e *Exchange) hedgedBid(
    ctx context.Context,
    b Bidder,
    req *pb.BidRequest,
    hedgeAfter time.Duration,
) (*pb.Bid, error) {
    result := make(chan *pb.Bid, 2)
    errCh := make(chan error, 2)

    fire := func(addr string) {
        c := pb.NewBidderClient(b.Conn(addr))
        bid, err := c.Bid(ctx, req)
        if err != nil {
            errCh <- err
            return
        }
        result <- bid
    }

    go fire(b.Primary)

    select {
    case bid := <-result:
        return bid, nil
    case <-time.After(hedgeAfter):
        // primary is slow, race a replica
        go fire(b.Replica)
    case <-ctx.Done():
        return nil, ctx.Err()
    }

    select {
    case bid := <-result:
        return bid, nil
    case <-ctx.Done():
        return nil, ctx.Err()
    }
}

If the interviewer pushes on hedging, the right thing to say is: hedged requests trade extra QPS for tail-latency cuts. Google's "The Tail at Scale" paper has the numbers (p99.9 drops dramatically for ~5 percent extra load). In ad-tech, p99 latency is revenue, so you pay the QPS.

The gotcha here, and they will ask: what if the hedged bidder is non-idempotent? In a strict RTB auction, both responses can come back and you treat the second one as a duplicate at the auction layer (use the bid ID as a dedupe key). Logging-wise you note both arrived so the bidder team can see their primary is slow.

Component 2: Budget pacer

A DSP signs a campaign for, say, $50,000 over a day. If you do nothing, the bidder will burn through it in the first 15 minutes when traffic is peakiest. That's bad for the advertiser, bad for the platform (lopsided delivery), and the interviewer wants to hear you've thought about it.

Budget pacing is a pre-bid throttle. Before the exchange sends a BidRequest to a bidder, it checks: does this advertiser have budget left in the current pacing window?

The data structure is a sharded token bucket per advertiser:

type PacerKey struct {
    AdvertiserID string
    Window       string // "day:2026-05-23" or "hour:2026-05-23T14"
}

type TokenBucket struct {
    mu           sync.Mutex
    tokens       float64 // dollars, or "impressions"
    refillRate   float64 // tokens per second
    capacity     float64
    lastRefill   time.Time
}

func (tb *TokenBucket) Take(cost float64) bool {
    tb.mu.Lock()
    defer tb.mu.Unlock()

    now := time.Now()
    elapsed := now.Sub(tb.lastRefill).Seconds()
    tb.tokens = min(tb.capacity, tb.tokens+elapsed*tb.refillRate)
    tb.lastRefill = now

    if tb.tokens < cost {
        return false
    }
    tb.tokens -= cost
    return true
}

Capacity = the budget for the window. Refill rate = budget divided by window seconds. A $50K daily budget at evenly-paced delivery is ~$0.58/second of refill. If the bucket has $0.58 and the bid asks $1.50, you skip this bidder for this auction.

The hard part is that this isn't a single bucket. It's millions of buckets, one per (advertiser, window) pair, and the exchange has to consult the right one inside the parse phase, that ~10ms slot before fanout starts. Single global Redis with MULTI/EXEC per advertiser doesn't fit.

The shape that works:

Each exchange pod owns a shard of advertiser IDs (consistent hashing).
The pod keeps the bucket in-memory.
A side channel (Kafka topic per shard, or a Redis stream) publishes refill events and out-of-band budget adjustments.
Pods sync state every ~5s to a durable store so a pod restart loses at most 5s of pacing accuracy.

The gotcha: don't put the pacer behind a network call from the parse phase. You'll blow the 10ms budget. Inline it. Yes, that means each exchange pod holds the slice of advertiser state mapped to its shard, and yes, that means consistent hashing and rebalancing pain. That's the trade.

Component 3: Click-fraud / invalid-traffic signals

Roughly 15-25% of ad traffic is invalid (bots, click farms, ad-stacking, malformed user agents; the IAB tracks this). You don't want to bid on it. The DSPs definitely don't want to pay for it.

So the exchange runs a fraud/IVT sidecar during the parse phase, before fanout. The contract has to be brutal about latency: under 5ms or it doesn't happen.

Sidecar API:

service FraudSignal {
  rpc Score(ScoreRequest) returns (ScoreResponse);
}

message ScoreRequest {
  string request_id = 1;
  string ip = 2;
  string user_agent = 3;
  string device_id = 4;
  string publisher_id = 5;
  int64  timestamp_ms = 6;
}

message ScoreResponse {
  float  fraud_score = 1; // 0.0 clean .. 1.0 fraud
  repeated string signals = 2; // "ip_blocklist", "ua_anomaly", "click_velocity"
  bool   served_from_cache = 3;
  int32  latency_us = 4;
}

Implementation rules the interviewer will probe:

Deployed as a sidecar, same pod as the exchange instance. Localhost UDS or loopback gRPC, no network. Latency dominated by serialization, not network.
Cache aggressively on (IP, UA) hash with ~60s TTL. Most fraud signals don't flip in seconds.
Failure mode is fail-open with a flag. If the sidecar times out (>5ms), you set signals: ["sidecar_timeout"] and proceed without blocking the auction. The decision to drop the request is policy, not infrastructure.
The actual scoring (IP reputation, UA fingerprinting, click velocity, bot lists) is a model + rule engine fed by an async pipeline. The sidecar is just the read path.

The candidate-killer question: "What about novel fraud patterns?" The honest answer is the sidecar can't catch those at request time. Novel patterns get caught by the post-hoc IVT pipeline downstream: same logs, batch jobs, model retraining. The sidecar is the line of defense for known fraud at <5ms.

Component 4: Frequency cap & user-state lookup

You don't want to show the same person the same ad 200 times in a day. Frequency caps say "no more than N impressions per (user, campaign) per window."

This is the fastest, simplest component, but candidates sometimes forget it exists. Don't.

Redis with TTL-keyed counters does the whole job:

KEY:   freq:{user_id}:{campaign_id}:{window}
VALUE: integer counter
TTL:   window expiry (e.g. 86400s for daily cap)

Read pattern, during the parse phase:

func (f *FreqCap) Allow(
    ctx context.Context,
    userID, campaignID string,
    cap int,
) (bool, error) {
    key := fmt.Sprintf(
        "freq:%s:%s:%s",
        userID, campaignID, dayBucket(),
    )
    // single round-trip GET, ~1ms p99 on a co-located redis
    n, err := f.rdb.Get(ctx, key).Int()
    if err == redis.Nil {
        return true, nil // never seen, allow
    }
    if err != nil {
        return true, nil // fail-open, log
    }
    return n < cap, nil
}

Write pattern, on win (not on bid; you don't want to count impressions you didn't actually serve):

func (f *FreqCap) RecordImpression(
    ctx context.Context,
    userID, campaignID string,
    ttl time.Duration,
) {
    key := fmt.Sprintf(
        "freq:%s:%s:%s",
        userID, campaignID, dayBucket(),
    )
    pipe := f.rdb.Pipeline()
    pipe.Incr(ctx, key)
    pipe.Expire(ctx, key, ttl)
    _, _ = pipe.Exec(ctx) // best-effort, do NOT block response
}

Two gotchas worth mentioning:

The EXPIRE only sets TTL if not already set. Use SET ... EX semantics or INCR + EXPIRE in a pipeline and accept that a tiny fraction of keys will re-extend their TTL. The cost of being slightly over the cap is far less than the cost of measuring it perfectly.
Cookieless users. Roughly half your traffic has no stable user ID. Frequency cap on a fingerprint or device-graph hash, and be explicit with the interviewer that the cap is "soft" for those segments.

The fanout topology

Now the picture comes together. One slide, end to end:

                  Publisher / SSP
                        │  (OpenRTB BidRequest)
                        ▼
            ┌───────────────────────┐
            │   Exchange ingress    │
            │   (gRPC, 60ms timeout)│
            └─────────┬─────────────┘
                      │ parse, decorate
                      ▼
          ┌─────────────────────────┐
          │ Pre-bid pipeline (<10ms)│
          │  - Fraud sidecar (<5ms) │
          │  - Freq cap GET (~1ms)  │
          │  - Pacer Take() (in-mem)│
          └─────────┬───────────────┘
                    │
                    ▼
         ┌─────────────────────────┐
         │ Deadline-bounded fanout │
         │ (60ms cutoff, hedged)   │
         └──┬──────┬──────┬─────┬──┘
            ▼      ▼      ▼     ▼
          DSP1   DSP2   DSP3   ...DSPn
            │      │      │     │
            └──────┴──────┴──┬──┘
                             ▼
                  ┌─────────────────────┐
                  │ Auction + response  │
                  │  (second-price)     │
                  └───────────┬─────────┘
                              │
              ┌───────────────┴────────────────┐
              ▼ (hot path)                     ▼ (async)
       Response to SSP                Kafka → S3 → warehouse
                                      + freq cap INCR on win

The async logging deserves its own paragraph. Every bid request, every bidder response, every win, every loss, every fraud signal: all of it must be logged. None of it can be logged synchronously.

The pattern is write-behind: serialize the log record into a local ring buffer or memory-mapped file in the exchange pod (sub-millisecond), then a background goroutine flushes batches to Kafka. From Kafka, downstream consumers fan it out to S3 (raw), to ClickHouse or BigQuery (analytics), to the fraud retraining pipeline, to the budget reconciliation job, and to the billing system.

If Kafka is down, the local buffer keeps queuing. If the buffer fills, you drop the oldest non-critical logs (debug-level diagnostics) but never drop billing-critical records. Those overflow to local disk and replay later. The hot path never blocks.

Interview-killer: a candidate says "we'll log to Postgres on every auction." That's a no-hire signal. Postgres on the hot path of a 100ms auction at 1.5M QPS is unworkable. Mention this trap explicitly so the interviewer knows you've thought about it.

Cold start, capacity, and the long tail

Quick numbers for the capacity section, which interviewers love because it's where you separate handwaving from arithmetic.

QPS: a mid-size exchange handles 1-2 million BidRequest/sec at peak. Call it 1.5M.
Fanout factor: average 50 bidders per request, so ~75M outbound gRPC calls/sec at the bidder tier.
Payload: typical BidRequest is 2-8KB, BidResponse 1-4KB. Ingress bandwidth ~6 GB/s, egress similar. This is why exchanges live in big-iron datacenters or hyperscaler edge regions.
p99 latency budget: 100ms end-to-end, with 60ms allocated to fanout. Your p99 on the fanout path must be under 60ms or you start dropping winners.

Cold start of a new exchange pod is its own headache. The in-memory pacer state has to be hydrated from the durable store before the pod takes traffic, the bidder connection pool has to be warmed (gRPC channel establishment is expensive), and the fraud sidecar's cache is empty so the first ~30s of traffic on a new pod is slightly more expensive. The standard fix: drain the pod's traffic gradually via the load balancer's slow-start config (Envoy's slow_start_config for example), and pre-warm pacer + bidder pools on pod ready.

The long tail also matters. Mobile traffic, in particular, has p99.9 latencies that are wildly worse than p99 because of flaky cellular networks at the SSP edge. Some exchanges run a separate, faster (40ms cutoff) auction path for mobile-app inventory and a slower (80ms cutoff) one for desktop web. Mentioning this differentiation is a strong signal that you've thought past the textbook design.

The 90-second answer

If they pull the "OK, summarize the whole design" move at the end, this is the shape that lands:

"It's a 100ms hard deadline system, not a request/response service. The exchange takes a BidRequest from an SSP, runs a pre-bid pipeline in ~10ms: fraud sidecar over UDS, frequency cap GET on a co-located Redis, and an in-memory token-bucket pacer per advertiser sharded by consistent hashing across exchange pods. Then it fans out to ~50 eligible bidders over gRPC with a 60ms deadline, using hedged requests to a warm replica if the primary doesn't reply by 30ms. At the cutoff, whatever bids arrived run through second-price auction, the winner gets returned to the SSP, and on win the frequency cap counter increments in a Redis pipeline. Logging is write-behind: every bid, win, and signal goes to a local ring buffer, then a background goroutine batches into Kafka, then S3 and warehouse downstream. The hot path never touches a database or a synchronous queue. Capacity sizing is ~1.5M QPS ingress and 75M gRPC fanout/sec, with p99 fanout under 60ms. The deadline is the system."

That's the answer. The four components most candidates miss (deadline-bounded fanout, the in-memory pacer, the fraud sidecar contract, and write-behind logging) are exactly what the interviewer is grading on.

What's the part of an ad-tech design you've seen candidates blow most often? Drop it in the comments. I'm especially curious which gotcha tripped you in a real loop, not the textbook ones.

If this was useful

Ad-tech auctions are one of the 15 designs in System Design Pocket Guide: Interviews — 15 Real System Designs, Step by Step. The book walks each system the way an interviewer expects you to walk it: hook, components, the gotchas candidates miss, and the 90-second answer. The bidding chapter goes deeper on the auction math and the failure modes around bidder timeout distributions, if that's the part you want to drill on next.