Gabriel Anhaia

Posted on May 23

Design a Multiplayer Game Lobby (NAT Traversal, Matchmaking, Anti-Cheat)

#systemdesign #interview #gamedev #networking

Book: System Design Pocket Guide: Interviews — 15 Real System Designs, Step by Step
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Every system design candidate who says "design Twitter" rehearses the wrong question. Real-time multiplayer is the hardest distributed-systems problem in the consumer-software world, and almost nobody preps for it. If the interviewer at Riot or Epic or Activision throws "design our game lobby" at you and you start sketching REST endpoints behind a load balancer, you've already lost the loop.

This post walks the design the way the interview actually goes. Four hard problems. Each one invalidates something you took for granted from web work.

Why a game backend doesn't look like a web backend

You designed REST APIs. You did Kafka. You shipped a CRUD service behind ALB. None of it applies cleanly here. The inversion is what interviewers want you to name in the first two minutes:

Assumption	Web backend	Game backend
Transport	TCP, HTTP/2	UDP (with reliability layer on top)
State authority	Client tells server what happened	Server simulates, client renders
Latency budget	200ms is fine	50ms is the budget, 150ms is broken
Threat model	Bots, scrapers, abuse	Every player is a potential cheater

Each row is its own hard problem. The transport row gives you NAT traversal. The state-authority row gives you the server-authoritative loop. The latency row forces UDP and snapshot tuning. The threat row forces a layered anti-cheat subsystem.

The candidate who treats these as four independent design questions, in this order, runs the clock cleanly. The candidate who tries to draw one big system diagram first gets stuck at the first NAT question and never recovers.

Hard problem 1: NAT traversal (STUN, TURN, ICE)

In a web backend, a client opens a TCP connection to a public server and you're done. In a game, two clients often need to talk to each other for peer-to-peer voice, or your dedicated game server has to push UDP packets to a player whose home router did NAT 30 minutes ago and the mapping is about to expire.

Public IPv4 is scarce. Your player's machine sits behind a home router that rewrites the source address on outgoing packets. When you, sitting on a dedicated server, try to send a UDP datagram to that player, the router has no idea which internal host you meant. The packet drops.

Three pieces fix this, and they're standardized in the WebRTC stack:

STUN (Session Traversal Utilities for NAT, RFC 8489). A small public server that does one job: when a client sends it a packet, STUN replies with "this is the source IP and port I saw." That's the client's reflexive address, the public IP:port their NAT exposed for this session. STUN is cheap. A bare-metal STUN server handles tens of thousands of clients per box.

TURN (Traversal Using Relays around NAT, RFC 8656). When two clients are both behind symmetric NATs that won't let an unsolicited inbound packet through, STUN alone fails. TURN is a relay. Both clients send their traffic to TURN, TURN forwards. It's expensive (you pay for every byte of game traffic) and adds latency (50ms+ depending on routing), but it always works. About 8-15% of consumer connections need TURN in practice.

ICE (Interactive Connectivity Establishment, RFC 8445). The protocol that orchestrates the dance. Each peer gathers candidate addresses: host (LAN), server-reflexive (via STUN), relayed (via TURN). They exchange candidates through a signaling channel (your matchmaker, over WebSocket or HTTPS). Then they probe pairs in priority order. Best is direct host-to-host on LAN, then reflexive-to-reflexive, last resort is relayed.

The handshake looks like this:

client A                 signaling           client B
   |                         |                  |
   | (1) gather candidates   |                  |
   |  host: 192.168.1.5:50000                   |
   |  STUN: 203.0.113.10:33421                  |
   |  TURN: 198.51.100.7:3478                   |
   |                         |                  |
   | (2) offer (SDP) ------->|----------------->|
   |                         |                  | (3) gather
   |                         |                  | (4) answer
   |<------------------------|<-----------------|
   |                         |                  |
   | (5) ICE probes: try pairs in priority order|
   |   host<->host -- fails (different LANs)    |
   |   reflexive<->reflexive -- succeeds        |
   |                                            |
   | (6) media flows directly, P2P              |
   |<==========================================>|

The gotcha most candidates miss: NAT mappings expire. The router holds the public→internal mapping only as long as traffic flows. If your client goes idle for 30 seconds, the mapping dies, the server's outbound packets vanish into nothing, and the player drops with no error. Fix: send a keepalive ping every 15-20 seconds on every active session. STUN binding requests work for this, or any tiny UDP packet your protocol tolerates.

The second gotcha: symmetric NAT. Some carrier-grade NATs (CGNAT, common on mobile networks) assign a different external port per destination. STUN gives you the port you used to talk to STUN, not the port you'd use to talk to your peer. ICE detects this and falls back to TURN. Budget for it.

Hard problem 2: Matchmaking and ELO routing

In a web backend, a queue is FIFO. You take the next job. In matchmaking, FIFO produces unplayable matches: a Bronze player against a Diamond, four solo queuers against a stack, regions mixed. The constraints matter more than the order.

Matchmaking is constraint satisfaction with a relaxation schedule. Each ticket arrives with constraints (skill rating, region, party size, queue type, language preference, role for class-based games). The matcher tries to assemble a match where every constraint holds, and when no valid match exists, widens constraints over time so players don't wait forever.

The classic shape. Initial skill band tight, expand with wait time:

import time
from dataclasses import dataclass

@dataclass
class Ticket:
    player_id: str
    skill: int          # MMR / ELO, e.g. 1500
    region: str         # "us-east", "eu-west"
    party_size: int
    queued_at: float    # unix epoch

def skill_band(ticket: Ticket, now: float) -> tuple[int, int]:
    """
    Skill window widens with wait time.
    0s waited  -> +/- 50
    30s waited -> +/- 150
    60s waited -> +/- 300
    """
    waited = now - ticket.queued_at
    # piecewise linear: flat early, steep after 30s
    if waited < 10:
        delta = 50
    elif waited < 30:
        delta = 50 + (waited - 10) * 5  # 50 -> 150
    else:
        delta = min(150 + (waited - 30) * 5, 500)
    return (ticket.skill - int(delta), ticket.skill + int(delta))

def find_match(
    pool: list[Ticket],
    team_size: int = 5,
) -> list[Ticket] | None:
    """
    Returns 2*team_size tickets that form a valid match,
    or None if no valid grouping exists yet.
    """
    now = time.time()
    # bucket by region first; cross-region is non-negotiable
    by_region: dict[str, list[Ticket]] = {}
    for t in pool:
        by_region.setdefault(t.region, []).append(t)

    for region, tickets in by_region.items():
        if len(tickets) < team_size * 2:
            continue

        # sort by skill, then sweep windows
        tickets.sort(key=lambda t: t.skill)
        for i in range(len(tickets) - team_size * 2 + 1):
            candidates = tickets[i : i + team_size * 2]
            # check every candidate's band overlaps the group center
            skills = [c.skill for c in candidates]
            center = sum(skills) / len(skills)
            ok = True
            for c in candidates:
                low, high = skill_band(c, now)
                if not (low <= center <= high):
                    ok = False
                    break
            if ok:
                return candidates
    return None

That's the kernel. Production matchmakers add:

Party stickiness: a 3-stack and a 2-stack joining one team must stay together.
Role fill for class-based games (League, Overwatch): match must contain N tanks, M DPS.
Backfill: slot a new player into a match in progress when someone drops.
Smurf / new-account dampening: wider skill on accounts under N games.
Connection-quality matching: players inside the same ISP bubble play together when possible (latency, jitter).

The honest answer to "how do you actually solve this at scale": you don't run one global matcher. You shard by region and queue type, each shard runs the constraint loop every 1-2 seconds on its local pool, and tickets that don't match within 60 seconds bump priority. Big studios use a hybrid. Primary constraints (region, queue type) shard the pool, secondary constraints (skill, role) run inside each shard.

The gotcha. If your skill band widens too fast, ratings stop meaning anything and your good players quit because every match is a stomp. If it widens too slow, queue times balloon and your bad players quit because they wait 4 minutes for a match they lose in 6. The tuning is a product decision, not a math one.

Hard problem 3: Server-authoritative game state

In a web backend, the client submits a form, the server validates, and "validates" means "checks the input matches the schema." In a game, the server runs the entire simulation. The client renders what the server says happened. Anything else is exploitable in 48 hours.

The loop on a dedicated game server, at 60Hz:

tick = 0
while running:
    # 1. drain inbound input from all connected clients (UDP)
    inputs = drain_input_buffer()  # per-player input frames

    # 2. validate every input
    #    - timestamp in window [now - 200ms, now]?
    #    - move within physically possible delta since last frame?
    #    - has cooldown for ability X expired?
    valid_inputs = [i for i in inputs if validate(i, state)]

    # 3. apply inputs to authoritative state
    state = step(state, valid_inputs, dt=1/60)

    # 4. snapshot to every client (UDP, unreliable, 20-30Hz)
    if tick % 2 == 0:  # snapshot at 30Hz, sim at 60Hz
        for client in clients:
            snapshot = build_delta_snapshot(client, state)
            send_udp(client.addr, snapshot)

    tick += 1
    sleep_until_next_tick()

Three numbers you'll want to defend in the interview:

Simulation tick rate: 60Hz to 128Hz. Counter-Strike runs at 64Hz or 128Hz, Valorant at 128Hz, most mobile games at 30Hz. Higher tick = lower input-to-response latency = more CPU per server = fewer matches per box.
Snapshot rate: 20Hz to 30Hz typically. Sending a snapshot every sim frame is wasteful, since clients can interpolate between snapshots without the player noticing.
Snapshot delta vs full: sending the full world state every tick costs bandwidth you don't have. You send a full snapshot once when the client joins, then deltas. Only the entities that changed, only the fields of those entities that changed, with a sequence number so the client can detect gaps.

The client compensates for the gap between sim and snapshot rate with two tricks. Client-side prediction: the client runs the same physics locally, immediately, on the local player's input, so the player sees instant response. Server reconciliation: when the next server snapshot arrives, the client compares its predicted state to the server's truth, and if they diverge it smoothly corrects (this is what feels like "rubber-banding" when your network hiccups).

Other players are handled with interpolation. The client holds back rendering of other players by ~100ms and interpolates between the last two received snapshots. Smooths over jitter. Combined with lag compensation: when a player shoots, the server rewinds the world to where the shooter saw it at the time they fired, and adjudicates the hit on that historical state. This is the only way "I clearly hit them" doesn't make every player rage-quit.

Worth knowing the family of papers here. Valve's Source Multiplayer Networking page from 2003 is still the cleanest writeup of this entire stack and every interviewer in the space has read it. Cite it if it comes up.

The gotcha. If you do prediction without reconciliation, the local player's view drifts from the server's and they walk through walls on their screen but die instantly on the server's. If you do reconciliation without smooth interpolation, every correction is a visible teleport. Both are required, neither is optional.

Hard problem 4: Anti-cheat (the layered defense)

In a web backend, security is "rate-limit, validate input, sanitize." In a game, every player is an adversary with months to develop tools and money to spend on hacks. Anti-cheat is a subsystem with four layers, each catching what the previous layer missed.

Layer 1: server-side input validation. The server already runs the authoritative simulation. Reject any input that's physically impossible. Move delta exceeds max run speed? Reject and log. Aim direction changed faster than human reflexes (snapping 180° in <50ms repeatedly)? Flag. Fire rate exceeds the weapon's cooldown? Reject. This catches the laziest cheats: speed hacks, basic aimbots, ammo dupes. It also costs you nothing because you're running the simulation anyway.

Layer 2: client-side agent. A kernel-mode (Vanguard, EAC, BattlEye) or user-mode (older systems) process that runs on the player's machine, scans for known cheat binaries, hooks suspicious DLLs, watches for memory edits to the game process, and reports back. Kernel-mode is more effective and more controversial. It's privileged code on your customers' machines and a single bug is a CVE. The trade-off is real. Riot ships Vanguard for Valorant, Epic ships EAC for Fortnite, both kernel-mode. Counter-Strike 2 uses VAC (user-mode) and supplements with server-side. Pick deliberately; it's a brand decision.

Layer 3: behavioral detection. Run statistical analysis over recorded match telemetry. A player whose aim-to-target acceleration distribution is too smooth (real humans overshoot and correct, aimbots don't) gets flagged. A player whose enemy-detection-through-wall rate is two standard deviations above the population gets flagged. This is where ML actually does useful work. Feature engineering matters (per-engagement features, not per-match), and the model output is suspicion score, not ban.

Layer 4: ban management. Behavioral flags don't trigger immediate bans. They feed a review queue. Confirmed cheats get a delayed ban (ban waves). Valve famously batches bans so cheat developers don't immediately know which signature got them caught. Hardware bans (HWID), payment-method bans, IP-range bans all stack. False positives in this layer kill your game's reputation, so the bar for automated permanent ban is high.

A representative pipeline:

client input ──> Layer 1 (server validation)
                      │ rejects impossible inputs immediately
                      ▼
                 simulation step
                      │
                      ├─> match telemetry stream
                      │       (positions, hits, deaths, inputs)
                      ▼
                 snapshot to clients
                      │
                      ▼
           Layer 2: client agent telemetry
                      │ (process scan, memory protect, etc.)
                      ▼
             ┌────────┴────────┐
             ▼                 ▼
    Layer 3: behavioral    rule-based flags
        ML scoring         (impossible KDR, etc.)
             │                 │
             └────────┬────────┘
                      ▼
              Layer 4: review
                      │
                      ▼
              ban wave / appeal

The gotcha that ends careers: never ban only on Layer 3 model output. Always require corroborating signal from Layer 1 or Layer 2. False-positive permabans on legitimate players is the single fastest way to destroy a competitive game's community.

The lobby itself (the easy part)

This is the only part of the system that looks like web. A lobby service is a CRUD-ish app:

Create / join / leave a lobby (REST or WebSocket).
Lobby state (members, ready status, map vote, party size) in Redis with TTL.
Lobby chat via WebSocket or your existing voice infra.
Once everyone's ready, hand off to the matchmaker (or directly to a dedicated game server allocation).

A Postgres + Redis + a thin Node or Go service handles this for millions of CCU. The reason candidates over-design it is they've seen Redis-backed lobby code before and it feels safe. Spend three minutes here, max. The interviewer wants to hear the four hard problems above.

Putting it together: request and connection flow

Walk this in your answer, end to end:

Client logs in over HTTPS, gets an auth token (your normal web identity stack: Cognito, Auth0, whatever).
Client opens a WebSocket to the lobby service. Creates or joins a lobby.
Players ready up. Lobby hands the party to the matchmaker as a ticket.
Matchmaker runs the constraint loop, finds a match, asks the session allocator for a dedicated game server in the right region.
Allocator returns a server IP, port, and join token. Lobby pushes that to every client over the WebSocket.
Clients connect to the game server over UDP, using the join token. If direct connection fails (P2P games), ICE candidate exchange happens through the lobby's WebSocket as signaling.
Game server runs the authoritative simulation, broadcasts snapshots, validates inputs (Layer 1 anti-cheat).
Anti-cheat client agent on each player's machine streams telemetry in parallel (Layer 2). Server-side behavioral pipeline (Layer 3) consumes match telemetry.
Match ends. Server reports result. ELO update for each player. Lobby cleans up. Anti-cheat review queue gets any flagged sessions.

The 90-second answer

When the interviewer says "design a multiplayer game lobby" and looks at the clock, this is what you say:

"Game backends invert four web defaults. They're UDP not TCP, server-authoritative not client-authoritative, latency-bounded at 50ms not 200ms, and adversarial against every player. So the system has four hard problems on top of a CRUD lobby.

First, NAT traversal. STUN to discover public reflexive address, TURN as relay fallback when symmetric NAT blocks direct connection, ICE to orchestrate candidate exchange and probing. Keepalives every 15-20 seconds so router mappings don't expire.

Second, matchmaking is constraint satisfaction, not FIFO. Skill band starts tight, widens with wait time, sharded by region and queue type. Each shard runs the constraint loop on its local pool every couple of seconds.

Third, server-authoritative simulation. Dedicated server runs at 60-128Hz, broadcasts delta snapshots at 20-30Hz, clients do prediction-and-reconciliation locally with lag compensation for hit registration. Source Engine networking model is the canonical reference.

Fourth, anti-cheat is four layers. Server-side input validation (free, catches obvious), client-side kernel agent (Vanguard / EAC / BattlEye for AAA), behavioral ML on telemetry (suspicion scoring), and a ban-management queue with delayed ban waves to obscure detection signatures.

The lobby itself (Redis-backed state, WebSocket for client connection, Postgres for durable account data) is the smallest piece. Most of the engineering goes into those four problems."

Then you stop talking and let them pick the one they want to drill into.

Which of the four would you push hardest on in your own answer, and which one have you seen done badly in production? Drop the war stories in the comments.

If this was useful

This walkthrough is the same shape every system design question has. Name the inversions first, then go problem-by-problem. The System Design Pocket Guide: Interviews does this for 15 different designs, including real-time and game-adjacent ones. If you're prepping for senior interviews at studios or real-time-product companies, the matchmaking and anti-cheat chapters in particular cover what this post had to compress.