Jay Shah

Posted on Mar 13

What the Candidates Tournament Taught Me About Algorithm Design

#algorithms #chess #programming #computerscience

I've been following chess tournaments casually for years, but when I started thinking seriously about algorithm design chess problems, the 2026 FIDE Candidates Tournament stopped me cold. Not because of the drama — though Hikaru Nakamura (2811) arriving as the top seed and Praggnanandhaa (2760) carrying the weight of a billion chess fans will generate plenty of that — but because the tournament format itself is a textbook computer science problem hiding in plain sight.

Eight players. Every player faces every other player twice, once with each color. Fourteen rounds over 18 days in Paphos, Cyprus (March 29 – April 15). When I sketched it out on a whiteboard, what I saw wasn't a chess bracket. I saw a fully connected directed graph with 8 nodes and 56 directed edges, a combinatorial scheduling problem, a ranking algorithm stress test, and a constraint satisfaction engine operating under adversarial conditions. This article is my attempt to work through what the Candidates reveals about building systems that actually hold up.

Follow the 2026 Candidates live at shatranj. live/candidates as the games unfold.

Eight Nodes, 56 Edges, Zero Easy Rounds

Start with the graph. In a double round-robin tournament with n players, the number of games played is n × (n − 1). With 8 players that's 8 × 7 = 56 games. Each game is a directed edge (White → Black) between two nodes. Every node has an out-degree of 7 (you play White against each opponent once) and an in-degree of 7 (you play Black against each opponent once), giving every node a total degree of 14, one game per round.

This is the complete directed graph K₈ with each undirected edge replaced by two antiparallel directed edges. The moment you draw it, you immediately see the scheduling problem: you need to decompose this graph into 14 perfect matchings (one per round), where each matching covers all 8 nodes exactly once, and across all 14 rounds, each ordered pair (i, j) appears exactly once.

That decomposition problem has a name in combinatorics: it's a 1-factorization of the complete graph. For even n, a round-robin schedule always exists and can be constructed with the polygon rotation method (fix one node, rotate the others). The computational complexity is O(n²) to construct the full schedule, linear in the number of games. FIDE doesn't reinvent this per tournament; they use a pre-verified schedule and a separate draw for color assignments. The formal proof that a 1-factorization always exists for even n is a standard result in combinatorial graph theory, detailed in Walecki's decomposition theorem.

The graph framing also immediately surfaces a constraint invisible in casual coverage: there are no "easy" rounds. Every round is a perfect matching. Fabiano Caruana (2795) cannot strategically rest in a round where weak opponents are clustered elsewhere. The structure forces complete coverage by construction.

Round-Robin Scheduling Is a Real CS Problem

Most developers encounter scheduling algorithms in the context of operating system process schedulers or job queues, but tournament scheduling is the original form. The round-robin scheduler in your OS kernel shares more than a name with a chess round-robin, both are solutions to the same fairness problem: how do you allocate a shared resource (CPU time, White pieces, favorable pairings) equitably across competing actors?

The polygon rotation method for constructing a round-robin schedule works like this: label the players 1 through n. Fix player n in place. Arrange players 1 through n − 1 in a regular polygon. In each round, rotate the polygon by one position. Player at position i in the polygon plays the player diagonally opposite. Player n plays whoever ends up at the "top" position.

def generate_schedule(players):
    n = len(players)
    if n % 2 != 0:
        players.append("BYE")
        n += 1

    fixed = players[-1]
    rotating = players[:-1]
    schedule = []

    for round_num in range(n - 1):
        pairs = [(fixed, rotating[0])]
        for i in range(1, n // 2):
            pairs.append((rotating[i], rotating[n - 2 - i]))
        schedule.append(pairs)
        rotating = [rotating[-1]] + rotating[:-1]  # rotate right by 1

    return schedule

For the Candidates specifically, this generates 7 rounds for the first half and the colors are swapped for the second half, so each directed edge appears exactly once in each direction. The elegance here is that the schedule is provably fair at construction time, not fair in expectation. No post-hoc correction needed.

There's a deeper lesson for distributed systems engineers in this: when you can prove a property holds structurally, you don't need runtime checks to verify it. The Candidates schedule doesn't need a fairness monitor. The fairness is baked into the factorization.

The Elo Rating System as a Ranking Algorithm

The eight Candidates players carry Elo ratings computed from hundreds to thousands of rated games played over their careers. Understanding how Elo works — and why it has been the standard for competitive ranking since Arpad Elo introduced it to FIDE in 1970 — reveals a lot about the tradeoffs in any ranking algorithm.

The Elo system models the probability that player A beats player B as a logistic function of their rating difference:

# Elo expected score and update formula
def elo_expected(rating_a, rating_b):
 return 1 / (1 + 10 ** ((rating_b - rating_a) / 400))

def elo_update(rating, expected, actual, k=10):
 return rating + k * (actual - expected)

# Example: Sindarov (2726) vs Nakamura (2811)
# Expected score for Sindarov:
E_sindarov = elo_expected(2726, 2811) # ~0.382
# If Sindarov wins (actual = 1.0), his rating change:
delta = elo_update(2726, E_sindarov, 1.0, k=10) # +6.18 points

The 400-point constant in the denominator means a player rated 400 points higher is expected to score 10 times more points than the lower-rated player. The K-factor controls how quickly ratings respond to new results. Per the FIDE Rating Regulations effective January 2024, FIDE uses K=10 for players rated above 2400 (as all Candidates participants are), K=20 for players under 2400, and K=40 for new players building their initial rating.

This is gradient descent in disguise. Each game is a training step. The "loss function" is the difference between predicted performance (expected score) and actual performance. The K-factor is the learning rate. Too high a K-factor and ratings oscillate wildly, a single upset sends ratings swinging implausibly. Too low and the system takes years to converge to an accurate estimate of true strength.

The Candidates field exposes a specific stress case for Elo: rating compression at the top. The difference between Nakamura (2811) and Matthias Bluebaum (2678) is 133 points, which the Elo formula translates to an expected score of roughly 0.68 for Nakamura in any given game. But at this level, preparation quality and opening theory can swing that probability substantially in any individual game. Elo is a long-run estimator; it was never meant to predict single games with high confidence.

For more background on the players and their ratings heading into the tournament, the Candidates 2026 preview has full analysis.

Minimax, Game Trees, and the AlphaZero Breakthrough

Every move in a chess game is a node in a search tree. White evaluates candidate moves, for each move evaluates Black's best response, for each response evaluates White's best counter, and so on. This recursive structure is minimax search: maximize your score on your turns, minimize your score on your opponent's turns.

Formally, for a game tree of depth d with branching factor b, brute-force minimax visits O(b^d) nodes. Chess has a branching factor around 35 and modern engines search to depths of 25-30 plies. That's 35^30 ≈ 10^46 nodes, completely intractable without pruning.

Alpha-beta pruning reduces this to O(b^(d/2)) in the best case, effectively doubling the depth searchable in the same time budget. Stockfish 17, the dominant classical engine, combines alpha-beta with NNUE (Efficiently Updatable Neural Networks) for position evaluation and routinely searches 25+ plies deep with selective extensions pushing tactical lines to 40+ plies.

Then AlphaZero came along and inverted the entire approach.

"AlphaZero is a remarkable demonstration of how far a general-purpose reinforcement learning algorithm can go when given the right environment. It learned chess from scratch in a matter of hours and proceeded to defeat Stockfish."
— Demis Hassabis, co-founder of DeepMind and senior author on the AlphaZero paper published in Science, December 2018

Per the published Silver et al. (2018) findings, AlphaZero trained by playing 44 million self-play games in 9 hours. It replaced the handcrafted evaluation function with a deep residual neural network and replaced exhaustive search with Monte Carlo Tree Search (MCTS) guided by a policy network. Where Stockfish at the time searched roughly 70 million positions per second, AlphaZero searched around 80,000 — nearly 1,000 times fewer. Yet it won.

The engineering lesson is uncomfortable for engineers who love optimizing the obvious bottleneck: sometimes the right answer is a different algorithm, not a faster implementation of the current one. Nakamura and Caruana both prepare with Stockfish. Their seconds run engine evaluations to ±0.01 pawns. Yet the most decisive games in the Candidates will likely be decided by preparation surprises, novelties prepared at home that take an opponent out of their precomputed lines. Preparation, not calculation speed, is the winning variable.

Opening Preparation as Caching

Before Round 1 of the Candidates begins, each player has spent months building an opening repertoire. Praggnanandhaa's team has analyzed thousands of lines in the Sicilian Defense, the Nimzo-Indian, and sharp gambits tailored to the specific tendencies of each opponent. This is exactly what it looks like computationally: pre-computing a lookup table and caching the results.

Opening theory extends roughly 20-25 moves deep in heavily analyzed lines, sometimes deeper in forced theoretical variations. A top player working through their home analysis isn't calculating over the board; they're pattern-matching against memorized positions. The calculation starts the moment they leave book. Their cognitive load drops to near zero in the opening phase; all the hard work was done at home.

The rest days built into the Candidates schedule (there are two in the 18-day span) serve a specific function: cache invalidation. After each round, a player's team analyzes how the opponent responded, identifies which lines were tested, and updates the repertoire. Lines the opponent knows get deprioritized; new analysis gets computed for the next encounter.

This is exactly the problem distributed cache systems face. Your cached data was computed against a snapshot of the world. The world changed. When do you invalidate? The Candidates schedule forces a particular invalidation strategy: you must update between games against the same opponent, because in the second game that opponent already knows what you played in the first. Wei Yi's (2754) team won't repeat the same Ruy Lopez approach in the second game against Javokhir Sindarov (2726) that they used in the first — Sindarov will have the counter ready.

The analogy extends to cache poisoning. If Giri's (2760) team plants a line in an online blitz game (a "poison pill"), opponents studying Giri's recent games might over-prepare against something Giri never intends to play in the Candidates. Cache poisoning is an active home-preparation strategy at this level.

The Endgame as Constraint Satisfaction

As pieces come off the board, the chess position transitions from a combinatorially explosive search problem to something tractable. This is the mathematical structure of chess endgames: constraint satisfaction in a rapidly shrinking search space.

Consider a King and Rook versus King endgame. The board has 64 squares. The two kings can be placed in at most 64 × 63 configurations (order matters). The Rook has at most 62 remaining squares. That's a state space of roughly 64 × 63 × 62 ≈ 250,000 positions, orders of magnitude smaller than middlegame search trees. The open-source Syzygy tablebases cover all 7-piece endgame positions and give the game-theoretic value (win/draw/loss) and depth-to-mate for every legal position — a database of over 500 billion positions computed via retrograde analysis. It's brute-force retrograde analysis: work backward from checkmate, label all positions that can force checkmate in one move, then label all positions from which the mating side can force a position labeled checkmate-in-one, and so on.

Endgame tablebases are, in a meaningful sense, solved chess: for the positions they cover, there's no longer a search problem. The "algorithm" is a lookup. What remains is the problem of navigating from the current position to a tablebase position, which may require playing 50-80 moves with perfect precision.

Esipenko (2695) and Bluebaum (2678), the two lowest-rated players in the field, will face the endgame constraint problem acutely. When Nakamura or Caruana reaches a technically winning endgame, conversion becomes a test of technique (a human approximation of tablebase lookup) rather than creativity. The search space reduction that comes from simplified positions is a genuine advantage for the higher-rated player, because their technique is more precisely calibrated.

Tiebreaking Algorithms and Distributed System Consensus

Suppose two players finish the 2026 Candidates on equal points. Who goes to the World Championship match against Gukesh? FIDE uses a specific cascade of tiebreaking criteria, each one a distinct algorithm, applied in priority order until one is resolved.

The cascade for the Candidates is:

Direct encounter: Head-to-head result between the tied players. Simple, deterministic, O(1) lookup.
Sonneborn-Berger score: For each game won, add the final score of the opponent defeated. For each draw, add half the final score of the opponent drawn against. This weights wins against stronger opponents more heavily.
Number of wins: Raw decisive game count.
ARO (Average Rating of Opponents): Mean Elo of all opponents faced.

Sonneborn-Berger is the interesting one algorithmically. It's a second-order metric: it doesn't just ask "how many points did you score?" but "who did you score those points against, and how did they do?" This is structurally similar to Google's PageRank, which doesn't count just how many links point to a page but weights those links by the authority of the pages linking to them. Both are iterative influence propagation algorithms.

The connection between Sonneborn-Berger and PageRank has been noted by researchers in computational social choice. The Wikipedia entry on the Sonneborn-Berger score describes it as the oldest surviving weighted-win tiebreak in competitive chess, dating to the 1870s — decades before graph theory had language for what it was computing. Claude Shannon, in his foundational 1950 paper "Programming a Computer for Playing Chess" in Philosophical Magazine, was the first to formally frame chess as a search problem — but he was explicit that evaluation functions, not just search depth, determine playing strength. The tiebreak problem is the evaluation function problem applied to the final standings.

The distributed systems parallel is direct: tiebreaking in a chess tournament is the same problem as consensus under Byzantine conditions. When multiple nodes (players) have identical observable state (point score), you need a protocol that produces a single agreed-upon ordering. The tiebreaker cascade is that protocol. Its properties, determinism, resistance to manipulation, computational simplicity, are exactly the properties you'd want in a distributed consensus mechanism.

The attack surface is also instructive. A player who understands Sonneborn-Berger can deliberately target opponents who are performing well in the tournament, because wins against strong opponents contribute more to the tiebreak score than wins against weaker ones. This is tiebreak optimization, gaming the secondary metric, and it's a real strategic consideration in the final rounds.

Designing Systems That Perform Under Adversarial Conditions

The 2026 Candidates field includes Praggnanandhaa at 22 years old, already a top-10 player with a career trajectory that suggests he'll be the dominant player of his generation. Read alongside Nakamura at 37, who has been in the top 5 globally for nearly two decades. Both will face the same structural adversarial conditions: an opponent who has studied them specifically, knows their tendencies, and has prepared the maximum anti-preparation for their style.

This is what distinguishes a chess tournament from most benchmarks we use to evaluate systems. The Candidates is not a static test. It's an adversarial benchmark where each "tester" adapts dynamically to what they observe. When Caruana wins with a novelty in Round 1, every remaining opponent immediately studies that game and prepares a counter. The environment shifts under you as you play.

"Chess is not about brute force. It's about finding patterns, identifying weaknesses, and taking advantage of your opponent's psychology. The machine has no psychology, but every human opponent does."
— Garry Kasparov, 13th World Chess Champion, from Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins (PublicAffairs, 2017)

Claude Shannon recognized this in 1950 when he first framed chess as a computer science problem. The challenge isn't building a system that performs well on a fixed distribution; it's building one that maintains performance when the input distribution is being actively manipulated by an intelligent adversary. That's the actual production environment for most serious software systems, adversarial users, adversarial network conditions, adversarial market dynamics.

The Candidates reveals several properties that make systems robust under these conditions:

Deep evaluation over shallow heuristics. Stockfish's NNUE evaluator can be surprised by genuinely novel positions that fall outside its training distribution. AlphaZero's learned representations generalize better to positions it hasn't seen. Systems built on deep models are more robust to novel adversarial inputs than systems built on handcrafted rules, but they require enormously more compute to build.

Preparation depth as a moat. The players who survive the Candidates are those whose opening and middlegame analysis extends deeper than what their opponents can anticipate. In software, this is the equivalent of having tested your system against failure modes your attackers haven't tried yet. The moat isn't cleverness; it's thoroughness.

Graceful degradation under resource pressure. Rounds 12-14 of the Candidates will be played by tired players with depleted reserves. The systems that hold up are the ones whose core evaluation function remains reliable under cognitive load, equivalent to a service that degrades gracefully under high latency rather than cascading into complete failure.

Structural fairness eliminating meta-game. The round-robin structure means there's no path to the title that avoids the strongest opponents. You can't route around Nakamura or Caruana. Systems designed with this constraint, where every node must be reachable, are more honest about where their bottlenecks actually are.

The 14 rounds of the 2026 Candidates in Paphos will produce about 56 games, roughly 3,000-4,000 individual moves, and somewhere between zero and a handful of genuinely decisive moments that determine who challenges for the World Championship. Each of those moments will be the output of algorithms, scheduling, rating, search, preparation, and tiebreaking, that were designed long before the first move was played.

That's not a metaphor. It's literally what a chess tournament is. And if you watch it that way, you'll see things you can't unsee.

Follow the 2026 Candidates Tournament live, including standings, round-by-round results, and game analysis, at shatranj. live/candidates.

DEV Community