Olivier Bazin

Posted on Jun 21

I didn't set out to write a Lichess client

#api #rust #apiclient #ai

AI Wrote the Endpoints. I Wrote the Library.

I didn't set out to write a Lichess client : i was building a complete chess training application, and I needed to integrate a handful of Lichess API features to make it work. The easy path would have been to wire up just those few endpoints for myself and move on. But open source gives me something every single day, and rather than build only for me, I wanted to give a little of it back — a real, complete, reusable library instead of a private helper. litchee is that: the detour that became its own thing.

I built a Rust client that covers every documented operation in the Lichess API. All 184 of them — users, games, tournaments, puzzles, studies, broadcasts, board and bot play, the opening explorer, the tablebase. One person, a few days, evenings mostly.

The endpoints were the easy part.

That sentence would have been a lie three years ago, so let me be honest about why it isn't one now — and about what actually turned out to be hard.

Why API clients are usually half-finished

Open the client library for almost any large API and you'll find the same thing: the popular 20% is covered, the rest is a // TODO or simply absent. This isn't laziness. It's economics. Writing a binding for an endpoint nobody on your team uses — reading the spec, modelling the response, writing the test — costs the same as writing one everybody uses, and pays back far less. So a single maintainer rationally stops at the endpoints they need. Completeness was a luxury you bought with a team.

I went in assuming litchee would be the same: cover board and bot play well, gesture at the rest. That's not what happened, and the reason is worth being precise about.

What changed: breadth got cheap

Lichess publishes an OpenAPI spec. I vendored it into the repo as a git submodule and treated it as the source of truth — not documentation to read, but a contract to conform to. From there, generating an endpoint became a mechanical loop: point an AI at the spec entry, get the request shape, the response DTO, and a test fixture built from the spec's own example.

The cost of breadth — the thing that used to make complete clients uneconomical — fell close to zero. Forty endpoints or four hundred, the marginal one was no longer expensive.

I want to be careful here, because this is the part that gets oversold. T*he AI did not "build the library." It produced volume*. Correct-looking, spec-shaped, test-backed volume — but volume. And volume, on its own, is not a library. It's a junk drawer.

The catch: 184 endpoints don't add up to one library

Here is the thing nobody warns you about. When breadth is free, you generate a lot of it, and very quickly you're staring at 184 endpoints that were each written in isolation. Each one is locally reasonable. Together they're incoherent — different naming, different error handling, different shapes for the same idea, the same DTO modelled three slightly different ways.

A library is not a pile of endpoints. A library is a set of promises that hold across all of them. That coherence is the entire value, and it's exactly what generation-in-isolation destroys. So the real work — the part that took the time — was deciding what those promises were and forcing every generated piece to obey them.

A few of the ones I committed to and didn't break:

One concern per file. Each business concern (board, swiss, puzzles…) is a single flat file holding its endpoints, its DTOs, and its tests together.
Hard size limits. No file over 900 lines, no function over 20. When something strains, it gets split. This sounds arbitrary; in practice it's the pressure that keeps any single piece from quietly becoming a mess.
Every DTO is Lichess-prefixed. LichessGame, LichessUser, LichessToken. Boring, and you always know what you're holding.
Builders for anything with options, so a request with twelve optional parameters doesn't become a twelve-argument function.

None of these are clever. That's the point. They're decisions, and an AI won't make them for you — it will happily generate code that violates every one, because each violation is locally fine. They're only wrong in aggregate, and aggregate is the one thing a per-endpoint generator can't see.

The part I couldn't hand off: the public surface

If I had to name the single thing that stayed stubbornly, irreducibly human, it was deciding what to make pub.

Marking something public is deciding what the library is. It's a promise you can't take back without breaking everyone who believed you. The generated code doesn't know that a field being public is a contract; it just knows the field exists. So the shape of the crate — what's exported, what's hidden, how the module tree mirrors the API's concerns, what a caller actually touches — was a series of small, deliberate, one-way decisions.

You can see the whole intended ergonomics in one call:

let client = LichessClient::builder().token("lip_…").build()?;

let mut games = client.games().export_user("bobby").max(5).stream().await?;
while let Some(game) = games.next().await {
    println!("game {}", game?.id);
}

That reads the way I wanted the whole library to read. Getting there wasn't a generation problem — it was a taste problem, made one decision at a time.

Error handling is design, not generation

The clearest place this shows up is errors.

The lazy version of "handle errors" is a single catch-all: something went wrong, here's a string. An AI will give you that instantly, and it's useless to the person calling your code, because they can't react to it. They can only log it.

I wanted a caller to be able to match on what went wrong. So the taxonomy was something I designed by hand, and then let the AI fill in:

pub enum ApiErrorKind {
    BadRequest,                                   // 400
    Unauthorized,                                 // 401
    Forbidden,                                    // 403
    NotFound,                                      // 404
    RateLimited { retry_after_secs: Option<u64> }, // 429, with the backoff hint
    Server,                                        // 5xx
    SwissUnauthorizedEdit,                         // a 401 that means something specific
    // …
}

That SwissUnauthorizedEdit variant is the whole argument in miniature. The Lichess API returns 401 when you try to edit a Swiss tournament you don't own — the same status code as "your token is invalid," but a completely different problem with a completely different fix. No generator infers that distinction from a status code. I only found it by reading the spec like a person who'd have to debug it later. Modelling it as its own variant is design work. The AI couldn't have known it mattered, because mattering is a human judgement.

And this is where it stops being a matter of taste and becomes a matter of correctness. An API is not just a set of URLs — it's a contract. It comes with rules about how it may be used: rate limits you must back off from, scopes a token must carry, states in which an operation is simply not allowed. Those rules aren't optional extras a client can round off; they're the terms of the contract. A serious client is precisely the one that implements them — that turns "this is forbidden" into a value the caller can see and obey, rather than a surprise they discover in production.

There's a natural fit here that's easy to miss. Deterministic, exhaustive error handling and a faithfully-respected API contract are the same work seen from two sides. Every rule the API imposes is, on the client's side, an error it must be able to name and hand back. Map the contract completely and your error type writes itself; model your errors as one specific variant per failure mode and you've transcribed the contract. The catch-all string fails on both counts at once: it neither respects the contract nor lets the caller honour it. So this isn't a flourish you add for polish — it's the most basic obligation a client has to the service it speaks to, and it's exactly the obligation a generator, working one endpoint at a time, has no way to even perceive.

Complete also has to mean correct

There's a quieter kind of work that breadth-for-free doesn't touch: making the thing hold up in production.

It's worth being clear about why this isn't optional, because it's tempting to treat it as polish you get to later. Completeness is a promise. When I say litchee covers all 184 operations, a reader hears "I can rely on all 184" — that's the entire point of claiming completeness in the first place. So the moment any one of them disintegrates on real input, the promise was false. And a false promise of completeness is worse than an honest gap: an absent endpoint tells you the truth up front and you plan around it, while a present-but-broken one lies to you until the worst possible moment — under load, in production, on the one input you didn't test. Breadth that isn't correct doesn't expand what the library can do; it expands the surface on which it can betray you.

There's a second reason, more specific to how this code came to exist. A spec describes the happy path — it gives one tidy example per endpoint, and a generator, quite reasonably, makes that example pass. But production is the unhappy path: the malformed chunk, the dropped connection, the burst that trips a rate limit, the stream that stays open for an hour. None of that is in the spec, so none of it is in the generated code. The example is exactly the input you will almost never see at scale; the inputs you will see are the ones nobody wrote down. Closing that gap is not generation filling in a blank — it's a person anticipating what the wire actually does.

Lichess streams many endpoints as newline-delimited JSON — event streams, live board state, game exports — one JSON value per line, with blank keep-alive lines in between. The naive version splits the response on newlines and hopes each line is whole. It isn't: the network hands you bytes in arbitrary chunks, and a JSON object routinely arrives split across two of them. So there's a small, unglamorous buffer that reassembles lines across chunk boundaries, skips the keep-alives, and yields a clean Stream of typed values. It's a few dozen lines, it has nothing to do with any particular endpoint, and it's the kind of correctness no spec describes and no generator volunteers.

The same goes for token redaction so secrets don't leak into logs, for read timeouts that don't strangle a long-lived stream, for honouring Retry-After. None of it is breadth. All of it is the difference between "covers the API" and "safe to depend on."

The bottleneck moved

So here's what I actually learned, and it's the only claim in this piece I'd defend hard.

AI didn't make architecture matter less. It made it matter *more* — because it removed the thing that used to absorb all the effort. When typing 184 endpoints by hand was the cost, that cost dominated everything; you never got far enough to feel the design problems. Now the typing is nearly free, and what's left standing, fully exposed, is the part that was always the real work: the invariants, the error taxonomy, the public surface, the judgement about what this library is.

The bottleneck moved from typing to taste. AI is a breadth multiplier. It is not an architect, and the more breadth it gives you, the more architect you have to be.

I'm genuinely unsure where the line settles as these tools get better. Maybe some of the taste becomes mechanical too. But for now, the honest report from building one complete client, solo, is this: the machine wrote the endpoints, and that turned out to be the part that didn't need me. The library — the promises, the shape, the judgement calls — is the part that did.

What's next

The surface is complete, and that was always meant to be a foundation, not the finish line. As it stands, the client mirrors the API faithfully: one method per operation, typed in, typed out. That's the right base layer — but it's a low-level one. Nobody wakes up wanting to "call the game-export endpoint with eight parameters and fold a stream into an accumulator." They wake up with a question.

So the next layer is a facade over the raw endpoints — a small set of higher-level features that answer the questions people actually ask:

What's the average rating of the opponents in my last twenty losses — am I losing up, or losing down?
Which puzzle themes do I consistently overlook — the tactics I keep missing without noticing?
How does my time control or time of day correlate with my results?

Not one of these is a new endpoint. Every one is a composition of endpoints that already exist: the losses question folds game export with a result filter and a rating average; the puzzle-themes question joins puzzle activity against theme metadata. And this is the thread I want to pull through one last time, because it's the same one running through everything above — that composition is only possible because the client is complete.

A partial client doesn't merely make these features harder; it makes some of them impossible, and you don't get to know which ones in advance. The feature you want next month might hinge on the single endpoint a 20%-coverage client never bothered to implement. Completeness is what keeps the whole space of compositions open — it's the substrate, and the facade is simply what you get to build once the ground is laid. You can't compose your way to a feature out of endpoints that aren't there.

And the shape repeats one final time. The 184 endpoints — the breadth — were the cheap, generated part. Deciding which questions are worth answering, and what it should feel like to ask them, is taste again. Completeness made the features possible; choosing and shaping them is the human work, same as it ever was. Which is a fitting place to draw the 1.0 line: freeze the complete, faithful surface, then build the opinionated layer on top of it — in the open.

litchee is on crates.io and GitHub, listed among the client libraries in the Lichess API docs, and featured in This Week in Rust #656. It's MIT, and I'm about to freeze the API for 1.0 — so don't hesitate to send me feedback while it's still cheap to change.

Top comments (1)

Mathéo Delbarre • Jun 21

What a fascinating read. The way you reframed the whole problem is refreshingly honest compared to most of what gets written about AI in dev 😊

What struck me most is your line "the bottleneck moved from typing to taste". That's exactly it. The AI freed you from the mechanical grind, and in doing so it made visible everything that was never mechanical in the first place: the invariants, the error taxonomy, what you decide to make public. Things that were just buried under hours of repetitive typing 🙂

The SwissUnauthorizedEdit detail is so telling. A 401 that means two completely different things depending on context, and that only someone thinking "how will a user debug this at 2am" will model correctly. No generator can have that intention. I have a genuine question: when you talk about the next layer, the facade with real user questions (losses against higher-rated opponents, missed puzzle themes...), are you planning to use the same approach, or will you start from actual usage patterns first? Because at that point you're no longer completing a spec, you're inferring what people actually want, which feels like it requires human judgement at an even higher level 😊

Also, small world moment: I was also in This Week in Rust 656 with ZamSync! A sync engine for clinics running on 2G sitting right next to a complete Lichess client, that's a pretty fun TWIR lineup 😄 Congrats on the feature, well deserved!

Either way, litchee is a great example of what open source can produce when you choose to build for everyone rather than just for yourself. Looking forward to seeing what that opinionated layer ends up looking like 🙂