Don't Trust the Score: Building Server-Authoritative Validation for a Prize-Based Mobile Game

#flutter #gamedev #security #webdev

When there's no money involved, a cheater in your game is mostly an annoyance. A leaderboard gets polluted, someone brags about a fake high score, you sigh and move on.

The moment a score can be converted into a real prize — a gift card, a draw entry, actual money — that calculation changes completely. Now a "high score" is a withdrawal request. And the uncomfortable truth I had to start from while building a mobile arcade game where gameplay earns entries into prize draws is this:

Any client-side game can be modified by a motivated attacker.

That's not pessimism, it's just the reality of shipping a game on someone else's device. The APK can be patched to submit fake scores. Methods can be hooked at runtime on a rooted device. Old winning requests can be replayed. Local timers and collision checks can be edited. And the backend endpoint that records scores can be called directly with a forged payload, skipping the game entirely.

So the design principle the whole system is built on is blunt: the game client is an input device, not an authority. It is allowed to tell the server what the player did. It is never allowed to tell the server what the player earned.

A quick note on stack before we go further: my client is built in Flutter and my backend runs as TypeScript edge functions, so that's what the code samples look like. But nothing in this post is Flutter-specific. The model — server-issued seeds, deterministic gameplay, server-side replay, one-time run IDs — applies exactly the same whether your client is Flutter, Kotlin, React Native, or Unity. The language changes; the principle doesn't. Treat the snippets as illustration, not prescription.

This post walks through how I built that — the seed-and-replay model that makes it work, the specific checks that close off each attack, and the things that turned out to be harder than I expected.

The naive version (and why it dies instantly)

The obvious first design is the one almost every tutorial shows you:

Player finishes game → app sends { score: 4820 } → server saves 4820

This is fine for a casual leaderboard. For a prize game it's indefensible. There is nothing stopping anyone from opening up a network inspector, watching that request, and replaying it with score: 999999. You haven't built a game backend, you've built a "type your own prize" form.

The first instinct people have is to "sign" the score on the client, or obfuscate the app, or add a Play Integrity check and call it done. None of that fixes the core problem, because the secret used to sign lives on the device the attacker controls, and integrity checks tell you the binary looks legit — they can't tell you whether the player actually survived 45 seconds or cleared six lines. You can't patch your way out of trusting an untrusted machine.

The only thing that actually works is to stop trusting the score and make the server compute it.

The core idea: deterministic seeds + replay

Here's the model I used.

Every game in the app is deterministic. Given the same starting seed, the game world unfolds identically every single time — the same Snake food spawns in the same cells, the same Tetris-style pieces come out of the bag in the same order. There's no "true" randomness in the gameplay loop; everything random is driven by a pseudo-random generator that's seeded from a single value.

Because of that property, the server and the client can run the exact same game from the exact same seed and get the exact same world. That's the whole trick. If the server can reproduce your game, the server doesn't need to believe your score — it can recompute it.

The flow looks like this:

Player starts a mission. The client calls start-mission-run. The server creates a one-time run, generates a fresh random seed, stamps a rules version and a 15-minute expiry, and returns the seed to the client.
Player plays. The Flutter game seeds its RNG with the server's seed and runs normally. While playing, it records a compact transcript of the player's inputs — every direction change, every move, rotate, hold, and hard-drop, each tagged with the game tick it happened on.
Player submits. Instead of sending a score, the client sends the transcript, a hash of that transcript, and the run ID it was issued.
Server replays. submit-score loads the run, takes the original seed, re-runs the player's recorded inputs through a server-side copy of the game logic, and computes the authoritative score and completion itself. The score the client claims is, at best, used for analytics. It is never what awards a prize.

The deterministic RNG

The piece that makes all of this possible is small and unglamorous — a seeded pseudo-random generator that behaves identically on Dart (client) and on the Deno/TypeScript backend.

The seed comes in as a string, so first it gets folded into a 32-bit integer (an FNV-1a-style hash), then each "random" value is produced by a classic linear congruential generator:

function seedInt(seed: string) {
  let hash = 2166136261;
  for (let i = 0; i < seed.length; i++) {
    hash ^= seed.charCodeAt(i);
    hash = Math.imul(hash, 16777619);
  }
  return hash >>> 0 || 1;
}

function nextRand(state: number) {
  return (state * 1664525 + 1013904223) >>> 0;
}

Nothing exotic here — and that's the point. It's deterministic, it's portable, and it produces the same stream of numbers on both sides as long as both sides advance it the same number of times. When Snake needs to spawn food, both the device and the server call nextRand and map the result onto the grid the same way:

function nextFood() {
  let candidate;
  do {
    rng = nextRand(rng);
    candidate = [rng % cols, Math.floor(rng / cols) % rows];
  } while (occupied(candidate));
  return candidate;
}

Same seed, same sequence, same food. The server can sit down and play your exact game from your exact inputs.

What submit-score actually checks

Replay is the heart of it, but on its own it isn't enough. The submission endpoint is layered, and each layer closes off a specific attack. Here's what a score submission goes through.

Rate limits, including one-time IDs. There's a burst limit and a daily limit per user, but the important two are the single-use ones: the clientRunId and the server-issued run ID are each capped at exactly one submission per day. That's what makes a replay attack pointless — resending a captured winning request just gets rejected as a duplicate.

await enforceRateLimit({
  action: "submit-score:server-run",
  key: `${user.id}:${body.serverRunId}`,
  limit: 1,
  windowSeconds: 24 * 60 * 60,
});

Run state and expiry. The run has to actually exist, belong to this user, match this mission and game, still be in the issued state (not already submitted — otherwise it's a 409 already submitted), and not be past its 15-minute expiry. A stale or reused run never reaches the scoring logic.

Transcript hash. The server recomputes a SHA-256 over the submitted transcript and compares it to the hash the client sent. A mismatch doesn't necessarily mean malice, but it gets recorded as a risk flag — and risk flags matter, because:

The replay decides everything. The server feeds the seed and the player's inputs into its own copy of the game and produces the authoritative score, duration, and completion. A mission only counts as completed when the replay succeeds and there are zero risk flags. Anything flagged — hash mismatch, rules version mismatch, a replay that doesn't add up — doesn't get auto-rewarded; it gets held for review.

Play Integrity, bound to the payload. On Android, the app fetches a Play Integrity token right before submitting and binds it to a hash of the exact score payload. The server decodes that token using Google's API (server-side — on-device checks are bypassable), confirms the binary is a recognized Play build on a device that meets integrity, and stores the verdict. Crucially, this is treated as one risk signal among many, not as proof of a legitimate score. Integrity tells you the app and device look real. It cannot tell you the player earned the points.

The part that was harder than I expected: Block Drop

Most of the games were straightforward to make replay-safe. Block Drop, my Tetris-style game, was not. It has the biggest state space of anything in the app — a full board, a bag of upcoming pieces, plus moves, rotations, holds, and hard-drops — and all of it has to reproduce identically on the server. So it became the game that ate most of my time, and it broke in two separate phases.

The first round of problems was the gameplay itself. Before I could even think about validation, Block Drop had UI glitches I had to chase down — pieces and board state not behaving the way they should on the device. That's the unglamorous reality of this kind of game: you can't validate a game that isn't yet behaving correctly, because you don't have a stable "correct" to compare against. So step one was just getting the game to play right.

Then I fixed the UI, and a second class of problems showed up — the server-side validation ones. With the game finally behaving on the phone, the server replay still didn't line up with what had happened on the device. And this is the trap with a game this complex: it's not enough for the game to look right on screen. The server has to land on the exact same board state the player saw, from the same seed, after the same sequence of inputs. Any tiny divergence between how the Dart client and the TypeScript server handle a piece compounds, and by the end of a run the two sides disagree about the score entirely.

The fix that mattered was a design decision: don't trust the client's summary of what happened, replay the board itself. An earlier instinct is to let the client report higher-level events — "I cleared these lines" — and have the server tally those up. But the server can't actually verify a reported line-clear; it just has to take the client's word for it, which defeats the whole point. So Block Drop's replay is board-state based. The server regenerates the deterministic piece bag from the seed, applies the player's recorded controls one tick at a time, and works out the line clears itself from the resulting board:

function lockPiece() {
  // place the current piece into the board grid
  // then clear completed lines based on the actual board,
  // not on anything the client reported
  clearLines();
  spawn();
}

If you take one thing from this section: the hardest part of server-authoritative validation isn't the security idea, it's the determinism. Making the same logic produce identical results in two languages, across two runtimes, every time — that's the real work. Block Drop is also the game I'd tell anyone to test most aggressively, precisely because its state space is large enough to hide a desync you won't notice until a real player's score quietly fails to validate.

What I'd tell anyone building a game that pays out

If you're putting real value behind a score, here's the short version of everything above:

Never let the client be the source of truth for anything that earns money. It reports inputs. The server decides outcomes.
Make your gameplay deterministic from a server-issued seed, so the backend can independently reproduce and score the run.
Issue one-time run IDs and reject duplicates. This is what neutralizes replay attacks, and it's cheap to build.
Treat Play Integrity / attestation as a signal, not a verdict. It catches tampered binaries and sketchy devices. It does not prove a score is real.
Don't auto-pay flagged runs. Server-authoritative scoring plus conservative eligibility plus manual review on high-value events is what keeps the economy honest. None of these individually is clever. Stacked together, they move you from "anyone can type their own prize" to "you'd have to defeat a server that already knows what should have happened." For a game where a high score is a withdrawal request, that's the bar.