DEV Community: xytras

Three things every indie multiplayer game gets wrong in production

xytras — Fri, 15 May 2026 20:26:22 +0000

Most indie multiplayer games ship with three architectural decisions that look fine at MVP and break somewhere between 50 and 500 concurrent players. After hosting servers for indie survival/MMO games across UE, Unity, and Godot for several years, these are the three failure modes I keep seeing.

Failure 1: Trusting the client for any value that affects other players

The pattern: the client computes "I dealt 25 damage to player X" or "my final survival time was 14:32" and sends that to the server. The server records it.

This fails the moment one player decompiles the client and starts sending fake values. They're suddenly invincible, or topping leaderboards with impossible scores, or duplicating resources. Trust collapses for the honest players who watched it happen.

The fix that has held up: server-authoritative on anything that touches other players. The client sends INTENT ("I shot at player X from this position") and the server validates and applies. Latency goes up because there's a round trip. Cheat surface drops to almost zero because the client never gets to be the source of truth on anything competitive.

Single-player progression can stay client-side. Leaderboards, PvP outcomes, shared world state, currency. None of those can.

Failure 2: No versioning or event log on player save data

The pattern: each player's save is a JSON blob. Every save overwrites the previous version. The only backup is whatever your hosting provider's nightly snapshot grabbed.

This fails when:

A bad patch corrupts saves and you find out 18 hours later.
A duplication exploit briefly works and you can't tell which players exploited.
Two parallel sessions (player reconnects on phone while desktop session still active) race-condition on the same save.
A support ticket comes in saying "I lost 3 hours of progress" and you can only restore to 4am yesterday.

The fix: per-player append-only event log. The "save" becomes a projection of events. Rollback to any second is a re-projection from event N, not a restore from backup. Audit becomes trivial because every progression jump has a source event. Race conditions become detectable instead of silent.

This is more work than a JSON blob. It's also the difference between "support is a war zone" and "support takes 5 minutes."

Failure 3: One binary doing auth, matchmaking, simulation, persistence, and anti-cheat

The pattern: the game server process does everything. Player auth happens in the same binary as the world tick. Persistence writes block on the same thread as physics. Anti-cheat runs inline with the simulation step.

This fails because those workloads are different shapes. World simulation is CPU-bound. Auth is I/O-bound and bursty. Persistence is database-bound and write-heavy. Anti-cheat scans are CPU-bursty.

When you bolt them all into one process, you get cascading failures. A noisy auth attack spikes CPU and the world tick starts dropping frames. A bad database write blocks for 200ms and the simulation hitches. An anti-cheat scan kicks in and 80 players see a momentary disconnect.

The fix is unglamorous: split the stateless concerns out. Auth, persistence, matchmaking, anti-cheat coordination all live as separate services that talk to the game server over an internal API. The game server keeps the one job that only it can do, which is running the world simulation. Everything else scales horizontally, fails independently, and can be replaced without taking down the game.

Most indie games discover this around the time they hit 100-200 concurrent and players start reporting "everyone got disconnected for 90 seconds." That's usually one of the stateless concerns starving the simulation thread.

The shape that works

The three fixes share a pattern: separate concerns that have different failure modes, and don't let the client be the source of truth for anything that crosses the player boundary.

If you want to read deeper on the architecture that holds these together, I've written about the tick-server vs event-driven split here: https://gsb.supercraft.host/blog/multiplayer-game-backend-architecture/ and on per-player event-sourcing for save data here: https://gsb.supercraft.host/blog/player-data-schema-design-nosql-vs-sql/. The orchestration side (splitting stateless services from the game server) is covered here: https://gsb.supercraft.host/blog/game-server-orchestration-guide/.

None of this is required at 5 players. All of it becomes required somewhere between 50 and 500 concurrent. The earlier you put the architecture in place, the cheaper the migration is.

What's the failure mode you've seen most often in production multiplayer? I've been catching mostly category 3 lately as more indie teams hit the auth-starves-simulation problem.

Why most AI agent memory implementations break in production

xytras — Fri, 15 May 2026 07:51:40 +0000

Every team trying to give AI agents memory is solving the same three problems badly. After running production agent memory for several months across two codebases, here are the failure modes I keep hitting and the one pattern that actually works.

Failure 1: Embed everything as vectors and call it memory

The instinct is reasonable. You have a vector database, you have embeddings, you have a retrieval API. Memory looks like "stuff a conversation in, get relevant chunks out." So you dump every session's transcript, every decision, every code review into the same embedding store and retrieve by similarity.

This breaks because facts and conversations have different retrieval shapes.

Ask the agent "what did we decide about JWT vs opaque session tokens?" and the embedding store returns five things kind-of-about-tokens by vector similarity. Three of them are old debate snippets. One is a tangential comment from a different feature. The actual decision record is in there somewhere, ranked alongside the noise.

The agent then synthesizes an answer from "five tokenish memories," which gives you a confident summary of the team's thoughts on tokens. What you actually wanted was the single decision record that says "use opaque session tokens, set 2025-04-12, still active."

The fix isn't to abandon vectors. It's to separate the layers. Structured decision records get an id, a claim, a source, an active_at, and (when relevant) a supersedes_id. Conversations and exploratory reasoning stay in the embedding store. Queries hit both, merge, and prioritize structured records over vector neighbors when the structured record exists for the same topic.

Failure 2: Summarize and discard

The pattern: every session, the agent writes a summary of what happened. The raw events are discarded. The next session starts by loading the summary.

This breaks because summaries are lossy compressions, and the next summary compresses the first summary, not the original events.

A real example I watched happen across four sessions on the same project:

Session 1: "We agreed to enforce idempotency at the receiver before any side-effect fires. Webhook X currently doesn't and that's blocking the migration."
Session 1 summary: "Decided idempotency must be enforced at the receiver. Webhook X needs updating."
Session 2 summary (built from session 1 summary): "Idempotency is being enforced at the receiver. Webhook X update is in progress."
Session 3 summary: "Idempotency is enforced at the receiver. Webhook X has been updated."
Session 4: "The webhook layer enforces idempotency."

Webhook X was never updated. The agent now believes a thing that isn't true and will plan against that belief.

The fix is to keep events as the source of truth. Summaries reference event ids, not free text. When a summary goes weird, you can re-summarize from the events with a fresh model and recover the original signal. Without this, you're playing a game of telephone with your own state.

Failure 3: Append-only memory with no supersedes-relations

The pattern: every decision becomes a new record. Old records aren't deleted because deleting historical context feels wrong. Conflicts resolve at retrieval time by "newest wins" or by vibes.

This breaks because retrieval relevance and recency are different ranking signals, and "newest wins" doesn't apply when both records are perfectly relevant.

Concrete: "We use JWT for service-to-service auth" gets recorded in week 1. In week 4 the team switches: "We replaced JWT with opaque session tokens for service-to-service. JWT is deprecated." Both records exist. Both are retrievable.

In week 8, the agent is asked about service-to-service auth. The query phrasing happens to match the JWT record more strongly (because the JWT record uses the exact phrase "service-to-service" while the replacement record uses "S2S"). The agent confidently retrieves "we use JWT" and starts building against it.

This isn't a hypothetical. I have seen it in production three separate times across two projects. The fix is supersedes-relations as a first-class concept. When a decision replaces another, the new record points at the old via supersedes_id. Retrieval filters out superseded records by default. The old record stays in the database for audit, but it's not surfaced unless explicitly queried.

The pattern that does work

The shape that has held up under load for me:

Decisions are records, not sentences. Each one has an id, a textual claim, a source (link, transcript ref, doc), an active_at timestamp, and a supersedes_id field that's null unless this record replaces another.
Provenance is mandatory. A record without a source is auto-flagged as low-trust. The agent can't ground an answer in a record it can't trace.
Supersedes-relations are first-class. Replacements use the supersedes_id field, not deletion or "newest wins."
Conversations stay in the embedding store, separately. Vector retrieval finds discussions. Structured retrieval finds decisions. Both run for each query, and structured wins when they conflict.
Resummarization runs against events, never against the previous summary. Summaries are derived data, refreshed periodically. They never become the source of truth.

The pattern is tool-agnostic. You can implement it with sqlite and a few tables. You can implement it with a managed memory service. The implementation that matters isn't the storage layer, it's the discipline of treating decisions as records with provenance and supersedes.

If you want to read further on why structure beats pure vector retrieval for this exact problem, I went deeper on the agent-memory-vs-vector-db decision tree here: https://memnode.dev/articles/agent-memory-vs-vector-db. And on why inspectable provenance beats opaque embeddings for trust here: https://memnode.dev/articles/lineage-and-provenance-in-agent-memory.

Curious what failure modes other people are hitting. The three above are the ones I keep seeing. There's probably a fourth I haven't caught yet.