the next software stack needs more than code generation

#opensource #ai #architecture #startup

Most people in software are staring at the wrong milestone. Models write API handlers, unit tests, and migrations fast enough that typing isn't the limiting factor anymore. In a world of high-concurrency agents, the act of writing code is no longer the bottleneck. That part of the problem is finished.

The real trouble starts the moment that code lands. Why was this change made? Which requirement forced it? And who actually checked the risky paths in the auth flow? You can still answer those questions today, but it takes a kind of technical archaeology, digging through PR threads, Slack messages, and documentation that was out of date the day it was written. That workflow held up while humans set the pace. It breaks the moment you stop being the bottleneck.

the velocity trap

Most teams run AI-assisted development through a loop of prompt, branch, code, review, and merge. At low volume, it holds up. Then usage increases. You start seeing changes that look fine but carry no clear origin story. A feature flag shows up in production with a name nobody recognizes. An environment variable gets added "just to make something work" and stays there for six months because nobody is sure what it’s gating.

Then we have a growing crowd of "psychosis coders" who think they are shipping masterpieces because they saw an agent move a cursor. They hit approve the second the diff looks plausible, never noticing the trail of empty TODO comments, shallow mocks, and tests that don't actually assert anything meaningful. They are shipping "passable" trash masquerading as velocity.

Maintaining real quality at agentic speeds requires a gauntlet. In my own work, I have to run Model B against Model A like a caffeine-fueled nitpicker for ten rounds just to reach consensus. Then Model C does the same dance. This cross-model review is mandatory to maintain velocity without the system collapsing into a pile of actual slop.

But even this gauntlet is a patch, not a solution. We are burning a mountain of tokens to force quality through a pipe that was never meant to handle it. This is "Approval Theater" as a survival strategy. No, your carefully crafted markdowns, prompt engineering nor harness stacking solves this.

why clean merges still fail

Agent A updates PricingEngine::price() to apply a discount based on User::join_date.

Agent B removes join_date from User and introduces a UserMetadata lookup that returns Option<NaiveDate>.

The pricing path now depends on a value that may not exist. In the failure case, the lookup returns None, and a later fallback resolves that missing value to Money::default(), producing 0.00.

Both changes compile. Both pass their unit tests. Because they don't touch the same lines of code, Git merges them without a single conflict.

In production, the pricing logic fails. Revenue doesn't drop to zero. That would be obvious. It becomes inconsistent instead. Some users are charged correctly. Others hit the missing metadata path and get a zero price. Support tickets appear first. Finance notices the reconciliation mismatch three weeks later.

You're left trying to unwind two changes that were never evaluated together. Each was correct in isolation; the failure only existed in the interaction. A human developer might have caught that by holding the context in their head, but that assumption doesn't scale when dozens of agents are moving at once.

the idempotency crisis

There is a deeper, uglier problem with agents and Git: retries. When a prompt fails or a network timeout hits, an agent often tries again. In a standard Git flow, this leads to double-commits, "dirty" working directories, or a messed-up HEAD state that requires a human to untangle. Then come additional worktrees and agents not checking if they're on the right branch in the right tree, or simply sticking to documentation paths you've specified instead of pollution the root with markdowns.

Git wasn't built for idempotent operations from a thousand concurrent workers. It was built for a human at a terminal who can see when a command failed. If the next stack doesn't have request-level idempotency built into the storage layer, you aren't building a system; you're building a race condition.

files are the wrong primitive now

Git shows you what changed in the text, but it doesn't show you why. You see two files modified, but you can’t see the requirement that triggered the edit. We review diffs and guess at intent.

Agents don't operate on files; they operate on relationships. A discount rule depends on a user attribute; a billing flow depends on an auth decision. When we take that rich graph of intent and flatten it into files, we lose the fidelity of the work. This mismatch leads to "clean" merges that are semantically murky, repeated edits to the same symbols, and retries that converge on something other than what we actually meant to build.

building the floor

I'm building a stack that treats intent as the primary object, not the diff. It's not one tool. It's a set of components doing work Git was never designed for.

aivcs is the version control core: a 9-crate Rust workspace. It uses blake3 for content-addressed hashing and groups changes around intent as an Episode instead of scattering them across commits. An Episode carries the requirement that triggered the change, the symbols actually touched, and the evidence (tests, benchmarks, profiles) attached when the work lands. It can import Git history as a baseline and export structured Episodes back into a branch, so teams don’t have to migrate all at once.

trstr is the parsing layer. It’s spec-grounded, not grammar-by-example. When an agent edits a symbol, the system knows what that symbol is, not just which bytes moved. Tree-sitter is built for editor features. This needs stricter guarantees.

sqry handles symbol-level indexing. It builds the graph from a rule like “apply a legacy discount” to every call site, call chain, and dependent type that touches it. That’s what lets an Episode carry semantic scope instead of a file list. It’s also how you catch the PricingEngine / UserMetadata class of failure before merge.

wsmux is the concurrency layer: a CRDT over the code graph. When dozens of agents edit the same repository, the merge surface isn’t text. It’s operations on symbols and relationships. wsmux makes those edits converge instead of producing two clean merges that disagree at runtime.

The storage layer is idempotent by construction. The same operation with the same content and intent resolves to the same Episode. Retries don’t duplicate work. A thousand workers hitting a flaky network stop being a race condition.

This doesn’t replace Git. It sits alongside it.

The goal is simple: when something changes, you can answer why without digging through history. Decisions travel with the change. Evidence is attached when the change is made, not reconstructed later.

The system remembers what changed. It should also remember why.

The bottleneck moved. The stack didn’t. That gap is where the risk lives.