DEV Community

Cover image for Claude Code got the architecture wrong (so we ran a controlled experiment to find out why)
Sushrut Mishra
Sushrut Mishra

Posted on

Claude Code got the architecture wrong (so we ran a controlled experiment to find out why)

If you have used Claude Code on a large codebase, you have probably felt this. The output compiles. The tests pass. But something feels off. The API surface is parallel to something that already exists. The approach is a workaround dressed up as an implementation. A senior engineer on your team would have done it differently. The instinct is to blame the model. The actual problem is something else entirely.

Coding agents explore large codebases through trial and error

Claude Code, Cursor, and every other coding agent navigates your codebase the same way: grep, glob, read files, repeat. On a small codebase this works well enough. On a codebase with millions of lines across thousands of files, this process produces a systematically incomplete picture.

The agent makes the best architectural decision it can with what it found. When extension points exist but the agent never found them, it creates a parallel implementation instead of extending what already exists. When conventions exist but the agent never saw them, it writes code that a senior engineer would reject in review. The model is capable. The context is incomplete.

The experiment

We ran a controlled test to measure exactly how much this matters. Same agent (Claude Code with Opus 4.6), same task, same codebase (Elasticsearch, 3.85 million lines of Java across 29,000+ files), one variable: whether Bito's AI Architect, a codebase intelligence layer that builds a knowledge graph of your entire system and exposes it to coding agents via MCP, was providing context or not.

The task was implementing deterministic terms aggregation using the TPUT algorithm, a multi-phase distributed coordination problem that requires changes across Elasticsearch's entire search pipeline.

Without AI Architect

Claude Code concluded the framework could not support multi-round shard communication and built a workaround instead. It created a separate aggregation type that forces every shard to return all unique terms in a single pass. Technically functional. Severe memory risk on high-cardinality fields. Zero multi-shard tests. 6 files changed. And critically, not actually TPUT.

With AI Architect

Claude Code understood exactly where to extend the pipeline, identified the correct integration point, followed Elasticsearch's own API conventions, and implemented genuine multi-phase TPUT with threshold computation, refinement rounds, and gap resolution. 27 files changed. Full test coverage across all coordination layers.
Same agent. Same codebase. Completely different architecture.

Why the gap exists

The agent without context made a reasonable conclusion based on incomplete information. It explored a 3.85-million-line codebase without a map and missed the extension points entirely. That is not a failure of reasoning. It is a failure of information.

AI Architect builds a knowledge graph of your entire codebase, mapping architecture, extension points, conventions, dependencies, and call graphs, and delivers that context to your coding agent before it writes a single line of code. The agent stops guessing and starts reasoning about your actual system.

The difference in output reflects the difference in understanding. A 6-file workaround versus a 27-file production-grade implementation. Both came from the same model on the same day.

What this means for your team

Most engineering teams accept the 6-file version because they never saw the 27-file version was possible. The architectural shortcuts your coding agents take today are a direct reflection of what they understand about your codebase, and most of them understand very little.

If your team is running Claude Code or Cursor on a large codebase, this experiment is worth reading in full. We published the complete side-by-side comparison, the layer-by-layer breakdown of what TPUT actually required, and links to both pull requests so you can read the code yourself.

Read the full experiment: The TPUT implementation Claude Code got wrong and AI Architect got right

If you want to connect AI Architect to your coding agent and run it on your own codebase, get started at bito.ai.

Top comments (0)