DEV Community

Cover image for AI Coding Agents Need a Foundation, Not a Canvas
Russel Dsouza
Russel Dsouza

Posted on

AI Coding Agents Need a Foundation, Not a Canvas

  • Same prompt, two repos: blank Expo repo → 47min / $5.20 / un-mergeable; wired foundation → 11min / $0.85 / mergeable.
  • The model didn't get smarter. The repo got more legible.
  • AGENTS.md helps ~4% (hand-written) or hurts 2–3% (LLM-generated), per a 138-repo / 5,694-PR study. Both add >20% to token cost. It's not the fix.
  • A real foundation ships one visible convention per cross-cutting concern, typed boundaries, one way to do each thing, pre-installed skills, and a working end-to-end path.
  • The evaluation heuristic: "Open the repo and ask Claude Code to add a screen. If the first thing the agent does is install three packages, the foundation is decorative."

Cold open

I had a working theory that the gap between coding agents that "feel useful" and coding agents that "feel like a coworker" was a model gap. Smarter base model, better tool use, longer context window. The usual story.

Then I ran the same prompt against two different repos and the theory died on contact.

The prompt was: "Add a weekly summary screen that fetches the last seven days of data from Supabase and renders a bar chart." Mid-tier feature, nothing exotic. I ran it against a fresh npx create-expo-app and against a fully wired React Native / Expo / Supabase foundation.

Same model. Same prompt. Two completely different sessions.

On the blank repo, the agent installed three chart libraries, picked one, then quietly imported a different one in the second file it wrote. It hardcoded the Supabase URL. It invented an api/ folder, then later invented a services/ folder, then never used either. After 47 minutes and roughly $5.20 in tokens, it produced 600 lines of code I would not merge into anything.

On the foundation, the agent did something that, at first, felt anticlimactic. It opened one existing screen, read it, and then wrote one new screen that looked exactly like the existing one. Same chart primitive. Same data fetcher. Same file structure. 11 minutes. About 85 cents in tokens. Mergeable on the first read.

The model hadn't gotten smarter. The repo had gotten more legible.

The blank-canvas problem, named

Every coding agent — Claude Code, Cursor, Codex, Windsurf, the lot — is doing roughly the same thing under the hood. It reads the repo, then writes code that "fits." When the repo has shape, "fits" means matching the shape. When the repo has no shape, "fits" means matching the median of the public internet.

The public internet is a thousand React Native tutorials, each of which makes different micro-decisions. State library: Redux, Zustand, Jotai, Recoil, Context. Data fetching: TanStack Query, SWR, raw fetch, Supabase client wrappers. Navigation: Expo Router, React Navigation, stack vs. tabs vs. drawer. Theming: NativeWind, styled-components, StyleSheet, restyle. The "median" picks one of each, often differently per file.

This is the blank-canvas problem. The agent does not fail because it is dumb. It fails because every choice is open and the choices don't compose.

The AGENTS.md cul-de-sac

The current industry answer is AGENTS.md — a Markdown file at the root of the repo describing your conventions. I have written several. They help, in the way that a sticky note on the fridge helps. They do not solve the problem.

The numbers are unkind. A study earlier this year analyzed AGENTS.md impact across 138 repositories and 5,694 pull requests. The headline:

  • LLM-generated AGENTS.md files hurt agent performance by 2–3%.
  • Hand-written AGENTS.md files improved performance by 4%.
  • Both raised token costs by 20%+.

4% is not the breakthrough you were hoping for. It's noise.

The intuition behind why is straightforward once you sit with it: prose about code is a weaker signal than code itself. An agent that reads three screens which all use the same data fetcher infers, very confidently, that the fourth screen should use the same fetcher. An agent that reads a sentence in AGENTS.md saying "use the data fetcher in lib/" sometimes does, sometimes doesn't.

The pattern is the prior. The AGENTS.md is a cache.

What an actual agent-ready foundation contains

The phrase "agent-ready" is doing a lot of work in boilerplate marketing. Here is what it should mean, concretely:

1. Visible conventions

Every cross-cutting concern — auth, data fetching, navigation, theming, state, payments, push — appears in at least one fully wired screen. The agent has a working example to copy. Not three contradictory examples. Not one half-finished example. One canonical example.

2. Typed boundaries

// Supabase generated types — the agent cannot fake its way past these
export interface Database {
  public: {
    Tables: {
      daily_summaries: {
        Row: {
          id: string;
          user_id: string;
          date: string;
          total_calories: number;
          // ...
        };
        Insert: { /* ... */ };
        Update: { /* ... */ };
      };
    };
  };
}
Enter fullscreen mode Exit fullscreen mode

If the agent calls a column that doesn't exist, tsc fails. The foundation makes the type system enforce the prior.

3. One way to do each thing

If your foundation has both Redux and Zustand, the agent will use both. If it has only Zustand, the agent will use Zustand. Foundations make choices the agent doesn't have to.

4. Pre-installed skills and slash commands

.claude/skills/
  add-screen.md           — knows where screens go, how they're wired
  wire-supabase-rpc.md    — knows how RPCs are exposed
  add-stripe-product.md   — knows the webhook layout
Enter fullscreen mode Exit fullscreen mode

A skill is a scoped operation the agent can invoke directly. It is not prompt engineering. It is the foundation describing what it lets you do.

5. A working end-to-end path

Auth → database read → typed UI → push notification, all wired up once. The agent reads the whole chain in a single context window and now has a template for every future feature that touches the chain.

What a real foundation looks like on disk

Here's what a real React Native + Expo + Supabase foundation looks like when an agent opens it:

your-app/
├── app/                        # Expo Router screens
│   ├── (tabs)/
│   │   ├── index.tsx           # Home + today's entries
│   │   ├── add.tsx             # Camera + analysis flow
│   │   └── stats.tsx           # Daily summary chart
│   └── auth/
│       └── login.tsx           # OAuth + email + Apple
├── modules/
│   ├── db/
│   │   ├── supabaseClient.ts
│   │   └── supabaseServer.ts
│   ├── auth/
│   │   └── useAuth.ts
│   └── vision/
│       └── analyzeFood.ts      # Vision API integration
├── supabase/
│   └── migrations/             # Real, dated, ordered migrations
├── components/                 # NativeWind primitives
└── .claude/
    └── skills/                 # Scoped agent operations
Enter fullscreen mode Exit fullscreen mode

When the agent opens this and you ask for a weekly-summary feature, it reads stats.tsx, reuses the chart primitive in components/, calls the same supabaseServer client, drops the screen in (tabs)/, and follows the same migration pattern if it needs a new column.

The agent never asks "what state library should I use?" because the repo answered the question.

The economics, in detail

I tracked twelve sessions across three teammates — six from blank repos, six from foundations — using the same model and same prompt categories. Averages:

Metric Blank repo Foundation Delta
Tokens per shipped feature $4.30 $0.95 4.5× cheaper
Wall time per feature 42 min 9 min 4.7× faster
Diffs requiring full rewrite 4 of 6 0 of 6
Hallucinated imports 11 total 1 total 11× fewer

This is not a controlled study. It is, however, consistent with what teammates and customers report. The gap is real and it is wide.

If you ship one app a year, the gap doesn't matter much — you'll spend a month either way. If you ship three or four, the gap is the difference between "we can" and "we can't."

What this does not mean

Two clarifications, because the foundation argument gets oversold.

A foundation does not replace judgment. The agent still ships things you have to review. The foundation just narrows what the agent can ship.

A foundation is not forever. Two years from now the model will be stronger, the framework will have shifted, and today's foundation will feel constraining. Throw it away when it does. Until then, it earns its keep every session.

How to evaluate any foundation you're considering

Whether you're considering a paid template, a free GitHub template, or something a friend shipped — the test is the same:

Open the repo and ask Claude Code to add a screen. If the first thing the agent does is install three packages, the foundation is decorative.

Other tests, in order:

  1. Count real screens. Five or more, all using the same conventions, is the bar.
  2. Grep for any. If the data layer leaks any, the agent has no priors.
  3. Find the skills directory. No skills, no scoped operations, the foundation is half-built.
  4. Read the migrations folder. Real, dated migrations are a foundation signal. A single init.sql is a boilerplate signal.
  5. Run the first session. Token cost and wall-clock time will tell you everything within 15 minutes.

FAQ

Is a foundation just a fancy boilerplate?
No. A boilerplate is the packaging. A foundation is the wiring. Most boilerplates are not foundations.

Why doesn't a great AGENTS.md fix this?
It helps about 4%, per the 138-repo study. The pattern is a much stronger prior than prose about the pattern.

Does this only apply to React Native?
No. It applies to every stack. React Native is the worst case because its ecosystem is the most fragmented, so the gap is largest there.

Should I buy or build the foundation?
Build if you'll ship three or more apps on the same stack. Buy if this is your first or second.


If you take one thing away: do not hand an AI coding agent an empty room and ask it to build a house. Give it a foundation. Then ask it to add the rooms.

For the longer breakdown — including the specific "one way to do each thing" heuristics I use to audit a foundation — see the Applighter blog.

What's the weirdest thing an AI coding agent has done in your blank repo? Drop it in the comments — I'm collecting the failure modes for a follow-up. (Bonus points for a token-cost screenshot.)

Top comments (0)