I Gave AI the Same Task Twice. The Only Difference Was 30 Lines of Markdown.

#ai #programming #codequality #discuss

I watched a teammate spend 20 minutes complaining that Copilot "doesn't understand our codebase." Then I looked at the repo. No README. No architecture docs. No module descriptions. Just code.

If that sounds familiar, keep reading. Because the fix took me an hour, and it changed everything about how AI performs on my projects.

Most AI code quality problems aren't AI problems. They're context problems.

What I learned from your comments

After my METR study post got 35 comments, something @hilton_fernandes pointed out stuck with me: AI is actually useful for developing in codebases you're not acquainted with — because it learns from existing code patterns. The flip side? If there are no documented patterns, AI has nothing to learn from.

@waqasra2022skipq made a similar point from the debugging angle: lacking a mental model of your project slows down everything — and AI will keep adding more files and functions without ever building that model for you.

Those two observations are why context files matter more than model upgrades.

The experiment

Same task: "add pagination to the users endpoint." Two attempts, same model, same codebase.

	Round 1: No context	Round 2: With AGENTS.md
ORM pattern	❌ Wrong (raw SQL)	✅ Matched team's Knex style
Error handling	❌ Generic try/catch	✅ Used our AppError class
Pagination	❌ Offset-based	✅ Cursor-based (our standard)
Tests	❌ None generated	✅ Co-located, used test factories
Usable without edits?	No — needed full rewrite	~90% ready

The AI didn't get smarter between attempts. The context did.

The uncomfortable math

Everyone's waiting for GPT-6 or Claude Next to "finally get it right." But here's what I keep seeing:

A mediocre model with good context outperforms a frontier model with zero context.

Think about it like onboarding. You wouldn't drop a senior engineer into your codebase with no docs and expect them to match your team's patterns on day one. Why do we expect that from AI?

What actually works: 30 lines of markdown

I keep a file called AGENTS.md at the project root:

# AGENTS.md

## Conventions
- Error handling: wrap in try/catch, use AppError class
- Pagination: cursor-based, not offset
- Tests: co-located, use test factories
- Naming: camelCase for JS, snake_case for DB

## Common Gotchas
- Don't use `users` table directly — go through UserService
- Rate limiting is middleware-level, not per-route

Takes maybe an hour to write well. And it's portable — I've used variations with Cursor, Copilot, and Claude Code. The format changes; the knowledge doesn't.

What it doesn't solve

I won't oversell this. The honest trade-offs:

Setup cost is real. Maybe 2-3 days for a large project. And it needs maintenance — when patterns evolve, the file evolves too.
Greenfield projects? AI will still hallucinate conventions when there aren't any yet.
High-stakes code (auth, payments, migrations) — I still do full manual review regardless.

But for the 80% of code that follows established patterns? Context files are the highest-leverage investment I've found.

The era question

Here's what I keep coming back to. Nobody knows what the AI tooling landscape looks like in a year. That's unsettling. Models will change, tools will change, pricing will change.

But documented conventions? Those are durable. Whether you're using Copilot today or some agent framework next year, the AI still needs to know your team's patterns. The markdown file that took you an hour to write will still be useful in 2027.

A solo developer today can build what took a team of 10 — but only if the AI can pick up the patterns without a month of onboarding. Context files are how you get there.

The open question (I actually want your answer)

Here's what I haven't cracked: how do you keep context files in sync with a fast-moving codebase?

I've tried pre-commit hooks that validate AGENTS.md against actual code patterns. It sort of works. But I'm curious — has anyone found a better approach? Or do you just accept some drift and do periodic manual updates?

I'm also wondering: what do you put in your context files that I'm missing? Every time I think mine are complete, someone mentions a convention I forgot to document.

Your answers genuinely shape what I write next. The METR post started as a simple study summary — your comments turned it into a month-long investigation into how AI actually performs. If something here doesn't match your experience, or you've found something better, I want to know.

Thanks for being here.

P.S. I package what I learn into tools. If you want context files and spec templates your AI follows automatically: 3 Skill Files.