DEV Community

Léo Le Roy
Léo Le Roy

Posted on

AI Coding Standards at Scale: Versioned AI Rules for Cursor, Claude Code, and Beyond

AI Coding Standards at Scale: How We Versioned Our Team's AI Rules with Claude Code

"AI amplifies best practices… and worst ones."


TL;DR

  • AI without shared rules amplifies chaos as fast as it amplifies good practices
  • Treating AI instructions as versioned, distributed team contracts changes the dynamic
  • Layered context (universal → tech-specific → project) keeps rules relevant without noise
  • Two workflow gates — /plan before coding, /review before merging — move feedback upstream
  • Scaling to multiple teams requires parameterizing identity, not forking the repo
  • Measuring whether any of it actually works is still an open problem

The Paradox at the Heart of AI-Assisted Development

AI coding standards weren't a priority for us — until they became a problem. As a lead software engineer at HiPay, I watched the same pattern repeat on every team that adopted Claude Code or other AI coding assistants without shared configuration: disciplined developers got more productive, and undisciplined ones got more chaotic — at exactly the same rate.

When I looked at how everyone on our team had configured their AI assistant, what I found wasn't reassuring. Each developer had their own setup. Different context files, different constraints, different rules. One had configured their assistant to always check for tests before suggesting changes. Another had given it no architectural context at all. A third had no guardrails around secrets or git conventions.

The consequences were concrete. A hardcoded API key committed to a feature branch. Changelog entries perpetually forgotten until the MR got blocked at review. Architecture patterns that varied wildly between contributors who sat ten meters apart. And one developer whose AI was reading hundreds of lines of irrelevant files on every request — tokens burned, context polluted, slower responses for no benefit.

The culprit wasn't the AI. Every tool is only as effective as the constraints you put on it. The real problem was that we had treated AI configuration as a personal preference — and we were paying for it in rework, review friction, and repeated mistakes.


The Mental Shift: Treating AI Rules as Versioned Team Contracts

The fix felt obvious in retrospect.

We already version everything that matters: our linters, our CI pipelines, our Docker configs, our commit message format. We had collectively agreed that "it works differently on my machine" wasn't acceptable for build tooling. Why was it acceptable for the AI assistant that helps us write, review, and ship code?

The key insight: AI instructions aren't personal preferences. They're team contracts. When I tell an AI assistant "never modify the core transaction service without checking its dependents first," that's not my opinion — it's institutional knowledge that affects everyone on the team.

So I built a centralized repository — a single source of truth for how our AI assistants should behave. It's versioned, published as a package, and distributed to every project with a single command.

Two design decisions that turned out to matter more than I expected:

Rules are copied, not referenced. Each project gets its own local copy of the relevant rules at install time. This means a project can stay on v1.8 while another adopts v1.9 — no hard coupling between repos, no "the rules changed upstream and broke our workflow."

Updates are pull, not push. Teams explicitly pull new versions. No surprises. No rules changing under you while you're mid-sprint.

One command installs the right rules for your stack - core conventions, tech-specific constraints, and your team's context, all in one shot.


The Layering Problem: How We Structured Our AI Rules Across Tools

The first version of the standards repo was a flat list of conventions. It worked for one team. It didn't scale.

The problem is that "context" isn't one-size-fits-all. An AI working on a NestJS backend needs to understand hexagonal architecture, port/adapter boundaries, and use-case placement. An AI working on a React frontend needs hook rules, component structure conventions, and testing library patterns. An AI on a Symfony codebase needs controller-service-repository layering and PHP typing conventions.

Giving all these teams one monolithic rules file means half of it is noise for any given developer. And noise in AI context is expensive — it dilutes the rules that matter and burns tokens on every request.

I landed on a three-layer model:

Layer 1 — Universal rules. Always active, regardless of tech stack. No secrets in code. Git convention enforcement. No destructive operations without explicit confirmation. These are the floor — non-negotiable across the entire org.

Layer 2 — Tech-specific rules. Selected automatically based on what the project uses. NestJS projects get hexagonal architecture constraints. React projects get hooks rules. The AI only sees what's relevant to its context.

Layer 3 — Project context. This is where institutional knowledge lives. What does this specific repository do? What are its known failure patterns? Which modules are fragile? What's the domain model? This layer is different per repo, and it captures the things no linter or generic rule can encode.

The key design principle: layers compose, they don't conflict. Layer 2 can add specificity for tech-specific cases. Layer 3 adds context without replacing anything. The AI sees a coherent, layered picture — not a contradictory mess.


Two Workflows That Changed How We Ship: /plan and /review with Claude Code

The layered rules were the foundation. But what actually changed day-to-day development was two structured workflows built on top of them.

I didn't design these from scratch. I drew inspiration from mattpocock/skills and other open-source AI skill frameworks, then assembled the elements that mattered most for our context: persistent memory of past architectural decisions, impact analysis to detect blast radius before writing a line, and structured prompting that forces the AI to surface its own uncertainty — asking questions instead of confidently producing a wrong answer.

The underlying problem with code review: feedback arrives after the work is done. A developer writes two hours of code, opens an MR, and then finds out their change breaks a module they hadn't considered. Either they rewrite a significant chunk, or the reviewer wastes their review budget on "you forgot to update the changelog" instead of real architecture questions.

The solution was to move AI guidance upstream.

Before writing a single line: /plan

When starting a new ticket, the developer invokes /plan. Rather than reading the codebase blindly, the AI uses dedicated tooling to run targeted searches and impact analysis — querying its memory for past decisions, identifying what the proposed change will touch, reading only the relevant stack conventions.

What comes back isn't code. It's a structured plan: validated scope, identified risks, recommended approach.

A concrete example: a developer was about to add a CSV export filter to our transaction module. Without /plan, they would have modified the core transaction service — a reasonable first instinct. After analyzing the impact, the AI found that this service was used by three other modules, and that changing its signature would be a breaking change for all of them. It recommended creating a new use-case instead, leaving the existing contract intact.

That discovery took three minutes. Without /plan, it would have emerged during code review — after the code was written.

Before opening the MR: /review

After the code is written, /review runs a systematic checklist: changelog entry in the correct format, no hardcoded secrets, no debug code left in, architecture boundaries respected, blast radius identified.

The output is a structured report with clear pass/warn/block verdicts:

🔍 Review — PROJ-1234 : CSV Export Filter

✅ CHANGELOG : [feat][PROJ-1234]: Add CSV export filter — format OK
✅ Architecture : ExportCsvUseCase in application/ — boundaries OK
✅ No console.log or debug code detected

⚠  Blast radius: TransactionService touched by 3 modules
   → Verify widget.spec.ts and dashboard.spec.ts still pass

❌ Secret detected: EXPORT_API_KEY hardcoded at line 47
   → Move to .env and add to .env.example

──────────────────────────────────────────────
Verdict: ✋ Changes requested
  Blocking: 1 (hardcoded secret)
  Warning:  1 (cross-module test coverage)
Enter fullscreen mode Exit fullscreen mode

The human reviewer never has to catch the secret. They can spend their review time on the architecture question: is this use-case in the right place? Is the abstraction correct?

One last thing /review does that's easy to overlook: it saves the decisions made — the chosen approach, the trade-offs accepted — into a persistent memory. The next time someone invokes /plan on the same codebase, those decisions are already there. The system learns from each cycle.

That's the shift. AI handles the checklist. Humans handle the judgment. And nothing gets forgotten.

Two gates, two minutes each: /plan surfaces risks before you write a line, /review catches what humans shouldn't have to.

To put it plainly:

What changes when AI configuration stops being a personal preference.


Scaling to Multiple Teams: The Hard Part

The system worked well for one team. Then a second team wanted to adopt it.

The problem: every context file was implicitly written for our team. Our ticket prefix. Our team name. Our release workflows. A second team could technically install the package, but they'd get rules that referenced the wrong context at every turn.

The naive solution is to fork the repo. I want to explain why this fails: the moment you fork, you have two standards instead of one. Improvements to the universal rules don't propagate. After six months, they've diverged enough that merging is impractical. You've solved the short-term adoption problem and created a long-term maintenance problem.

The principle I landed on: standardize the structure, parameterize the identity.

Universal rules — the security guardrails, the architecture constraints, the git conventions — stay universal. They belong to the org, not to any single team.

Team identity is injected at install time. The AI assistant's context file gets populated with the correct team name, ticket prefix, and team-specific context. The install process prompts for these values once and bakes them in.

Team-specific workflows live in a dedicated namespace, not in core. A team can add their own release runbook, their own known failure patterns — without touching shared rules.

The governance model this creates: one team owns the standards repo (they publish versions, review changes to universal rules). Other teams contribute their context via pull requests — adding their project context files, their team-specific runbooks, their known failure patterns. Shared ownership without fragmentation.

The practical result: a new team goes from zero to a fully configured AI assistant in minutes. They inherit all org-wide guardrails automatically, then add their specific context on top.


What's Next: Open Problems in AI Standards Governance

The multi-team generalization (v1.10) was the milestone that changed the scope of the project. It went from "our team's internal tooling" to "something other teams can actually adopt without forking it." That shift opens a different set of questions.

The immediate next step is rolling it out to other teams and observing what breaks. Rules that felt obvious to us may be confusing to a team with a different tech stack or a different way of working. The contribution model — teams submitting PRs to add their own project context — will be the real test of whether the governance model holds up at scale.

The longer-term question is one I don't have an answer to yet: how do you know when a rule has become obsolete? The codebase evolves, the team's practices evolve, and a rule that was hard-won six months ago may be actively misleading today. Keeping the standards fresh is a discipline problem as much as a tooling problem — and I haven't solved it.

There's also a deeper measurement challenge I'm still working through. Is the system actually being used? Are the workflows genuinely changing how developers write and review code, or are they being quietly skipped when the deadline is tight? And beyond usage — is it actually making things better? Fewer regressions, less rework, faster reviews? Answering these questions properly requires instrumentation I'm still building out. Gut feel says yes. But gut feel is exactly what I was trying to replace.



FAQ

How do you standardize AI coding assistants across a development team?
Treat AI configuration files (.cursorrules for Cursor, CLAUDE.md for Claude Code, or equivalent) as versioned team contracts, not personal preferences. Publish them as a package, distribute them to every project with a single install command, and use a pull model so teams explicitly opt into updates — no surprises mid-sprint.

What are .cursorrules / CLAUDE.md files and how do they work for team AI standards?
These are context files that AI coding assistants (Cursor, Claude Code, and others) read automatically in any project. They tell the assistant how to behave: what architecture patterns to follow, what guardrails to apply, what team conventions to enforce. The tool name differs, the principle is the same. By versioning these files in a shared repo and distributing them like a package, you turn individual AI configs into shared team knowledge.

How do you scale AI assistant configuration (Cursor, Claude Code…) across multiple teams without forking?
Standardize the structure, parameterize the identity. Universal rules (security guardrails, git conventions, architecture constraints) stay shared. Team-specific values — ticket prefix, team name, project context — are injected at install time. New teams inherit all org-wide rules automatically and add their context on top, without touching the core.

What is the /plan workflow?
/plan is a structured workflow invoked before writing any code. It runs targeted impact analysis, queries past architectural decisions from memory, and returns a structured plan — validated scope, identified risks, recommended approach — before a single line is written. It's designed to surface the kind of problems that would otherwise only appear during code review.


If you're building something similar — or have a different approach to AI coding standards at team scale — I'd love to compare notes in the comments. And if you want to dig into the Claude Code setup that powers /plan and /review, drop a comment and I'll share the repo.

Top comments (0)