Amit Kochman

Posted on Jul 1 • Originally published at pandorian.ai

The Real Reason AI Code Never Follows Your Standards

#ai #devops #codereview #engineering

TL;DR: AI coding tools generate statistically plausible code, not organizationally compliant code. Copilot, Cursor, and Claude Code were trained on public repositories. They have never seen your Confluence page, your ADRs, or your internal architecture docs. The result is code that works and violates your standards at the same time. This is not a prompt engineering problem. It is a structural one. Pandorian imports your existing standards and enforces them on every pull request and repository scan automatically.

What's in this post

Your Codebase Has Standards. Copilot Has Never Read Them.
AI Tools Are Prediction Engines, Not Compliance Engines
The Drift Is Not Random. It Is Systematic.
Every Prompt-Level Fix Shares the Same Flaw
Agent Skills Tell the AI What to Do. They Cannot Verify That It Did.
Your Standards Already Exist. They Just Need Enforcement.
One Place to Define, Enforce, and Govern
Standards in Docs Die. Standards with Enforcement Live.

Your Codebase Has Standards. Copilot Has Never Read Them.

You wrote the standards. Your team agreed on them. They live in Confluence, in architecture decision records, in internal wikis written by the engineers who spent years building this system. They cover things like how you structure service boundaries, how errors get handled, how data flows through your stack.

Then your team started using Copilot and Cursor, and the code that came out stopped matching what you decided.

This is not a configuration problem. It is not a prompt engineering problem. It is not a problem your senior engineers can review their way out of. It is structural. AI coding tools were trained on public code repositories. They have never seen your documentation. They have no way to know what your organization decided.

AI Tools Are Prediction Engines, Not Compliance Engines

Understanding why AI code ignores your standards requires understanding what AI coding tools actually do. Copilot, Cursor, and Claude Code are large language models trained to predict what plausible code looks like in a given context. They are extremely good at this. They produce syntactically correct, logically coherent code that solves the problem at hand.

What they cannot do is check whether that code conforms to the architectural decisions your organization made last quarter, the security patterns your security team mandated, or the naming conventions your platform team established three years ago. That knowledge lives in your internal documentation. It was never part of their training data.

Stack Overflow described this precisely in a March 2026 piece on coding guidelines for AI: coding agents contributing to an enterprise codebase need to follow the organization's standards and guidelines. But unlike a new junior developer who can be pointed to documentation, agents lack the accumulated context that comes from months of working inside a specific codebase. The instructions they receive at prompt time are not the same as the organizational standards your team built over years.

One VP of Engineering, quoted by CIO.com, described the result: AI tools fail to align with established coding standards, creating additional work in review, refactoring, and rework. Not a one-time rework. A compounding rework load that scales with AI adoption.

The Drift Is Not Random. It Is Systematic.

When a single developer uses Cursor for a week without your standards enforced, the resulting drift is small. When 100 developers use Cursor every day for six months, the drift becomes categorical.

Sonar's 2026 State of Code survey, covering 1,100+ professional developers, found that 40% of developers report AI has increased technical debt by creating unnecessary or duplicative code. That is not a signal from edge cases. That is the mainstream of professional development.

The drift takes specific forms. REST URL paths that should follow one convention follow three across your services. Error handling patterns established by your platform team appear in half the repositories and nowhere in the other half. Security checks your team mandated for data access code appear when a senior engineer wrote the PR and vanish when the developer used Cursor to write it. None of these violations are bugs. They all pass syntax checks. They all pass review. They all reach production. And they compound.

Faros.ai's 2026 AI Engineering Report measured code churn, defined as lines deleted relative to lines added, up 861% in high AI adoption organizations. A significant portion of that churn is rework: code written by AI tools that did not conform to organizational standards and had to be rewritten later.

Every Prompt-Level Fix Shares the Same Flaw

When AI code stops following your standards, the instinct is to improve the instructions. Write a better system prompt. Add your standards to the Copilot repository instruction file. Put your architecture guidelines in a Cursor rules file. Add a CLAUDE.md with service boundary notes. These all help at the margin. They all fail at scale. And they all fail for the same reason.

Augment Code's analysis of Cursor's rules system is direct: rules are injected into the system prompt as advisory text, so the model treats them as suggestions rather than gating constraints. No mechanism exists to block code generation when a rule is violated. Cursor's own documentation acknowledges this explicitly: AI guidance should not be your only security control.

Copilot's repository instruction files have character limits. They capture the surface layer of your standards while missing the judgment layer. CLAUDE.md files are per-project and per-developer. All of these approaches share a fundamental architectural property: they inject context at the start of a generation session. There is no post-generation check. No org-level signal that the instruction was followed. The problem is not that they are unsophisticated. It is that they operate at the wrong point in the process.

Agent Skills Tell the AI What to Do. They Cannot Verify That It Did.

Agent skills, whether as SKILL.md files, Claude Code skills, or structured instruction libraries, are the most advanced form of this approach. Rather than a flat rules file, agent skills define modular, reusable instruction sets scoped to specific tasks and contexts. A skill for your data pipeline standards. A skill for your API design patterns. A skill that loads automatically when an agent touches a payments service file.

This is meaningfully better than a single .cursorrules file. Skills can carry real organizational context. They can be version-controlled, shared across the team, and refined over time. They represent the state of the art in prompt-level standards guidance.

They still share the core limitation of every prompt-level approach: they tell the AI what to do. They cannot verify that it did. There is no mechanism in any agent skills framework that generates a PR-level check. There is no org-level report showing which repositories have active skills, which developers are using them, and whether the skills are producing compliant output. A skill that fires every session and is quietly ignored by the model leaves no trace. The violation still reaches the main branch.

Pandorian is built to work alongside agent skills, not instead of them. Skills define intent at generation time. Pandorian enforces outcomes after generation, at the PR and repository scan level.

Your Standards Already Exist. They Just Need Enforcement.

Here is the practical reality for most engineering organizations: the standards already exist. They are in Confluence. They are in architecture decision records. They are in the onboarding docs that your most experienced engineers wrote when the system was being designed. They are in the runbooks, the wikis, the ADRs marked "accepted" two years ago.

The problem is not that the standards do not exist. The problem is that they live in documents, and documents do not enforce themselves.

Pandorian's Guideline Importer takes your existing standards and turns them into active enforcement. The workflow is:

Extract — Pandorian connects to your existing sources (Confluence, Markdown docs, ADRs) and extracts the standards already written there.
Compile — The extracted content gets compiled into discrete, enforceable guidelines, each scoped to the domain it covers.
Score — Each guideline is scored for focus, clarity, and enforceability. Vague guidelines get flagged for refinement before being deployed as checks.
Enforce — The scored guidelines run as automated checks on every pull request and every repository scan.

The standards your senior engineers wrote years ago, the ones AI tools have never read, become the checks that run against every PR. Every time Cursor generates code that violates your service boundary pattern, the violation surfaces in the PR with context, before it reaches the main branch.

One Place to Define, Enforce, and Govern

The downstream impact of AI code not following your standards is not just technical. It is organizational. Inconsistent patterns make onboarding harder. When the codebase reflects three different approaches to the same problem, a new engineer cannot tell which one is right. Standards drift forces senior engineers to re-explain decisions that were settled. Compliance reviews surface violations that should have been caught at the PR level months earlier.

Pandorian gives engineering leaders a single place to resolve this:

Standards catalog — Every guideline in one place, with ownership, version history, and scope.
PR enforcement — Every pull request checked against applicable guidelines automatically, before merge.
Repository scans — Existing drift across the codebase visible in one report, not discovered during audits.
Leadership visibility — Compliance posture across repos and teams without manual review.

Standards in Docs Die. Standards with Enforcement Live.

AI coding tools are not going to start reading your Confluence page. They were not built to be compliance engines. They were built to generate plausible code, and they do that extremely well.

Your job is not to make Copilot or Cursor compliant at the generation stage. Your job is to build the enforcement layer that checks what they generate against what your organization decided. That layer does not exist in prompt files. It does not exist in Cursor rules, which are suggestions dressed up as instructions. It does not exist in PR review processes that were not designed for AI code volume.

It exists in automated enforcement that runs on every PR and every scan, against the standards you already wrote. The issue is not whether your standards exist. The issue is whether your codebase can feel them.

Written by Amit Kochman, GTM Operations Director at Pandorian

DEV Community