AI Agents Can Ship Code Faster Than You Can Review It. Here's What Stops Them.

#ai #programming #softwareengineering #devops

Most teams running AI agents have no enforcement at the git layer. Here's what's quietly building in your repo — and the two-line defence that stops it.

An AI agent just wrote and committed 47 files. Did you review all of them?

Probably not.

Nobody does. That's the point — agents move faster than review. And if nothing is enforcing standards at the git layer, bad code reaches the repo at the same speed the agent writes it.

This is the problem quality gates were built for. It used to be a slow, human-speed problem. Now it's urgent.

The Migration That Broke Everything (And What Actually Saved It)

Three years ago I was maintaining a white-label registry platform — a government web app powering multiple clients. We had to migrate from Vue 2 to Vue 3. Vue 3 changed almost everything: the reactivity system, the component model, the entire ecosystem. Some of that pain was inevitable. But the wall we hit in the first hour? That was ours.

The terminal had thousands of errors. Some components were in TypeScript. Some weren't. Some had proper props with default values. Others had been copy-pasted years ago and never revisited. Slot handling had changed between Vue 2 and Vue 3 — a component would render in isolation, pass unit tests, and then silently break in a parent layout. Every broken slot had to be found by hand, by loading the full app, by navigating to the right screen.

The framework changed. But the real damage was the absence of guardrails — no enforced patterns, no consistent structure, nothing that would have caught the drift commit by commit before it compounded into a wall.

That was the human-speed version. Years of unchecked inconsistency, made visible all at once by a forced migration.

Now Imagine an Agent Doing That Every Five Minutes

Now imagine the same thing — but an agent is committing code every five minutes.

No guardrails means the agent generates code in whatever pattern it infers from context. Some files use TypeScript strictly. Others don't. Some follow your component conventions. Others are plausible-looking code that passes a basic check but violates three rules you defined in your style guide six months ago. The agent doesn't know. It wasn't told. And nothing is stopping it.

By the time a human reviews it, the inconsistency is already in the repo. Multiplied across 47 files. And now it's part of your baseline.

This isn't a hypothetical risk — it's already the default of agentic workflows without enforcement. The agent is fast. Capable. And completely indifferent to your project's conventions unless those conventions are defined, fed to the agent, and enforced at the git layer.

Quality gates are that enforcement. When a human pushes, they've at least read the code. When an agent pushes, the gate is the review.

Some teams argue that agents are still "junior devs" and humans are still in control. I think that's already outdated.

Two Lines of Defence

Whether the code comes from a human or an agent, you need the same two layers:

Developer or Agent
        ↓
Line 1: Quality Gates  (pre-push, before code hits the repo)
  → linter (ESLint, ruff, clippy...)
  → formatter (Prettier, black...)
  → type checker (tsc, mypy...)
  → unit tests (jest, pytest, cargo test...)
        ↓
Repository → CI Pipeline
        ↓
Line 2: Integration / UI Tests  (before code hits production)
        ↓
Production

Line 1 is a pre-push git hook that detects your stack automatically and runs all four checks in sequence — linting, formatting, type checking, and unit tests — but only the tools your project actually has installed. If anything fails, the push is blocked:

❌ Quality Gates FAILED — push blocked.

No Husky, no package manager, no Node.js dependency. Works in any git repo regardless of language. One line sets it up for the whole team: git config core.hooksPath .contextkit/hooks. See the full stack support in the docs.

Line 2 catches what line 1 structurally can't — things that only appear at runtime: broken user flows, visual regressions, components that pass unit tests in isolation but fail in a real browser context. Tools like Cypress and Playwright handle this.

The key insight: because unit tests already ran at line 1, your integration suite can focus purely on critical user paths rather than trying to cover everything. An hour-long test suite is expensive. The more you invest in unit coverage at line 1 — fast, cheap, build-time feedback — the leaner and more focused line 2 can be.

Line 1 stops bad code at the gate. Line 2 stops broken behaviour from reaching users.

What You Need Before the Gates Can Work

Most teams already have standards. They're in Confluence, a Notion doc, a wiki somewhere. Someone wrote them. Someone approved them. And then a deadline hit, and the push went out anyway — because nothing mechanically stopped it.

That's the real problem. Not that standards don't exist. It's that documentation you have to remember to follow isn't a guardrail. It's a suggestion. And suggestions don't scale with agents.

What gates actually enforce are standards that live close to the code — version-controlled, readable by your AI tools, and checked on every push:

.contextkit/standards/
├── glossary.md        ← project terminology
├── code-style.md      ← coding conventions
├── testing.md         ← test patterns
├── architecture.md    ← decisions and constraints
└── ai-guidelines.md   ← rules for AI-generated code

This is the loop: standards define what correct looks like. Gates enforce it. Agents read the standards and write to them. Without the standards, the agent guesses. Without the gates, the guesses reach the repo unchecked. And the Confluence doc nobody checked before pushing? It doesn't help either.

This is exactly what I didn't have three years ago. The standards existed — loosely, informally, in people's heads and in docs nobody opened under pressure. If code-style.md had been enforced from day one, the inconsistency we found during the migration would have been caught year by year, push by push, instead of all at once.

I ended up building ContextKit to handle this — standards folder, git hooks, and bridge files for whatever AI tools your team uses. One install.

TL;DR

With agents writing code at scale, quality gates are no longer optional — they're the only automated review that scales with them. Letting agents push without enforced gates is the fastest way to degrade a codebase.

Line 1 — Quality Gates (pre-push)
Runs linter, formatter, type checker, and unit tests before code hits the repo.

Line 2 — Integration / UI Tests (CI)
Catches broken flows, runtime regressions, visual bugs before code hits production.

Strong unit coverage at line 1 reduces the cost and surface area of line 2.

The real question isn't whether you trust your agent. It's whether your repo does.

I'm curious how teams are actually handling this right now:

Are you letting agents commit directly to your repo?
Or is every change still gated by human review?
If you have gates — what are you actually enforcing?

I have a strong opinion — but I'm genuinely curious what teams are doing right now.

npm i -g @nolrm/contextkit && cd your-project && ck install

Written by Marlon Maniti. I build tools for AI-native development workflows. Follow for more on context engineering, squad pipelines, and shipping with AI at speed.