Vibe coding works for the first week or two. You describe what you want, the agent writes it, tests pass, you ship. A few weeks in, progress falls off a cliff. New prompts start breaking older features in ways that pass the obvious tests, but later surface in production.
Vibe coding is the version where you fully trust the agent, don't read or only skim the code, and ship. Agentic coding is the version where you still read every diff, but the line between the two is a convention that decays when you're tired, when the diff is large, or when you're four hours in and the feature is almost done. So I'm treating vibe coding here as the failure mode of agentic coding rather than a separate thing.
The issue is structural, since coding agents have no equivalent of the source/generated-output boundary that a compiler gives us, and so prompt, code, tests, and previous agent output are all editable and all treated as input. The fix has to come from the harness vendors, in the form of a protected region the agent can read but can't rewrite without an explicit human unlock, because another instruction file isn't going to cut it. Until they ship the real thing, the workarounds are all a bit unsatisfying.

Public case of vibe coding fail. Lemkin was experimenting on a personal project, but real production systems have been wiped the same way. Replit is apparently "the safest place for vibe coding" according to their marketing.
It's tempting to read this as a problem that only kicks in once you have a real team or a serious codebase, but even the vendors selling these agents are starting to see its limits. From a recent interview:
if you close your eyes, and you don't look at the code, and you have AIs build things with shaky foundations as you add another floor, and another floor, and another floor, things start to kind of crumble.
Michael Truell, Cursor CEO
I don't want to cast blame on the users here ("professional" SWEs doing vibe coding is another story). The dream is real: a tool that lets you build production software without the years of engineering muscle memory it usually takes. The marketing says it's safe and the product produces plausible work. The loop stays quiet until something breaks, and the dev forums are full of stories where it did: leaked secrets, runaway agents, silent regressions.
Even if you don't use agents, or you always read the diffs carefully, you still have to deal with the consequences. It usually arrives as a vibe-coded PR or demo from a non-technical colleague that engineering then has to finish properly. It's hard to be the engineer who always says no, especially when these colleagues are excited to contribute and think they made something good. The question is do we want to fix it, control it, or ban it?
Why this fails
The agent reads both the prompt and the code, treating them as equally important since either can be changed at any time. This is different from a compiler, which operates in one direction. You write Go, it produces assembly, and there's no confusion about which side to edit. If you change the Go file, the assembly gets regenerated next time. If you edit the assembly directly, you could make a mistake that the next compile will silently overwrite.
Now, picture a compiler that is right 95% of the time. Sometimes it regenerates code in a different file you didn't plan to modify, treating its previous output as input for the next run. Nobody reads the assembly because the main reason for trusting the compiler is that you don't have to. So, when things go wrong, nobody notices. The compiler continues to treat its past output as if it were the source, causing errors to accumulate unnoticed.

Compilers gave us assembly we never had to look at. The agent loop asks us to look at both.
To make this concrete, let's say that in week 1 you ask the agent to add a payment flow where it does the right thing, eg, a GDPR consent check before charge and amount bounded against user daily cap:
if not user.has_consent("payments"):
raise PaymentDenied("missing consent")
if amount <= 0 or amount > user.daily_cap:
raise PaymentDenied("amount out of bounds")
You revisit the same function weeks later and tell the agent to send a quick cleanup pass and it looks this way:
if amount <= 0:
raise PaymentDenied("amount out of bounds")
if amount > user.daily_cap and not user.is_premium:
raise PaymentDenied("amount out of bounds")
The tests still pass, the code is clean and readable, but gone is the GDPR check, a fraud cap has been silently dropped from premium users without anyone asking for it.
I've been calling this logic drift. The code shape is roughly the same, but an earlier constraint is subtly relaxed. An invariant becomes conditional, a guard gets moved a few lines down past the thing it was supposed to guard, an authorization check gets duplicated and one of the copies is wrong. The diff just says a guard moved. The source never stated that the guard was load-bearing, so the review never catches the moment it is no longer load-bearing.
This actually happened on the Linux kernel recently. A maintainer submitted a patch generated by a AI that removed a __read_mostly annotation. This annotation is a hint to the compiler about cacheline placement, and removing it causes contention on every multi-core system that the kernel ships to. On review, the line seemed like a simple cleanup, so the patch was accepted, and Torvalds later said that he would have viewed it differently if he had known it was written by AI.
The shape of a fix
The fix needs to be in the harness, the layer between the model and your filesystem (Cursor, Claude Code, Replit, an IDE plugin). The simplest implementation is a way of tagging a comment and the code immediately following it as human owned so that the agent can read it and reference it and suggest a patch but cannot implement the patch without the human unlocking it first. That puts the source/assembly boundary back into the code.
Protected regions like this are a really old idea. Code generators have used BEGIN USER CODE / END USER CODE markers for decades because rerunning the generator overwrites whatever you had hand-edited inside the generated file. Agentic coding has the same overwrite problem, except there's no generator and no rerun, just an agent editing ordinary source files in the background. There's no codegen template to put the markers in, so the lock has to live one layer up, in the harness itself.
A # lock: comment does the job one statement at a time, in the spirit of Python's # type: or # pragma: no cover:
def charge_card(user, amount, idempotency_key):
# lock: gdpr art 6 - refuse charge if no payment consent
if not user.has_consent("payments"):
raise PaymentDenied("missing consent")
# lock: fraud SLA - reject amounts <=0 or above user.daily_cap
if amount <= 0 or amount > user.daily_cap:
raise PaymentDenied("amount out of bounds")
invoice = build_invoice(user, amount, idempotency_key)
metrics.timing("invoice.build", invoice.elapsed_ms)
receipt = stripe.charge(invoice.token, amount)
# lock: pci audit trail, compliance keeps asking, dont remove
log.info("charged", user=user.id, amount=amount)
return receipt
The # lock: comment locks itself and the syntax node immediately below, so attaching it to an if covers the whole block and attaching it to a single call covers just that line. The comment contains the motivation and is locked along with the code.
Note that these solutions do not rely on the model to cooperate. The harness already sits between the agent and the filesystem. Before applying any patch, it analyses the file, determines where the locks are placed, and refuses all attempts to edit the spans containing the locks, unless of course they are explicitly unlocked by the user.
What's been tried
The first answer everyone reaches for is discipline (agentic coding is a trap): use the agent less, keep diffs small, review everything. This all works well right up until the tool itself drains any remaining self-discipline you might have. You pull the lever and a perfectly functional piece of code drops out of the app. Also, even if you may have strong discipline, you cannot enforce that on others.
Traditional engineering processes work well for humans, but don't scale to the scope of agents. Requirements live outside the code and are not generally read by agents. Tests, types, and linters all give the agent rails to follow, but none of them says: don't change this line, ever. Code review can catch some of the drift, but it's a scale problem. Reviewing takes far longer than it takes an agent to spit out a new feature.
The harness vendors themselves have caught up some too, but most of what they've shipped is still not hard constraints. Persistent memory survives sessions, skills bundle known procedures, code search has gone from grep to semantic indexing, and AGENTS.md files politely beg the agent not to touch certain functions. Cursor has project rules, Claude Code has hooks that can intercept tool calls, GitHub Copilot has custom instructions, and OpenCode has modes that can't write to production files at all. I actually use a lot of it.

AGENTS.md, on closer inspection.
So that's roughly where I land. The harness vendors aren't going to ship a real lock anytime soon, and until they do, the only boundary that reliably holds is one the agent can't see or touch. Current solutions are helpful but just as advisory hints rather than as the lock itself.
Top comments (0)