Sergei Frangulov

Posted on Jun 1

Vibecoding in unskilled hands: 11 ways it quietly breaks

#ai #llm #productivity #career

You can get a working demo out of an AI coding agent in an hour. That first hour is the trap.

The speed is real. A prototype or a small script comes together in front of you, and it is easy to believe the whole project will go like that. It will not. Most vibecoding failures get blamed on the model. In my experience few of them are the model's fault. The bottleneck is almost always the person driving it, and the bill arrives later, on the long distance, where it is expensive to undo.

Here are eleven places I keep watching it break, and what actually causes each one.

1. The short distance lies

The first hour is genuine productivity. The curve flips after that. What sped you up early starts to slow you down: duplicates pile up, earlier decisions quietly contradict each other, and there is no single architecture holding it together. "Almost done" turns into months of patching loosely connected code. The beginner reads the easy start as a property of the whole road, and plans nothing for the tenth iteration or for coherence over time.

2. The visible success of AI projects is mostly a bubble

Trending repositories and viral wrappers create the impression that everything works by itself. When I scanned GitHub trending in mid-2026, several agent repos had pulled hundreds of thousands of stars in seven or eight months: one skills framework near 202k, one open coding agent near 164k. That is faster than almost any historical open-source growth, and a large share of it is inflated. Marketing, benchmark-maxxed READMEs, and trending-as-a-service badges, not working software or organic demand. A small, well-packaged project earns a few thousand stars the honest way while the giants farm hundreds of thousands. A star is a vanity metric. Beginners calibrate against this storefront and conclude they are the ones doing it wrong.

3. The model has no standing picture of your project

Each run sees a limited context window, and that window gets actively trimmed to fit a budget. Some tools prune idle context by design, without telling you. So the model is sharp inside a tight, well-scoped task and loses the thread on a large one: it forgets earlier decisions, contradicts code it just wrote, and over-builds. Holding the whole system in your head is still a human job. At minimum you have to keep architecture docs current, which is its own discipline, and most people skip it.

4. Garbage in, garbage out, and you pay per token for it

Weak input is the main source of bad output: a vague spec, no codebase context, no examples, no acceptance criteria. The model is not telepathic. It fills the gaps with the most probable answer, not the one you needed. The beginner opens a chat and types a wish instead of a brief, then spends the afternoon arguing with the result. Skilled operators spend most of their time here, before the first prompt.

5. One run is one sample, not a verdict

Because models are tuned on human preference, they lean toward the most typical, average answer. Research on alignment (Kirk et al., ICLR 2024) found that RLHF measurably reduces the diversity of outputs for a given prompt. So a single response is one draw from a distribution that has already collapsed toward the median. It is not the best answer, and not the only correct one. Without a precise process you get the internet average instead of an engineering call for your context. Asking for several options and picking one helps, but only when there is somebody qualified to pick.

6. Reasoning depth is a dial, not a fixed trait

"The model is lazy" is usually a misread. On current models, effort is a setting and depth follows from how you frame the task. The old habit of prompting "don't be lazy, be thorough" is now an anti-pattern: vendors warn that capable models over-trigger on it. The real skill is knowing when to turn effort up. Push it to maximum in the wrong place and you buy overthinking and a worse answer. The beginner never touches the dial and takes the first shallow response as the ceiling of what the model can do.

7. It is a capable executor without a global view, not an autonomous engineer

A useful calibration is by seniority. As a junior it is overpriced: you pay top-model rates for intern-level autonomy and still check every line. As a mid it is excellent on a well-scoped task. As a senior or architect it does not hold system coherence or judgment, and it cannot tell you what not to build. The beginner delegates exactly the part it cannot do.

Buying autonomy off the shelf is no shortcut either. We run a multi-agent orchestration tool called gastown. The author loves westerns, so the agents are named mayor, deacon, convoy, hounds, raccoons. It took two weeks of spare evenings to half-integrate it into our pipeline, and even then not for every task. Simple tools are not really autonomous. Capable ones cost you weeks of setup.

8. Memory is primitive and needs manual management

The assistant loses context between sessions and on resume, and it drops things on purpose to fit the window. This is not a guess. One popular coding agent's own postmortem admitted that it had quietly trimmed reasoning in idle sessions to save cost, shipped without a changelog, and that a hidden instruction to answer in under twenty-five words had measurably lowered output quality. Bloated instruction layers degrade output on their own, and people keep piling them on. If you do not understand how the memory behaves, you re-explain the same context every session and wonder why yesterday it understood and today it does not.

9. The setup around the model is a separate skill

The value comes from the layer you build around the model: project instructions, ready-made skills, hooks, tools, rules, context. Beginners work straight out of the box, never notice that layer exists, and blame the model for the result. This is applied knowledge of how your own system fits together, and it has to be budgeted like any other engineering work. The license is the cheap part.

10. Your skills expire the moment you learn them

The operator's competence depreciates faster than the models change. A technique that worked last week is stale this week, and silent changes in model behavior keep your mental model out of date. Staying current is a daily habit, not a one-time milestone, and most people are not up for it. The only thing that works for me is reading the field every day. You can automate the gathering, but someone still has to filter the noise by hand.

11. The line keeps moving, but responsibility does not

The boundary between what the model can do and what you must do slides toward the model over time. Responsibility does not slide with it. The mechanics will keep getting absorbed, but intent, taste, and the decision of what not to build stay with the human, and no one can date when that changes. "Let's wait for AGI" is not a strategy. It is the excuse that produces unskilled hands, treating the model as the accountable author instead of a tool in your own hands.

The pattern

None of these is about the model being weak. Each is about a person not seeing the line between what the model does and what they still owe, and that line is masked by hype and kept moving by how fast the field changes. Plenty of people will tell you none of this is critical and it is all solvable. They are probably right. It is solvable.. in skilled hands.

Treat all of the above as a snapshot of 2026, not a verdict for all time.

Which of these eleven hits your team hardest, and what did you actually do about it?

DEV Community