Iurii Rogulia

Posted on Jun 30 • Originally published at iurii.rogulia.fi

Vibe Coding Problems: What AI-Generated Code Gets Wrong

#ai #technicaldebt #architecture #bestpractices

A few months ago, a founder sent me a repository and said: "The developer used AI to build it. It works in demos but breaks in production and we can't figure out why."

I've seen this enough times now that I have a checklist before I open the code. Not because I want to be right, but because the patterns are so consistent it would be irresponsible to pretend otherwise. The codebase compiles. The tests pass. The structure looks vaguely like something a senior developer would produce. And yet it is still broken — just in ways that don't show up until you have real users, real data, or someone reading it carefully.

This is not an argument against AI-assisted development. I use it myself. This is about what happens when an LLM drives the whole project with nobody steering — no architect in the loop, no one reviewing whether the fifth approach to authentication should have replaced the first four. The output looks like code. It is not, in any meaningful sense, a designed system.

Five Overlapping Ways to Do the Same Thing

The most reliable tell is duplication — not the obvious kind, but the kind that happens when a model adds a feature, is asked about it again later in a different context, and adds it again without checking what already exists.

Authentication is the canonical example. In a recent rescue project — a codebase under 30k lines — I counted five distinct authentication flows: one in middleware, one in individual page components, one duplicated across API routes, one wrapped in a client-side hook that did its own token check, and one experimental Passport.js setup in a folder no route imported anymore. Each was added at a different point in the conversation history. None was deliberately redundant. The model never went back and removed what it had made obsolete.

The same happens with validation. There's a Zod schema at the API boundary, a manual check function three files away, and a third set of checks inside the database utility — written at different times, with slightly different rules. Which one is authoritative? Nobody knows, including the person who asked the AI to write them.

This matters in production because when you fix a bug in one layer, the other layers still have the old behaviour. You patch the API validation and the database utility silently rejects inputs that should now be valid. Or the opposite. The system is not one thing — it's an archaeological site of previous attempts.

Tests That Assert the Wrong Thing

A passing test suite is not evidence that the code is correct. In a vibe-coded repo, it is often evidence of the opposite.

Here's the shape of the problem:

it("calculates the order total", () => {
  const total = calculateOrderTotal(items);
  expect(typeof total).toBe("number");
});

That test passes. It will always pass, because calculateOrderTotal does return a number — it just returns the wrong number. The VAT is not applied. Discounts are double-counted. The shipping is hardcoded to zero. None of this is caught because the test was written to verify that the function runs without throwing, not to verify that it produces a correct result.

When a model writes tests for code it just wrote, it tends to test the implementation rather than the behaviour. It knows what the function does and it asserts that. What it doesn't know — because nobody told it — is what the function should do in edge cases, error conditions, or inputs that don't look like the happy path.

The result is a CI pipeline that's permanently green on a system with incorrect business logic. The tests are not lying. They are asking the wrong question.

Everything in `package.json`

Open the dependencies and you will find a history of the model's training data.

There is axios for HTTP. There is also node-fetch. There is also the native fetch with a polyfill that's not needed in the Node version being used. There are three date libraries: moment (because tutorials used it for years), date-fns (because someone told the model moment was deprecated), and dayjs (because another prompt mentioned it was lighter). They are all installed. Two of them are used in different files. One is not used at all but was added to package.json during a refactor that never happened.

The same with utilities. lodash and underscore. uuid and nanoid and crypto.randomUUID() all used in different places for the same purpose. A PDF library installed alongside a different PDF library, each used in one file.

Unused dependencies are not just an aesthetic problem. They expand the attack surface for dependency vulnerabilities. They bloat the build. They make it harder to understand what the system actually uses. And they are almost impossible to clean up confidently without reading every import in the codebase — because there is no documentation saying why any of them were added.

The proportion is consistent enough to be predictable. The last three vibe-coded repos I audited had between 25% and 35% of their direct dependencies entirely unused. In one of them, removing the unused entries cut the production Docker image by roughly 40%.

Security Patterns Copied from Tutorials

The model has processed an enormous amount of tutorial code. Tutorials are written to be easy to follow, which means they cut security corners that a production system cannot afford.

I find access tokens stored in localStorage because that is how most OAuth tutorials store them — it is easy to read from JavaScript, which is exactly the problem. I find CORS configured as Access-Control-Allow-Origin: * on an API that handles authenticated requests, because that is the configuration that makes the demo work without the CORS error. I find SQL built by string concatenation:

const result = await db.query(`SELECT * FROM orders WHERE user_id = '${userId}'`);

This is an injection vulnerability. The model has seen this pattern in thousands of tutorials and Stack Overflow answers where the point was to demonstrate something else, not to demonstrate safe query construction.

The one that I find most alarming, because it is the hardest to reverse: API keys committed into the codebase, sometimes in client-side bundles. The model puts configuration where it has seen configuration put before — directly in the code, or in a .env file that was never added to .gitignore. The key is now in the git history. Rotating it is mandatory. If it is a third-party service key, you must assume it was seen.

I am not describing carelessness. I am describing what happens when a system that has learned from millions of examples of code-written-for-explanation is asked to produce code-that-must-actually-be-secure. These are different tasks. The model was not told they were different.

The Schema That Disagrees With the Code

Database drift is the slowest-moving problem and often the most expensive to fix.

The migration files exist. They do not reflect the current state of the database, because migrations were generated at different stages of development and some were edited directly rather than through a new migration. Others were run manually on the production database by someone who was in a hurry. The migration history and the actual schema are no longer the same document.

The ORM models have columns that do not exist. The database has columns that the ORM does not know about. Foreign keys are defined in the code but not enforced at the database level — or enforced in the database but not declared in the ORM, so queries that cross that boundary fail in unpredictable ways. Indexes were added at the model's suggestion but are on columns that are never actually queried, while the columns that appear in every WHERE clause have no index.

The practical consequence: you cannot do a fresh setup from the migration history alone. The schema and the code are permanently entangled with the specific database instance that was used during development. Moving to a new environment requires manual reconstruction. Rolling back a bad migration is not possible.

A README That Describes a Different Product

The documentation was written by the same model that built the app — and it imagines what the finished version would look like.

The README describes role-based permissions that exist only as placeholder comments. The API reference lists endpoints that were planned but never added. The "quick start" fails on step three because a dependency was renamed and the docs weren't. Two of the environment variables it lists have different names than the actual code reads.

This is not the usual kind of documentation drift, where someone wrote accurate docs and the code moved on. The model produces confident-sounding documentation based on the intended design, not the actual implementation. It describes the system it was asked to build, not the system it built. A new developer setting up the project is working from a description of software that does not exist.

Why This Happens

None of this is random. It follows directly from how large language models work.

The model's training data is dominated by tutorial code, Stack Overflow answers, and open-source repositories in various states of completion. Tutorial code is written to illustrate a point, not to be maintained. Stack Overflow answers solve the specific question asked, without regard for what surrounds them. Open-source repositories include everything from exemplary production code to abandoned weekend projects — and the model cannot easily distinguish between them.

More structurally: the model optimises for plausible-looking output. Each response is evaluated on whether it seems correct, not on whether it integrates cleanly with the thirty responses that preceded it. There is no persistent memory of previous architectural decisions. When you ask it to add authentication at the start of a project and ask again halfway through, it adds authentication again — and there is no mechanism that asks whether the first attempt should be removed.

There is also no one in the role of architect. An experienced developer making these decisions would delete the previous approach before adding a new one. They would read the test and ask whether it is actually testing the right thing. They would notice that three HTTP clients are installed and remove two of them. The model does none of this unless explicitly prompted, and even then it only does it locally — for the current file or function, not the system as a whole.

The result is not incompetent code. It is code that is locally coherent but globally incoherent — each piece makes sense in isolation, and the whole doesn't hold together.

There is also a quieter cause that is harder to write about because it concerns hiring decisions rather than technology. Many of the projects I see arrive in this state because the company used AI assistance to substitute for a senior engineering review that nobody on the team was qualified to do. The model was not asked to draft a feature that a senior engineer would then evaluate — it was the only entity in the loop with any opinion about software design. That arrangement produces the codebases described in this article more reliably than any specific prompt or tool choice. The fix is not better prompts. The fix is putting an experienced developer back into the decision path, even part-time.

What Cleanup Actually Looks Like

slug="rescue-projects"
text="Inherited a vibe-coded codebase? I audit, stabilise, and fix the patterns described above — without a rewrite that costs you twice."
/>

The sequence I follow when I take one of these projects on:

Audit first. Read the codebase before touching it. Map what's actually there: every auth mechanism, every validation layer, every dependency that's imported somewhere. The goal is to understand the actual system, not the intended one. This is also what I do in a technical due diligence engagement before a client commits to acquiring or building on top of an existing codebase.

Dead code removal. Once you have the map, remove what is not used. This is tedious and requires confidence — you need to be sure a dependency is genuinely unused before removing it. Tools like depcheck help, but manual review is necessary for anything that touches shared utilities.

Real tests. Not tests that assert the code runs — tests that assert it produces correct results for known inputs, including edge cases and error conditions. This usually means reading the actual business logic and asking: what is this function supposed to do, and what are the cases it can fail? Write those tests before refactoring anything.

Dependency cull. Consolidate to one HTTP client, one date library, one ID generation utility. Document why each remaining dependency exists.

Security pass. Tokens out of localStorage. CORS locked to actual origins. SQL through parameterised queries or an ORM with no raw interpolation. Secrets out of the repository and into an actual secrets manager. If anything was committed, rotate it.

This is not a rewrite. The underlying application logic is usually salvageable — the business rules, the data model, most of the UI. What needs replacing is the infrastructure that holds it together.

I emphasise this because the first instinct of most founders, when they realise what they have, is to throw it away and start again. That instinct is almost always wrong. A rewrite means rebuilding everything the AI happened to get right alongside everything it got wrong, paying for it a second time, and arriving — months later — at a system that has not yet been tested against real users. Stabilising what exists is usually two to four times cheaper than starting over, and you keep the parts that already work. The codebase is in a worse state than it looks. It is rarely in a worse state than starting from zero.

items={[
{
q: "What is a vibe-coded codebase?",
a: "A codebase built primarily by prompting an LLM with no experienced developer reviewing the output, making architectural decisions, or enforcing consistency. The code compiles and demos work, but the system is globally incoherent: duplicate auth flows, tests that assert the wrong thing, security patterns copied from tutorials, and a schema that has drifted from the migration history.",
},
{
q: "Should I rewrite a vibe-coded codebase or fix it?",
a: "Almost always fix it. The underlying business logic and data model are usually salvageable. A rewrite means rebuilding everything the AI happened to get right alongside everything it got wrong, paying for it twice, and arriving months later at a system with no real-user battle-testing. Stabilising what exists is typically two to four times cheaper than starting over.",
},
{
q: "What are the most common security problems in AI-generated code?",
a: "Three patterns I find in nearly every vibe-coded codebase: access tokens stored in localStorage (readable by any JavaScript on the page); CORS configured as Access-Control-Allow-Origin: * on authenticated APIs; and SQL built by string concatenation instead of parameterised queries. API keys committed to git history are also common — if found, rotate them immediately.",
},
{
q: "How do I audit a codebase for vibe-coded problems?",
a: "Read before touching anything. Map every authentication mechanism, every validation layer, and every dependency that is actually imported somewhere. Use depcheck to identify unused packages, then manually verify anything that touches shared utilities. Look for duplicate implementations of the same concept. The goal is to understand the actual system before assuming the README describes it correctly.",
},
{
q: "How do I prevent these problems when using AI to write code?",
a: "Put an experienced developer back into the decision path, even part-time. An LLM should draft features that a senior engineer evaluates and integrates, not drive the whole project. Practically: use a CLAUDE.md or equivalent to enforce architectural rules at the prompt level, run lint gates on every commit, and never let the model add a dependency or auth flow without a human reviewing whether the existing one should be removed first.",
},
]}
/>

If what I've described sounds like the codebase on your laptop — or the one a developer just handed you — that's what my rescue projects service is for. I'll tell you honestly what's there and what it'll take to fix it.

For the broader argument about why AI amplifies existing skill gaps rather than closing them, see AI Coding Is Wind. For the practical tools that prevent these patterns from forming in the first place — CLAUDE.md, lint gates, stop prompts — read Prompts That Keep an AI Agent From Wrecking Your Codebase.

DEV Community

Vibe Coding Problems: What AI-Generated Code Gets Wrong

Five Overlapping Ways to Do the Same Thing

Tests That Assert the Wrong Thing

Everything in `package.json`

Security Patterns Copied from Tutorials

The Schema That Disagrees With the Code

A README That Describes a Different Product

Why This Happens

What Cleanup Actually Looks Like

Top comments (0)

Five Overlapping Ways to Do the Same Thing

Tests That Assert the Wrong Thing

Everything in package.json

Security Patterns Copied from Tutorials

The Schema That Disagrees With the Code

A README That Describes a Different Product

Why This Happens

What Cleanup Actually Looks Like

Everything in `package.json`