DEV Community

Cover image for Securing AI Generated Code: You Ship It, You Own It
Ricardo M Santos for NotTheCode

Posted on • Originally published at notthecode.com

Securing AI Generated Code: You Ship It, You Own It

A developer merges a pull request on Friday afternoon. Monday morning, a security review flags a SQL injection flaw in a new endpoint. When you ask what happened, the answer is: “Copilot generated that part. I didn’t really read it.”

That answer is worse than the bug.

The security problem with AI-generated code is not only that models can produce insecure patterns. It’s that teams start treating shipped code as if ownership can be outsourced along with the typing. Scanners, SAST, and dependency audits still matter, but they sit downstream of a more basic failure: somebody merged code they didn’t understand. If you want to secure AI-generated code, you have to fix that first.

The Illusion of "Done" (And the Vulnerabilities You Inherit)

AI code generation is fast. That speed creates a dangerous shortcut in a developer’s head: the code appeared in seconds, so the task must be nearly finished.

It isn’t.

Code from an LLM carries familiar risks, and it often packages them in a way that looks polished enough to trust. Training data contains insecure patterns: string-built SQL queries, outdated authentication flows, hard-coded secrets, permissive CORS policies, weak validation, and examples copied from old blog posts that never should have made it into production. The model does not know your threat model, your trust boundaries, or which endpoint is exposed to the public internet. It does not know that the DbContext it suggested bypasses your tenant boundary checks, or that the helper it introduced skips the authorization policy your team uses everywhere else.

The testing story is no better. AI-generated tests skew toward happy paths. They verify that the endpoint returns 200 OK when the request is well formed and the caller is authorized. They rarely probe malformed input, privilege escalation, concurrency failures, race conditions, or the authorization edge cases that cause real incidents. That gives teams a green build and a false sense of safety. Coverage goes up. Confidence goes up. The system is not safer.

This is how you end up with dependency hell and architectural drift. Each pull request adds one more unreviewed assumption: a stale package, an extra abstraction, a hidden permission check, a helper method no one can explain. Any single change looks harmless. In aggregate, they create an attack surface the team no longer understands.

That’s the real danger behind the illusion of “done.” AI makes unfinished work look finished, and teams inherit vulnerabilities the moment they confuse generated output with completed engineering.

The Reinvested-Time Thesis

A common claim about AI-assisted development sounds reasonable on the surface: “AI saves me two hours a day.”

The better question is what happens to those two hours.

If the answer is “we ship more features,” the trade has been misunderstood. AI did not remove the hard parts of the job. It removed some of the typing. Boilerplate generation, DTO mapping, CRUD scaffolding, repetitive handlers, and test skeletons can be produced faster. Architectural judgment did not get cheaper. Security review did not get easier. Integration testing still requires someone who understands how the system fails under stress, not how it behaves in a demo.

I think of this as the Reinvested-Time Thesis: every hour AI saves on code generation should be reinvested into the work the model cannot own.

That means reading generated code with more care, not less. It means tracing package additions before npm install or dotnet restore turns a suggestion into a dependency. It means writing the negative tests the model skipped. It means checking whether a generated abstraction matches the architecture you intended, or whether it quietly introduced a second pattern your team will have to support for the next two years.

Teams that treat AI as pure acceleration often ship faster with less understanding. Teams that treat AI as time reallocation keep human judgment where it belongs and raise their confidence in what they ship. The output may still be faster. The difference is that the saved time gets spent on review, verification, and system thinking instead of disappearing into a larger sprint commitment.

That trade is less exciting on a dashboard. It’s much better for production.

The Tooling Supply Chain: It’s Not Just npm Anymore

Most teams already understand supply chain risk in runtime dependencies. You inspect package.json, audit packages.lock.json or package-lock.json, review new NuGet packages, and watch for known CVEs. Good. Keep doing that.

The supply chain is bigger now.

When you install an AI coding extension in VS Code or JetBrains Rider, you are not adding a harmless convenience feature. You are granting software broad visibility into source code, prompts, file contents, and often nearby context that may include secrets or internal contracts. When you connect an agent to an MCP server, you may be exposing local files, shell commands, Git state, issue trackers, database schemas, or internal documentation. Those permissions matter as much as a new production dependency, and in some cases more, because they sit upstream of the code you ship.

That changes the frame. AI-generated code and third-party packages are both forms of code you didn’t write. They deserve the same skepticism. Hallucinated imports can pull typosquatted packages. Suggested libraries can be abandoned, stale, or incompatible with your standards. The tooling itself can exfiltrate data, over-collect context, or create a path for unsafe automation in the developer environment.

For .NET teams, that can mean reviewing every new package reference in .csproj files and checking whether a generated example uses raw SQL where parameterized queries or EF Core APIs would be safer. For TypeScript teams, it means treating generated import statements as part of your supply chain review, not as harmless glue code. In both cases, it also means reviewing the editor extensions, agent permissions, and MCP integrations that produced the code in the first place.

If you wouldn’t add an unknown runtime dependency without scrutiny, you shouldn’t trust an unknown AI tool with your repo, your environment, or your merge path.

The First-Person Merge Rule

The fastest way to spot a team losing accountability is to listen to how it talks after an incident.

“Claude added that endpoint.”
“Copilot missed the null check.”
“Codex forgot the authorization attribute.”

That language sounds harmless. It isn’t. The moment developers describe merged code in the third person, responsibility starts drifting away from the people who shipped it. Reviews get softer because the author can imply the tool is the real source. Retros get muddier because “the AI did it” cuts off the harder question: why did a human approve it? Junior developers hear that phrasing and learn the worst possible lesson from the AI era: shipping code without understanding it is acceptable if the output came from a model.

The rule I want teams to adopt is simple: if you merge it, you wrote it.

I call this the First-Person Merge Rule.

Under that rule, the correct sentence is not “Copilot forgot validation.” The correct sentence is “I merged code without validation.” Not “Claude created an unsafe query,” but “I approved an unsafe query.” The tool may have generated the text, but it did not click merge, approve the PR, or accept the risk on behalf of your team.

This is not about blame theatre. It is about preserving a clean line of ownership. Once that line gets blurry — what another post here calls the accountability fog — security suffers fast. Reviewers stop pressing because the human author sounds less like an owner and more like a courier. Incident analysis turns into tool criticism instead of process correction. Teams focus on model quality while ignoring the broken discipline that allowed bad code into production.

Language shapes culture more than most engineering teams admit. If your team bans “the AI did it” as an excuse, review quality improves because authors know they own the output. Postmortems improve because they stay centered on decision-making. Security conversations improve because accountability is no longer transferable.

Use the rule in pull requests. Use it in retrospectives. Use it when coaching junior developers. The merge button has one name attached to it. That name matters.

How to Actually Own Code You Didn’t Write

Ownership only matters if it changes behavior.

The practical model is to treat AI output the same way you’d treat a pull request from a new contractor on their second day in the codebase. You would read every line. You would question assumptions. You would verify they understood your authorization model, tenancy rules, error handling, and deployment constraints. The same standard applies here. A useful shorthand is to treat the AI like an Infinite Intern: fast, eager, and able to produce a lot of code without knowing which parts are unsafe in your system.

That mindset works best when paired with Intent Architecture: you direct the work with bounded tasks, constrain what the model can touch, and verify the output against your system’s real failure modes instead of its optimistic guesses. The difference is concrete — “Generate an ASP.NET Core endpoint for order cancellation that enforces this policy” is reviewable; “build the cancellation feature” is how you get architectural drift you will own later.

A working checklist looks like this:

  • Run the generated code through a real review gate. The pillar’s Zero Trust audit covers the mechanics — happy-path gaps, hallucinated dependencies, hard-coded secrets. The ownership rule sits on top of it: if you can’t explain a line, you don’t merge it.
  • Reinvest the time AI saved into that review and into the negative tests the model skipped, not into the next ticket.
  • Adopt the First-Person Merge Rule in pull requests and retros. “The AI did it” is not an accepted sentence.
  • Treat your AI tooling as supply chain. Verify the provenance of every package the model suggests, and audit what your extensions, agents, and MCP servers can read, write, and transmit.

None of this is glamorous. It is disciplined engineering. The teams that stay safe with AI are not the ones with the flashiest tools. They are the ones that kept ownership attached to the human who shipped the code.

That’s the standard: you ship it, you own it.

Frequently Asked Questions

Is AI-generated code secure?

Not by default. Models reproduce insecure patterns from their training data — string-built SQL, missing authorization checks, hard-coded secrets — and they have no knowledge of your threat model. AI-generated code can be made secure, but only through the same review, testing, and dependency scrutiny you apply to anything else. The security of what you merge is your responsibility, not the model’s.

Who is responsible for AI-generated code?

The person who merges it. A tool can generate text, but it cannot approve a pull request or accept risk on your team’s behalf. That is the First-Person Merge Rule: if you merge it, you wrote it.

What are the biggest risks of AI-generated code?

Four recurring ones: insecure patterns inherited from training data, happy-path tests that hide real failure modes, hallucinated or stale third-party dependencies, and the AI tooling itself — extensions, agents, and MCP servers that hold broad access to your environment.

How do you review AI-generated code safely?

Treat it like a pull request from a contractor on their second day: read every line, question every assumption, and verify it against your real failure modes. The working model is direct, constrain, verify — give the model bounded tasks, limit what it can touch, and test the edge cases it skipped.

Top comments (0)