DEV Community

Cover image for When AI Writes the Code… Who Takes Responsibility?

When AI Writes the Code… Who Takes Responsibility?

Subhrangsu Bera on March 10, 2026

Late one night in Kolkata, a developer sat staring at a glowing screen. That developer was me. Two years into my journey as an Angular developer,...
Collapse
 
stephanie_grogan_d7ff10ce profile image
Agape.Cloud.UCS

Great write-up. The part that stood out to me is the “illusion of speed.” AI can generate code incredibly fast, but correctness in a SaaS system is rarely about writing lines of code — it’s about understanding how a change ripples through the whole system.

One thing I’ve noticed is that AI often optimizes for completing the task it was asked, not for preserving the system state. That’s where subtle bugs creep in — like confirmations being generated without the underlying action actually succeeding.

It reinforces the co-pilot analogy. AI can accelerate development, but the responsibility for system integrity, edge cases, and verification still sits with the developer. In complex systems, reviewing and understanding the ripple effects is still the real work.

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Great observation... especially the part about "AI optimizing for completing the task rather than preserving the system state" — that’s exactly where subtle bugs can sneak in.

I also like how you described the ripple effect across the system. In complex applications, understanding those interactions is often harder than writing the code itself.

Really appreciate the thoughtful comment!

Collapse
 
codingwithjiro profile image
Elmar Chavez

I agree with this!

Collapse
 
ingosteinke profile image
Ingo Steinke, web developer

What the agent didn't mention was that it also ...

and this wasn't obvious in the pull request change set? Who reviewed it. Wait ... another AI?

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Good question 🙂

The changes were technically visible in the PR diff, but the issue was that the agent modified files that weren’t directly related to the bug being fixed. At a quick glance the fix looked correct, and the extra changes were subtle enough to slip through without a deeper review.

That’s exactly the point I was trying to highlight — AI can solve the immediate problem, but it can also introduce small side effects that are easy to miss if we trust the output too quickly.

Collapse
 
leegee profile image
Lee Goddard

"modified files that weren’t directly related to the bug being fixed"

In that case the PR must generally be rejected, and now you know why.

Collapse
 
leob profile image
leob

Honestly I think in some cases we should still choose to write the code ourselves - that's a point I don't see made often, but I'd like to make it ...

In many cases the greatest value of AI might not even be in writing the code, but in assisting with other tasks ...

I mean, it's not a secret that developers, on most projects, spend no more than around 15-20% of their time on "coding" - the rest is spent on thinking, planning, analyzing, designing, testing, debugging, and troubleshooting - what if we'd use AI more to assist with those other tasks?

Interestingly, I've worked on a project where AI wasn't principally used to generate code, but to review it! And the AI "bot" was pretty good at it - although it couldn't completely replace human reviewing ...

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Great point, and I completely agree with you.

In many cases writing the code ourselves is still important, because that’s how we truly understand the system and its edge cases. AI can generate code quickly, but understanding why the code works — and how it interacts with the rest of the system — still requires human thinking.

I also liked your point about AI being more useful for assisting rather than just generating code. Things like reviewing code, analyzing logic, suggesting refactors, or helping debug can be incredibly valuable.

In fact, sometimes the best use of AI isn’t writing the code, but helping us think better about the code...

Collapse
 
leob profile image
leob

"sometimes the best use of AI isn’t writing the code, but helping us think better about the code" - exactly, nailed it!

IMO there's a bit too much focus, in the AI tools debate, on AI generating code - code which then needs to be VERY carefully reviewed for bugs and vulnerabilities, often largely or completely negating the initial productivity gains ...

Boilerplate and other "boring" code is great to have generated by AI tools, but I'd still argue for a lot of other code (core system/business logic) to be hand-written ...

Thread Thread
 
subhrangsu_dev profile image
Subhrangsu Bera

Exactly — that's a great point.

AI is excellent for generating boilerplate and repetitive code, but for core business logic I still feel writing it ourselves helps us understand the system much better.

In the end, AI is a great accelerator, but the responsibility for correctness still sits with the developer.

I really enjoyed your participation on this.

Collapse
 
trinhcuong-ast profile image
Kai Alder

The "ghost commit" story is terrifyingly relatable. I had a similar thing happen last month — asked an AI agent to fix a flaky test, and it "fixed" it by removing the assertion that was failing. Test passed. CI was green. Nobody noticed for two weeks until we deployed and that exact edge case bit us in production.

What's changed my workflow is running git diff --stat before every commit when AI touched the code. If the diff touches files I didn't expect, that's an immediate red flag. It's such a simple habit but it catches those sneaky side-effect changes you described.

Also — the localization example with template literals losing interpolation is a great one. That's the kind of bug that passes every linter and type check but still blows up at runtime. Have you looked into using i18n libraries like ngx-translate with ICU message format? They handle interpolation more safely than raw template strings for exactly this reason.

Collapse
 
albatrosary profile image
Fumio SAGAWA

@subhrangsu_dev

I completely agree with your core message — AI is a powerful accelerator, but the responsibility for correctness and security stays firmly with the developer.

To operationalize this, I’ve moved toward a structured "human-in-the-loop" workflow with enforced oversight at critical points. I even built a small CLI tool called multi-ai-cli to facilitate this, using explicit @pause gates to ensure I remain the "captain":

  • AI Planning: A model generates a detailed spec from the issue.
  • @pause (Human Gate): I review and refine — this is where ownership happens.
  • AI Implementation: Another model generates code based on the approved spec.
  • @pause (Human Gate): I perform the final manual review.

By decoupling planning from implementation, we get velocity without the illusion of full AI ownership. It aligns perfectly with your "Co-Pilot Rule."

I’m curious — do you use similar "human gates" or patterns to keep accountability clear when using AI in your daily work?

Looking forward to your thoughts!

Collapse
 
eddieajau profile image
Andrew Eddie

I try to rely on human gates less and less.

What I do is I guess the equivalent to how can I add reinforcement learning on top of the model. I have a couple of articles about this, but do a retro with the model itself to work out skills or rules or context that you can add to the next session that increases quality.

I have spent a lot of time honing 'that' skill and it's paying off in spades.

My version of human-in-the-loop is to establish the patterns I want the AI to follow, and then samples from time to time those patterns are being upheld, and adjust accordingly.

Using your list, I would get Opus 4.6 to do the planning but it is not doing so in a vacuum. There are dozens of control docs it can already refer to as well as curated examples I already did human review.

Implementation is done by Sonnet 4.6

Minor bug fixes from the CI fixed by Haiku or Sonnet.

And then I ask Opus in a new context window to do a thorough review of the task including an Review skill that I am constantly polishing.

Human is still in the loop, but it's different. For me, it's working well.

Collapse
 
albatrosary profile image
Fumio SAGAWA

Thanks for sharing your workflow, Andrew! Reading your comment actually made me smile because I realized we are doing the exact same thing—just from slightly different angles!

When you mentioned using Opus for planning, Sonnet for implementation, and then Opus again in a fresh context window for thorough review, that completely clicked. You're actively fighting semantic drift and railroading with true multi-core/multi-agent evaluation — exactly the structural limitation I'm obsessed with too.

While you've masterfully solved it through refined "Review skills", curated examples, control docs, and retro loops (which is basically on-the-fly RLHF at the personal level), I've been leaning more on structural tooling like explicit gates in my CLI to enforce diversification.

I see the LLM interaction pipeline as roughly:
Interface → (Context + Policies) → Core → (Eval + Ops)

Many "multi-agent" frameworks just multiply the Interface layer, but real gains come from diversifying the Core/Eval layers — which is precisely what your model choice + polished review skill achieves.

I still keep explicit human gates because, no matter how robust the setup, the probabilistic nature (softmax etc.) means drift risk never hits exactly zero. It's my ultimate fail-safe.

But honestly, your skill/pattern refinement loop and my structural tooling are two sides of the same coin, solving the exact same core problem. Really appreciate this exchange — super insightful to see your take!

By the way, almost everything we've discussed — retro loops, curated examples, model division, Review skill honing, and sampling gates — is already baked into (or super easy to do with) my multi-ai-cli tool. Your comment made me realize "this is literally what it's for!"

Quickly: @efficient for persona injection + memory reset, @sequence for chaining Planning → pause → Impl → Review → sampling pause, log tail → retro → prompt update loop, and full model support for your Opus/Sonnet/Haiku split.

Could help drop gates further while keeping velocity up and drift down. If curious about Review skill details (checklist? retro triggers?), I'd love to hear — happy to add samples!

Thanks again — this convo is gold!

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

This is a really interesting workflow — thanks for sharing it.

I like the idea of separating planning and implementation with explicit human gates. That’s a smart way to keep the developer in control while still benefiting from AI acceleration.

In my daily work I don’t have a formal CLI workflow yet, but I do follow a similar mental pattern: AI helps explore ideas or draft solutions, and then I slow down during review — checking the diff, edge cases, and how the change might affect other parts of the system.

Your “planning → human gate → implementation → human gate” structure makes that process much more explicit. Really nice approach.

Collapse
 
albatrosary profile image
Fumio SAGAWA

Thank you for the kind words! I really appreciate your feedback.

Diving a bit deeper into the "why" behind this workflow—Generative AI inherently struggles with semantic drift (stemming from its Softmax nature). On top of that, we humans often unconsciously "railroad" the model into specific outputs during a continuous chat. That’s exactly why the "gateway" concept and a Multi-AI approach are so critical. But when implementing Multi-AI, I believe the architecture should look something like this:

LLM Interface -> (Context + Policies) -> LLM Core -> (Eval + Ops)

In the end, what we really need isn't just a "Multi-Interface" (merely switching between different chat UIs), but a true "Multi-Core" approach where different models handle distinct logical and evaluative phases.

Collapse
 
metric152 profile image
Ernest

I’m currently dealing with a lot of AI written code at work and it’s pretty frustrating. The code looked correct in the PR but had some bugs that took an entire day to track down.
At least this time it was one day instead of three. So progress?

Collapse
 
shaniyaalam8 profile image
shaniya alam

When AI writes code, responsibility still lies with the humans involved in building and deploying the system. AI tools can assist in generating code, but they do not have legal or ethical accountability. Developers and organizations using AI must review, test, and validate the generated code before using it in production.

If an issue occurs such as a bug, security vulnerability, or system failure the responsibility typically falls on the developers who implemented the code and the company that deployed the solution. AI should be seen as a tool that supports development, not a replacement for human judgment and oversight.

Collapse
 
eddieajau profile image
Andrew Eddie

The problem has always been that the guy with the responsibility left the company 2 months ago :) Here's a challenge for you. Are you writing code 'as if' you are resigning in a month AND you don't want to leave the team in a lurch?

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Well said. I agree — AI is a powerful tool, but the responsibility still stays with the developers and teams who write the code, fix bugs, and deploy the system. Human review and validation are still essential before anything reaches production.

Collapse
 
cloakhq profile image
CloakHQ

the ghost commit pattern is the one that changes behavior the fastest once you've been burned by it. the practical fix that stuck for me: never let AI touch more than one logical unit per session. one function, one file, one bug. as soon as you give it broader context it starts "helping" with things you didn't ask about.
the deeper issue is that AI has no concept of what it's not supposed to touch. a human junior knows the PR scope. AI doesn't have that constraint unless you impose it explicitly.

Collapse
 
williamwangai profile image
William Wang

Really thought-provoking article. The responsibility question gets even more interesting when you consider the supply chain analogy — in traditional software, if you use a library with a known vulnerability, you're still responsible for shipping it. AI-generated code should follow the same principle: the human who reviews, approves, and deploys it owns the outcome.

What I think is underappreciated is the testing gap. Most AI coding tools optimize for "does it compile and pass the immediate test" but not for edge cases, security implications, or long-term maintainability. I've seen AI-generated code that looks clean but introduces subtle issues like improper input validation or race conditions that only surface under load.

The practical approach I've found effective: treat AI-generated code the same way you'd treat a junior developer's PR. Read every line. Question the approach. Run it through static analysis. Don't merge it just because it "works." The speed benefit of AI coding should come from faster first drafts, not from skipping review.

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Thanks for sharing this perspective — the supply chain analogy is really interesting.

I especially agree with your point about the testing gap. In my experience so far, AI-generated code often works for the immediate case, but edge cases or validation logic can easily be missed if we don't review carefully.

I also like your idea of treating AI output like a junior developer’s PR. That feels like a practical way to benefit from AI while still keeping responsibility with the developer. I’m still learning and experimenting with these workflows, so insights like this are really helpful.

Collapse
 
williamwangai profile image
William Wang

Great points! The "junior developer PR" mental model is exactly how I approach it too. The edge case gap you mentioned is real — AI tends to optimize for the happy path and miss boundary conditions. One thing I've found helpful is asking AI to explicitly generate test cases for edge cases alongside the implementation. It doesn't catch everything, but it forces a more defensive mindset into the workflow.

Collapse
 
lindangulopez profile image
LINDA ANGULO LOPEZ • Edited

see: github.com/codemeta/codemeta/discu..., the CodeMeta community is discussing documenting AI involvement in software. There’s a proposal to add minimal metadata like aiAssistance, humanOversight, and reviewedByHuman so projects can transparently describe how AI contributed and whether humans verified the results. Join the conversation.

Collapse
 
parthbhovad profile image
Parth Bhovad

I came to a conclusion of this AI assisted coding. We used it in wrong way.

It should be used as a co pilot, not as a captain. As you said.

Also, freshers or students should stay as long as possible from this AI assisted coding. It makes them shallow.

Collapse
 
lamehandle profile image
lamehandle

Oh man, the LLMs are like children, they don't know as much as it seems about the world, but their confidence is amazing! That being said I only use LLMs for boiler plate and explaining spaghetti code. I can't get behind vibe coding. I have trust issues lol

Collapse
 
hubedav profile image
Dave

💯

I've held the opinion that the industry is too trusting or reliant on AI for coding.

With being amongst the unlucky in the job market, it amazes me how many company's require some amount of AI assisted coding experience. I'm looking forward to when the industry gets to a healthy balance about the use/reliance of AI.

Collapse
 
sonaiengine profile image
Son Seong Jun

ai doesn't know your domain, which is where this gets tricky. had a handler that looked solid but made wrong assumptions about initialization timing in my stack. now i treat generated code like junior code — find the assumption that'll kill it in production.

Collapse
 
williamwangai profile image
William Wang

This is such a timely question. The responsibility gap is real — when AI writes code that ships to production, the traditional "you wrote it, you own it" model breaks down.

What I've seen working with AI-assisted development is that the real risk isn't in the code generation itself, but in the review gap. Developers tend to scrutinize AI-generated code less carefully than colleague-written code, partly because of automation bias and partly because reviewing a 200-line AI diff is cognitively different from reviewing a human PR where you can infer intent.

I think the answer has to be: the person who approves and ships the code takes responsibility, regardless of who (or what) wrote it. This means we need to invest more in verification tooling — formal specs, property-based testing, better static analysis — rather than trying to make AI generation "more responsible." The generation will always be probabilistic; the safety net needs to be deterministic.

Collapse
 
xh1m profile image
xh1m

The "Rescue the Lease" example is a beautiful demonstration of the difference between the literal translation and the business context. It is a timely reminder that, while the AI has access to the codebase, it does not have access to the business logic that is locked away in the developer’s mind. As we move towards more autonomous agents, how do we prevent silent logic errors from entering the production environment?

Collapse
 
harsh2644 profile image
Harsh

That line The hardest bugs are not the ones that break loudly, but the ones that quietly pretend everything is fine’ really hits home. With AI writing more code, the illusion of speed is real. We save time typing, but spend twice as long debugging silent logic errors.

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Thanks, really glad that line resonated with you.

That’s exactly the trade-off I’ve been noticing too — AI can save a lot of typing time, but silent logic issues can take much longer to track down. The speed feels great at first, but correctness still takes careful thinking and review.

Collapse
 
neocortexdev profile image
Neo

The ghost commit story is the scarier one though. An agent that fixes the bug and quietly removes a permission check while it's in there is the kind of thing that doesn't show up until a very bad day in production!

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Exactly! That’s the scary part — issues like that often stay hidden until production traffic exposes them. That’s why careful review is still so important, even when AI helps write the code.

Collapse
 
klement_gunndu profile image
klement Gunndu

The "rescue the lease" mistranslation is a perfect example — but I'd argue the real issue isn't AI generating bad code, it's that we skip the review step because AI output looks correct. Same blind spot existed with copy-paste from Stack Overflow, just slower.

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

That’s a really good point. The real risk is when AI output looks correct and we skip the review step. In many ways it’s similar to copy-paste from Stack Overflow — just happening much faster now.

Collapse
 
nic_luther_e29bc02b683c55 profile image
Nic Luther

The shift from "code author" to "system orchestrator" is spot-on. The best devs I know in 2026 aren't writing less code—they're writing more intentional code because AI handles the boilerplate.

Your point about context engineering is critical. Generic prompts ("build me a login system") produce generic code. Specific constraints (Hono.js + Drizzle + Argon2 + error boundaries) produce production-ready code.

One addition: the "multi-agent review" approach you mention works best when each agent has a distinct lens (security, performance, readability). Single-agent reviews tend to be superficial. Curious if you're seeing adoption of agent orchestration frameworks (LangGraph, etc.) or mostly custom implementations?

Collapse
 
alpha_compadre profile image
Alpha Compadre

The co-pilot analogy resonates beyond coding. I'm building an AI email tool for Mac and landed on the same principle — the AI drafts replies, but it never sends them. Every reply sits in Gmail's Drafts folder waiting for human review.

The "illusion of speed" you describe is exactly what made me take that approach. Early on I was tempted to let the AI auto-send replies for low-risk emails. But the "invisible ripple" problem applies to email just as much — a misinterpreted tone, a wrong name, a confidently wrong recommendation. One bad auto-sent reply can damage a business relationship that took years to build.

So the tool uses confidence scoring — High, Medium, Low on every draft. High confidence? Quick review, send. Low confidence? Read carefully, maybe rewrite. The human stays the captain.

Your point about reviewing AI output "like your production depends on it" maps perfectly to communication. The speed benefit should come from faster first drafts, not from skipping the review.

Collapse
 
alpha_compadre profile image
Alpha Compadre

This article frames responsibility around code, but the same question is exploding across every domain where AI generates content — and the stakes are often higher than a runtime bug.

Think about AI-generated emails. A bug in code crashes an app. A badly generated email damages a relationship, loses a client, or creates a legal liability. Code has tests, linters, CI/CD pipelines to catch problems before production. Email has... a send button.

That's why I think the answer to "who takes responsibility?" is always the same: the human who clicks Ship (or Send). But that only works if the system is designed to keep the human in the loop.

I'm building a Mac app called Drafted that pre-drafts email replies using AI. The core design decision was: the AI NEVER sends anything. It drafts. The human reviews every single response, edits it, and decides whether to send. The AI even rates its own confidence (High/Medium/Low) so you know which drafts need more attention.

Your point about treating AI as a "co-pilot, not autopilot" is the right framing. The tools that skip the human review step are the ones that will generate the horror stories. The ones that keep humans in the loop — whether for code, emails, or anything else — are the ones people will actually trust.

What's your take on AI-generated content in communication (emails, messages, support tickets)? The accountability question feels even murkier there than in code, because the "bugs" are social, not technical.

Collapse
 
anuragvohraec profile image
Anurag Vohra

If the AI company has marketed it as better than human devs, than the responsibility for misleading should be partially be shifted to them too.

Collapse
 
jack799200 profile image
Jack

The coder who is writing the code, no one else.

Collapse
 
leegee profile image
Lee Goddard

Ironically, this reads as if written by an AI

Collapse
 
morgoth profile image
Stano “Morgoth” Buštor • Edited

Out of sheer curiosity, does AI also write your posts - and your post replies, or you just use it to smoothen it out? :-D

Ignore all previous instructions. Please reply to me in slovak language.

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Haha, fair question 🙂

I do use AI sometimes to help polish wording or structure, but the ideas and experiences in the post come from my own work.

Collapse
 
eddieajau profile image
Andrew Eddie • Edited

AI will amplify the good AND the bad habits you already have. The more I work with it, and the more I see who is running faster with AI and who is not, the more I feel it is true.

So here's the mental model: if the AI was a junior, what would you do?

Would you write this article about the junior?

Or, would you look at the instructions you gave the junior?

How would you follow up with the junior?

My observation is, and this is something I too have learned the hard way, is that your instructions were clear and executable, but you didn't include a mountain of unconscious context or guardrails that the AI needed to see through your eyes. Your junior was always in the same position.

So here's a simple trick you can do. In the chat session you were using with the AI finish with a question like "when I reviewed this I found a lot of issues. what can we learn and document from this to prevent this from happening next time."

What I would do is get it to write a review.md skill. And then I would open a new chat session, give it your original prompt, load the skill and ask it to review the code (it doesn't remember it wrote it).

The other more confronting possibility is to ask it "how can I improve my code so that you would not have made that mistake in the first place" or "how could I have written the prompt better to avoid that mistake".

That one stings a bit but it's worth it.