DEV Community

Cover image for When AI Writes the Code… Who Takes Responsibility?
Subhrangsu Bera
Subhrangsu Bera

Posted on

When AI Writes the Code… Who Takes Responsibility?

Late one night in Kolkata, a developer sat staring at a glowing screen.

That developer was me.

Two years into my journey as an Angular developer, I’ve learned something interesting about software development:

The hardest bugs are not the ones that break loudly.

They’re the ones that quietly pretend everything is fine.

And lately, with AI tools everywhere, I’ve started noticing a strange phenomenon in modern development — the illusion of speed.

Let me tell you a story.

Chapter 1: The Magical Button

I’m currently working on a Property Management System (PMS) SaaS platform.

If you've ever worked on SaaS products, you know one thing:

Data integrity is sacred.

If a small bug appears in a personal project, it's annoying.

If a small bug appears in a SaaS system managing properties, tenants, rent, and financial data…

…it can become a very expensive mistake.

Recently we were tackling a common SaaS problem:

Localization.

Our platform needs to speak multiple languages so property managers and tenants can use it comfortably.

Which sounds like the perfect job for AI.

So one evening I opened my AI chat inside VS Code and typed a very confident command:

“Find all SweetAlerts in the project, extract all user-facing strings into a translation JSON file, and bind the keys back for localization.”

I hit enter.

Five seconds later.

The AI delivered a full solution.

Files created.
JSON structured.
Bindings written.

It looked… perfect.

Like a magician just pulled a rabbit out of a TypeScript file.

But then a thought hit me:

If I didn’t write this code… do I actually understand it?

So I did something boring.

I reviewed it.

Line by line.

That’s when the cracks appeared.

Chapter 2: The “Almost Right” Problem

AI is incredibly good at writing code that looks correct.

But SaaS systems don’t run on looks correct.

They run on exactly correct.

While reviewing the AI's work, I found three small but dangerous problems.

1️⃣ The Context Problem

One alert originally meant:

“Save Lease Agreement.”

The AI translated it into a word that technically meant “Save.”

But in the context of property management…

…it sounded closer to “Rescue the Lease.”

Which is a little dramatic.

Imagine clicking a button and seeing:

“Lease successfully rescued.”

Who kidnapped the lease?

2️⃣ The Template Literal Disaster

Somewhere inside a SweetAlert message was this:

`Rent payment of ${amount} received successfully`
Enter fullscreen mode Exit fullscreen mode

The AI accidentally modified the binding, and it became something like:

"rent_received_message"
Enter fullscreen mode Exit fullscreen mode

But it lost the variable interpolation during the refactor.

Result?

The alert would show:

Rent payment of undefined received successfully.

Congratulations.

The tenant paid undefined rupees.

3️⃣ The Invisible Alert

There was a specific edge-case alert for Overdue Rent.

The AI never touched it.

Why?

Because the AI only had visibility into the files included in the prompt or editor context.

Which means the system would be localized…

except for one critical financial alert.

The worst type of bug.

A silent one.

Chapter 3: The Surprising Realization

After fixing everything, I leaned back and realized something slightly ironic.

Reviewing the AI’s work took almost as long as writing the code myself.

AI saved typing.

But it didn't save thinking.

And in professional SaaS systems, thinking is the expensive part.

Chapter 4: The Ghost Commit

But the bigger lesson came from a colleague's experience.

He was debugging a small issue.

A simple one.

A UI bug.

He used an AI coding agent to fix it.

The AI did exactly what it promised.

The bug disappeared.

Mission accomplished.

Or so it seemed.

What the agent didn't mention was that it also:

  • Modified code in three other files
  • Refactored a utility function
  • “Cleaned up” a permission check

None of which were part of the original task.

But the AI reported:

✅ Issue fixed successfully

My colleague trusted it.

He pushed the code.

Now imagine this happening in a SaaS dashboard.

You could suddenly get:

Data Corruption

Property tax calculations become wrong.

Security Vulnerability

A permission check disappears.

The Butterfly Effect

An analytics chart breaks three pages away.

All because an AI agent tried to be helpful.

Chapter 5: The Truth About AI in Development

AI tools are incredible.

They can:

  • write boilerplate
  • generate structures
  • speed up repetitive tasks
  • explain complex code

But they have one big limitation.

They lack contextual understanding and ownership of system outcomes.

When production breaks:

The AI doesn’t get paged.

The AI doesn’t get blamed.

The AI doesn’t sit in the emergency meeting.

You do.

Chapter 6: The Co-Pilot Rule

So here’s the rule I now follow.

AI is not the captain.

AI is the co-pilot.

A co-pilot can:

  • Suggest
  • Assist
  • Navigate

But the captain still flies the plane.

Because when turbulence hits…

someone needs to understand the entire system.

The Invisible Ripple

Every line of code in a SaaS product creates ripples.

A small change in a localization string can affect UI logic.

A small refactor can break a reporting module.

A tiny missing variable can confuse thousands of users.

That’s the invisible ripple of software development.

AI can generate the change.

But developers must understand how far the ripples travel.

Final Thought

We shouldn’t fear AI.

But we should respect the complexity of the systems we build.

Because in real-world development:

A “fast” push that breaks the dashboard
is the slowest way to build a product.

If you made it this far, thanks for reading.

And if you're using AI to write code (like most of us are now)…

Just remember:

Trust the AI's assistance.
But review the code like your production depends on it.
😄

Have you ever caught a bug introduced by AI-generated code?

I'd love to hear your experience.

Top comments (51)

Collapse
 
stephanie_grogan_d7ff10ce profile image
Agape.Cloud.UCS

Great write-up. The part that stood out to me is the “illusion of speed.” AI can generate code incredibly fast, but correctness in a SaaS system is rarely about writing lines of code — it’s about understanding how a change ripples through the whole system.

One thing I’ve noticed is that AI often optimizes for completing the task it was asked, not for preserving the system state. That’s where subtle bugs creep in — like confirmations being generated without the underlying action actually succeeding.

It reinforces the co-pilot analogy. AI can accelerate development, but the responsibility for system integrity, edge cases, and verification still sits with the developer. In complex systems, reviewing and understanding the ripple effects is still the real work.

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Great observation... especially the part about "AI optimizing for completing the task rather than preserving the system state" — that’s exactly where subtle bugs can sneak in.

I also like how you described the ripple effect across the system. In complex applications, understanding those interactions is often harder than writing the code itself.

Really appreciate the thoughtful comment!

Collapse
 
codingwithjiro profile image
Elmar Chavez

I agree with this!

Collapse
 
ingosteinke profile image
Ingo Steinke, web developer

What the agent didn't mention was that it also ...

and this wasn't obvious in the pull request change set? Who reviewed it. Wait ... another AI?

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Good question 🙂

The changes were technically visible in the PR diff, but the issue was that the agent modified files that weren’t directly related to the bug being fixed. At a quick glance the fix looked correct, and the extra changes were subtle enough to slip through without a deeper review.

That’s exactly the point I was trying to highlight — AI can solve the immediate problem, but it can also introduce small side effects that are easy to miss if we trust the output too quickly.

Collapse
 
leegee profile image
Lee Goddard

"modified files that weren’t directly related to the bug being fixed"

In that case the PR must generally be rejected, and now you know why.

Collapse
 
leob profile image
leob

Honestly I think in some cases we should still choose to write the code ourselves - that's a point I don't see made often, but I'd like to make it ...

In many cases the greatest value of AI might not even be in writing the code, but in assisting with other tasks ...

I mean, it's not a secret that developers, on most projects, spend no more than around 15-20% of their time on "coding" - the rest is spent on thinking, planning, analyzing, designing, testing, debugging, and troubleshooting - what if we'd use AI more to assist with those other tasks?

Interestingly, I've worked on a project where AI wasn't principally used to generate code, but to review it! And the AI "bot" was pretty good at it - although it couldn't completely replace human reviewing ...

Collapse
 
subhrangsu90 profile image
Subhrangsu Bera

Great point, and I completely agree with you.

In many cases writing the code ourselves is still important, because that’s how we truly understand the system and its edge cases. AI can generate code quickly, but understanding why the code works — and how it interacts with the rest of the system — still requires human thinking.

I also liked your point about AI being more useful for assisting rather than just generating code. Things like reviewing code, analyzing logic, suggesting refactors, or helping debug can be incredibly valuable.

In fact, sometimes the best use of AI isn’t writing the code, but helping us think better about the code...

Collapse
 
leob profile image
leob

"sometimes the best use of AI isn’t writing the code, but helping us think better about the code" - exactly, nailed it!

IMO there's a bit too much focus, in the AI tools debate, on AI generating code - code which then needs to be VERY carefully reviewed for bugs and vulnerabilities, often largely or completely negating the initial productivity gains ...

Boilerplate and other "boring" code is great to have generated by AI tools, but I'd still argue for a lot of other code (core system/business logic) to be hand-written ...

Thread Thread
 
subhrangsu90 profile image
Subhrangsu Bera

Exactly — that's a great point.

AI is excellent for generating boilerplate and repetitive code, but for core business logic I still feel writing it ourselves helps us understand the system much better.

In the end, AI is a great accelerator, but the responsibility for correctness still sits with the developer.

I really enjoyed your participation on this.

Collapse
 
trinhcuong-ast profile image
Kai Alder

The "ghost commit" story is terrifyingly relatable. I had a similar thing happen last month — asked an AI agent to fix a flaky test, and it "fixed" it by removing the assertion that was failing. Test passed. CI was green. Nobody noticed for two weeks until we deployed and that exact edge case bit us in production.

What's changed my workflow is running git diff --stat before every commit when AI touched the code. If the diff touches files I didn't expect, that's an immediate red flag. It's such a simple habit but it catches those sneaky side-effect changes you described.

Also — the localization example with template literals losing interpolation is a great one. That's the kind of bug that passes every linter and type check but still blows up at runtime. Have you looked into using i18n libraries like ngx-translate with ICU message format? They handle interpolation more safely than raw template strings for exactly this reason.

Collapse
 
albatrosary profile image
Fumio SAGAWA

@subhrangsu_dev

I completely agree with your core message — AI is a powerful accelerator, but the responsibility for correctness and security stays firmly with the developer.

To operationalize this, I’ve moved toward a structured "human-in-the-loop" workflow with enforced oversight at critical points. I even built a small CLI tool called multi-ai-cli to facilitate this, using explicit @pause gates to ensure I remain the "captain":

  • AI Planning: A model generates a detailed spec from the issue.
  • @pause (Human Gate): I review and refine — this is where ownership happens.
  • AI Implementation: Another model generates code based on the approved spec.
  • @pause (Human Gate): I perform the final manual review.

By decoupling planning from implementation, we get velocity without the illusion of full AI ownership. It aligns perfectly with your "Co-Pilot Rule."

I’m curious — do you use similar "human gates" or patterns to keep accountability clear when using AI in your daily work?

Looking forward to your thoughts!

Collapse
 
eddieajau profile image
Andrew Eddie

I try to rely on human gates less and less.

What I do is I guess the equivalent to how can I add reinforcement learning on top of the model. I have a couple of articles about this, but do a retro with the model itself to work out skills or rules or context that you can add to the next session that increases quality.

I have spent a lot of time honing 'that' skill and it's paying off in spades.

My version of human-in-the-loop is to establish the patterns I want the AI to follow, and then samples from time to time those patterns are being upheld, and adjust accordingly.

Using your list, I would get Opus 4.6 to do the planning but it is not doing so in a vacuum. There are dozens of control docs it can already refer to as well as curated examples I already did human review.

Implementation is done by Sonnet 4.6

Minor bug fixes from the CI fixed by Haiku or Sonnet.

And then I ask Opus in a new context window to do a thorough review of the task including an Review skill that I am constantly polishing.

Human is still in the loop, but it's different. For me, it's working well.

Collapse
 
albatrosary profile image
Fumio SAGAWA

Thanks for sharing your workflow, Andrew! Reading your comment actually made me smile because I realized we are doing the exact same thing—just from slightly different angles!

When you mentioned using Opus for planning, Sonnet for implementation, and then Opus again in a fresh context window for thorough review, that completely clicked. You're actively fighting semantic drift and railroading with true multi-core/multi-agent evaluation — exactly the structural limitation I'm obsessed with too.

While you've masterfully solved it through refined "Review skills", curated examples, control docs, and retro loops (which is basically on-the-fly RLHF at the personal level), I've been leaning more on structural tooling like explicit gates in my CLI to enforce diversification.

I see the LLM interaction pipeline as roughly:
Interface → (Context + Policies) → Core → (Eval + Ops)

Many "multi-agent" frameworks just multiply the Interface layer, but real gains come from diversifying the Core/Eval layers — which is precisely what your model choice + polished review skill achieves.

I still keep explicit human gates because, no matter how robust the setup, the probabilistic nature (softmax etc.) means drift risk never hits exactly zero. It's my ultimate fail-safe.

But honestly, your skill/pattern refinement loop and my structural tooling are two sides of the same coin, solving the exact same core problem. Really appreciate this exchange — super insightful to see your take!

By the way, almost everything we've discussed — retro loops, curated examples, model division, Review skill honing, and sampling gates — is already baked into (or super easy to do with) my multi-ai-cli tool. Your comment made me realize "this is literally what it's for!"

Quickly: @efficient for persona injection + memory reset, @sequence for chaining Planning → pause → Impl → Review → sampling pause, log tail → retro → prompt update loop, and full model support for your Opus/Sonnet/Haiku split.

Could help drop gates further while keeping velocity up and drift down. If curious about Review skill details (checklist? retro triggers?), I'd love to hear — happy to add samples!

Thanks again — this convo is gold!

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

This is a really interesting workflow — thanks for sharing it.

I like the idea of separating planning and implementation with explicit human gates. That’s a smart way to keep the developer in control while still benefiting from AI acceleration.

In my daily work I don’t have a formal CLI workflow yet, but I do follow a similar mental pattern: AI helps explore ideas or draft solutions, and then I slow down during review — checking the diff, edge cases, and how the change might affect other parts of the system.

Your “planning → human gate → implementation → human gate” structure makes that process much more explicit. Really nice approach.

Collapse
 
albatrosary profile image
Fumio SAGAWA

Thank you for the kind words! I really appreciate your feedback.

Diving a bit deeper into the "why" behind this workflow—Generative AI inherently struggles with semantic drift (stemming from its Softmax nature). On top of that, we humans often unconsciously "railroad" the model into specific outputs during a continuous chat. That’s exactly why the "gateway" concept and a Multi-AI approach are so critical. But when implementing Multi-AI, I believe the architecture should look something like this:

LLM Interface -> (Context + Policies) -> LLM Core -> (Eval + Ops)

In the end, what we really need isn't just a "Multi-Interface" (merely switching between different chat UIs), but a true "Multi-Core" approach where different models handle distinct logical and evaluative phases.

Collapse
 
metric152 profile image
Ernest

I’m currently dealing with a lot of AI written code at work and it’s pretty frustrating. The code looked correct in the PR but had some bugs that took an entire day to track down.
At least this time it was one day instead of three. So progress?

Collapse
 
shaniyaalam8 profile image
shaniya alam

When AI writes code, responsibility still lies with the humans involved in building and deploying the system. AI tools can assist in generating code, but they do not have legal or ethical accountability. Developers and organizations using AI must review, test, and validate the generated code before using it in production.

If an issue occurs such as a bug, security vulnerability, or system failure the responsibility typically falls on the developers who implemented the code and the company that deployed the solution. AI should be seen as a tool that supports development, not a replacement for human judgment and oversight.

Collapse
 
eddieajau profile image
Andrew Eddie

The problem has always been that the guy with the responsibility left the company 2 months ago :) Here's a challenge for you. Are you writing code 'as if' you are resigning in a month AND you don't want to leave the team in a lurch?

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Well said. I agree — AI is a powerful tool, but the responsibility still stays with the developers and teams who write the code, fix bugs, and deploy the system. Human review and validation are still essential before anything reaches production.

Collapse
 
cloakhq profile image
CloakHQ

the ghost commit pattern is the one that changes behavior the fastest once you've been burned by it. the practical fix that stuck for me: never let AI touch more than one logical unit per session. one function, one file, one bug. as soon as you give it broader context it starts "helping" with things you didn't ask about.
the deeper issue is that AI has no concept of what it's not supposed to touch. a human junior knows the PR scope. AI doesn't have that constraint unless you impose it explicitly.

Collapse
 
williamwangai profile image
William Wang

Really thought-provoking article. The responsibility question gets even more interesting when you consider the supply chain analogy — in traditional software, if you use a library with a known vulnerability, you're still responsible for shipping it. AI-generated code should follow the same principle: the human who reviews, approves, and deploys it owns the outcome.

What I think is underappreciated is the testing gap. Most AI coding tools optimize for "does it compile and pass the immediate test" but not for edge cases, security implications, or long-term maintainability. I've seen AI-generated code that looks clean but introduces subtle issues like improper input validation or race conditions that only surface under load.

The practical approach I've found effective: treat AI-generated code the same way you'd treat a junior developer's PR. Read every line. Question the approach. Run it through static analysis. Don't merge it just because it "works." The speed benefit of AI coding should come from faster first drafts, not from skipping review.

Collapse
 
subhrangsu_dev profile image
Subhrangsu Bera

Thanks for sharing this perspective — the supply chain analogy is really interesting.

I especially agree with your point about the testing gap. In my experience so far, AI-generated code often works for the immediate case, but edge cases or validation logic can easily be missed if we don't review carefully.

I also like your idea of treating AI output like a junior developer’s PR. That feels like a practical way to benefit from AI while still keeping responsibility with the developer. I’m still learning and experimenting with these workflows, so insights like this are really helpful.

Collapse
 
williamwangai profile image
William Wang

Great points! The "junior developer PR" mental model is exactly how I approach it too. The edge case gap you mentioned is real — AI tends to optimize for the happy path and miss boundary conditions. One thing I've found helpful is asking AI to explicitly generate test cases for edge cases alongside the implementation. It doesn't catch everything, but it forces a more defensive mindset into the workflow.

Collapse
 
lindangulopez profile image
LINDA ANGULO LOPEZ • Edited

see: github.com/codemeta/codemeta/discu..., the CodeMeta community is discussing documenting AI involvement in software. There’s a proposal to add minimal metadata like aiAssistance, humanOversight, and reviewedByHuman so projects can transparently describe how AI contributed and whether humans verified the results. Join the conversation.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.