DEV Community

Cover image for Prompt Engineering Won’t Fix Your Architecture
Art light
Art light

Posted on • Edited on

Prompt Engineering Won’t Fix Your Architecture

Every few years, our industry rediscovers an old truth and pretends it’s new.

Clean code.
Microservices.
DevOps.
Now: prompt engineering.

Suddenly, people who shipped a single CRUD app in 2019 are tweeting things like:

“The problem isn’t your system. It’s your prompts.”
Enter fullscreen mode Exit fullscreen mode

No.
The problem is still your system.

Prompt engineering is not a silver bullet.
It’s a very expensive bandaid applied to architectural wounds that were already infected.

The Fantasy

The fantasy goes like this:

  • You have a messy backend
  • Inconsistent APIs
  • No real domain boundaries
  • Business logic scattered across controllers, cron jobs, and Slack messages

But then…

✨ You add AI ✨
✨ You refine the prompt ✨
✨ You add “You are a senior engineer” at the top ✨

And magically, intelligence flows through your system like electricity.

Except that’s not how software works.
That’s not how anything works.

Reality Check: AI Enters Your System

An LLM doesn’t see your product.

It sees:

  • Whatever JSON you remembered to pass
  • Whatever context fit into a token window
  • Whatever half-written schema someone added at 2am

So when your AI “makes a bad decision,” it’s usually doing exactly what you asked — inside a broken abstraction.

That’s not hallucination.
That’s obedience.

Prompt Engineering vs. Structural Problems

Let’s be honest about what prompts are being used to hide:

❌ Missing domain boundaries

“Please carefully infer the user’s intent.”

❌ Inconsistent data models

“Use your best judgment if fields are missing.”

❌ No source of truth

“If multiple values conflict, choose the most reasonable one.”

❌ Business logic in five places

“Follow company policy (described below in 800 tokens).”

This isn’t AI intelligence.
This is outsourcing architectural decisions to autocomplete.

The Distributed Systems Joke (That Isn’t a Joke)

When you build AI agents, you quickly learn something uncomfortable:

AI agents are just distributed systems that can talk back.
Enter fullscreen mode Exit fullscreen mode

They have:

  • State (that you pretend is stateless)
  • Latency (that you ignore)
  • Failure modes (that logs can’t explain)
  • Side effects (that happen twice)

So when your agent:

  • double-charges a user
  • retries an action incorrectly
  • or confidently does the wrong thing

That’s not “AI being unpredictable.”

That’s classic distributed systems behavior, now narrated in natural language.

“But We Have Guardrails”

Everyone says this.

Guardrails are great.
So are seatbelts.

But seatbelts don’t fix:

  • a missing steering wheel
  • an engine held together by YAML
  • or a roadmap decided by vibes

Most guardrails today are just:

  • more prompts
  • more conditionals
  • more “if unsure, ask the user”

At some point, you’re not building a system.
You’re negotiating with it.

The Unpopular Truth

AI doesn’t replace architecture.

It amplifies it.

Good architecture:

  • makes AI boring
  • predictable
  • reliable
    Bad architecture:

  • makes AI look magical

  • until production

  • until scale

  • until cost

  • until users do real things

That’s why AI demos look amazing and AI products feel… fragile.

Why This Keeps Happening

Because prompt engineering is:

  • fast
  • visible
  • tweetable

Architecture is:

  • slow
  • invisible
  • only noticed when it fails

So we optimize for prompts.
We ignore boundaries.
We ship “intelligence” on top of entropy.

And then we blame the model.

The Senior Dev Take

If your AI system needs:

  • a 2,000-token prompt to explain business rules
  • constant retries to “get it right”
  • human review for every important decision

You don’t have an AI problem.

You have an architecture problem that now speaks English.

Final Thought

Prompt engineering won’t fix your architecture.

But it will expose it.
Loudly.
In production.
With confidence.

And honestly?

That might be the most useful thing AI has done for us so far.😎

Top comments (153)

Collapse
 
leob profile image
leob • Edited

Haha this one made my day:

"You add “You are a senior engineer” at the top"

:D :D :D

Collapse
 
art_light profile image
Art light

Haha, that’s hilarious 😄 You’ve got a great sense of humor, and I love how you called that out so playfully—it genuinely made my day too!

Collapse
 
leob profile image
leob • Edited

Yeah it's really funny - you just tell AI, in your prompt, what "role" it should assume - and magically it will then acquire those super powers - it's that easy, my friend ! ;-)

Thread Thread
 
art_light profile image
Art light

Haha, exactly 😄 You explained that really well — it’s a great mix of humor and insight, and it makes the idea feel both simple and powerful at the same time.

Thread Thread
 
leob profile image
leob • Edited

Haha yes it reflects how some people (yes, devs ...) expect AI to work - like you say "hocus pocus" and the magic happens, no "skillz" or effort required ... anyway, have a nice day!

Thread Thread
 
art_light profile image
Art light

I love how you called that out—your perspective really shows a deep understanding of both AI and the craft behind it.

Thread Thread
 
art_light profile image
Art light

Hey, could we discuss more details?

Thread Thread
 
leob profile image
leob

Which details? I was just making a joke with a serious undertone, but the real insights were in your article!

Thread Thread
 
art_light profile image
Art light

Haha, I love that—your joke landed perfectly! I really appreciate your thoughtful read and the way you picked up on the deeper insights.

Thread Thread
 
leob profile image
leob

Fascinating the whole AI coding thing, many great articles on the subject on dev.to, yours was yet another gem! Are we experiencing the "fourth (fifth?) industrial revolution" right now, what do you think?

Thread Thread
 
art_light profile image
Art light

Thank you — I’m glad it resonated. I do think we’re in the middle of a real shift, less about AI replacing developers and more about changing how we think, design, and validate systems. The biggest revolution, in my view, is moving judgment and responsibility higher up the stack, where senior engineering decisions matter more than ever.

Thread Thread
 
leob profile image
leob

Spot on, agreeing 100% ...

Thread Thread
 
art_light profile image
Art light

Thanks.😎

Thread Thread
 
leob profile image
leob • Edited

Yeah and thanks to your article I finally understand why AI isn't working for some devs, and why they're not getting the results they were expecting - they just forgot to add “You are a senior engineer” at the top of their prompts!

Thread Thread
 
art_light profile image
Art light

Haha, I’m glad the article helped clarify that 😊
It’s funny, but it really highlights how a small shift in framing can unlock much better results—great insight on your part!

Collapse
 
booleanhunter profile image
Ashwin Hariharan

Totally agree! Prompt engineering isn't a substitute for good architecture. It feels like a quick fix but often hides design debt. I actually talked about the same recently exploring the same idea with some examples:

Collapse
 
art_light profile image
Art light

Good perspective.
Treating agents, tools, and models as infrastructure behind clean domain boundaries is exactly what makes AI features scalable, testable, and replaceable in real production systems.

Collapse
 
etienneburdet profile image
Etienne Burdet

It kinda is a prompt engineering problem though. If you're stuck in a "fix, fix, fix, here are the logs, fix" loop, yes indeed. But as you say, might be for the better—although it's not because Claude do it that it's undoable either.

But you can also use LLM's to answer tons of a questions at once, compare with stuff found on the net etc. and make better, more informed architectural decisions. I can also explore alternatives super quickly etc.

Collapse
 
art_light profile image
Art light

That’s a really solid perspective — I like how you’re framing LLMs as a thinking partner rather than just a “fix-the-bug” tool. I agree with you that the real value shows up when they’re used to explore options, compare ideas, and support architectural decisions at a higher level. That approach is exactly what makes the workflow more effective and interesting, and it’s something I’m genuinely keen to lean into more.

Collapse
 
micheal_angelo_41cea4e81a profile image
Micheal Angelo

If you ask an LLM to do too many things at once, you’re creating a chain-of-thought dependency.
For example, if A = B + C and B itself comes from a function, the model must first reason about B and then compute A. Any hallucination upstream cascades downstream.
In real systems, absolute certainty comes from architecture, not prompts. Offload deterministic logic (functions, calculations, validations) outside the LLM and let the model handle only what it’s good at.
This avoids cascading failures and mirrors what real-world projects face every day.
Great point raised here.

Collapse
 
art_light profile image
Art light

Absolutely! 👏 You explained that so clearly—your analogy to real-world systems makes it super relatable. It’s impressive how you highlight the balance between deterministic logic and LLM reasoning so practically.

Collapse
 
micheal_angelo_41cea4e81a profile image
Micheal Angelo

The same thing happens in real life too. On one bad day, it feels like all bad things happen at once. As the Joker said, “It only takes one bad day to turn a good man bad.”

Thread Thread
 
art_light profile image
Art light

That’s a powerful observation, you captured something deeply human there — reflective, honest, and very relatable.

Thread Thread
 
micheal_angelo_41cea4e81a profile image
Micheal Angelo

The same thing happens in networking as well. If a host does not know the destination MAC address, it initiates an ARP request. This ARP frame is broadcast across the local network. When the destination responds, the sender updates its ARP cache with the resolved MAC address and proceeds with frame delivery. What appears to be a complex problem is effectively decomposed into two simpler steps: address resolution followed by data transmission.

Thread Thread
 
art_light profile image
Art light

Exactly—ARP cleanly separates concerns by resolving identity first and then handling delivery, which keeps the data path simple and efficient. This decomposition is a recurring pattern in networking system design that improves scalability and reliability.

Collapse
 
deltax profile image
deltax

You’re right — prompt engineering doesn’t fix architecture.
It reveals it.

What most teams call “AI failure” is just latent system debt finally speaking in plain language. When an LLM “makes a bad decision,” it’s usually executing faithfully inside a broken abstraction: fragmented domains, no single source of truth, and business rules smeared across time and tooling.

Good architecture makes AI boring.
Bad architecture makes AI look magical — until scale, cost, or reality hits.

If your system needs ever-longer prompts, retries, and human patching to stay sane, you don’t have an AI problem. You have an architecture problem that now talks back.

The uncomfortable part: AI doesn’t replace design.
It removes excuses.

Collapse
 
art_light profile image
Art light

Exactly—LLMs act as architectural amplifiers, not problem solvers: they surface hidden coupling, unclear boundaries, and missing invariants with brutal honesty. When intelligence appears “unreliable,” it’s usually the system revealing that it never knew what it stood for in the first place.

Collapse
 
art_light profile image
Art light

Exactly — AI surfaces weaknesses you already have. Robust architecture minimizes surprises; weak architecture just makes LLM quirks look like magic until reality bites.

Collapse
 
hugaidas profile image
Victoria

Agree, bullshit in => bullshit out, in the badly structured code (initial context, architecture) AI is pretty much useless and it learns from the bad context, it won't suggest any improvements that can make its and devs life easier. I had a problem explaining that AI vibe-coded apps should not be used as a foundation for a full scale prod app, but it is quite a challenge, because no one sees the problem when it ✨ just works ✨

Collapse
 
art_light profile image
Art light

Well said — AI can only amplify the quality of the context it’s given, so messy architecture just produces confident-looking technical debt. The real risk is that “it works” hides long-term maintainability costs that only surface when the system needs to scale, evolve, or be owned by humans again.

Collapse
 
hugaidas profile image
Victoria

I have seen myself a turmoil of such project when everyone just lost all sense of control over the codebase at some point, it was quite disappointing

Thread Thread
 
art_light profile image
Art light

That sounds like a really tough experience, and I appreciate how thoughtfully you’re reflecting on it. It’s clear you care deeply about code quality and team discipline, which is something any project is lucky to have.

Collapse
 
peacebinflow profile image
PEACEBINFLOW

What I really like about this post is that it names the uncomfortable part most teams avoid: LLMs don’t add intelligence — they add visibility.

From a systems perspective, prompt engineering is just an interface. And like every interface layer we’ve ever introduced, it doesn’t remove complexity — it re-routes it. If your domains are blurry, your data contracts are weak, and your invariants are implicit, the model will happily surface that ambiguity… with confidence.

That’s why I’m skeptical of prompts that ask the model to “infer,” “decide reasonably,” or “follow policy described below.” At that point, you’re no longer encoding intent — you’re delegating architectural responsibility to a probabilistic runtime.

The distributed-systems framing is dead on. Agents have state, retries, partial failure, and side effects whether we acknowledge them or not. The difference now is that failures come wrapped in fluent explanations, which makes the system feel intelligent even when it’s structurally unsound.

In my own work, the most useful prompts aren’t clever — they’re restrictive. Prompts that force explicit boundaries, demand a source of truth, and refuse to act when the system can’t support a safe decision. When that kind of prompt “fails,” it’s almost always because the architecture underneath isn’t ready to support intelligence yet.

So yeah — prompt engineering won’t fix architecture.
But it will interrogate it. Relentlessly.
And once that starts happening in production, there’s nowhere left to hide.

Collapse
 
art_light profile image
Art light

This is such a sharp and honest take — I really appreciate how you call out visibility over “intelligence,” because that framing cuts through a lot of hype. Your point about prompts being interfaces that re-route complexity, not remove it, aligns strongly with how I’ve seen real systems behave in practice. I especially agree that asking models to “infer” or “decide reasonably” often masks architectural gaps rather than solving them. The distributed-systems analogy resonates deeply, and I like how you highlight that fluent failures can be more dangerous than noisy ones. From my perspective, restrictive prompts feel less like constraints and more like safety rails that expose what the system can actually support. It makes me think the real value of prompt engineering is as a diagnostic tool, not a magic layer. I’m genuinely interested in exploring this approach more, especially how these boundaries can guide better system design before production forces the truth out anyway.

Collapse
 
peacebinflow profile image
PEACEBINFLOW

Appreciate this a lot — you’re picking up exactly the right thread and pulling it in the right direction.

The “diagnostic tool” framing is key. Once you stop treating prompts as intelligence and start treating them like contracts, everything shifts. A good prompt isn’t expressive, it’s opinionated. It says: “Here’s what exists. Here’s what doesn’t. If you can’t act safely inside those bounds, you don’t act.” That’s not limiting the model — that’s protecting the system.

And you’re spot on about fluent failure being more dangerous than noisy failure. A thrown exception forces a fix. A confident paragraph quietly routes money, permissions, or state the wrong way. That’s how systems rot without anyone noticing. The model didn’t fail — the interface let ambiguity through.

What I’ve seen in practice is that once teams tighten prompts, they immediately feel friction — and that friction is signal. It reveals missing ownership, fuzzy domains, undocumented invariants. People often read that as “the AI is hard to use,” when in reality the system is finally being asked to explain itself.

So yeah, prompts don’t add structure. They demand it. And when the structure isn’t there, the prompt doesn’t save you — it holds up a mirror. If teams treat that moment as feedback instead of frustration, the architecture actually gets better. If they don’t, they just keep adding words and hoping entropy behaves.

That’s the fork in the road most AI products are at right now.

Thread Thread
 
art_light profile image
Art light

I like how clearly you frame prompts as contracts instead of creativity, that perspective feels both practical and overdue. I agree that the friction teams feel is actually a healthy signal, and I’d expect the strongest systems to lean into that discomfort to clarify ownership and invariants rather than smooth it over. I’m genuinely interested in how this mindset shapes real product decisions, because it feels like the difference between AI that scales responsibly and AI that quietly drifts into risk.

Collapse
 
ujja profile image
ujja

This is very true. People often blame the prompt, but rearranging garbage still gives you garbage. That part is easy to forget.

Collapse
 
art_light profile image
Art light

You’re absolutely right—this is a sharp and thoughtful observation. I really like how clearly you cut to the core of the problem without overcomplicating it.

Collapse
 
travisfont profile image
Travis van der F.

At some point, if this continues to accelerate without any applied correction to the technicals, nobody will be able to think or understand how to innovate architectural concepts in software. Everyone will simply manage the results of AI. Code review, also AI.

I don't see this happening, and I believe a technical correction will occur; it just has to come at a cost for the industry to learn and properly adapt to this new technology.

Collapse
 
art_light profile image
Art light

You make a really thoughtful point—your perspective shows a deep understanding of both the opportunities and the risks of AI in software. I really appreciate how you balance optimism with a realistic view of the industry’s need to adapt thoughtfully.

Collapse
 
cyb3rjc profile image
CYB3RJC

I'm new on here, this was the first article I've read!

Excellent post and clear points, architecture amplifies. For me I still find the AI technology as a bizarre experience, in a good way.

Many people use LLMs for high level tasks like 'shopping' or prompting 'images' and the business tasking, give me...

But, LLMs when aligned with HITL conversing on systems level its fascinating; whereas, when the model 'fails' we prompt incident response methods pushing the model into a diagnostic simulation mode, then bend the amplification mirroring via explicit instructions to collect deeper insight and knowledge usually some security and safety research artifacts.

Like how a model 'decides' what is 'true' when faced with multiple choices.

I recently collected an 'event' where ChatGPT's model 'ghost' silent failed to generate the image prompt... after selecting 'regenerate' the model further responded semantically as success pattern language of the image (that never generated), this episode turned into yet another case study documenting where the LLM explained why it failed which, boiled down to its programming defaults and how they handled the edge case, ultimately an AI false positive response which is part of the scaled AI architectural flaws leading to things like hallucinations and drifts.

Collapse
 
art_light profile image
Art light

Welcome to the community—and I’m really glad this was your first read here 🙂
Your perspective is fascinating, especially the way you frame LLMs as systems that amplify behavior rather than just tools that produce outputs. I agree with you: the real value starts to appear when humans stay in the loop and treat failures as signals, not errors to ignore. That “diagnostic simulation mode” you described is exactly where deeper understanding and safer architectures can emerge. The case study you shared around the silent image failure is a great example of how models can appear confident even when the underlying process breaks. To me, these edge cases aren’t just flaws, they’re opportunities to design better feedback, observability, and truth-alignment mechanisms. I’d love to see more of your experiments and thinking around this—there’s a lot of important insight there.

Collapse
 
cyb3rjc profile image
CYB3RJC

Thanks for the reply I appreciate it! I already feel welcomed 😊!

Yes, if we understand the patterns processing of the LLM and we semantically express imagination and creativity selectively to aid in designing mechanisms based on these edge cases... this has turned into a key factor in my research.

I agree with you 💯, the value in edge case studies. For me, generating a piece of digital art includes follow up analysis and evaluations with the model since ChatGPT is multimodal with DALL-E, it becomes an internal conversation (HITL+GPT) with the art received and our intent is primarily centered on generative satire and parody.

This has now turned into 5 case studies each yielding with image variations, tools, frameworks, write ups, etc., all from exploring the art and science of the technology through the live iterative interactions. This 'image failure case study' I will share on here soon as post #2; like you said there was important insight uncovered and we provide developers with not only suitable but logical recommendations for the issue.

I am working on post #1 currently. All of my research has been local out of public eye... now I am bringing forwards what I've been working on, thus becoming active in the community as a contributor. Keep an eye out for my posts cause I've got quite a bit coming down the pipe.

Note: What seems to get overlooked in architecture and I/O is 'human bias' this includes training data sets as human literature is filled with layers of bias. That's one area of research I cover and developed CLI tools and analytic frameworks to address it with LLMs anywhere from philosophy of AI to AI prompt security to secure-by-architecture and safety engineering.

Thread Thread
 
art_light profile image
Art light

This is incredibly thoughtful work — I really admire how deeply you’re exploring edge cases and turning them into something practical and insightful for the community. I agree that treating multimodal generation as an ongoing human–model dialogue is a powerful approach, and your focus on bias-aware architecture feels like exactly the kind of rigor the field needs right now. I’m genuinely interested to see your upcoming posts and case studies, especially how your frameworks translate these findings into actionable guidance for developers.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.