hefty

Posted on May 18

Stop Calling It Vibe Coding When You Need Engineering

#ai #programming #productivity #vibecoding

The most useful thing about "vibe coding" is also the thing that makes it dangerous: it feels like progress before the system has earned your trust.

You describe the app. The model writes a lot of code. The demo starts to move. For prototypes, that is magic. For production software, it is where the bill starts.

The mistake is treating the first 70% as proof that the last 30% will be easy. It usually is not. The last 30% is where the vague requirements become edge cases, the generated architecture starts pushing back, and the missing tests stop being a detail.

That is why the more interesting shift is not "AI writes code now." We already know that. The real shift is from vibe coding to agentic engineering: structured work where agents operate inside specs, tests, memory, review loops, and clear human control.

Vibe coding is great until the code matters

Vibe coding works best when the cost of being wrong is low.

Want to explore a UI idea? Fine. Want to build a throwaway internal script? Great. Want to generate a starter app so you can see the shape of a product? That is a legitimate use case.

The problem starts when the prototype quietly becomes the foundation.

AI-generated code can look more finished than it is. It may compile. It may even pass the happy-path click test. But production work has a different standard:

Can another developer understand the structure?
Are the failure cases explicit?
Do tests cover the behavior that matters?
Is the architecture still sane after the fifth change request?
Can you safely modify it next month?

That is the part vibe coding tends to hide. The model can generate volume faster than you can inspect intent. If the workflow is just "prompt, accept, prompt again," you are not removing engineering work. You are moving it downstream, where it is harder to see.

The 70% problem is really a trust problem

The "70% problem" is a good way to frame this: AI gets you impressively far, then the remaining work becomes weirdly expensive.

That does not mean AI coding is bad. It means code generation is not the same as software delivery.

The first 70% rewards speed. The last 30% rewards judgment. Those are different muscles.

Early on, the agent can make broad moves:

scaffold the app
wire up common patterns
generate boilerplate
suggest APIs
implement obvious flows

Later, the work becomes less about typing and more about control:

deciding what should not be abstracted
catching incorrect assumptions
tightening data boundaries
deleting clever-but-useless code
proving behavior with tests

That is why serious AI coding workflows start to look less like chat and more like engineering operations. You need constraints. You need feedback. You need durable context. You need a way to say, "This is the contract. This is the test. This is the part you are allowed to change."

Without that, the agent is just producing plausible text in a code-shaped format.

Agentic engineering changes the unit of work

The useful unit is no longer a prompt. It is a task with context, acceptance criteria, tools, and review.

That sounds less exciting than "build me an app," but it is the difference between a demo and a workflow you can keep using.

Agentic engineering is the practice of making AI agents operate inside an engineering system:

specs before implementation
tests before trust
small scopes instead of giant rewrites
file-based handoffs instead of chat memory guesses
human review at the points where judgment matters
repeatable skills for common work

This is where tools like Hermes Agent are worth watching. The interesting part is not that it is another chatbot interface. The project points toward a more operational model: agents with memory, custom skills, subagents, and deployment options that let them run as part of a workflow instead of sitting off to the side as a text box.

That is a different posture. A coding assistant answers. An engineering agent should remember, delegate, run tools, adapt to local patterns, and leave artifacts that humans can audit.

It still needs supervision. Maybe more supervision, not less. But the supervision moves from babysitting every line to designing the system the agent works inside.

Parallel agents only help if the work is shaped correctly

Once people see agents as workers instead of autocomplete, the next temptation is obvious: run more of them.

That can help. It can also create a beautiful mess.

Parallel agents are only useful when the work can be split cleanly. If three agents all edit the same files, disagree about architecture, and invent their own assumptions, you did not gain throughput. You created a merge conflict with confidence.

The better pattern is boring and effective:

one agent explores a specific question
one agent owns a narrow implementation area
one agent verifies behavior or checks risks
all of them write results back to files
a human or orchestrator integrates the output

This is also why memory and custom skills matter. If every agent starts cold, you spend half the run re-explaining the codebase. If the agent can carry durable project knowledge and reusable workflows, it has a better shot at producing work that fits.

The goal is not autonomy for its own sake. The goal is less repeated context loading, fewer sloppy handoffs, and faster movement through well-defined tasks.

Programming for agents may need stricter boundaries

The Weft project is another signal in the same direction: developers are starting to think about languages and runtimes where humans, LLMs, and infrastructure are all first-class parts of the system.

That framing matters because agent work is not just "call an LLM and hope." Durable execution, explicit state, recoverable tasks, and clear boundaries become much more important when a model is allowed to act over time.

This is where the hype often gets ahead of the engineering.

Agents are not magically reliable because they can call tools. Tool access gives them more surface area to fail. The workflow has to make failure visible:

what did the agent read?
what did it change?
what assumptions did it make?
what tests did it run?
what still needs human review?

If you cannot answer those questions, you do not have an agentic workflow. You have a longer prompt chain.

The practical upgrade path

You do not need to throw away vibe coding. You need to stop using it as the whole process.

A more production-friendly workflow looks like this:

Use vibe coding for exploration.
Freeze the useful direction into a short spec.
Break the work into small tasks with file ownership.
Ask the agent to implement against the spec, not the vibe.
Require tests, logs, or screenshots depending on the change.
Review the diff like you would review a human teammate's work.
Capture repeatable patterns as reusable project instructions or skills.

That last step is underrated. If you keep prompting the same rule every day, it belongs in the system, not in your short-term memory. Agents get more useful when the workflow teaches them how your project actually works.

Where this still breaks down

Agentic engineering is not a magic maturity badge.

It can add overhead. It can produce too much process around simple work. It can create false confidence if the agent writes tests that merely confirm its own misunderstanding. It can also make teams lazy about architecture if they assume "the agent will fix it later."

The rule of thumb is simple: match the process to the blast radius.

For a prototype, vibe coding is fine. For a user-facing system, you need constraints. For critical paths, you need human review, meaningful tests, and boring operational discipline.

The future of AI coding is probably not one giant prompt that builds the perfect app. It is smaller, sharper loops where agents do real work inside systems that make their output inspectable.

That is less magical.

It is also much closer to engineering.

Top comments (2)

Harjot Singh • May 29

renaming the workflow doesnt fix the underlying gap, agreed. the real failure mode tho: vibe-vs-engineering is a false binary for greenfield builds. theres a third lane - deterministic gen w/ failure-classifier + retry caps + typed handoffs between phases - where the OUTPUT is engineering-grade but the INPUT is a single prompt. building moonshift on that bet, $3 per shipped saas, code into ur own gh + vercel. first run free if u want a counter-example for ur post.

hefty • Jun 1

I actually agree that there is a third lane here — but I would argue that the moment you introduce failure classifiers, retry caps, typed handoffs, and deterministic phases, you are already moving out of “vibe coding” and into engineering.

The input being a single prompt is not the important part. The important part is whether the workflow makes assumptions visible, failures classifiable, outputs reviewable, and changes maintainable after the first run.

For greenfield builds, I can definitely see a prompt-driven pipeline getting very far, especially for scaffolding and common SaaS patterns. But the real test is not “can it ship the first version?” It is: can another developer understand the system, safely change it next month, debug the weird edge case, and trust the generated architecture after the fifth requirement change?

So yes, I’m open to counter-examples. But to me, a deterministic generation pipeline with typed handoffs is not a refutation of the post — it is basically the more structured agentic engineering workflow the post is arguing for.