The most useful thing about "vibe coding" is also the thing that makes it dangerous: it feels like progress before the system has earned your trust.
You describe the app. The model writes a lot of code. The demo starts to move. For prototypes, that is magic. For production software, it is where the bill starts.
The mistake is treating the first 70% as proof that the last 30% will be easy. It usually is not. The last 30% is where the vague requirements become edge cases, the generated architecture starts pushing back, and the missing tests stop being a detail.
That is why the more interesting shift is not "AI writes code now." We already know that. The real shift is from vibe coding to agentic engineering: structured work where agents operate inside specs, tests, memory, review loops, and clear human control.
Vibe coding is great until the code matters
Vibe coding works best when the cost of being wrong is low.
Want to explore a UI idea? Fine. Want to build a throwaway internal script? Great. Want to generate a starter app so you can see the shape of a product? That is a legitimate use case.
The problem starts when the prototype quietly becomes the foundation.
AI-generated code can look more finished than it is. It may compile. It may even pass the happy-path click test. But production work has a different standard:
- Can another developer understand the structure?
- Are the failure cases explicit?
- Do tests cover the behavior that matters?
- Is the architecture still sane after the fifth change request?
- Can you safely modify it next month?
That is the part vibe coding tends to hide. The model can generate volume faster than you can inspect intent. If the workflow is just "prompt, accept, prompt again," you are not removing engineering work. You are moving it downstream, where it is harder to see.
The 70% problem is really a trust problem
The "70% problem" is a good way to frame this: AI gets you impressively far, then the remaining work becomes weirdly expensive.
That does not mean AI coding is bad. It means code generation is not the same as software delivery.
The first 70% rewards speed. The last 30% rewards judgment. Those are different muscles.
Early on, the agent can make broad moves:
- scaffold the app
- wire up common patterns
- generate boilerplate
- suggest APIs
- implement obvious flows
Later, the work becomes less about typing and more about control:
- deciding what should not be abstracted
- catching incorrect assumptions
- tightening data boundaries
- deleting clever-but-useless code
- proving behavior with tests
That is why serious AI coding workflows start to look less like chat and more like engineering operations. You need constraints. You need feedback. You need durable context. You need a way to say, "This is the contract. This is the test. This is the part you are allowed to change."
Without that, the agent is just producing plausible text in a code-shaped format.
Agentic engineering changes the unit of work
The useful unit is no longer a prompt. It is a task with context, acceptance criteria, tools, and review.
That sounds less exciting than "build me an app," but it is the difference between a demo and a workflow you can keep using.
Agentic engineering is the practice of making AI agents operate inside an engineering system:
- specs before implementation
- tests before trust
- small scopes instead of giant rewrites
- file-based handoffs instead of chat memory guesses
- human review at the points where judgment matters
- repeatable skills for common work
This is where tools like Hermes Agent are worth watching. The interesting part is not that it is another chatbot interface. The project points toward a more operational model: agents with memory, custom skills, subagents, and deployment options that let them run as part of a workflow instead of sitting off to the side as a text box.
That is a different posture. A coding assistant answers. An engineering agent should remember, delegate, run tools, adapt to local patterns, and leave artifacts that humans can audit.
It still needs supervision. Maybe more supervision, not less. But the supervision moves from babysitting every line to designing the system the agent works inside.
Parallel agents only help if the work is shaped correctly
Once people see agents as workers instead of autocomplete, the next temptation is obvious: run more of them.
That can help. It can also create a beautiful mess.
Parallel agents are only useful when the work can be split cleanly. If three agents all edit the same files, disagree about architecture, and invent their own assumptions, you did not gain throughput. You created a merge conflict with confidence.
The better pattern is boring and effective:
- one agent explores a specific question
- one agent owns a narrow implementation area
- one agent verifies behavior or checks risks
- all of them write results back to files
- a human or orchestrator integrates the output
This is also why memory and custom skills matter. If every agent starts cold, you spend half the run re-explaining the codebase. If the agent can carry durable project knowledge and reusable workflows, it has a better shot at producing work that fits.
The goal is not autonomy for its own sake. The goal is less repeated context loading, fewer sloppy handoffs, and faster movement through well-defined tasks.
Programming for agents may need stricter boundaries
The Weft project is another signal in the same direction: developers are starting to think about languages and runtimes where humans, LLMs, and infrastructure are all first-class parts of the system.
That framing matters because agent work is not just "call an LLM and hope." Durable execution, explicit state, recoverable tasks, and clear boundaries become much more important when a model is allowed to act over time.
This is where the hype often gets ahead of the engineering.
Agents are not magically reliable because they can call tools. Tool access gives them more surface area to fail. The workflow has to make failure visible:
- what did the agent read?
- what did it change?
- what assumptions did it make?
- what tests did it run?
- what still needs human review?
If you cannot answer those questions, you do not have an agentic workflow. You have a longer prompt chain.
The practical upgrade path
You do not need to throw away vibe coding. You need to stop using it as the whole process.
A more production-friendly workflow looks like this:
- Use vibe coding for exploration.
- Freeze the useful direction into a short spec.
- Break the work into small tasks with file ownership.
- Ask the agent to implement against the spec, not the vibe.
- Require tests, logs, or screenshots depending on the change.
- Review the diff like you would review a human teammate's work.
- Capture repeatable patterns as reusable project instructions or skills.
That last step is underrated. If you keep prompting the same rule every day, it belongs in the system, not in your short-term memory. Agents get more useful when the workflow teaches them how your project actually works.
Where this still breaks down
Agentic engineering is not a magic maturity badge.
It can add overhead. It can produce too much process around simple work. It can create false confidence if the agent writes tests that merely confirm its own misunderstanding. It can also make teams lazy about architecture if they assume "the agent will fix it later."
The rule of thumb is simple: match the process to the blast radius.
For a prototype, vibe coding is fine. For a user-facing system, you need constraints. For critical paths, you need human review, meaningful tests, and boring operational discipline.
The future of AI coding is probably not one giant prompt that builds the perfect app. It is smaller, sharper loops where agents do real work inside systems that make their output inspectable.
That is less magical.
It is also much closer to engineering.
Top comments (0)