Pizza Cat

Posted on Jun 11

The End of Vibe Coding: Why I Switched to Structured AI Workflows

#ai #webdev #productivity #saas

The End of Vibe Coding: Why I Switched to Structured AI Workflows

I spent 3 months "vibe coding" my SaaS. Then I realized I was spending more time fixing AI's mistakes than if I'd written it myself. Here's the system that changed everything.

In early June 2026, two HN threads with a combined ~1,300 comments told me something had shifted.

Thread 1 (~1,100 comments): "What was your 'oh shit' moment with GenAI?"
Thread 2 (~230 comments): "What tools have you made for yourself since AI?"

Both threads had the same pattern: people started with unfiltered excitement ("I built a whole app in one weekend!"), then hit a wall ("I'm spending more time fixing its bugs than writing code from scratch").

I know the feeling. I lived it.

The Vibe Coding Trap

When I started building MultiPost — an AI-powered cross-platform content tool — I was deep in "vibe coding" mode:

Me: "Make it look better"
AI: *adds Tailwind, restyles everything*
Me: "Add a filter by date"
AI: *adds a date picker, breaks the layout*
Me: "Fix that bug where posts don't show"
AI: *fixes the filter, introduces a null pointer*
Me: "Okay now add a dark mode toggle"
AI: *regenerates half the component from scratch*

Three weeks later I had a working feature and zero understanding
of how any of it actually held together.

This was my daily rhythm for weeks. Fast output, slow cleanup. The ratio kept getting worse as the codebase grew.

I was optimizing for speed of generation instead of speed of delivery.

The "Oh Shit" Moment

It came when I reviewed a feature I'd built entirely through unstructured AI sessions. The feature worked. But:

The code had 3 different patterns for the same thing (Auth0 token handling in one place, hardcoded keys in another)
Error handling was inconsistent — some functions returned null, others threw, others returned Result types
Database queries were scattered across the codebase instead of in a repository layer
A security reviewer would have cried

The AI didn't do this maliciously. It did this because I asked it to "fix this" and "add that" without ever giving it a structural framework. Every session was a fresh context with no memory of architectural decisions from the previous one.

The Fix: Structured AI Workflows

I switched from "vibe coding" to what I call structured AI workflows. The principle is simple:

Don't ask AI to write code. Ask it to execute a plan that you've designed together.

Here's the actual system I use:

1. Design Before Generate (15 minutes)

Before I let AI write a single line of code for a feature:

Step 1: Describe the feature in plain English back to AI
Step 2: Ask AI to list the components, data flow, and edge cases
Step 3: I review the plan — fix the architecture BEFORE code exists
Step 4: Only then: "Implement component X as designed"

The key change: I'm reviewing a plan, not debugging code. A plan review takes 2 minutes. Debugging generated code takes 30 minutes.

2. One Concern Per Session

This was my biggest mistake. I'd ask AI to "build the auth system" in one go — which means it generates auth UI, backend routes, database schema, and middleware all at once. Too much to review.

Instead:

Session 1: "Design the auth data model. Here are the constraints..."
Session 2: "Implement the auth API routes based on the data model from session 1"
Session 3: "Build the login page UI"

Each session has one clear output that I can fully review in under 5 minutes.

3. Enforce Structure with Prompt Templates

I stopped typing one-off prompts. Every session starts with a structured template:

CONTEXT: [what we're building, decisions made so far]
TASK: [single, specific output]
CONSTRAINTS: [tech stack, patterns to follow, things to avoid]
OUTPUT FORMAT: [expected delivery — code block, diagram, plain text]

This alone cut my "fix AI's mistakes" time by ~60%.

4. The "Review Gate" Practice

After every implementation session, before accepting the code:

Does it match the plan? (If not, reject — don't fix inline)
Are error paths handled? (If not, ask for them specifically)
Is it consistent with existing code? (Same patterns, same conventions)

This sounds like a lot. It takes 3-5 minutes per session and saves hours of later debugging.

The Before and After (Real Code)

Here's what a typical piece of vibe-coded auth logic looked like in my codebase — three different sessions, three different patterns:

- // Session 1: "Add auth check here"
- function getUserId(req) {
-   const token = req.headers.authorization?.split(' ')[1]
-   return token ? jwt.verify(token, process.env.SECRET).sub : null
- }
- 
- // Session 2: "Also add auth here"
- const requireAuth = (req, res, next) => {
-   try {
-     const user = jwt.verify(req.cookies.token, 'hardcoded-secret-123')
-     req.user = user
-     next()
-   } catch { res.sendStatus(401) }
- }
- 
- // Session 3: "Fix the auth bug"
- async function getUser(req) {
-   if (!req.headers['x-auth']) return { error: 'no auth' }
-   const result = await db.query(`SELECT * FROM users WHERE token = '${req.headers['x-auth']}'`)
-   return result.rows[0]
- }

Three approaches. Three error-handling styles. A hardcoded secret. A SQL injection waiting to happen. All doing "the same thing" because each session had no memory of the last.

After applying the structured workflow (one concern per session, review gate between each):

+ // repository/auth.js — single source of truth, one pattern
+ 
+ export function verifyToken(token) {
+   try {
+     return jwt.verify(token, process.env.AUTH_SECRET)
+   } catch {
+     return null
+   }
+ }
+ 
+ export function requireAuth(req, res, next) {
+   const token = req.cookies.token ?? req.headers.authorization?.split(' ')[1]
+   const payload = verifyToken(token)
+   if (!payload) return res.status(401).json({ error: 'unauthorized' })
+   req.userId = payload.sub
+   next()
+ }

One file. One pattern. One error-handling strategy. The AI wrote both versions — the difference was whether I gave it a structure to work within.

Real Numbers: Before vs After

For MultiPost specifically:

Metric	Vibe Coding	Structured Workflows
Time to feature completion	4-6 hours	3-4 hours
Code review reject rate	~40%	~10%
Post-deploy bugs per feature	3-5	0-1
Time spent debugging per week	8+ hours	2-3 hours

The interesting part: structured workflows are actually faster overall, even though they feel slower in the moment. The upfront planning time pays for itself in avoided debugging.

The Flow Difference

Here's what the two approaches look like side by side:

VIBE CODING:
  "Build feature X" → AI writes 500 lines → Deploy → Bug → Debug → Patch → Repeat
  │                      │                        │
  └── You skip design    └── 1 huge chunk         └── 30 min to find the issue

STRUCTURED WORKFLOW:
  "Design feature X" → Review plan → "Implement component 1" → Review → "Implement 2" → Review → Ship
  │                     │            │                          │
  └── 2 min plan        └── 2 min    └── small, reviewable      └── 3 min, catches bugs early

The vibe coding path feels faster because you see output immediately. The structured path is faster because you almost never backtrack.

The Bigger Pattern

Looking at that ~230-comment HN thread, multiple people independently came to the same conclusion:

"Most of the tools I write now are bridges to various SaaS products that have APIs but no CLIs."

"I made a harness to discipline AI output. The opposite of vibe coding. Using it daily."

"Superpowers guides the model to use careful, methodical approaches. Great for multi-step planning."

The community is self-organizing around the same insight: AI is a powerful junior engineer. Junior engineers need structure to produce quality work.

What I Use Now

I ended up building a CLI called Content Bridge to enforce this exact workflow. It's literally this article turned into a tool — you write a plan, review it, implement one component at a time, and the CLI enforces the structure so you don't have to remember the rules.

But you don't need any tool to start. You can begin today with one change:

Before your next AI session, spend 2 minutes writing down what you want it to produce. Not how — what. Review that plan before generating anything.

The generation is the easy part. The structure is the work.

What's your approach? Have you hit the same vibe coding trap? Drop your workflow (or horror story) in the comments — I read all of them.

Top comments (7)

Alex Shev • Jun 11

This is the right direction. The real shift is from asking AI for code to giving it a repeatable workflow: context, constraints, checks, and a clear definition of done.

That is also where terminal-first tooling becomes interesting. If the workflow can run, verify, and leave an audit trail in the repo, it stops being a chat session and starts behaving like a build artifact.

Pizza Cat • Jun 11

Thanks! That is exactly the insight I was trying to capture — moving from ad-hoc prompting to something that behaves like a build artifact.

I actually ended up building a small CLI tool around this workflow approach: a content bridge that transforms, verifies, and publishes across platforms in one command. When the output of an AI session is a verified artifact rather than a chat transcript, the whole economics of development changes.

What has been your experience with making AI workflows reproducible? Have you found any patterns that work particularly well?

Alex Shev • Jun 11

For me the reproducibility pattern starts by turning the AI interaction into a small contract, not a conversation.

The parts that help most are: a fixed input shape, repo or project context checked into files, explicit acceptance checks, and a final artifact that can be inspected outside the chat. If the workflow cannot be rerun by another person or another agent, it is still mostly vibe coding with better notes.

The CLI/tooling layer matters because it gives the process a stable boundary. Prompts can change, models can change, but the workflow still says: here is the context, here are the allowed actions, here is how we verify the output.

Pizza Cat • Jun 14

Your "contract vs. conversation" framing actually shifted how I'm thinking about this. You're right — if the workflow can't be rerun by another person or another
agent, it's still vibe coding with better notes. That's a brutally honest litmus test, and most of what's sold as "AI engineering" today fails it.
A question I've been sitting with since reading your reply: in practice, what does the contract boundary look like for you? Do you define it as a schema/type definition, a
shell script with well-defined arguments, or something looser like a markdown spec that an agent parses?
The reason I ask — I've been experimenting with treating the contract as a CLI argument signature: --input=file.md --output=published --verify=preview. It works for simpl
e cases, but I'm curious how it scales when the workflow involves judgment calls (e.g., "does this output meet the quality bar?").

Alex Shev • Jun 14

For me the strongest boundary is usually an executable one: a CLI contract or a typed input/output schema that a human can run without reading the whole conversation. A markdown spec can work as the design layer, but I do not trust it as the contract until there is something deterministic around it: arguments, fixtures, expected outputs, or at least a validation step. The CLI argument shape you mentioned is a good starting point because it forces the fuzzy workflow to become repeatable.

chneg cheng • Jun 22

@huaian666 Totally — the best dev tools really do start as personal scratches. What surprised me was realizing how many people had the same problem once I put the rough v1 out there. Shipping the janky version early is what told me I wasn't alone.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.