Workflow Engineering > Prompt Engineering

#ai #webdev #tooling #devex

... it's early 2026. Remember when I said AI is a tool? I still believe that. But I've been using Claude Code for a few months now and I need to update the nuance a bit: AI is a tool, but the way you set up the workshop matters more than the tool itself.

I built a book inventory app. Nothing fancy, it tracks books across households, lets users invite others to share their collections. Hono on the backend, React with Vite on the frontend, Neon Postgres for the database, deployed on Vercel. A boring stack for a boring app. And I mean that as a compliment. Oh, and before you ask - no, it doesn't have any AI-powered features. No "smart recommendations," no "AI-curated reading lists." It's a CRUD app. It stores books. The irony is not lost on me.

But the way I built it? That part wasn't boring at all.

The Codex Chapter

I started with Codex. I had high hopes. Same skills loaded, same setup, same project. Progress was slow. Not because Codex is bad, it's not, but because the ergonomics didn't work for me. Worktrees felt awkward. Parallel agents were running but I couldn't see what they were doing well enough to react in time. I was spending more energy managing the tool than building the app. That said, this could easily be a skill issue on my part. Codex might click better for someone with a different workflow or habits, I'm not here to tell you it's a bad tool. It just wasn't the right fit for how I work.

So I switched to Claude Code. And things exploded.

Same skills. Same project. Different interface. Claude Code's TUI let me see parallel agents running in tmux, let me react fast, let me stay in the flow. Thats it. That's the difference. Not smarter AI, not better models, better ergonomics. If you take one thing from this post: capability is table stakes. The interface determines your productivity.

The Workflow Is the Skill

Here's where it gets interesting. I didn't just open Claude Code and say "build me a book inventory app." That's how you get a mess.

Instead, I designed a pipeline. Every feature goes through the same stages:

First, brainstorming. I use obra's superpowers skill, specifically the brainstorming mode, for the planning phase. It made a noticeable jump in the quality of definitions compared to vanilla planning. The output here isn't code, it's clarity about what I'm building and why.

Then, specification. The planning phase generates an OpenSpec, a structured definition of what needs to be built. Still no code.

Then, decomposition. The spec gets broken into beads (if you haven't seen Steve Yegge's beads repo on GitHub, go look). Each bead is a tight, focused task. This is where the magic starts, because beads keep everything scoped. No context window bloat, no agents wandering off into tangents.

Then, implementation. This is where I hand the OpenSpec and the beads to Claude Code and say: go. Parallel agents in tmux pick up the tasks and run.

The One-Shot That Wasn't

Let me tell you about the household feature. Users living in the same space should have access to all books in that space. And I needed an invitation system so the household creator could invite others.

I kicked off the pipeline. Brainstorming produced a clean definition. OpenSpec captured the full scope. Beads broke it into tasks. I handed it to Claude Code and the parallel agents basically one-shotted the whole thing, household concept and invitations, implemented and working.

Sounds impressive, right?

But was it really a one-shot? The implementation was, sure. But I front-loaded the intelligence into planning, specification, and decomposition. The "shot" landed cleanly because the planning was rigorous. Take away the pipeline and ask Claude Code to "implement household sharing with invitations" cold? You'll get something. Whether you'll get something good is a another question.

This mirrors how experienced developers actually work. Nobody good just starts coding a multi-faceted feature. You think it through, you break it down, then you execute. I just happened to have AI on both sides, doing the thinking and the executing. My job was designing the workflow between them.

Three Layers of "Does It Actually Work?"

I can hear the skeptics: "Sure, AI generated the code. But does it work?"

Fair question. Here's my answer: TDD is in place from the start. Yes, through a skill. Tests exist before implementation, not as an afterthought.

But tests only tell you the logic is correct. So I also use a Playwright skill with Chrome to watch actual end-to-end runs. I see the app doing what it's supposed to do. No manual clicking through screens, no "I think it works." I watch it work.

And then, at the end of each meaningful session, I spawn dedicated reviewer agents, one for frontend, one for backend, one for security. Their findings go into a follow-up PR.

Three layers: TDD catches logic errors. Playwright catches visual and integration errors. Reviewers catch architectural and security issues. None of them manual.

And then there's the fourth layer: me. I do a thorough manual code review after all of this. AI catches a lot, but I still read the code myself. I need to understand what's in my codebase, I need to know why decisions were made, and I need to catch the things that automated tools miss. The subtle logic that's technically correct but wrong for the product, the naming that'll confuse me in three months, the architectural drift that no linter will flag. If you skip this step, you're not building software, you're accumulating code you don't own.

The Cross-Model Twist

Here's something that might raise eyebrows: I've been experimenting with using Codex as my reviewer.

Yes, that Codex. The one I moved away from for building. Turns out, Codex 5.3 with maxed out thinking produces genuinely valuable review feedback. And it makes sense when you think about it, review is a different cognitive task than generation. You're evaluating against criteria, not creating from scratch. Codex's deep thinking mode suits that well. The ergonomics that frustrated me during building don't matter for review because it's single-threaded, focused work.

I'm not loyal to one tool. I'm assembling the best pipeline I can from whatever works. Claude Code plans, specs, decomposes, and implements. Codex reviews. Each playing to its strengths.

What This Actually Means

In my last post I said AI is a tool and that you need to know what you're doing to use it well. I stand by that. But now I'd add: the emerging skill isn't coding, and it isn't prompting either. It's pipeline design.

Knowing which skills to load, when to brainstorm vs. spec vs. decompose, when to run agents in parallel, when to bring in a different model entirely, that's the craft now. The code is the output. The workflow is the work.

Will this change again in six months? Probably. But the principle won't: understand what you're building, design a process that keeps quality high, and use whatever tools make that process smooth. Boring? Maybe. But boring apps that work are what people actually need.

And I still think I'll have a lifetime of work fixing vibe-coded messes. Some things don't change.