Jono Herrington

Posted on Apr 7 • Originally published at jonoherrington.com

Build the System, Not the Prompt

#ai #leadership

If I had to roll out AI again, I wouldn't change the tools. I'd change the approach. I'd start with one repeatable workflow, map every step, define what good output looks like, encode it once, and turn the whole thing into a pipeline. Then I'd improve the system instead of rewriting prompts. That framework didn't come from reading about AI adoption. It came from getting it wrong first and then building my way out across engineering, content creation, and personal workflows until the pattern became impossible to ignore.

The real unlock wasn't a single AI doing a task well. It was learning to orchestrate multiple agents through a shared system that produces consistent output. You can give AI the same prompt twice and get two different results. The only way to get reliability from something inherently variable is to surround it with structure ... defined inputs, clear standards, encoded context. The system is what makes the output trustworthy. The prompt never will.

The only way to get reliability from something inherently variable is to surround it with structure.

Here's how that works in practice.

Start With One Repeatable Workflow

Resist every instinct to go wide.

Most AI rollouts start by giving everyone access and seeing what happens. That approach produces twenty people prompting individually, all getting decent results, none of them building on each other. I read a thread last year where over 500 experienced engineers described what happened when their companies rolled out AI. The stories were remarkably similar. Leadership gave everyone access, maybe ran a training session, and then measured adoption by how many people were using the tools. Almost nobody described a system.

Pick one workflow instead. Not the most exciting one. Not the one with the biggest potential ROI on a slide deck. The most repeatable one. The task that happens the same way, with the same inputs, producing roughly the same shape of output, over and over again. For my engineering team, that was scaffolding a new service endpoint. Every engineer did it. Every engineer did it slightly differently. And every time AI helped with it, the slight differences multiplied.

One workflow gives you a contained environment where you can see what works, what breaks, and what the tool actually needs from you before you've spread the experiment across your entire surface area.

Map Every Step

The mapping is where most teams skip ahead, and it's where the real value hides.

Sit down and write out what a human actually does when they complete this workflow. Not the idealized version. Not the documented version from a wiki page nobody has updated since 2023. The real version. The one that includes the implicit decisions people make without thinking about them ... which logging pattern to use, how to handle the auth layer, whether to write the test first or after, what error messages should say.

When we mapped our endpoint scaffolding workflow, we found twelve distinct decisions that engineers were making individually every time. Twelve places where the output could diverge. Most of those decisions were invisible. Nobody had ever written them down because they felt obvious to the person making them. What's obvious to the engineer who's been on the team for three years is a guess for the engineer who started last month. And it's completely opaque to the AI.

The map doesn't have to be pretty. Ours was a markdown file with numbered steps and notes about where judgment calls happen. But having it at all changed the conversation from "how do we prompt this better" to "what decisions does this workflow actually require."

Define Good Output

This is the step most teams skip entirely, and it's the one that makes everything else work.

Before we let the tool generate a single line of code for our mapped workflow, we wrote down what a good result looks like. Not vaguely. Specifically. A good endpoint scaffold in our system uses this error handling pattern. It logs with this format. It follows this naming convention. It includes these specific tests. The auth layer integrates this way. State management follows this approach.

Most of that had been living in people's heads or "decided" in meetings that produced no artifacts. Writing it down was uncomfortable because it forced arguments we'd been deferring. Two engineers had different opinions about retry logic. A tech lead and an architect disagreed on how granular logging should be. The AI had been scaling both approaches simultaneously because nobody had picked a winner.

This is where the non-deterministic nature of AI makes systems essential. A deterministic tool gives you the same output every time. AI doesn't. If you haven't defined what good looks like in writing, every interaction is a coin flip between five technically valid approaches. Define it once and the system has something to aim at.

If you haven't defined what good looks like in writing, every interaction is a coin flip between five technically valid approaches.

Add Rules, Context, and Examples Once

Once you have the map and the definition of good, you encode it.

Instead of every person carrying the context in their head and typing it fresh each session, you write it down once in a form the tool can consume. For us, that meant markdown files in the repo. Rules for the architectural patterns. Examples of correct output. Context about our specific stack, our conventions, our decisions. All of it sitting alongside the code, where both humans and AI workflows could reference it.

The first time an engineer used the encoded workflow instead of prompting from scratch, the output matched our standards on the first pass. Not because the engineer was more skilled. Not because the prompt was more clever. Because the system already knew what good looked like.

The new hire who joined last week gets the same quality output as the tech lead who defined the patterns. The context travels with the system, not with the person.

Turn It Into a Pipeline

A workflow with mapped steps, defined outputs, and encoded context stops being a prompt and starts being a pipeline.

A prompt is a request. A pipeline is infrastructure. A prompt gets you one good result. A pipeline gets you a hundred consistent ones. And a pipeline can be improved. Update the system once and every future interaction runs through the better version.

When we found an edge case in our endpoint scaffolding pipeline, we didn't adjust one engineer's prompt. We updated the canonical pattern, and every engineer's next interaction benefited from the fix. When we realized our logging context was missing a specific format requirement, we added it once, and it propagated everywhere. The improvements compound because the system is shared.

I've since built pipelines well beyond engineering. My content creation runs through an editorial system with multiple AI agents handling drafting, editing, and grading in sequence. Financial workflows, personal automation, code review ... each one started the same way. One repeatable task. Map the steps. Define good. Encode it. Improve the system.

A prompt gets you one good result. A pipeline gets you a hundred consistent ones.

Improve the System, Not the Prompt

This is where most teams leave the real multiplier on the table.

The default behavior with AI is to optimize the prompt. The output wasn't quite right, so you rewrite the instructions. You add more context. You try a different framing. And maybe it works better this time. But that improvement lives in your head, in that one session, and it disappears the moment someone else sits down to do the same task.

The alternative is to improve the system. When something doesn't work, you don't rewrite the prompt. You update the encoded rules, the documented standards, the context files that every future interaction draws from. The fix propagates. It compounds. It gets better for everyone, every time, without anyone needing to remember what worked last Tuesday.

The team that has ten encoded pipelines and average prompting skills will outperform the team with zero pipelines and a Slack channel full of prompt tips. Every single time. Because one team is building infrastructure and the other is performing.

If you're leading an AI rollout right now, pick one workflow. The most boring, repeatable one you have. Map it. Define what good output looks like. Write it down. Encode it. And then do the same thing with the next workflow, and the next.

The compound return isn't in the prompt. It never was.

One email a week from The Builder's Leader. The frameworks, the blind spots, and the conversations most leaders avoid. Subscribe for free.

Top comments (3)

Mykola Kondratiuk • Apr 9

prompts are config not code. most people treating them like artisanal one-offs - that’s where it breaks.

Jono Herrington • Apr 9

That's the thing ... "config" implies a system around it. Versioning. Review. A single source of truth.

What I see happening instead is prompts living in Slack threads. "Here's what worked for me." Copied, pasted, modified, lost. The artisanal part isn't the craft. It's the absence of infrastructure.

The teams that treat prompts like config don't just write them better. They write them once, encode the decisions, and improve the system. Everyone else is performing the same discovery fifty times and calling it productivity.

Mykola Kondratiuk • Apr 9

yeah exactly - Slack is the worst prompt graveyard. we git-track ours now, changelogs when behavior changes. sounds overkill but you catch regressions you'd never notice otherwise