Process automation is older than software. The teams that remember will be the ones still shipping.
Photo by Thao LEE on Unsplash
Fourth in a short series. The first piece argued that the gap between AI-agents-in-pilots and AI-agents-in-production is being closed by a quiet infrastructure rebuild. The second showed what that infrastructure looks like. The third was about the failure modes the dashboard does not show. This piece is the constructive counterpoint: what installing the missing discipline actually consists of, and why teams that install it will be the ones still shipping when the hype rotates.
An older engineer I know — the kind of person who has seen three industry waves come and go — told me, when I asked him what he thought about the agentic moment, that he was struck mostly by how much of the conversation he had heard before.
“Every five years,” he said, “we discover that the way to coordinate work in a large organisation is to write down what the work is. We rediscover this with a slightly different vocabulary every time. The vocabulary changes. The discovery is the same.”
I have been thinking about that for a few weeks, and I think he is right. There is a discipline of process automation that is older than software — older, in some forms, than electricity. We have been forgetting and rediscovering pieces of it for the better part of a century. The agentic hype, in particular, is convincing teams to skip the rediscovery this time around. Which is a problem, because the discipline is what keeps the wheels on.
This piece is about what the discipline actually is.
Five layers a production process needs
If you take an experienced operations engineer — someone who has shipped real workflows in regulated industries, and lived with what they shipped — and you ask them what makes the difference between a process that survives and a process that breaks, you will get a list that has been more or less the same since the 1970s.
Figure 1. The unfashionable list. Each layer holds the ones above it up.
At the bottom is audit completeness. Every decision the process made — by which agent, on what input, against which version of the rules, with what outcome — is captured in one place, by name. Not in three log files, not in two tools, not in a Slack channel. One record. The kind of thing you can hand to a regulator without rewriting it first.
Above that is declared structure. The process exists as a thing you can read and a thing you can run, and these are the same thing. An analyst reads the diagram. An engineer reads the source. They are looking at the same artifact. Nobody has to translate.
Above that is typed contracts. Each step has an input shape and an output shape, both checked. If the upstream step changes its output and the downstream step is not updated, the system refuses to run. It does not silently pass the wrong shape forward and let production discover it.
Above that is bounded scope. Each step in the process is allowed to touch a specific, declared set of tools, data, and external systems. Nothing else. If the score-the-claim step tries to send an email, the runtime refuses, regardless of how plausible the email is.
At the top is cost SLOs in the source. The economic envelope of the process — what it is allowed to cost per run, per day, per month — is part of the program. Not a dashboard the finance team monitors. Part of the program.
Each layer is a claim the next layer rests on. If the audit log is incomplete, the structure means nothing — you cannot demonstrate which version of the process actually ran. If the structure is undeclared, the contracts mean nothing — there is no shared understanding to type-check against. If the contracts are unenforced, the scope means nothing — agents will pass arguments to tools that those tools were never built to accept. And if the scope is unbounded, the cost SLO means nothing — the process can spend money in ways the budget never anticipated.
Most teams shipping AI agents in 2026 have, at best, two of these five. Often one. Often zero. They make up the difference with vigilance and pagers. It works for a while. It does not scale.
The two-language problem
To see why this matters, consider the most common shape of an enterprise AI deployment, in any industry, anywhere in the world.
Figure 2. The two-language problem. The more important your workflow, the more this gap will cost you.
On one side, in a git repository, lives the code. Engineers wrote it. It is tested, reviewed, deployed, instrumented. When it changes, the change is reviewed. When something goes wrong, you can git blame your way back to the moment.
On the other side, in a Confluence page or a PDF or a SharePoint folder, lives the policy. Compliance wrote it. Legal approved it. It is the document the regulator will ask for in the audit. It says what is required. It says what is forbidden. It is the law of the land for this workflow.
Between them is a gap, and into that gap fall every important failure your enterprise will have over the next three years.
The reason is mechanical. Code and policy are written in two different languages, lived in two different systems, owned by two different organisations. The two systems do not talk to each other. The code cannot verify the policy. The policy cannot constrain the code. When the policy changes, the change has to be translated into code by hand, by someone who hopefully understood both. When the code changes, the change has to be reviewed against the policy by hand, by someone who hopefully read both.
This works, in the sense that nothing immediately catches fire. It also has a one-hundred-percent failure rate over a long enough horizon. The policy and the code drift. Edge cases the policy contemplates are not in the code. Edge cases the code handles are not in the policy. The drift is silent until something happens — a regulator’s question, a customer’s complaint, a Tuesday — and you discover that what you do is not what you said you would do.
Every team that has run a workflow in production for more than two years knows this in their bones. They know that the audit went well last time because someone remembered to update the policy. They know that the small incident in March happened because the engineer didn’t read the latest version. They know that the next problem will happen because nobody can hold both documents in their head at the same time.
The fix is not better documentation. The fix is to stop having two artifacts.
The same thing, in one source
Imagine, for a moment, that the policy and the code lived in the same source file. Not next to each other. Not linked. The same source file.
Figure 3. What it looks like when the workflow, the policy, and the budget all share one source.
The workflow declaration sits next to a governance rule that constrains it. The governance rule sits next to a budget that bounds them both. All three are written in the same syntax. All three are read by the same compiler. All three are checked against each other before anything ships.
If the workflow changes in a way that violates the policy, the build fails. Not at runtime — at build time, before anything is deployed. If the budget is exceeded by the workflow’s expected cost, the build fails. If a step references a tool that the policy disallows, the build fails. The compiler is the contract between engineering and compliance. It cannot be bypassed by someone forgetting to update a Confluence page.
This is not a science fiction proposal. This is how every serious software discipline has worked for half a century. Type systems. Schema validation. Formal contracts in distributed systems. The compiler-as-discipline pattern is older than most of the people writing AI agents today. The novelty is applying it to process as well as code, to policy as well as logic, to cost as well as correctness.
The pushback I get when I describe this is always the same: “but our policy is too complicated to encode.” And it is true that natural-language policy is more expressive than any formal system. It is also true that natural-language policy is less defensible than any formal system. The choice is not between formal-but-impoverished and rich-but-vague. The choice is between enforced and unenforced. Most policies, in practice, are not as complex as they sound; they are written in legalese to defend their authors. The actual logic — what amounts get reviewed, by whom, under what conditions — fits in a few dozen lines of declarative source.
The teams that learn to do this will be the teams that ship faster, not slower. They will not need three meetings to understand whether a change is compliant. The compiler will tell them. They will not need a quarterly compliance review to find drift. There will be no drift to find.
Where the bug gets caught
There is a strange asymmetry in the cost of bugs that anyone who has worked in software for long enough will have an intuitive feel for, and that almost no one outside software has been told about explicitly.
Figure 4. The same bug, caught in three different places.
A bug caught at compile time costs essentially nothing. The author sees a red squiggle, fixes it, moves on. The bug never enters the system. There is no incident, no rollback, no post-mortem. Most engineers do not even count compile-time errors as bugs. They are part of the act of writing.
A bug caught at deployment time costs a deployment. Maybe an evening. Maybe a rollback and a hotfix. Annoying, but contained. The damage is internal — engineering pays the cost, the customer never sees it.
A bug that escapes into production costs whatever the bug actually does. If it sends a wire to the wrong account, it costs the wire plus the recovery plus the regulatory disclosure plus the trust. If it deletes a customer’s data, it costs the data plus the trust plus the lawsuit plus possibly the company. The cost in production is not bounded by anything. It is bounded by what the world makes of the bug.
Each layer outward is roughly an order of magnitude more expensive than the one inside it. This is not a precise number. The point is the shape: the cost of catching a bug is exponential in how late you catch it.
Most agentic systems being shipped today do not have a compile step at all. The platform interprets a configuration. The configuration is checked, in some places, but not as a whole. The checks are at runtime, at deployment time, sometimes only when production traffic hits a particular path. The cost shape of bugs in these systems is therefore the worst possible shape: most bugs are caught in production, where they are most expensive.
This is a choice the platforms make. It is a defensible choice for a research prototype. It is an indefensible choice for a workflow that processes claims, transfers money, or makes hiring decisions. The teams that pick the platform without a compile step will, on average, pay an order of magnitude more for their bugs than teams that pick the platform with one. Over a few years, this difference shows up in everything: cost-of-incidents, time-to-recover, regulatory standing, engineering morale.
It is one of those things that is not visible until it is. And then it is the only thing that matters.
Cost is a number the program knows about
Here is a small thing that is, when you sit with it, an enormous thing.
In every conventional engineering discipline — civil, mechanical, electrical, chemical — the cost envelope of a system is part of the system’s specification. A bridge has a cost. A circuit has a power budget. A process plant has a throughput target. These are not numbers a finance team monitors after construction. They are constraints the design has to honour, in the design phase, on paper, before anything is built.
In software, somehow, this got lost.
Figure 5. The budget is part of the program graph, not a sidecar.
Most software systems, including the agentic systems being built today, treat cost as an emergent property. The system runs. Bills accrue. A dashboard somewhere — or, in the worst cases, an invoice at the end of the month — reports what the system spent. The finance team adjusts. The engineering team apologises. Nobody is held accountable, because nobody wrote down what the cost was supposed to be in the first place.
The fix is to make cost a typed quantity in the program. A first-class construct. A budget node that the workflow references, that the runtime enforces, that the compiler validates. When a step is added that pushes the projected cost above the declared budget, the build fails. When a run starts to exceed its budget, it pauses, alerts, and refuses to continue without explicit approval. The runtime knows what the program is supposed to cost. The program is the source of truth.
This single change does more for FinOps discipline than every dashboard ever built. Because the question stops being “how much did we spend?” — a question that can only be answered after the fact, by an exhausted analyst with three browser tabs open — and becomes “what did we say we’d spend?” — a question with one answer, in the source.
If you want to know whether a team takes its production agents seriously, look for this. If their budget lives in a spreadsheet, they have a hobby. If their budget lives in the source they shipped, they have a system.
What the agentic hype is selling you
Now, with all of this in view, look honestly at what the major agent platforms are selling in 2026.
They are selling you a notebook with extra steps. They are selling you a configuration file in a YAML dialect that nobody compiles, that nobody type-checks, that nobody validates against your governance, that nobody bounds against your budget. They are selling you a beautiful drag-and-drop canvas that produces a workflow that lives only in their database, that you cannot version-control, that you cannot review in a pull request, that you cannot enforce a policy against.
They are selling you, in short, the absence of every layer in the diagram above.
This is not because the people who build these platforms are foolish. They are not. It is because the platforms were built for the demo era — where the goal is to make a thing that produces an impressive output in a controlled setting, fast. They are excellent at this. They are not built for the era we are now actually entering, where the goal is to run a thousand of these things a day in production, on workflows that touch real money and real people and that you have to defend in a hearing.
The platforms will, eventually, grow into this. Some of them are starting to. But the gap between what they ship today and what production-grade process automation requires is much larger than the marketing suggests, and it will be filled by teams who choose the unfashionable option.
The hype cycle has a predictable rhythm. The new thing is exciting because it skips the boring parts. Then production reveals that the boring parts were load-bearing. Then the boring parts get added back, one at a time, painfully, by the teams that survive long enough to need them. The teams that wait for the hype to install the boring parts on their behalf wait a long time, because hype does not install boring parts. Hype installs new features.
The teams that install the boring parts themselves, in advance — that pick the platform with the compiler, that demand the typed contracts, that put the budget in the source — those teams will be the ones still shipping in 2030. They will look, today, like they are moving slowly. In three years they will look like they were moving exactly the right speed.
A craft, not a product
I want to close on something that the previous three pieces have circled around without quite saying.
Process automation is a craft. It is older than the software industry. It is older than the computer industry. It has roots in the time-and-motion studies of the early twentieth century, and before that in the trade guilds, and before that in the apprenticeship traditions of every settled civilisation that ever had to coordinate work across more than one room.
Crafts have a particular shape. They have practitioners who get better with time. They have apprentices who are taught by the people who came before. They have standards that are enforced not by management but by the practitioners themselves, refusing to ship work that does not meet them. They have a sense of what is right and what is sloppy that does not need to be justified to outsiders, because the outsiders cannot tell, and the practitioners can.
The agentic moment is, among other things, a moment when this craft is being threatened by people who do not know it exists. They believe they are inventing process automation from scratch. They are not. They are reinventing it badly, with worse tools, in a hurry, for a market that does not yet know enough to demand otherwise.
The job of anyone who has been around long enough to remember is to keep the craft alive. To insist that the boring parts are load-bearing. To choose, at every fork, the platform that compiles over the platform that interprets, the system that types over the system that hopes, the source that is auditable over the source that is convenient. To pay the small short-term price for the large long-term sanity.
This is not a heroic posture. It is a workmanlike one. There is no glory in it, and there is no Twitter audience for it, and there is no conference circuit for it. There is only the quiet satisfaction, three years on, of looking at a system that is still shipping, that has not had its catastrophe, that the regulators have not flagged, that the customers have not left — and knowing that this is not luck. It is the result of discipline, applied early, when it was unfashionable.
The previous piece in this series ended with the line look at the shape.
I want to end this one with the older line that the discipline of engineering keeps coming back to, because every generation has to learn it again.
Build it like it has to last.
The series ends here, for now. If you found these useful, the previous pieces are linked at the top. The discipline they describe is older than I am, and will outlast all of us. The job is to remember it.











Top comments (0)