The previous post focused on failure modes: where agents fail, and where we fail when reviewing their output. This follow-up shifts from diagnosis to design.
What follows is a sample workflow that is explicitly structured to counter those failure modes. It is not the only way to do this, and it is not meant to be permanent. It is one workable design intended to turn known risks into explicit constraints and force verification back into the loop.
This post assumes familiarity with the earlier analysis. If you haven’t read it, the failure modes this workflow responds to are covered here:
Designing agentic workflows: where agents fail and where we fail
What this workflow is (and isn’t)
This workflow is designed for developers who:
- are new to agentic coding and want a cautious, structured starting point
- need reviewable commits and explicit verification gates
- work in enterprise environments with audit, compliance, or production risk
- want to prevent common AI failure modes such as premature completion claims, silent test deletion, and shallow or hard-coded implementations
It is not designed for:
- exploratory hacking
- green-field personal projects
- environments where reviewability and accountability are optional
As tools improve and teams gain experience, much of this should be streamlined or removed. This is scaffolding, not a permanent prescription.
The key constraint this workflow enforces
This workflow is built around a single constraint that was implicit in the first post but not yet operationalised:
verification must be independent of the language model.
The failure modes discussed earlier cluster around the same structural issue: the system is allowed to participate in judging its own success. Confidence, summaries, and “done” signals become substitutes for evidence.
This workflow explicitly separates proposal from verification.
The agent can propose changes, generate code, and assemble artifacts. Verification is performed independently using external tools such as test runners, linters, type checkers, static analysis, and real execution.
The intent is trust, but verify.
The model is trusted to propose changes and assemble artifacts. Verification is handled independently using tools with different incentives.
Why a workflow is necessary at all
Agentic systems optimise against what they can observe:
- green tests
- plausible diffs
- explicit completion signals
Humans under review pressure tend to do the same.
A workflow has to deliberately change the optimisation surface by:
- bounding work into small, reviewable units
- making intent explicit and durable
- forcing independent, machine-verifiable evidence before claims of completion
- preventing large, ambiguous diffs from accumulating
Without this constraint, the workflow does not materially change those outcomes.
The core design principles
All variants of the workflow follow the same underlying rules, regardless of tool.
1. Intent is captured before execution
Each task starts with a written intent that defines:
- what is being changed
- what is explicitly not being changed
- what success looks like
- what evidence will be used to verify it
This prevents intent from living only in conversation history and reduces the chance of silent scope drift.
2. Planning and execution are separated
The agent does not immediately start modifying code.
There is a planning step that produces an explicit, reviewable plan. Only after that plan is accepted does execution begin.
This keeps architectural and behavioural decisions visible and reduces surprise diffs.
3. Changes are deliberately small
Each loop is constrained to a narrow scope:
- one concern
- one behavioural change
- one verification target
This keeps review within human cognitive limits and avoids the shift from verification to plausibility that occurs as diffs grow.
4. Verification is independent and machine-verifiable
Completion requires evidence produced by tools that are not the language model.
Examples include:
- test execution results
- static analysis outputs
- type-checker passes or failures
- runtime traces or logs from real execution
The model’s explanations and summaries provide context, not verification. This is a deliberate application of trust, but verify.
5. Cleanup is mandatory
Every loop ends with a cleanup pass:
- remove temporary scaffolding
- remove dead code
- consolidate overlapping helpers
- update comments to match behaviour
Cleanup is treated as part of correctness rather than a cosmetic improvement. Residue compounds review cost and cognitive load over time.
The sample workflows
The repository contains several variants of the same workflow adapted for different tools. Structurally, they are identical.
https://github.com/daniel-butler-irl/sample-agentic-workflows
Each workflow documents:
- the phases of the loop
- what the agent is allowed to do in each phase
- what the human is expected to review
- what artifacts must exist before progressing
There is also a generic methodology document in the repository (docs/methodology.md) that describes the workflow shape and constraints in a tool-agnostic way, using command-style notation to illustrate phases.
The important part is not the syntax. It’s the shape of the loop.
This post intentionally does not walk through the workflow step by step. The mechanics are documented in the repository. A follow-up post will walk through one variant in detail and show how the constraints described here are enforced in practice.
How this mitigates the earlier failure modes
This workflow does not attempt to fix the model. It changes the environment the model operates in.
Small scopes, durable intent, and independent verification reduce the surface area where:
- requirements can disappear silently
- shallow implementations can pass unnoticed
- reviewers are pushed into plausibility checks
- architectural decisions slip through unexamined
The workflow assumes these failures will occur if the structure allows them. Its job is to make them harder to hide and cheaper to detect.
This is a starting point
If you already have strong internal workflows, you may only need pieces of this.
If you are early in agentic coding adoption, starting with something like this avoids learning the hard lessons in production.
As tools add better built-in guardrails, some of this will become redundant. Until then, workflow design remains the most reliable control surface we have.
This repository is meant to be copied, adapted, and eventually outgrown.
This is one way to do it — not the only way.
Top comments (0)