Daniel Butler

Posted on Feb 3 • Edited on Feb 16

Designing agentic workflows: a practical example

#ai #agents #codereview #tooling

The previous post focused on failure modes: where agents fail, and where we fail when reviewing their output. This follow-up shifts from diagnosis to design.

What follows is a sample workflow that is explicitly structured to counter those failure modes. It is not the only way to do this, and it is not meant to be permanent. It is one workable design intended to turn known risks into explicit constraints and force verification back into the loop.

This post assumes familiarity with the earlier analysis. If you haven’t read it, the failure modes this workflow responds to are covered here:

Designing agentic workflows: where agents fail and where we fail

What this workflow is (and isn’t)

This workflow is designed for developers who:

are new to agentic coding and want a cautious, structured starting point
need reviewable commits and explicit verification gates
work in enterprise environments with audit, compliance, or production risk
want to prevent common AI failure modes such as premature completion claims, silent test deletion, and shallow or hard-coded implementations

It is not designed for:

exploratory hacking
green-field personal projects
environments where reviewability and accountability are optional

As tools improve and teams gain experience, much of this should be streamlined or removed. This is scaffolding, not a permanent prescription.

The key constraint this workflow enforces

This workflow is built around a single constraint that was implicit in the first post but not yet operationalised:

verification must be independent of the language model.

The failure modes discussed earlier cluster around the same structural issue: the system is allowed to participate in judging its own success. Confidence, summaries, and “done” signals become substitutes for evidence.

This workflow explicitly separates proposal from verification.

The agent can propose changes, generate code, and assemble artifacts. Verification is performed independently using external tools such as test runners, linters, type checkers, static analysis, and real execution.

The intent is trust, but verify.
The model is trusted to propose changes and assemble artifacts. Verification is handled independently using tools with different incentives.

Why a workflow is necessary at all

Agentic systems optimise against what they can observe:

green tests
plausible diffs
explicit completion signals

Humans under review pressure tend to do the same.

A workflow has to deliberately change the optimisation surface by:

bounding work into small, reviewable units
making intent explicit and durable
forcing independent, machine-verifiable evidence before claims of completion
preventing large, ambiguous diffs from accumulating

Without this constraint, the workflow does not materially change those outcomes.

The core design principles

All variants of the workflow follow the same underlying rules, regardless of tool.

1. Intent is captured before execution

Each task starts with a written intent that defines:

what is being changed
what is explicitly not being changed
what success looks like
what evidence will be used to verify it

This prevents intent from living only in conversation history and reduces the chance of silent scope drift.

2. Planning and execution are separated

The agent does not immediately start modifying code.

There is a planning step that produces an explicit, reviewable plan. Only after that plan is accepted does execution begin.

This keeps architectural and behavioural decisions visible and reduces surprise diffs.

3. Changes are deliberately small

Each loop is constrained to a narrow scope:

one concern
one behavioural change
one verification target

This keeps review within human cognitive limits and avoids the shift from verification to plausibility that occurs as diffs grow.

4. Verification is independent and machine-verifiable

Completion requires evidence produced by tools that are not the language model.

Examples include:

test execution results
static analysis outputs
type-checker passes or failures
runtime traces or logs from real execution

The model’s explanations and summaries provide context, not verification. This is a deliberate application of trust, but verify.

5. Cleanup is mandatory

Every loop ends with a cleanup pass:

remove temporary scaffolding
remove dead code
consolidate overlapping helpers
update comments to match behaviour

Cleanup is treated as part of correctness rather than a cosmetic improvement. Residue compounds review cost and cognitive load over time.

The sample workflows

The repository contains several variants of the same workflow adapted for different tools. Structurally, they are identical.

https://github.com/daniel-butler-irl/sample-agentic-workflows

Each workflow documents:

the phases of the loop
what the agent is allowed to do in each phase
what the human is expected to review
what artifacts must exist before progressing

There is also a generic methodology document in the repository (docs/methodology.md) that describes the workflow shape and constraints in a tool-agnostic way, using command-style notation to illustrate phases.

The important part is not the syntax. It’s the shape of the loop.

This post intentionally does not walk through the workflow step by step. The mechanics are documented in the repository. A follow-up post will walk through one variant in detail and show how the constraints described here are enforced in practice.

How this mitigates the earlier failure modes

This workflow does not attempt to fix the model. It changes the environment the model operates in.

Small scopes, durable intent, and independent verification reduce the surface area where:

requirements can disappear silently
shallow implementations can pass unnoticed
reviewers are pushed into plausibility checks
architectural decisions slip through unexamined

The workflow assumes these failures will occur if the structure allows them. Its job is to make them harder to hide and cheaper to detect.

This is a starting point

If you already have strong internal workflows, you may only need pieces of this.

If you are early in agentic coding adoption, starting with something like this avoids learning the hard lessons in production.

As tools add better built-in guardrails, some of this will become redundant. Until then, workflow design remains the most reliable control surface we have.

This repository is meant to be copied, adapted, and eventually outgrown.

In the next post, I walk through how this structure is implemented in practice — how AGENTS.md, gates, task files, and cleanup steps are wired together into a repeatable core loop.

Designing agentic workflows: the core loop