A lightweight, spec-driven workflow for keeping AI coding agents focused, efficient, and reviewable as projects grow.
Since February, I’ve been refining a small workflow for coding with AI agents.
Not because I wanted more process.
Because I was tired of wasting tokens.
If you’ve used a coding agent on a real side project, you probably know the pattern. At the beginning, everything feels almost too good. The project is small. The agent can read most of it. “Plan, then build” works well enough.
Then the project grows.
Suddenly, the same loop starts to feel expensive. The agent reads files it doesn’t need. It rediscovers decisions you already made. It mixes planning with implementation. And even after all that, it can still miss the intent you already had in your head.
I looked at existing spec-driven workflows, including things like spec-kit and BMAD, and I understood why people like them. They give agent work more structure. But for the way I build, they felt a little too heavy—too much ceremony before I could keep moving.
Workflows are at their best when they directly fit the developer and the specific project needs, not when they force you into a generic mold. What matters aren't rigid frameworks, but the core ideas that keep an agent focused.
When I first started experimenting with these boundaries, I was using opencode. I eventually migrated over to codex because I preferred its GUI experience for my daily development loop. I ultimately built a lightweight, spec-driven workflow extension around it, but the underlying principles behind this setup are entirely tool-agnostic—they can be applied to almost any modern coding agent environment you prefer to use.
The workflow is designed around a few focused jobs:
- Plan the work explicitly before touching code.
- Explore a larger feature safely before building it.
- Implement exactly one task at a time.
- Review that single task against its spec.
- Write research notes and fuzz-test pure functions.
The fundamental idea is simple: Don’t ask the agent to re-understand the whole project every time. Give it enough project memory, task intent, and quality checks to make the next step reliable, without turning your repository into a bureaucracy.
The Problem Is Not Code Generation
Agents can write code. The harder problem is attention.
I do not look at agents through the lens of anthropomorphization. They aren't digital employees filling human roles; they are software processes. They read a specific input state and produce an output state based on an inferred intent.
When you look at them as data pipelines, the default plan/build loop becomes obviously flawed. It asks a single execution loop to process far too many variables at once:
Understand Codebase ➔ Infer Product Intent ➔ Design Change ➔ Implement ➔ Test ➔ Review ➔ Remember Context
That works when the project data footprint is small. It gets much worse once the project has history.
The agent processes context too broadly. It spends its token budget on orientation instead of execution. It forgets decisions that were obvious in a previous session. And because most of the intent lives in ephemeral chat history, every new session starts with a weaker, fuzzier version of the input parameters.
The result is not just slower work. It is worse work, because the agent’s limited context window is being spent in the wrong places. It is an input filtering problem.
The Unit Is a Task Spec
To change the shape of this loop, every meaningful piece of agent work should start with a task spec. Not a giant requirements document. Not a permanent source of truth. Just a temporary contract.
A task spec is highly modular and typically includes:
| Component | Purpose |
|---|---|
| Functional Goal & Non-Goals | Explicit boundaries on what not to touch. |
| User-Facing Behavior | What the end result actually looks like. |
| Technical Plan | Likely code touchpoints and implementation steps. |
| Test Expectations | Verification requirements and expected evidence. |
That small document changes the entire feel of the development loop.
You can plan a task in one session, implement it in another, and review it in a third. The implementation agent doesn’t have to guess what you meant. The reviewer doesn’t have to reconstruct intent from a raw git diff. The next session doesn’t have to dig through a messy chat history to figure out why a change was made.
The task becomes the handoff—and more importantly, a context filter. Instead of letting the agent wander through the whole project, the task gives it a strict reason to read some things and ignore others. That is where the real token savings come from: making context intentional.
Intent Specs vs. Project Specs
One distinction matters immensely if you want to keep your workflow fast: Task specs are intent specs, not project specs.
They describe what the project needed at a specific moment. They are highly valuable while the work is being planned, built, and reviewed. They are not meant to become immortal, living architecture documents that you have to maintain forever.
If a task creates durable knowledge, that knowledge should immediately move somewhere durable:
💡 Where Durable Knowledge Belongs:
Source code comments •MAP.mdfiles • Package maps • Official architecture docs •READMEupdates
The task says what should happen while it's happening. The project records what became true. Keeping these separate stops the process from feeling like a chore.
MAP.md Is the Agent’s First Five Minutes
Another vital piece of this puzzle is keeping an orientation layer, like a MAP.md file, at the root of a project or package. This file explains what the project does, what the important files are responsible for, and where specific behaviors live.
It does not replace reading code; it makes the first code read less random.
This is useful even when working alone—everyone forgets where a specific behavior lives after a few weeks away from a file. For an agent, it’s a massive force multiplier. Before it burns thousands of tokens searching through a massive repository, it reads the map and makes an incredibly accurate first guess.
Epics Stop Big Features From Becoming One Giant Prompt
For larger features, a healthy workflow uses a container—an Epic. An epic is not just a giant task; it’s an exploration boundary that keeps the agent from building and planning simultaneously.
The flow naturally separates into distinct phases:
- Create an epic for the broader goal.
- Explore the codebase and write an isolated
context.md. - Split the epic into hyper-focused, bite-sized tasks.
- Implement and review one isolated task at a time.
Exploration gets its own artifact. Implementation gets small tasks. Review has something concrete to compare against. The epic context becomes the shared memory for that specific slice of work, preventing the agent from trying to hold an entire system architecture in its head while writing code.
The Review Pipeline: Decoupling Execution States
The review step is where this input/output approach truly pays off. Without a written task spec, evaluating an agent's output often defaults to a vague, vibe-based question: "Does this code look okay?"
With a task spec, review becomes a strict, binary data validation problem: "Does the output state satisfy the input contract?"
To make this reliable, the review process should be handled by a completely separate agent invocation with a read-only posture.
[Task Spec + Diff] ➔ Ingestion ➔ Read-Only Validation Pipeline ➔ Pass/Fail + Discrepancies
When you ask a single agent session to both implement code and review its own changes, the context window becomes muddy. The generation bias pollutes the evaluation.
By passing the task spec and the resulting code diff into a fresh, isolated process that has no write permissions, you change the nature of the evaluation. Its processing loop isn't trying to figure out how to write the next line of code; its sole instruction is to parse the diff as data and validate it against the spec requirements, technical plan, and verification evidence.
Decoupling the generation pipeline from the validation pipeline yields significantly higher predictability.
Tailoring the Contract to Your Project
Because every project is different, a good workflow shouldn't be magical or rigid. It needs an adapter layer—a local contract (like a .agents/workflow-config.md) that maps the philosophy to your specific stack.
One project might use make test, another uses npm test. One repository might store research notes under docs/research/, while another skips notes entirely. The adapter simply tells the workflow how this specific project functions, keeping the workflow highly portable without forcing every repository into the exact same layout.
What to Take Away
You don't need to adopt a massive project management framework to make AI coding efficient. You just need to give the agent a few useful boundaries:
- Map the project layout to orient the agent instantly.
- Write down strict intent before starting implementation.
- Build exactly one isolated task at a time.
- Review against the written intent, treating the output as data to be verified.
If your agent sessions keep turning into long, expensive, and frustrating rediscovery loops, try giving the work an explicit contract before you ever ask it to generate a line of code.
If you want to see an example of how this configuration layer and task lifecycle are implemented structurally, check out the open-source repository pattern at gtindo/gt-workflow.
Top comments (0)