If you use Claude Code or OpenAI Codex heavily, you’ve probably noticed the same thing I did:
AI coding agents are powerful, but they can burn through tokens fast.
Especially on medium or large repositories, the default workflow tends to create long autonomous loops:
- analyze the repo
- implement code
- run tests
- fix errors
- repeat until done
That sounds productive in theory.
In practice, it often means:
- repeated repo scans
- repeated tool output
- repeated reasoning loops
- unpredictable usage spikes
And all of that costs tokens.
What helped me was not a better model, but a better workflow.
This simple change gave me three benefits:
- significantly lower token usage
- much more structured AI development
- cheap hybrid workflows across multiple models
The core idea is simple:
- use a task-based workflow
- keep a small
.ai/workspace inside your repo
The Problem With Default AI Coding Workflows
Most AI coding agents are allowed to do too much in one go.
You ask for a feature, and the model tries to solve the whole thing autonomously:
- understand the repository
- create a plan
- implement everything
- debug everything
- review everything
That creates long execution loops where the model keeps sending large amounts of context back and forth.
On a bigger codebase, that usually means:
- it rescans the repo
- it revisits files it already saw
- it repeats reasoning it already did
- it keeps trying to “finish the whole job” in one session
That’s where token usage explodes.
The Fix: Task-Based AI Development
Instead of letting the agent solve the entire feature in one autonomous loop, break the work into small controlled steps.
The workflow becomes:
- plan the feature
- generate structured tasks
- execute tasks one at a time
- review changes
That changes the model’s behavior completely.
Instead of endless loops, you get short, focused operations.
That alone cuts down token usage dramatically.
The .ai/ Workspace
Inside the repository, I add a small workspace like this:
.ai/
plan.md
tasks/
tasks-done/
repo-map.md
context.md
Each file has a very specific purpose.
plan.md
High-level feature planning.
tasks/
Individual implementation tasks.
For example:
.ai/tasks/01-auth-service.md
.ai/tasks/02-oauth-provider.md
.ai/tasks/03-login-ui.md
Each task contains things like:
- files to modify
- expected code structure
- verification steps
tasks-done/
Finished tasks get moved here.
repo-map.md
A lightweight overview of the project structure.
This is useful because the model no longer has to scan the whole repository every time. It can use the repo map as a compact reference instead.
context.md
A place for extra project-specific context the model may need repeatedly.
Add Workflow Rules With CLAUDE.md and AGENTS.md
To make this work, the agent needs explicit instructions.
-
Claude Code reads
CLAUDE.md -
Codex reads
AGENTS.md
These files define how the agent should behave.
For example:
Planner
- create implementation plan
- write the plan to
.ai/plan.md - generate tasks
Coder
- implement one task at a time
- modify only files needed for that task
Tester
- run tests and lint checks
Reviewer
- review the diff and improve code quality
The most important rule is this:
Never implement the entire feature in a single loop. Always work task-by-task.
That sounds small, but it makes a big difference.
Example Prompts
After asking for a feature, I prompt the agent like this:
Based on my request create or update
.ai/plan.mdand create task files in.ai/tasks/. Do not implement anything yet. Stop after the plan and tasks are created. Use a timestamp prefixYYYYMMDD-HHMM-to ensure unique filenames.
Then I move into execution with:
Implement the tasks from
.ai/tasks/sequentially. Only modify files required for the current task. Complete one task, run the verification steps, then stop.
This keeps the workflow bounded and predictable.
Why This Saves Tokens
Token usage usually doesn’t explode during planning.
It explodes during execution loops.
A rough pattern looks like this:
Planning → small
Execution loops → very large
Review → small
By forcing the model to:
- implement one task at a time
- avoid scanning the full repo repeatedly
- stop after each step
you eliminate most of the expensive loops.
The difference is surprisingly noticeable.
A Nice Side Effect: Better Engineering Discipline
The token savings are great.
But the second benefit might actually be more important:
your AI workflow becomes much more organized.
Instead of chaotic autonomous behavior, the AI starts to behave more like a small engineering team:
planner → coder → tester → reviewer
The .ai/tasks/ files act like lightweight tickets.
Even if you switch context, come back later, or hand work off to another model, you still know:
- what task is next
- which files should change
- how to verify the result
That’s a much cleaner way to work than relying on one giant chat session and hoping the model remembers the right things.
This Also Unlocks Multi-Agent, Multi-Model Workflows
This is where it gets even more interesting.
Because the workflow lives in markdown files instead of inside one tool’s memory, it becomes portable.
That means you can switch between Anthropic, OpenAI, and other providers depending on the task.
For example:
- Terminal 1 (Claude Code): planning, architectural reasoning, code review
- Terminal 2 (Codex): brute-force execution, writing tests, handling concurrent tasks
And if you want even more flexibility, you can route tools through something like LiteLLM + OpenRouter.
That gives you a universal interface to many different models, so you can choose the cheapest, fastest, or strongest option depending on the task.
Because the plan and task files live in .ai/tasks/ as plain text, any agent can pick them up and continue from the same shared workflow.
That means:
- less vendor lock-in
- fewer rate-limit bottlenecks
- better cost control
- easier experimentation with multiple models
In other words:
the future of AI coding probably isn’t one magical agent.
It’s a structured workspace where multiple agents can collaborate.
The Result
With this setup:
- Claude Code and Codex usage become much more predictable
- token consumption drops significantly
- development becomes more structured
- hybrid multi-model workflows become easy
Instead of one AI agent running uncontrolled loops, you get something closer to a small AI development team working through clearly defined tasks.
And the best part is that the setup is simple:
just a few files and a small repository convention.
Try It Yourself
I published a small bootstrap script that sets this workflow up automatically in a repository by generating both CLAUDE.md and AGENTS.md.
https://github.com/nils-fl/AgenticCodingInit
If you're using Claude Code or Codex heavily, it’s worth trying.
A small change in workflow can make a surprisingly large difference in both cost and productivity.
Final Thought
A lot of people assume the solution to expensive AI coding is “use a better model.”
I’m increasingly convinced the real solution is better process design.
Smaller loops.
Clearer tasks.
Less context thrashing.
More discipline.
That’s how you stop burning tokens.
Top comments (0)