DEV Community

Nils Flaschel
Nils Flaschel

Posted on

Stop Burning Tokens in AI Coding Agents (Claude & Codex)

If you use Claude Code or OpenAI Codex heavily, you’ve probably noticed the same thing I did:

AI coding agents are powerful, but they can burn through tokens fast.

Especially on medium or large repositories, the default workflow tends to create long autonomous loops:

  1. analyze the repo
  2. implement code
  3. run tests
  4. fix errors
  5. repeat until done

That sounds productive in theory.

In practice, it often means:

  • repeated repo scans
  • repeated tool output
  • repeated reasoning loops
  • unpredictable usage spikes

And all of that costs tokens.

What helped me was not a better model, but a better workflow.

This simple change gave me three benefits:

  1. significantly lower token usage
  2. much more structured AI development
  3. cheap hybrid workflows across multiple models

The core idea is simple:

  • use a task-based workflow
  • keep a small .ai/ workspace inside your repo

The Problem With Default AI Coding Workflows

Most AI coding agents are allowed to do too much in one go.

You ask for a feature, and the model tries to solve the whole thing autonomously:

  • understand the repository
  • create a plan
  • implement everything
  • debug everything
  • review everything

That creates long execution loops where the model keeps sending large amounts of context back and forth.

On a bigger codebase, that usually means:

  • it rescans the repo
  • it revisits files it already saw
  • it repeats reasoning it already did
  • it keeps trying to “finish the whole job” in one session

That’s where token usage explodes.


The Fix: Task-Based AI Development

Instead of letting the agent solve the entire feature in one autonomous loop, break the work into small controlled steps.

The workflow becomes:

  1. plan the feature
  2. generate structured tasks
  3. execute tasks one at a time
  4. review changes

That changes the model’s behavior completely.

Instead of endless loops, you get short, focused operations.

That alone cuts down token usage dramatically.


The .ai/ Workspace

Inside the repository, I add a small workspace like this:

.ai/
  plan.md
  tasks/
  tasks-done/
  repo-map.md
  context.md
Enter fullscreen mode Exit fullscreen mode

Each file has a very specific purpose.

plan.md

High-level feature planning.

tasks/

Individual implementation tasks.

For example:

.ai/tasks/01-auth-service.md
.ai/tasks/02-oauth-provider.md
.ai/tasks/03-login-ui.md
Enter fullscreen mode Exit fullscreen mode

Each task contains things like:

  • files to modify
  • expected code structure
  • verification steps

tasks-done/

Finished tasks get moved here.

repo-map.md

A lightweight overview of the project structure.

This is useful because the model no longer has to scan the whole repository every time. It can use the repo map as a compact reference instead.

context.md

A place for extra project-specific context the model may need repeatedly.


Add Workflow Rules With CLAUDE.md and AGENTS.md

To make this work, the agent needs explicit instructions.

  • Claude Code reads CLAUDE.md
  • Codex reads AGENTS.md

These files define how the agent should behave.

For example:

Planner

  • create implementation plan
  • write the plan to .ai/plan.md
  • generate tasks

Coder

  • implement one task at a time
  • modify only files needed for that task

Tester

  • run tests and lint checks

Reviewer

  • review the diff and improve code quality

The most important rule is this:

Never implement the entire feature in a single loop. Always work task-by-task.

That sounds small, but it makes a big difference.


Example Prompts

After asking for a feature, I prompt the agent like this:

Based on my request create or update .ai/plan.md and create task files in .ai/tasks/. Do not implement anything yet. Stop after the plan and tasks are created. Use a timestamp prefix YYYYMMDD-HHMM- to ensure unique filenames.

Then I move into execution with:

Implement the tasks from .ai/tasks/ sequentially. Only modify files required for the current task. Complete one task, run the verification steps, then stop.

This keeps the workflow bounded and predictable.


Why This Saves Tokens

Token usage usually doesn’t explode during planning.

It explodes during execution loops.

A rough pattern looks like this:

Planning → small
Execution loops → very large
Review → small

By forcing the model to:

  • implement one task at a time
  • avoid scanning the full repo repeatedly
  • stop after each step

you eliminate most of the expensive loops.

The difference is surprisingly noticeable.


A Nice Side Effect: Better Engineering Discipline

The token savings are great.

But the second benefit might actually be more important:

your AI workflow becomes much more organized.

Instead of chaotic autonomous behavior, the AI starts to behave more like a small engineering team:

planner → coder → tester → reviewer
Enter fullscreen mode Exit fullscreen mode

The .ai/tasks/ files act like lightweight tickets.

Even if you switch context, come back later, or hand work off to another model, you still know:

  • what task is next
  • which files should change
  • how to verify the result

That’s a much cleaner way to work than relying on one giant chat session and hoping the model remembers the right things.


This Also Unlocks Multi-Agent, Multi-Model Workflows

This is where it gets even more interesting.

Because the workflow lives in markdown files instead of inside one tool’s memory, it becomes portable.

That means you can switch between Anthropic, OpenAI, and other providers depending on the task.

For example:

  • Terminal 1 (Claude Code): planning, architectural reasoning, code review
  • Terminal 2 (Codex): brute-force execution, writing tests, handling concurrent tasks

And if you want even more flexibility, you can route tools through something like LiteLLM + OpenRouter.

That gives you a universal interface to many different models, so you can choose the cheapest, fastest, or strongest option depending on the task.

Because the plan and task files live in .ai/tasks/ as plain text, any agent can pick them up and continue from the same shared workflow.

That means:

  • less vendor lock-in
  • fewer rate-limit bottlenecks
  • better cost control
  • easier experimentation with multiple models

In other words:

the future of AI coding probably isn’t one magical agent.

It’s a structured workspace where multiple agents can collaborate.


The Result

With this setup:

  • Claude Code and Codex usage become much more predictable
  • token consumption drops significantly
  • development becomes more structured
  • hybrid multi-model workflows become easy

Instead of one AI agent running uncontrolled loops, you get something closer to a small AI development team working through clearly defined tasks.

And the best part is that the setup is simple:
just a few files and a small repository convention.


Try It Yourself

I published a small bootstrap script that sets this workflow up automatically in a repository by generating both CLAUDE.md and AGENTS.md.

https://github.com/nils-fl/AgenticCodingInit

If you're using Claude Code or Codex heavily, it’s worth trying.

A small change in workflow can make a surprisingly large difference in both cost and productivity.


Final Thought

A lot of people assume the solution to expensive AI coding is “use a better model.”

I’m increasingly convinced the real solution is better process design.

Smaller loops.
Clearer tasks.
Less context thrashing.
More discipline.

That’s how you stop burning tokens.

Top comments (0)