I Realized I Was Paying an AI to Reread My Stack Traces, and What I Did to Fix That.

#ai #opensource #showdev #tooling

I was thirty prompts deep into building a new feature when the realization hit me: I was paying an AI coding agent to reread my own stack traces, abandoned plans, package docs, and outdated assumptions from an hour ago.

The agent was still writing useful code, but the session itself was leaking—bleeding tokens, model budget, and my own review time.

There’s no line item on your API invoice that says "Stale context: $30." But you feel the tax later. You feel it when a tiny UI tweak takes way too long, when the agent reasons from outdated information, or when a premium model spends part of its run reprocessing history the current task didn't even need.

The Context Window Trap

Most of my AI coding sessions started beautifully clean: one request, a couple of files, clear constraints. But as the chat went on, it picked up baggage:

Architecture guesses sat right next to pasted errors.
Old corrections sat next to new instructions.
Partial plans stayed in the window long after the task changed.

By the time I asked for a minor frontend change, the agent might still be lugging my database migration history into the prompt.

Then there was the routing problem. Because I was using one generic agent for everything, it was bouncing between architecture, implementation, review, security, and debugging—all in the same chat. Sometimes an expensive model was wasting cycles on simple cleanups; other times, a cheap model was making production-risk decisions because there was no clean escalation path.

The agent hadn't suddenly become useless. The process had just become one long chat. Everything was landing in the same context window, whether the current task needed it or not.

Breaking Down the Monolith

I realized the engineering process didn't need a bigger context window; it needed separated concerns. So, I built oowl to break down that monolith.

It's a workflow layer for OpenCode designed to turn one massive, leaky chat into scoped artifacts and defined role handoffs.

I chose OpenCode specifically because I needed its native swarm support. The fact that a primary agent can spin up several sub-agents on the fly—assigning each one a prompt and model—gave me the exact primitive I needed to build out artifacts, approval gates, file locks, and distinct reviewer phases.

The Workflow: Artifacts Over Transcripts

Instead of dumping everything into an endless transcript, I structured the workflow like a proper engineering pipeline:

request
  -> dispatcher
  -> architect writes design.md
  -> user approval
  -> planner writes implementation.md
  -> user approval
  -> builder schedules locked tasks
  -> implementation agents execute
  -> reviewer writes review.md

The key here is that each handoff leaves behind a tangible artifact. The design captures the architecture before code changes. The implementation plan captures the work breakdown before agents touch files. File locks capture scope. The review document captures the final check.

The next agent gets the artifact and files for its job, not the whole transcript.

The frontend agent reads the UI task and assigned component files. The database agent reads schema context and migration notes. The security reviewer reads the sensitive surfaces and threat assumptions.

Mapping Models to Risk

Because the work was finally scoped, mapping models to specific tasks became straightforward. I could use cheap, fast models to route, schedule, and handle low-risk edits. Mid-tier models could plan, implement, and review normal work. And the heavy hitters could be reserved strictly for architecture escalation and security review.

A standard tier profile looks something like this:

Agent	Model	Tier
dispatcher	deepseek-v4-flash	cheap
builder	deepseek-v4-flash	cheap
low-engineer	deepseek-v4-flash	cheap
architect	minimax-m2.7	mid
planner	minimax-m2.7	mid
frontend-engineer	minimax-m2.7	mid
backend-engineer	minimax-m2.7	mid
reviewer	minimax-m2.7	mid
database-engineer	deepseek-v4-pro	mid
high-architect	qwen3.7-max	premium
security-auditor	glm-5.1	premium

The specific models matter less than the boundaries themselves. Giving each role a job, a scope, and a model profile made the cost and risk predictable before a request even started running.

Fixing the Leaks

Enforcing these boundaries systematically shut down the issues I was facing:

Problem	oowl mechanism
stale chat context	scoped artifacts
wrong model for the task	role-based model
surprise file edits	assigned file locks
invisible architecture choices	design approval gate
implementation drift	plan approval gate
self-reviewer	separate reviewer phase
vague completion	verification requirement

Small work can stay small. Larger work hits the necessary gates at the exact points where mistakes would cost the most: design, planning, implementation, and review.

Trying it Out

If you want to try this workflow out, installing oowl adds the OpenCode agent framework right into your project. From your project root, just run:

npx @jimzandueta/oowl install

opencode

Installing oowl with npx writes the OpenCode agent framework into the project: agents, commands, prompts, model profiles, runtime config, and workflow instructions.

I also added an optional oowl init command that reads your project's shape and recommends extra skills per agent. You can approve, edit, or skip them.

oowl init

From there, the dispatcher decides whether the work fits one implementer or needs the full design, plan, build, and review flow.

What's Next

Right now, oowl is deeply tied to OpenCode. My next goal is to make this same workflow portable to other AI coding tools, including Codex and Claude Code.

If that version becomes cowl, I'm going to have a really tough decision on whether the owl logo survives or gets replaced by a very serious cow.

But regardless of the tool, the underlying leak is exactly the same: when the engineering process devolves into one large chat, stale context will always end up on the bill.