4 Hard Lessons on Optimizing AI Coding Agents

peter.zeng — Fri, 22 May 2026 13:44:14 +0000

4 Hard Lessons on Optimizing AI Coding Agents (Claude Code + Cost)

I've been running Claude Code Cli in production for about months now—building, shipping, and watching the token meter spin. Here's what I wish I knew before I started.

1. Your Context Strategy Is Everything

The developers getting 10x from Claude Code aren't prompt whisperers. They're context engineers.

The 2026 consensus is clear: CLAUDE.md is no longer optional. Keep it lean. Ask yourself: "Would removing this line cause the agent to make mistakes?" If not, cut it. No one has time to scroll through 15k tokens of stale docs every session.

One more .claudeignore tweak took my sessions from 150k tokens to 60k—a 60% drop. The culprit? Node modules and dist folders. Block them. You can manually feed files when needed.

The golden rule: engineer before you prompt.

2. The /ghost, OODA, and L99 Patterns

Stop talking to Claude Code like it's ChatGPT. It has 40+ tools built in—most people use 3 or 4.

Four prompt frameworks that actually moved the needle for me:

/ghost : Kills the AI filler voice. "It's worth noting that..." disappears. You get clean, direct prose for docs and PRs.
OODA : Forces a concrete decision instead of "it depends." Feed it your constraints (team size, shipping cadence) and demand a plan.
L99 : Unlocks expert depth. Instead of tutorial answers, you get production-ready patterns.
PERSONA : Frames the response through an expert lens. "Senior AWS Solutions Architect with 10 years experience" actually works.

Also: be direct. The system prompt explicitly tells Claude Code to be terse. If you write "Hey, could you maybe take a look at..." you're working against the grain. A crisp "Fix this function" is what the agent wants.

3. Cost Optimization Isn't Magic—It's Math

Let's run the numbers. A 200-call agent session on Opus with growing context can hit 4M input tokens. At $5/M, that's $20 just for input before output. A 20-developer team running 50 sessions daily? $10k+ per month.

Here's where the real waste hides: 70–85% of your tokens are input tokens. And 80% of that input comes from reading project files.

Three fixes that actually work:

Cache your prompts. Anthropic and OpenAI offer 90% and 50% discounts on cache hits. That repeated system prompt? Stop paying for it on every turn.

Skill layers slash token spend. One team cut consumption from 10.4M to 3.7M tokens—$9.21 to $2.81 per session—by routing context through a reusable Skill layer. Think of it as middleware for tokens.

Route by difficulty. A classification model routing simple queries to Sonnet (or even Haiku for boilerplate) while reserving Opus for architectural decisions can cut 40–70% of spend with no quality loss.

Don't let perfect be the enemy of cheap.

4. Agentic Workflows Beat Scripts

The shift from "scripting" to "orchestration" is real. The teams shipping fastest in 2026 aren't writing prompts—they're designing workflows.

Two patterns worth stealing:

TDD governance at the prompt level. Encode your testing discipline into the agent's loop: write, run, observe errors, fix, repeat. With inference costs 5–10x lower than 2025, this feedback loop is now economically sane for 500-line changes.

Leave Opus 4.6 habits behind. Opus 4.7 treats you differently. The old "fine-grained pair programming" approach backfires. Instead, specify the full task in the first turn—goals, constraints, acceptance criteria—and let the agent work. And stop defaulting to max effort. xhigh is the new sweet spot. max overthinks and slows you down.

Bottom line: Engineer your context, use prompt frameworks that drive decisions, route strategically for cost, and design workflows instead of writing scripts. The agents are ready. Your wallet (and your weekend) will thank you.