Running an AI agent on Claude Max costs $200 a month. I was hitting weekly usage limits by Friday at 70-80% capacity and almost switched platforms entirely.
Instead, I discovered something counterintuitive: using the right model for each specific task.
My agent: Wiz
Wiz operates with persistent memory and full infrastructure access. It functions more like a junior developer than a chatbot. Hourly task checks, autonomous night shifts, Discord processing, daily reports, job scraping, blog pipeline, production deployments.
Initial approach: Sonnet 4.5 as default, Opus for complex decisions, Haiku for simple lookups. Then automations scaled. Token usage came back.
So I tried something unconventional: making Haiku the baseline.
Why it worked
Most agent work is execution, not creativity:
- Reading files and running scripts
- Sending messages and updating task lists
- Scraping with predefined criteria
- Generating structured reports
- Following checklists
- Triggering automations based on conditions
Haiku excels at precision-based execution. You do not need sophisticated reasoning for these.
The three-tier model
- Haiku (95%) — execution and automation
- Sonnet (4%) — user-facing interactions, content creation
- Opus (1%) — complex reasoning, architecture decisions
Result: weekly usage dropped from 70-80% to ~40% of my Claude Max quota.
The expensive model was slowing things down with extra reasoning I did not need.
Originally published on Digital Thoughts — where I write about building AI agents in the real world.
Top comments (0)