DEV Community

sunyifu
sunyifu

Posted on

How I Cut My AI Agent's API Bill by 60% — Without Losing Quality

Two months into self-hosting my AI agent, I opened my Anthropic dashboard and saw a number I didn't love. $47 for the month. Not catastrophic, but way more than it needed to be.

After a week of tweaking, I got that down to ~$15/month — same quality for daily tasks, same channels, same skills. Here's exactly what I changed.

(If you're new here: Part 1 covers setting up OpenClaw, and Part 2 covers the Skills I use daily.)

Why Was I Overpaying?

The default setup uses one model for everything. I had Claude Sonnet 4 handling every message — including "thanks", "ok", and "what time is it?". That's like using a sports car to go get milk.

Here's what the models actually cost (as of early 2026):

Model Input Output Best For
Claude Haiku 4.5 $1/1M tokens $5/1M tokens Quick tasks, greetings, lookups
Claude Sonnet 4 $3/1M tokens $15/1M tokens Balanced daily use
Claude Opus 4.6 $5/1M tokens $25/1M tokens Deep research, analysis

Haiku is 5x cheaper than Opus on output tokens, and 3x cheaper than Sonnet. For "what's the weather?" or "remind me to check the deployment" — Haiku handles it just fine.

Prices from Anthropic's pricing page. Check there for the latest rates.

Fix #1: Use Different Models for Different Channels

This was the single biggest win. OpenClaw lets you assign models per channel using modelByChannel in your config:

// ~/.openclaw/openclaw.json
{
  "channels": {
    "modelByChannel": {
      "whatsapp": {
        "default": "anthropic/claude-haiku-4-5"
      },
      "telegram": {
        "default": "anthropic/claude-haiku-4-5"
      },
      "discord": {
        "default": "anthropic/claude-sonnet-4"
      },
      "slack": {
        "default": "anthropic/claude-sonnet-4"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

My logic: WhatsApp and Telegram are mostly quick personal messages — Haiku is perfect. Discord and Slack are where I do actual work stuff (code review, debugging), so those get Sonnet.

This one change shifted about 50% of my traffic to Haiku. Immediate cost drop.

Want smarter routing? The community has built auto-routers like iblai-openclaw-router that analyze message complexity and route to Haiku/Sonnet/Opus automatically. I haven't tried it yet, but the concept is solid — route "hi" to Haiku and "analyze this architecture" to Opus, per message.

Fix #2: Limit Conversation History

This one surprised me. By default, OpenClaw sends your recent conversation history with every request — so the model has context. But that means every "yes" reply carries a long tail of previous messages worth of tokens.

OpenClaw lets you cap this with historyLimit:

{
  "messages": {
    "groupChat": {
      "historyLimit": 10
    }
  },
  "channels": {
    "whatsapp": {
      "dmHistoryLimit": 8
    },
    "telegram": {
      "dmHistoryLimit": 8
    },
    "discord": {
      "historyLimit": 15
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

I keep Discord higher (15) because work conversations need more context. WhatsApp and Telegram get 8 — plenty for casual back-and-forth.

Before this change, I was sending ~10,000 input tokens per request with all the history. After: ~4,000. That alone cut my input costs by more than half.

Fix #3: Use Anthropic's Prompt Caching

This one's not an OpenClaw setting — it's an Anthropic API feature. If your agent sends the same system prompt and conversation prefix with every request (which it does), prompt caching avoids reprocessing those tokens.

The savings are real:

Operation Cost
Normal input $3/1M tokens (Sonnet)
Cache write (first request) $3.75/1M tokens (1.25x)
Cache read (subsequent) $0.30/1M tokens (0.1x)

That's a 90% discount on repeated context. If your system prompt is 1,000 tokens and you send 50 messages, you pay full price once and 10% for the other 49.

OpenClaw supports this if your API provider has caching enabled. Check your Anthropic dashboard — if you see "cache read tokens" in your usage, it's already working.

Fix #4: Pick a Cheaper Default Model

Sounds obvious, but I was overthinking it. I switched my default from Sonnet to Haiku 4.5 for most channels, and honestly? For 80% of my daily interactions, I can't tell the difference.

Haiku handles:

  • Quick Q&A and lookups ✅
  • Smart home commands ✅
  • Simple reminders and scheduling ✅
  • Casual conversation ✅

Where I notice the difference: complex code review, long-form writing, and nuanced analysis. For those, I keep Sonnet (or Opus) on my work channels.

The mental shift: default cheap, upgrade where it matters — not the other way around.

Fix #5: Monitor and Set Limits

You can't optimize what you don't measure. OpenClaw has a built-in stats command:

openclaw stats
Enter fullscreen mode Exit fullscreen mode
Today's Usage:
  Input tokens:  45,230
  Output tokens: 12,450
  Estimated cost: $0.23

This month:
  Total tokens: 1,234,567
  Estimated cost: $8.45
Enter fullscreen mode Exit fullscreen mode

I check this weekly. It helps me spot when something's off — like that time a Discord channel was generating way more traffic than I expected.

Also set up usage limits on your Anthropic account directly. The Anthropic Console lets you configure monthly spend caps so you never get a surprise bill. I set mine at $25/month — if I ever hit it, something's wrong.

Quick Math: What Should You Expect to Pay?

Here's a rough formula:

Monthly Cost = (Daily Messages × Avg Tokens × Price per Token) × 30
Enter fullscreen mode Exit fullscreen mode

For 50 messages/day with Claude Sonnet 4 (avg 800 input + 400 output tokens):

  • Input: 50 × 800 × ($3/1M) × 30 = $3.60
  • Output: 50 × 400 × ($15/1M) × 30 = $9.00
  • Total: ~$12.60/month

With half your traffic on Haiku 4.5 instead:

  • Haiku portion (25 msgs): 25 × 800 × ($1/1M) × 30 + 25 × 400 × ($5/1M) × 30 = $0.60 + $1.50 = $2.10
  • Sonnet portion (25 msgs): 25 × 800 × ($3/1M) × 30 + 25 × 400 × ($15/1M) × 30 = $1.80 + $4.50 = $6.30
  • Total: ~$8.40/month

Add prompt caching on top and you're looking at even less.

Three Setups to Get You Started

Tight budget (~$5-10/month):

  • Default model: Haiku 4.5 for all channels
  • History limit: 5 messages
  • Prompt caching: enabled
  • Good for: personal use, casual messaging

Balanced (~$15-25/month):

  • Haiku 4.5 for WhatsApp/Telegram, Sonnet for Discord/Slack
  • History limit: 8-15 messages depending on channel
  • Prompt caching: enabled
  • Good for: daily driver with some work use

Quality-first (~$30-50/month):

  • Sonnet for most channels, Opus for specific work channels
  • History limit: 20+ messages
  • Good for: heavy work use, code review, research

More details and full config examples on open-claw.me/blog/openclaw-model-selection-cost.

TL;DR

  1. Route models by channel — Haiku for casual, Sonnet for work. Biggest single win.
  2. Limit conversation history — 8-15 messages is enough for most chats.
  3. Use prompt caching — 90% off repeated context, basically free.
  4. Default to a cheaper model — upgrade where it matters, not everywhere.
  5. Monitor weekly — catch surprises early, set spend caps.

These five changes took my bill from $47 to ~$15. The agent works just as well for daily tasks — it just uses the right tool for the job now.

Config details may vary by OpenClaw version — check docs.openclaw.ai for your setup.


What's your monthly AI agent bill looking like? I'm curious if anyone's running leaner — or if you've found optimizations I missed. Drop a comment.

Top comments (0)