Forem

dohko
dohko

Posted on

I Cut My AI Coding Costs by 73% Without Losing Quality — Here's the Exact Setup

I Cut My AI Coding Costs by 73% Without Losing Quality — Here's the Exact Setup

I was spending $15/day on AI coding tools. After two weeks of optimizing, I'm at $4/day with the same (arguably better) output quality.

Here's every change I made, with real numbers.


The Baseline: Where the Money Was Going

Before optimization, here's what a typical day looked like:

Claude Opus 4.6:     $8.50  (12 calls, avg 180K context)
Claude Sonnet 4.6:   $3.20  (25 calls, avg 80K context)
GPT-5.4:             $2.10  (8 calls, avg 100K context)
Misc (Mini, tools):  $1.20
────────────────────────────
Total:              $15.00/day
Enter fullscreen mode Exit fullscreen mode

The problem was obvious once I tracked it: I was using Opus for everything. Bug fixes, test writing, documentation — all going through the most expensive model.


Change 1: Task-Based Model Routing (Saved 45%)

The single biggest win. I categorized every coding task and assigned the cheapest model that performs well:

# model-router.yaml
routing_rules:
  # Tier 1: Cheap & fast ($0.001-0.01 per call)
  documentation:
    model: gpt-5.4-mini
    max_context: 20000

  test_generation:
    model: claude-sonnet-4.6
    max_context: 30000

  code_formatting:
    model: gpt-5.4-mini
    max_context: 10000

  commit_messages:
    model: gpt-5.4-mini
    max_context: 5000

  # Tier 2: Mid-range ($0.01-0.50 per call)
  bug_fixes:
    model: claude-sonnet-4.6
    max_context: 50000

  code_review:
    model: claude-sonnet-4.6
    max_context: 80000

  feature_implementation:
    model: claude-sonnet-4.6
    max_context: 100000

  # Tier 3: Premium ($0.50-5.00 per call)
  architecture_decisions:
    model: claude-opus-4.6
    max_context: 200000

  complex_refactors:
    model: claude-opus-4.6
    max_context: 150000

  security_review:
    model: claude-opus-4.6
    max_context: 100000
Enter fullscreen mode Exit fullscreen mode

Result: Same day, same tasks:

Before: $15.00/day
After:   $8.25/day  (-45%)
Enter fullscreen mode Exit fullscreen mode

The insight: 65% of coding tasks don't need frontier-model reasoning. Sonnet and Mini handle them perfectly.


Change 2: Context Window Discipline (Saved 30%)

I was loading entire directories "just in case." A 200K context call costs 10x more than a 20K one.

The rule: Only include files the AI needs to READ or MODIFY for THIS task.

# ❌ Before: throw everything at it
claude "fix the auth bug" --include "src/**/*"
# Context: 185K tokens → ~$1.50 per call

# ✅ After: surgical context
claude "fix the auth bug" \
  --include "src/auth/handler.ts" \
  --include "src/auth/middleware.ts" \
  --include "tests/auth.test.ts" \
  --include "AGENTS.md"
# Context: 18K tokens → ~$0.15 per call
Enter fullscreen mode Exit fullscreen mode

For multi-file tasks, I use the plan-then-execute pattern:

# Step 1: Cheap model maps the task ($0.10)
claude --model sonnet "Given AGENTS.md and the error log, 
which files need to change to fix issue #142? Just list them."

# Step 2: Load only those files for the fix ($0.50-2.00)
claude --model opus --include [files from step 1] "Fix issue #142"
Enter fullscreen mode Exit fullscreen mode

Result:

Before: $8.25/day
After:  $5.50/day  (-33%)
Enter fullscreen mode Exit fullscreen mode

Change 3: Prompt Caching (Saved 15%)

If you're making multiple calls against the same codebase in a session, cache the context:

# Using Anthropic's prompt caching
import anthropic

client = anthropic.Client()

# First call: send full context (cached on server)
response = client.messages.create(
    model="claude-sonnet-4.6",
    system=[{
        "type": "text",
        "text": project_context,  # Your AGENTS.md + source files
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "Fix the auth bug"}]
)

# Subsequent calls: cached prefix = 90% cheaper
response2 = client.messages.create(
    model="claude-sonnet-4.6",
    system=[{
        "type": "text",
        "text": project_context,  # Same text = cache hit
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "Now write tests for the fix"}]
)
Enter fullscreen mode Exit fullscreen mode

Cache hit savings: Up to 90% on input tokens for repeated context.

Result:

Before: $5.50/day
After:  $4.50/day  (-18%)
Enter fullscreen mode Exit fullscreen mode

Change 4: The 3-Attempt Rule (Saved 10%)

The most expensive waste: feeding more tokens to a stuck AI. If three attempts with different approaches don't work, the problem needs human insight.

MAX_ATTEMPTS = 3
attempt_costs = []

for attempt in range(MAX_ATTEMPTS):
    result = ai_code(task, approach=approaches[attempt])
    attempt_costs.append(result.cost)

    if result.tests_pass:
        break
else:
    # After 3 failures, flag for human review
    notify_developer(task, attempt_costs)
    # Saved: potentially $5-20 in wasted tokens
Enter fullscreen mode Exit fullscreen mode

Result:

Before: $4.50/day
After:  $4.00/day  (-11%)
Enter fullscreen mode Exit fullscreen mode

The Final Numbers

Original:                    $15.00/day  ($450/month)
After model routing:          $8.25/day  (-45%)
After context discipline:     $5.50/day  (-33%)  
After prompt caching:         $4.50/day  (-18%)
After 3-attempt rule:         $4.00/day  (-11%)
────────────────────────────────────────────────
Final:                        $4.00/day  ($120/month)
Total savings:                73%
Enter fullscreen mode Exit fullscreen mode

And the output quality? Slightly better. Using the right model for each task often produces more focused, appropriate code than throwing Opus at everything.


The Quick-Start Checklist

  1. ☐ Track costs per task category for one week
  2. ☐ Create a model routing config (start with 3 tiers)
  3. ☐ Set hard context limits per task type
  4. ☐ Implement plan-then-execute for multi-file tasks
  5. ☐ Enable prompt caching if your provider supports it
  6. ☐ Add the 3-attempt circuit breaker
  7. ☐ Review weekly and adjust routing rules

Full Cost Optimization Kit

The complete model routing configs, cost tracking dashboards, and optimization scripts are in the AI Dev Toolkit — 264 production frameworks including multi-model routing, prompt caching patterns, and budget enforcement tools.


What's your AI coding budget? Have you optimized it? Share your numbers in the comments — I'm curious how others are managing costs.

Top comments (0)