dohko

Posted on Mar 24

I Cut My AI Coding Costs by 73% Without Losing Quality — Here's the Exact Setup

#ai #webdev #programming #productivity

I Cut My AI Coding Costs by 73% Without Losing Quality — Here's the Exact Setup

I was spending $15/day on AI coding tools. After two weeks of optimizing, I'm at $4/day with the same (arguably better) output quality.

Here's every change I made, with real numbers.

The Baseline: Where the Money Was Going

Before optimization, here's what a typical day looked like:

Claude Opus 4.6:     $8.50  (12 calls, avg 180K context)
Claude Sonnet 4.6:   $3.20  (25 calls, avg 80K context)
GPT-5.4:             $2.10  (8 calls, avg 100K context)
Misc (Mini, tools):  $1.20
────────────────────────────
Total:              $15.00/day

The problem was obvious once I tracked it: I was using Opus for everything. Bug fixes, test writing, documentation — all going through the most expensive model.

Change 1: Task-Based Model Routing (Saved 45%)

The single biggest win. I categorized every coding task and assigned the cheapest model that performs well:

# model-router.yaml
routing_rules:
  # Tier 1: Cheap & fast ($0.001-0.01 per call)
  documentation:
    model: gpt-5.4-mini
    max_context: 20000

  test_generation:
    model: claude-sonnet-4.6
    max_context: 30000

  code_formatting:
    model: gpt-5.4-mini
    max_context: 10000

  commit_messages:
    model: gpt-5.4-mini
    max_context: 5000

  # Tier 2: Mid-range ($0.01-0.50 per call)
  bug_fixes:
    model: claude-sonnet-4.6
    max_context: 50000

  code_review:
    model: claude-sonnet-4.6
    max_context: 80000

  feature_implementation:
    model: claude-sonnet-4.6
    max_context: 100000

  # Tier 3: Premium ($0.50-5.00 per call)
  architecture_decisions:
    model: claude-opus-4.6
    max_context: 200000

  complex_refactors:
    model: claude-opus-4.6
    max_context: 150000

  security_review:
    model: claude-opus-4.6
    max_context: 100000

Result: Same day, same tasks:

Before: $15.00/day
After:   $8.25/day  (-45%)

The insight: 65% of coding tasks don't need frontier-model reasoning. Sonnet and Mini handle them perfectly.

Change 2: Context Window Discipline (Saved 30%)

I was loading entire directories "just in case." A 200K context call costs 10x more than a 20K one.

The rule: Only include files the AI needs to READ or MODIFY for THIS task.

# ❌ Before: throw everything at it
claude "fix the auth bug" --include "src/**/*"
# Context: 185K tokens → ~$1.50 per call

# ✅ After: surgical context
claude "fix the auth bug" \
  --include "src/auth/handler.ts" \
  --include "src/auth/middleware.ts" \
  --include "tests/auth.test.ts" \
  --include "AGENTS.md"
# Context: 18K tokens → ~$0.15 per call

For multi-file tasks, I use the plan-then-execute pattern:

# Step 1: Cheap model maps the task ($0.10)
claude --model sonnet "Given AGENTS.md and the error log, 
which files need to change to fix issue #142? Just list them."

# Step 2: Load only those files for the fix ($0.50-2.00)
claude --model opus --include [files from step 1] "Fix issue #142"

Result:

Before: $8.25/day
After:  $5.50/day  (-33%)

Change 3: Prompt Caching (Saved 15%)

If you're making multiple calls against the same codebase in a session, cache the context:

# Using Anthropic's prompt caching
import anthropic

client = anthropic.Client()

# First call: send full context (cached on server)
response = client.messages.create(
    model="claude-sonnet-4.6",
    system=[{
        "type": "text",
        "text": project_context,  # Your AGENTS.md + source files
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "Fix the auth bug"}]
)

# Subsequent calls: cached prefix = 90% cheaper
response2 = client.messages.create(
    model="claude-sonnet-4.6",
    system=[{
        "type": "text",
        "text": project_context,  # Same text = cache hit
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "Now write tests for the fix"}]
)

Cache hit savings: Up to 90% on input tokens for repeated context.

Result:

Before: $5.50/day
After:  $4.50/day  (-18%)

Change 4: The 3-Attempt Rule (Saved 10%)

The most expensive waste: feeding more tokens to a stuck AI. If three attempts with different approaches don't work, the problem needs human insight.

MAX_ATTEMPTS = 3
attempt_costs = []

for attempt in range(MAX_ATTEMPTS):
    result = ai_code(task, approach=approaches[attempt])
    attempt_costs.append(result.cost)

    if result.tests_pass:
        break
else:
    # After 3 failures, flag for human review
    notify_developer(task, attempt_costs)
    # Saved: potentially $5-20 in wasted tokens

Result:

Before: $4.50/day
After:  $4.00/day  (-11%)

The Final Numbers

Original:                    $15.00/day  ($450/month)
After model routing:          $8.25/day  (-45%)
After context discipline:     $5.50/day  (-33%)  
After prompt caching:         $4.50/day  (-18%)
After 3-attempt rule:         $4.00/day  (-11%)
────────────────────────────────────────────────
Final:                        $4.00/day  ($120/month)
Total savings:                73%

And the output quality? Slightly better. Using the right model for each task often produces more focused, appropriate code than throwing Opus at everything.

The Quick-Start Checklist

☐ Track costs per task category for one week
☐ Create a model routing config (start with 3 tiers)
☐ Set hard context limits per task type
☐ Implement plan-then-execute for multi-file tasks
☐ Enable prompt caching if your provider supports it
☐ Add the 3-attempt circuit breaker
☐ Review weekly and adjust routing rules

Full Cost Optimization Kit

The complete model routing configs, cost tracking dashboards, and optimization scripts are in the AI Dev Toolkit — 264 production frameworks including multi-model routing, prompt caching patterns, and budget enforcement tools.

What's your AI coding budget? Have you optimized it? Share your numbers in the comments — I'm curious how others are managing costs.

DEV Community

I Cut My AI Coding Costs by 73% Without Losing Quality — Here's the Exact Setup

I Cut My AI Coding Costs by 73% Without Losing Quality — Here's the Exact Setup

The Baseline: Where the Money Was Going

Change 1: Task-Based Model Routing (Saved 45%)

Change 2: Context Window Discipline (Saved 30%)

Change 3: Prompt Caching (Saved 15%)

Change 4: The 3-Attempt Rule (Saved 10%)

The Final Numbers

The Quick-Start Checklist

Full Cost Optimization Kit

Top comments (0)