DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

How to Optimize AI Agent Costs — Inference, API Calls, and Infrastructure

How to Optimize AI Agent Costs — Inference, API Calls, and Infrastructure

Agents are expensive. Every API call costs money. Every inference costs money. Every screenshot costs money. At scale, the bill adds up fast.

Your agent workflow might cost $0.02 per execution. That's fine for 100 runs. At 10,000 runs per month, you're paying $200. At 100,000 runs, you're at $2,000.

Here's how to cut those costs without sacrificing performance.

Where Agent Costs Live

1. Inference (LLM calls)

  • GPT-4: $0.03 per 1K input tokens
  • GPT-3.5: $0.0005 per 1K input tokens
  • Claude 3: $0.003 per 1K input tokens

A single agent workflow might make 5-10 LLM calls. Each call costs tokens. At scale, this dominates the budget.

2. API Calls

  • Stripe: $0 (but slow at high volume)
  • AWS API calls: $0.0000002 per call (negligible)
  • Custom API calls: depends on your pricing

3. Infrastructure

  • Browser automation: Puppeteer, Playwright, Selenium = CPU-intensive
  • PageBolt API: Pay per screenshot/video
  • Hosting agents: EC2, Lambda, serverless containers

4. Data Transfer

  • Screenshots, videos, logs = bandwidth costs
  • S3 storage: $0.023 per GB/month

Cost Optimization Strategies

Strategy 1: Reduce Inference Calls

Not every step needs an LLM call. Many agent actions are deterministic.

# Bad: LLM decides every step
for page in pages:
    decision = llm.call(f"What should I do with {page}?")
    execute(decision)

# Good: Use logic for deterministic steps, LLM only for ambiguous ones
for page in pages:
    if page.matches_pattern(expected_format):
        execute_deterministic_action(page)
    else:
        decision = llm.call(f"How should I handle this unexpected format?")
        execute(decision)
Enter fullscreen mode Exit fullscreen mode

Result: 80% fewer LLM calls. Cost drops from $200 to $40.

Strategy 2: Batch Processing

Process multiple items in a single LLM call instead of one-by-one.

# Bad: 1 LLM call per item
for item in items:
    classification = llm.call(f"Classify this: {item}")

# Good: Batch 10 items per LLM call
for batch in chunks(items, size=10):
    classifications = llm.call(f"Classify these 10 items: {batch}")
Enter fullscreen mode Exit fullscreen mode

Result: 10x fewer calls. Cost drops proportionally.

Strategy 3: Caching and Memoization

Agent workflows often repeat the same tasks. Cache the results.

cache = {}

def process_page(url):
    if url in cache:
        return cache[url]

    result = agent.process(url)
    cache[url] = result
    return result
Enter fullscreen mode Exit fullscreen mode

If 30% of your agent's work is duplicate, caching eliminates that cost.

Result: 30% cost reduction for repeated workflows.

Strategy 4: Use Cheaper Models for Initial Filtering

Use GPT-3.5 for simple filtering, GPT-4 only for complex reasoning.

# Tier 1: GPT-3.5 for classification
initial_category = gpt35.call(f"Is this a support ticket or sales inquiry?")

# Tier 2: GPT-4 only if ambiguous
if initial_category == "ambiguous":
    refined_category = gpt4.call(f"Deeper analysis required...")
Enter fullscreen mode Exit fullscreen mode

Result: 90% of work done at 1/60th the cost.

Strategy 5: Minimize Screenshots/Videos

Screenshots and videos are expensive to store and process. Capture selectively.

# Bad: Screenshot every step
for step in workflow:
    execute(step)
    screenshot()  # 10 screenshots per workflow

# Good: Screenshot only critical steps
critical_steps = ["login", "form_submission", "confirmation"]
for step in workflow:
    execute(step)
    if step.name in critical_steps:
        screenshot()  # 3 screenshots per workflow
Enter fullscreen mode Exit fullscreen mode

Result: 70% fewer screenshots. Saves on storage and API costs.

Strategy 6: Optimize API Calls

Not all APIs are equal. Some are slow, some are expensive.

# Bad: Call Stripe API for every transaction
for transaction in transactions:
    stripe.create_charge(transaction)  # 1 API call per transaction

# Good: Batch API calls where possible
stripe.create_charges_batch(transactions)  # 1 API call for 100 transactions
Enter fullscreen mode Exit fullscreen mode

Result: 100x fewer API calls.

Real-World Example

Agent workflow: "Process 10,000 customer support tickets per month"

Original costs:

  • 5 LLM calls per ticket × 10,000 = 50,000 calls = $1,500 (GPT-3.5)
  • 3 screenshots per ticket × 10,000 = 30,000 screenshots = $300
  • Infrastructure: $200
  • Total: $2,000/month

Optimized:

  • Tier 1 (GPT-3.5): 10,000 calls = $5
  • Tier 2 (GPT-4, 20% only): 2,000 calls = $60
  • Batch processing: 80% reduction = $12 (instead of $60)
  • Selective screenshots (3 → 1 per ticket): $100
  • Infrastructure: $200
  • Total: $377/month

Savings: 81%

Where to Focus First

  1. Identify your biggest cost driver — Is it inference? Screenshots? API calls?
  2. Optimize that single thing — Often yields 50%+ savings
  3. Iterate — Move to the next biggest cost driver

Most teams can cut costs by 60-80% with targeted optimizations.


Try it free: 100 requests/month on PageBolt—optimize your agent workflow costs while maintaining visibility into every action. No credit card required.

Top comments (0)