How to Optimize AI Agent Costs — Inference, API Calls, and Infrastructure
Agents are expensive. Every API call costs money. Every inference costs money. Every screenshot costs money. At scale, the bill adds up fast.
Your agent workflow might cost $0.02 per execution. That's fine for 100 runs. At 10,000 runs per month, you're paying $200. At 100,000 runs, you're at $2,000.
Here's how to cut those costs without sacrificing performance.
Where Agent Costs Live
1. Inference (LLM calls)
- GPT-4: $0.03 per 1K input tokens
- GPT-3.5: $0.0005 per 1K input tokens
- Claude 3: $0.003 per 1K input tokens
A single agent workflow might make 5-10 LLM calls. Each call costs tokens. At scale, this dominates the budget.
2. API Calls
- Stripe: $0 (but slow at high volume)
- AWS API calls: $0.0000002 per call (negligible)
- Custom API calls: depends on your pricing
3. Infrastructure
- Browser automation: Puppeteer, Playwright, Selenium = CPU-intensive
- PageBolt API: Pay per screenshot/video
- Hosting agents: EC2, Lambda, serverless containers
4. Data Transfer
- Screenshots, videos, logs = bandwidth costs
- S3 storage: $0.023 per GB/month
Cost Optimization Strategies
Strategy 1: Reduce Inference Calls
Not every step needs an LLM call. Many agent actions are deterministic.
# Bad: LLM decides every step
for page in pages:
decision = llm.call(f"What should I do with {page}?")
execute(decision)
# Good: Use logic for deterministic steps, LLM only for ambiguous ones
for page in pages:
if page.matches_pattern(expected_format):
execute_deterministic_action(page)
else:
decision = llm.call(f"How should I handle this unexpected format?")
execute(decision)
Result: 80% fewer LLM calls. Cost drops from $200 to $40.
Strategy 2: Batch Processing
Process multiple items in a single LLM call instead of one-by-one.
# Bad: 1 LLM call per item
for item in items:
classification = llm.call(f"Classify this: {item}")
# Good: Batch 10 items per LLM call
for batch in chunks(items, size=10):
classifications = llm.call(f"Classify these 10 items: {batch}")
Result: 10x fewer calls. Cost drops proportionally.
Strategy 3: Caching and Memoization
Agent workflows often repeat the same tasks. Cache the results.
cache = {}
def process_page(url):
if url in cache:
return cache[url]
result = agent.process(url)
cache[url] = result
return result
If 30% of your agent's work is duplicate, caching eliminates that cost.
Result: 30% cost reduction for repeated workflows.
Strategy 4: Use Cheaper Models for Initial Filtering
Use GPT-3.5 for simple filtering, GPT-4 only for complex reasoning.
# Tier 1: GPT-3.5 for classification
initial_category = gpt35.call(f"Is this a support ticket or sales inquiry?")
# Tier 2: GPT-4 only if ambiguous
if initial_category == "ambiguous":
refined_category = gpt4.call(f"Deeper analysis required...")
Result: 90% of work done at 1/60th the cost.
Strategy 5: Minimize Screenshots/Videos
Screenshots and videos are expensive to store and process. Capture selectively.
# Bad: Screenshot every step
for step in workflow:
execute(step)
screenshot() # 10 screenshots per workflow
# Good: Screenshot only critical steps
critical_steps = ["login", "form_submission", "confirmation"]
for step in workflow:
execute(step)
if step.name in critical_steps:
screenshot() # 3 screenshots per workflow
Result: 70% fewer screenshots. Saves on storage and API costs.
Strategy 6: Optimize API Calls
Not all APIs are equal. Some are slow, some are expensive.
# Bad: Call Stripe API for every transaction
for transaction in transactions:
stripe.create_charge(transaction) # 1 API call per transaction
# Good: Batch API calls where possible
stripe.create_charges_batch(transactions) # 1 API call for 100 transactions
Result: 100x fewer API calls.
Real-World Example
Agent workflow: "Process 10,000 customer support tickets per month"
Original costs:
- 5 LLM calls per ticket × 10,000 = 50,000 calls = $1,500 (GPT-3.5)
- 3 screenshots per ticket × 10,000 = 30,000 screenshots = $300
- Infrastructure: $200
- Total: $2,000/month
Optimized:
- Tier 1 (GPT-3.5): 10,000 calls = $5
- Tier 2 (GPT-4, 20% only): 2,000 calls = $60
- Batch processing: 80% reduction = $12 (instead of $60)
- Selective screenshots (3 → 1 per ticket): $100
- Infrastructure: $200
- Total: $377/month
Savings: 81%
Where to Focus First
- Identify your biggest cost driver — Is it inference? Screenshots? API calls?
- Optimize that single thing — Often yields 50%+ savings
- Iterate — Move to the next biggest cost driver
Most teams can cut costs by 60-80% with targeted optimizations.
Try it free: 100 requests/month on PageBolt—optimize your agent workflow costs while maintaining visibility into every action. No credit card required.
Top comments (0)