DEV Community

Cover image for Your AI Agent is Bleeding Money (And You're Making It Worse)
Franklin Mayoyo
Franklin Mayoyo

Posted on

Your AI Agent is Bleeding Money (And You're Making It Worse)

Let me guess: you're building a SaaS product with Claude. You've got customers making API calls, burning through tokens like a Tesla on fire. Your AWS bill looks like a phone number. And somewhere, a product manager is asking why your margins are trash.

I'll tell you why: you're shoving encyclopedias into every single API call when you could be carrying a library card.

The $10K/Month Mistake

Here's what I see in every production codebase I audit:

# Your current nightmare
system_prompt = """
You are DataCorp's financial analyst assistant.

BRAND GUIDELINES:
- Use active voice in all communications
- Keep sentences under 20 words
- Reference our Q4 2024 performance metrics
- Always cite from our approved data sources
... [2,000 more words]

FINANCIAL ANALYSIS PROCEDURES:
1. Always validate data sources
2. Calculate rolling 12-month averages
3. Compare to industry benchmarks from FinData Inc.
4. Generate variance reports
... [1,500 more words]

DATA SOURCES:
Our approved databases are:
- PostgreSQL cluster at db.datacorp.com
  Schema: [500 words of table definitions]
- MongoDB analytics at analytics.datacorp.com
  Collections: [400 words of collection schemas]
- S3 data lake structure: [600 words]

EXCEL FORMATTING STANDARDS:
- Header row always bold, 12pt Calibri
- Currency formatted to 2 decimals
- Charts use company colors: #1E3A8A, #3B82F6, #60A5FA
... [800 more words]

REPORT TEMPLATES:
[Monthly report template: 500 words]
[Quarterly analysis template: 700 words]
[Executive summary template: 400 words]

Now please analyze this quarter's revenue data.
"""

# Token cost per request: ~9,000 tokens
# Requests per month: 50,000
# Cost at Claude Sonnet 4: $27/million input tokens
# Your monthly burn: $12,150
# And that's JUST the system prompt
Enter fullscreen mode Exit fullscreen mode

Every. Single. API. Call.

You're paying to load the same information 50,000 times a month. That's not engineering—that's setting money on fire while your CFO watches.

The Skill Solution: Pay Per Use, Not Per Pray

Skills work on progressive disclosure. Claude loads them in three levels:

Level 1: Metadata (~100 tokens, ALWAYS loaded)

---
name: datacorp-financial-analysis
description: "Analyze financial data following DataCorp standards and procedures. Use for revenue analysis, variance reports, and executive summaries."
---
Enter fullscreen mode Exit fullscreen mode

That's it. 100 tokens. Claude now knows this Skill exists.

Level 2: Instructions (Under 5k tokens, loaded WHEN TRIGGERED)

Only loaded when someone actually asks for financial analysis:

# DataCorp Financial Analysis Skill

## Quick Start
Use our PostgreSQL cluster for source data.
For schema details, see [SCHEMAS.md](SCHEMAS.md).
For Excel formatting, see [FORMATTING.md](FORMATTING.md).

## Analysis Procedures
1. Validate data sources
2. Calculate 12-month rolling averages
3. Compare to benchmarks
4. Generate variance reports

For detailed procedures, see [PROCEDURES.md](PROCEDURES.md).
Enter fullscreen mode Exit fullscreen mode

Level 3: Resources (Unlimited, loaded AS NEEDED)

financial-analysis-skill/
├── SKILL.md              # Loaded when Skill triggers
├── SCHEMAS.md            # Only if Claude needs schema details
├── FORMATTING.md         # Only if creating Excel files
├── PROCEDURES.md         # Only if doing complex analysis
├── templates/
│   ├── monthly.md        # Only if monthly report requested
│   ├── quarterly.md      # Only if quarterly report requested
│   └── executive.md      # Only if exec summary requested
└── scripts/
    ├── validate_data.py  # Never enters context, just executes
    └── benchmark.py      # Never enters context, just executes
Enter fullscreen mode Exit fullscreen mode

Now watch your token economics transform:

import anthropic

client = anthropic.Anthropic()

# One-time setup: Upload your Skill
from anthropic.lib import files_from_dir

skill = client.beta.skills.create(
    display_title="DataCorp Financial Analysis",
    files=files_from_dir("/path/to/financial_analysis_skill"),
    betas=["skills-2025-10-02"]
)

# Now your API calls look like this:
response = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=["code-execution-2025-08-25", "skills-2025-10-02"],
    container={
        "skills": [{
            "type": "custom",
            "skill_id": skill.id,
            "version": "latest"
        }]
    },
    messages=[{
        "role": "user",
        "content": "Analyze Q4 revenue vs Q3"
    }],
    tools=[{
        "type": "code_execution_20250825",
        "name": "code_execution"
    }]
)

# Token cost breakdown:
# - Skill metadata: 100 tokens (always)
# - SKILL.md: 2,000 tokens (when financial analysis triggered)
# - SCHEMAS.md: 500 tokens (if needed)
# - PROCEDURES.md: 1,500 tokens (if needed)
# Total: 4,100 tokens vs 9,000 tokens
# Savings: 54% on this request
# But wait, it gets better...
Enter fullscreen mode Exit fullscreen mode

The Script Multiplier: Zero-Token Execution

Here's where your accountant starts crying tears of joy:

Executable scripts never enter the context window.

When Claude runs a script, only the OUTPUT consumes tokens. The code itself? Zero.

# scripts/validate_revenue_data.py
import pandas as pd
import json
import sys

def validate_revenue_data(filepath):
    """
    Comprehensive revenue data validation.
    50 lines of sophisticated logic that you'd normally
    need to explain to Claude in excruciating detail.
    """
    df = pd.read_csv(filepath)

    issues = []

    # Check for negative revenue (red flag)
    if (df['revenue'] < 0).any():
        negative_count = (df['revenue'] < 0).sum()
        issues.append(f"CRITICAL: {negative_count} negative revenue entries")

    # Validate date continuity
    df['date'] = pd.to_datetime(df['date'])
    date_gaps = df['date'].diff().dt.days
    if (date_gaps > 1).any():
        issues.append(f"WARNING: Date gaps detected in data")

    # Check for outliers (3 sigma rule)
    mean_rev = df['revenue'].mean()
    std_rev = df['revenue'].std()
    outliers = df[abs(df['revenue'] - mean_rev) > 3 * std_rev]
    if len(outliers) > 0:
        issues.append(f"INFO: {len(outliers)} statistical outliers detected")

    # Validate currency consistency
    if df['currency'].nunique() > 1:
        issues.append("ERROR: Multiple currencies in dataset")

    # Check completeness
    missing_data = df.isnull().sum()
    if missing_data.any():
        issues.append(f"WARNING: Missing data in columns: {missing_data[missing_data > 0].to_dict()}")

    return {
        'valid': len([i for i in issues if 'ERROR' in i or 'CRITICAL' in i]) == 0,
        'issues': issues,
        'records': len(df),
        'date_range': f"{df['date'].min()} to {df['date'].max()}",
        'total_revenue': float(df['revenue'].sum()),
        'currency': df['currency'].iloc[0]
    }

if __name__ == "__main__":
    result = validate_revenue_data(sys.argv[1])
    print(json.dumps(result, indent=2))
Enter fullscreen mode Exit fullscreen mode

Claude runs: python scripts/validate_revenue_data.py q4_revenue.csv

Claude gets back:

{
  "valid": true,
  "issues": ["INFO: 2 statistical outliers detected"],
  "records": 1247,
  "date_range": "2024-10-01 to 2024-12-31",
  "total_revenue": 1847293.42,
  "currency": "USD"
}
Enter fullscreen mode Exit fullscreen mode

Token cost of that 50-line validation script: ZERO

If you made Claude write that logic from scratch every time? 800+ tokens. Per request. Forever.

Real Numbers From Real Production Systems

Let me show you what this looks like at scale.

Case Study: MarketingCo (B2B SaaS, 200 employees)

Before Skills:

# Every request included:
system_prompt = """
[Brand guidelines: 2,000 tokens]
[SEO requirements: 1,500 tokens]
[Product catalog: 3,000 tokens]
[Writing templates: 2,500 tokens]
[Competitor positioning: 1,000 tokens]
Total: 10,000 tokens per request
"""

# Monthly usage:
# - 100,000 API calls
# - 10,000 tokens per call = 1 billion input tokens
# - Claude Sonnet 4: $27/million tokens
# - Monthly cost: $27,000 (just system prompts!)
Enter fullscreen mode Exit fullscreen mode

After Skills:

# Marketing Skill structure:
marketing-skill/
├── SKILL.md (2k tokens)
├── BRAND.md (loaded if brand voice needed)
├── SEO.md (loaded if SEO optimization needed)
├── products/
   ├── product_a.json (loaded per product)
   ├── product_b.json
   └── ...
├── templates/ (loaded per content type)
└── scripts/
    └── check_seo.py (zero tokens, just runs)

# Typical blog post request:
# - Skill metadata: 100 tokens (always)
# - SKILL.md: 2,000 tokens (triggered)
# - SEO.md: 1,500 tokens (needed for blog)
# - templates/blog.md: 600 tokens (needed)
# - One product file: 200 tokens (needed)
# Total: 4,400 tokens per request

# New monthly cost:
# - 100,000 API calls
# - 4,400 tokens per call = 440 million input tokens
# - Monthly cost: $11,880
# SAVINGS: $15,120/month = $181,440/year
Enter fullscreen mode Exit fullscreen mode

Case Study: FinTechStartup (Series A, Tight Margins)

The Setup:

  • Building AI-powered financial advisory tool
  • 10,000 users, averaging 5 queries/day
  • 50,000 API calls/day = 1.5M/month

Before Skills:

# System prompt included:
# - Compliance guidelines: 3,000 tokens
# - Investment strategies: 4,000 tokens
# - Risk assessment procedures: 2,000 tokens
# - Market data schemas: 2,500 tokens
# - Report formatting: 1,500 tokens
# Total: 13,000 tokens per request

# Monthly burn:
# 1.5M calls × 13,000 tokens = 19.5 billion tokens
# Cost: $526,500/month on input tokens alone
# Annual run rate: $6.3 MILLION
Enter fullscreen mode Exit fullscreen mode

After Skills:

# Investment advisory Skill:
investment-skill/
├── SKILL.md (3k tokens)
├── COMPLIANCE.md (only if compliance check needed)
├── STRATEGIES.md (only if strategy analysis needed)
├── RISK_MODELS.md (only if risk assessment needed)
├── schemas/
   └── market_data.sql (only if schema needed)
├── templates/
   └── reports/ (only if report generation needed)
└── scripts/
    ├── risk_calculator.py (zero tokens)
    ├── compliance_check.py (zero tokens)
    └── portfolio_optimizer.py (zero tokens)

# Average request now:
# - Skill metadata: 100 tokens
# - SKILL.md: 3,000 tokens
# - One additional file: 2,000 tokens (average)
# - Script execution: 0 tokens
# Total: 5,100 tokens per request

# New monthly cost:
# 1.5M calls × 5,100 tokens = 7.65 billion tokens
# Cost: $206,550/month
# SAVINGS: $319,950/month = $3.8 MILLION/year
Enter fullscreen mode Exit fullscreen mode

That's not a typo. $3.8 million saved per year.

The Multi-Skill Power Move

Here's where it gets spicy. You can combine up to 8 Skills per request:

# The old way: Everything in one bloated prompt
system_prompt = """
[Financial analysis: 5,000 tokens]
[Excel formatting: 2,000 tokens]
[PowerPoint templates: 2,500 tokens]
[Company branding: 1,500 tokens]
[Data validation: 2,000 tokens]
Total: 13,000 tokens every request
"""

# The Skills way: Load only what's needed
response = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=["code-execution-2025-08-25", "skills-2025-10-02"],
    container={
        "skills": [
            {"type": "custom", "skill_id": "skill_financial", "version": "latest"},
            {"type": "anthropic", "skill_id": "xlsx", "version": "latest"},
            {"type": "anthropic", "skill_id": "pptx", "version": "latest"}
        ]
    },
    messages=[{
        "role": "user",
        "content": "Analyze Q4 revenue and create an executive presentation"
    }],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# What actually loads:
# - All Skill metadata: 300 tokens (3 × 100)
# - Financial Skill: 3,000 tokens (triggered)
# - Excel Skill: 2,000 tokens (triggered for data analysis)
# - PowerPoint Skill: 2,500 tokens (triggered for presentation)
# Total: 7,800 tokens

# But if user just asks "What was our Q4 revenue?":
# - All Skill metadata: 300 tokens
# - Financial Skill: 3,000 tokens
# - Excel Skill: NOT loaded (not needed)
# - PowerPoint Skill: NOT loaded (not needed)
# Total: 3,300 tokens
# Savings: 58% on this simpler query
Enter fullscreen mode Exit fullscreen mode

The Anthropic Skills: Free Optimization

Anthropic provides pre-built Skills for common tasks:

# Use Anthropic's optimized Skills for document generation
response = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=["code-execution-2025-08-25", "skills-2025-10-02"],
    container={
        "skills": [
            {"type": "anthropic", "skill_id": "xlsx", "version": "latest"},
            {"type": "anthropic", "skill_id": "pptx", "version": "latest"},
            {"type": "anthropic", "skill_id": "docx", "version": "latest"},
            {"type": "anthropic", "skill_id": "pdf", "version": "latest"}
        ]
    },
    messages=[{
        "role": "user",
        "content": "Create a quarterly report with data analysis and charts"
    }],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# These are maintained by Anthropic
# Zero effort, just better performance
Enter fullscreen mode Exit fullscreen mode

Downloading Generated Files: The Right Way

When Skills create documents, you need to download them properly:

# Step 1: Create the document
response = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=["code-execution-2025-08-25", "skills-2025-10-02"],
    container={
        "skills": [{"type": "anthropic", "skill_id": "xlsx", "version": "latest"}]
    },
    messages=[{
        "role": "user",
        "content": "Create Q4 revenue analysis spreadsheet"
    }],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# Step 2: Extract file IDs from response
def extract_file_ids(response):
    file_ids = []
    for item in response.content:
        if item.type == 'bash_code_execution_tool_result':
            content_item = item.content
            if content_item.type == 'bash_code_execution_result':
                for file in content_item.content:
                    if hasattr(file, 'file_id'):
                        file_ids.append(file.file_id)
    return file_ids

# Step 3: Download using Files API
for file_id in extract_file_ids(response):
    # Get metadata
    file_metadata = client.beta.files.retrieve_metadata(
        file_id=file_id,
        betas=["files-api-2025-04-14"]
    )

    # Download content
    file_content = client.beta.files.download(
        file_id=file_id,
        betas=["files-api-2025-04-14"]
    )

    # Save to disk
    file_content.write_to_file(file_metadata.filename)
    print(f"Downloaded: {file_metadata.filename}")
Enter fullscreen mode Exit fullscreen mode

Version Control: Production vs Development

Smart teams version their Skills:

# Production: Pin to specific versions for stability
production_config = {
    "skills": [{
        "type": "custom",
        "skill_id": "skill_01AbCdEfGhIjKlMnOpQrStUv",
        "version": "1759178010641129"  # Specific version timestamp
    }]
}

# Development: Use latest for rapid iteration
development_config = {
    "skills": [{
        "type": "custom",
        "skill_id": "skill_01AbCdEfGhIjKlMnOpQrStUv",
        "version": "latest"  # Always get newest
    }]
}

# Update workflow:
# 1. Develop and test with "latest"
from anthropic.lib import files_from_dir

new_version = client.beta.skills.versions.create(
    skill_id="skill_01AbCdEfGhIjKlMnOpQrStUv",
    files=files_from_dir("/path/to/updated_skill"),
    betas=["skills-2025-10-02"]
)

# 2. Deploy to production with specific version
# 3. Monitor and rollback if needed
Enter fullscreen mode Exit fullscreen mode

The Prompt Caching Gotcha

Skills work WITH prompt caching, but there's a trap:

# This caches well:
response1 = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=[
        "code-execution-2025-08-25",
        "skills-2025-10-02",
        "prompt-caching-2024-07-31"
    ],
    container={
        "skills": [
            {"type": "anthropic", "skill_id": "xlsx", "version": "latest"}
        ]
    },
    messages=[{"role": "user", "content": "Analyze sales data"}],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# This breaks the cache (different Skills list):
response2 = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=[
        "code-execution-2025-08-25",
        "skills-2025-10-02",
        "prompt-caching-2024-07-31"
    ],
    container={
        "skills": [
            {"type": "anthropic", "skill_id": "xlsx", "version": "latest"},
            {"type": "anthropic", "skill_id": "pptx", "version": "latest"}  # Cache miss
        ]
    },
    messages=[{"role": "user", "content": "Create presentation"}],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# Solution: Keep Skills list consistent for related workflows
# Group your API calls by Skill combinations
Enter fullscreen mode Exit fullscreen mode

The Migration Checklist

Ready to stop hemorrhaging money? Here's your path:

Week 1: Audit

# Calculate your current burn
def audit_current_costs():
    """
    1. Count your API calls per month
    2. Measure your average system prompt length
    3. Calculate: calls × prompt_tokens × $27/million
    4. Cry into your spreadsheet
    """
    monthly_calls = 100000  # Your number
    avg_system_tokens = 8000  # Measure yours

    monthly_input_tokens = monthly_calls * avg_system_tokens
    monthly_cost = (monthly_input_tokens / 1_000_000) * 27

    print(f"Current monthly burn: ${monthly_cost:,.2f}")
    print(f"Annual run rate: ${monthly_cost * 12:,.2f}")
    return monthly_cost
Enter fullscreen mode Exit fullscreen mode

Week 2: Extract Your First Skill

# Start with your most-used procedures
# Financial analysis, content generation, data validation, etc.

# Create skill directory:
"""
my-first-skill/
├── SKILL.md
└── README.md
"""

# SKILL.md template:
"""
---
name: my-company-analytics
description: Company-specific data analysis procedures. Use for financial analysis, revenue reporting, and KPI calculations.
---

# Analytics Skill

## Quick Start
For standard revenue analysis, use our PostgreSQL cluster.
Connection details in company vault.

## Common Tasks
- Revenue analysis: See [REVENUE.md](REVENUE.md)
- KPI dashboards: See [KPI.md](KPI.md)

## Data Sources
All company data sources documented in [SOURCES.md](SOURCES.md)
"""
Enter fullscreen mode Exit fullscreen mode

Week 3: Upload and Test

from anthropic.lib import files_from_dir

# Upload
skill = client.beta.skills.create(
    display_title="Company Analytics",
    files=files_from_dir("/path/to/my-first-skill"),
    betas=["skills-2025-10-02"]
)

# Test with real queries
test_queries = [
    "Analyze Q4 revenue",
    "Generate monthly KPI report",
    "Compare revenue vs last quarter"
]

for query in test_queries:
    response = client.beta.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=4096,
        betas=["code-execution-2025-08-25", "skills-2025-10-02"],
        container={
            "skills": [{"type": "custom", "skill_id": skill.id, "version": "latest"}]
        },
        messages=[{"role": "user", "content": query}],
        tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
    )

    # Validate responses
    # Measure token usage
    # Calculate savings
Enter fullscreen mode Exit fullscreen mode

Week 4: Measure and Scale

def measure_savings():
    """
    Track token usage before/after
    Calculate cost savings
    Project annual impact
    Send victory email to CFO
    """
    old_tokens_per_request = 8000
    new_tokens_per_request = 3500  # Your actual number

    savings_per_request = old_tokens_per_request - new_tokens_per_request
    monthly_calls = 100000

    monthly_savings = (monthly_calls * savings_per_request / 1_000_000) * 27
    annual_savings = monthly_savings * 12

    print(f"Monthly savings: ${monthly_savings:,.2f}")
    print(f"Annual savings: ${annual_savings:,.2f}")
    print(f"Efficiency gain: {(savings_per_request/old_tokens_per_request)*100:.1f}%")
Enter fullscreen mode Exit fullscreen mode

The Bottom Line

Skills are not a "nice-to-have" feature. They're a fundamental architectural shift in how you should build with LLMs.

Stop doing this:

  • Copying 10,000-token system prompts into every API call
  • Burning $500K+ annually on repeated context
  • Explaining the same procedures to Claude 100,000 times

Start doing this:

  • Load instructions on-demand (54-75% token reduction)
  • Execute scripts at zero token cost
  • Compose Skills for complex workflows
  • Version your domain knowledge like code

The math is brutal: a mid-size SaaS company making 1.5M API calls/month with bloated prompts burns $526K monthly. With Skills, that drops to $206K. That's $3.8M saved per year.

Your investors didn't give you millions to spend it on redundant context loading.

Fix your architecture. Deploy Skills. Save money.


P.S. - If you're still copying system prompts after reading this, I can't help you. But your competitors using Skills will eat your lunch while spending 60% less on AI costs.

Top comments (0)