Franklin Mayoyo

Posted on Dec 30, 2025

Your AI Agent is Bleeding Money (And You're Making It Worse)

#webdev #programming #ai #agents

Let me guess: you're building a SaaS product with Claude. You've got customers making API calls, burning through tokens like a Tesla on fire. Your AWS bill looks like a phone number. And somewhere, a product manager is asking why your margins are trash.

I'll tell you why: you're shoving encyclopedias into every single API call when you could be carrying a library card.

The $10K/Month Mistake

Here's what I see in every production codebase I audit:

# Your current nightmare
system_prompt = """
You are DataCorp's financial analyst assistant.

BRAND GUIDELINES:
- Use active voice in all communications
- Keep sentences under 20 words
- Reference our Q4 2024 performance metrics
- Always cite from our approved data sources
... [2,000 more words]

FINANCIAL ANALYSIS PROCEDURES:
1. Always validate data sources
2. Calculate rolling 12-month averages
3. Compare to industry benchmarks from FinData Inc.
4. Generate variance reports
... [1,500 more words]

DATA SOURCES:
Our approved databases are:
- PostgreSQL cluster at db.datacorp.com
  Schema: [500 words of table definitions]
- MongoDB analytics at analytics.datacorp.com
  Collections: [400 words of collection schemas]
- S3 data lake structure: [600 words]

EXCEL FORMATTING STANDARDS:
- Header row always bold, 12pt Calibri
- Currency formatted to 2 decimals
- Charts use company colors: #1E3A8A, #3B82F6, #60A5FA
... [800 more words]

REPORT TEMPLATES:
[Monthly report template: 500 words]
[Quarterly analysis template: 700 words]
[Executive summary template: 400 words]

Now please analyze this quarter's revenue data.
"""

# Token cost per request: ~9,000 tokens
# Requests per month: 50,000
# Cost at Claude Sonnet 4: $27/million input tokens
# Your monthly burn: $12,150
# And that's JUST the system prompt

Every. Single. API. Call.

You're paying to load the same information 50,000 times a month. That's not engineering—that's setting money on fire while your CFO watches.

The Skill Solution: Pay Per Use, Not Per Pray

Skills work on progressive disclosure. Claude loads them in three levels:

Level 1: Metadata (~100 tokens, ALWAYS loaded)

---
name: datacorp-financial-analysis
description: "Analyze financial data following DataCorp standards and procedures. Use for revenue analysis, variance reports, and executive summaries."
---

That's it. 100 tokens. Claude now knows this Skill exists.

Level 2: Instructions (Under 5k tokens, loaded WHEN TRIGGERED)

Only loaded when someone actually asks for financial analysis:

# DataCorp Financial Analysis Skill

## Quick Start
Use our PostgreSQL cluster for source data.
For schema details, see [SCHEMAS.md](SCHEMAS.md).
For Excel formatting, see [FORMATTING.md](FORMATTING.md).

## Analysis Procedures
1. Validate data sources
2. Calculate 12-month rolling averages
3. Compare to benchmarks
4. Generate variance reports

For detailed procedures, see [PROCEDURES.md](PROCEDURES.md).

Level 3: Resources (Unlimited, loaded AS NEEDED)

financial-analysis-skill/
├── SKILL.md              # Loaded when Skill triggers
├── SCHEMAS.md            # Only if Claude needs schema details
├── FORMATTING.md         # Only if creating Excel files
├── PROCEDURES.md         # Only if doing complex analysis
├── templates/
│   ├── monthly.md        # Only if monthly report requested
│   ├── quarterly.md      # Only if quarterly report requested
│   └── executive.md      # Only if exec summary requested
└── scripts/
    ├── validate_data.py  # Never enters context, just executes
    └── benchmark.py      # Never enters context, just executes

Now watch your token economics transform:

import anthropic

client = anthropic.Anthropic()

# One-time setup: Upload your Skill
from anthropic.lib import files_from_dir

skill = client.beta.skills.create(
    display_title="DataCorp Financial Analysis",
    files=files_from_dir("/path/to/financial_analysis_skill"),
    betas=["skills-2025-10-02"]
)

# Now your API calls look like this:
response = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=["code-execution-2025-08-25", "skills-2025-10-02"],
    container={
        "skills": [{
            "type": "custom",
            "skill_id": skill.id,
            "version": "latest"
        }]
    },
    messages=[{
        "role": "user",
        "content": "Analyze Q4 revenue vs Q3"
    }],
    tools=[{
        "type": "code_execution_20250825",
        "name": "code_execution"
    }]
)

# Token cost breakdown:
# - Skill metadata: 100 tokens (always)
# - SKILL.md: 2,000 tokens (when financial analysis triggered)
# - SCHEMAS.md: 500 tokens (if needed)
# - PROCEDURES.md: 1,500 tokens (if needed)
# Total: 4,100 tokens vs 9,000 tokens
# Savings: 54% on this request
# But wait, it gets better...

The Script Multiplier: Zero-Token Execution

Here's where your accountant starts crying tears of joy:

Executable scripts never enter the context window.

When Claude runs a script, only the OUTPUT consumes tokens. The code itself? Zero.

# scripts/validate_revenue_data.py
import pandas as pd
import json
import sys

def validate_revenue_data(filepath):
    """
    Comprehensive revenue data validation.
    50 lines of sophisticated logic that you'd normally
    need to explain to Claude in excruciating detail.
    """
    df = pd.read_csv(filepath)

    issues = []

    # Check for negative revenue (red flag)
    if (df['revenue'] < 0).any():
        negative_count = (df['revenue'] < 0).sum()
        issues.append(f"CRITICAL: {negative_count} negative revenue entries")

    # Validate date continuity
    df['date'] = pd.to_datetime(df['date'])
    date_gaps = df['date'].diff().dt.days
    if (date_gaps > 1).any():
        issues.append(f"WARNING: Date gaps detected in data")

    # Check for outliers (3 sigma rule)
    mean_rev = df['revenue'].mean()
    std_rev = df['revenue'].std()
    outliers = df[abs(df['revenue'] - mean_rev) > 3 * std_rev]
    if len(outliers) > 0:
        issues.append(f"INFO: {len(outliers)} statistical outliers detected")

    # Validate currency consistency
    if df['currency'].nunique() > 1:
        issues.append("ERROR: Multiple currencies in dataset")

    # Check completeness
    missing_data = df.isnull().sum()
    if missing_data.any():
        issues.append(f"WARNING: Missing data in columns: {missing_data[missing_data > 0].to_dict()}")

    return {
        'valid': len([i for i in issues if 'ERROR' in i or 'CRITICAL' in i]) == 0,
        'issues': issues,
        'records': len(df),
        'date_range': f"{df['date'].min()} to {df['date'].max()}",
        'total_revenue': float(df['revenue'].sum()),
        'currency': df['currency'].iloc[0]
    }

if __name__ == "__main__":
    result = validate_revenue_data(sys.argv[1])
    print(json.dumps(result, indent=2))

Claude runs: python scripts/validate_revenue_data.py q4_revenue.csv

Claude gets back:

{
  "valid": true,
  "issues": ["INFO: 2 statistical outliers detected"],
  "records": 1247,
  "date_range": "2024-10-01 to 2024-12-31",
  "total_revenue": 1847293.42,
  "currency": "USD"
}

Token cost of that 50-line validation script: ZERO

If you made Claude write that logic from scratch every time? 800+ tokens. Per request. Forever.

Real Numbers From Real Production Systems

Let me show you what this looks like at scale.

Case Study: MarketingCo (B2B SaaS, 200 employees)

Before Skills:

# Every request included:
system_prompt = """
[Brand guidelines: 2,000 tokens]
[SEO requirements: 1,500 tokens]
[Product catalog: 3,000 tokens]
[Writing templates: 2,500 tokens]
[Competitor positioning: 1,000 tokens]
Total: 10,000 tokens per request
"""

# Monthly usage:
# - 100,000 API calls
# - 10,000 tokens per call = 1 billion input tokens
# - Claude Sonnet 4: $27/million tokens
# - Monthly cost: $27,000 (just system prompts!)

After Skills:

# Marketing Skill structure:
marketing-skill/
├── SKILL.md (2k tokens)
├── BRAND.md (loaded if brand voice needed)
├── SEO.md (loaded if SEO optimization needed)
├── products/
│   ├── product_a.json (loaded per product)
│   ├── product_b.json
│   └── ...
├── templates/ (loaded per content type)
└── scripts/
    └── check_seo.py (zero tokens, just runs)

# Typical blog post request:
# - Skill metadata: 100 tokens (always)
# - SKILL.md: 2,000 tokens (triggered)
# - SEO.md: 1,500 tokens (needed for blog)
# - templates/blog.md: 600 tokens (needed)
# - One product file: 200 tokens (needed)
# Total: 4,400 tokens per request

# New monthly cost:
# - 100,000 API calls
# - 4,400 tokens per call = 440 million input tokens
# - Monthly cost: $11,880
# SAVINGS: $15,120/month = $181,440/year

Case Study: FinTechStartup (Series A, Tight Margins)

The Setup:

Building AI-powered financial advisory tool
10,000 users, averaging 5 queries/day
50,000 API calls/day = 1.5M/month

Before Skills:

# System prompt included:
# - Compliance guidelines: 3,000 tokens
# - Investment strategies: 4,000 tokens
# - Risk assessment procedures: 2,000 tokens
# - Market data schemas: 2,500 tokens
# - Report formatting: 1,500 tokens
# Total: 13,000 tokens per request

# Monthly burn:
# 1.5M calls × 13,000 tokens = 19.5 billion tokens
# Cost: $526,500/month on input tokens alone
# Annual run rate: $6.3 MILLION

After Skills:

# Investment advisory Skill:
investment-skill/
├── SKILL.md (3k tokens)
├── COMPLIANCE.md (only if compliance check needed)
├── STRATEGIES.md (only if strategy analysis needed)
├── RISK_MODELS.md (only if risk assessment needed)
├── schemas/
│   └── market_data.sql (only if schema needed)
├── templates/
│   └── reports/ (only if report generation needed)
└── scripts/
    ├── risk_calculator.py (zero tokens)
    ├── compliance_check.py (zero tokens)
    └── portfolio_optimizer.py (zero tokens)

# Average request now:
# - Skill metadata: 100 tokens
# - SKILL.md: 3,000 tokens
# - One additional file: 2,000 tokens (average)
# - Script execution: 0 tokens
# Total: 5,100 tokens per request

# New monthly cost:
# 1.5M calls × 5,100 tokens = 7.65 billion tokens
# Cost: $206,550/month
# SAVINGS: $319,950/month = $3.8 MILLION/year

That's not a typo. $3.8 million saved per year.

The Multi-Skill Power Move

Here's where it gets spicy. You can combine up to 8 Skills per request:

# The old way: Everything in one bloated prompt
system_prompt = """
[Financial analysis: 5,000 tokens]
[Excel formatting: 2,000 tokens]
[PowerPoint templates: 2,500 tokens]
[Company branding: 1,500 tokens]
[Data validation: 2,000 tokens]
Total: 13,000 tokens every request
"""

# The Skills way: Load only what's needed
response = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=["code-execution-2025-08-25", "skills-2025-10-02"],
    container={
        "skills": [
            {"type": "custom", "skill_id": "skill_financial", "version": "latest"},
            {"type": "anthropic", "skill_id": "xlsx", "version": "latest"},
            {"type": "anthropic", "skill_id": "pptx", "version": "latest"}
        ]
    },
    messages=[{
        "role": "user",
        "content": "Analyze Q4 revenue and create an executive presentation"
    }],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# What actually loads:
# - All Skill metadata: 300 tokens (3 × 100)
# - Financial Skill: 3,000 tokens (triggered)
# - Excel Skill: 2,000 tokens (triggered for data analysis)
# - PowerPoint Skill: 2,500 tokens (triggered for presentation)
# Total: 7,800 tokens

# But if user just asks "What was our Q4 revenue?":
# - All Skill metadata: 300 tokens
# - Financial Skill: 3,000 tokens
# - Excel Skill: NOT loaded (not needed)
# - PowerPoint Skill: NOT loaded (not needed)
# Total: 3,300 tokens
# Savings: 58% on this simpler query

The Anthropic Skills: Free Optimization

Anthropic provides pre-built Skills for common tasks:

# Use Anthropic's optimized Skills for document generation
response = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=["code-execution-2025-08-25", "skills-2025-10-02"],
    container={
        "skills": [
            {"type": "anthropic", "skill_id": "xlsx", "version": "latest"},
            {"type": "anthropic", "skill_id": "pptx", "version": "latest"},
            {"type": "anthropic", "skill_id": "docx", "version": "latest"},
            {"type": "anthropic", "skill_id": "pdf", "version": "latest"}
        ]
    },
    messages=[{
        "role": "user",
        "content": "Create a quarterly report with data analysis and charts"
    }],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# These are maintained by Anthropic
# Zero effort, just better performance

Downloading Generated Files: The Right Way

When Skills create documents, you need to download them properly:

# Step 1: Create the document
response = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=["code-execution-2025-08-25", "skills-2025-10-02"],
    container={
        "skills": [{"type": "anthropic", "skill_id": "xlsx", "version": "latest"}]
    },
    messages=[{
        "role": "user",
        "content": "Create Q4 revenue analysis spreadsheet"
    }],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# Step 2: Extract file IDs from response
def extract_file_ids(response):
    file_ids = []
    for item in response.content:
        if item.type == 'bash_code_execution_tool_result':
            content_item = item.content
            if content_item.type == 'bash_code_execution_result':
                for file in content_item.content:
                    if hasattr(file, 'file_id'):
                        file_ids.append(file.file_id)
    return file_ids

# Step 3: Download using Files API
for file_id in extract_file_ids(response):
    # Get metadata
    file_metadata = client.beta.files.retrieve_metadata(
        file_id=file_id,
        betas=["files-api-2025-04-14"]
    )

    # Download content
    file_content = client.beta.files.download(
        file_id=file_id,
        betas=["files-api-2025-04-14"]
    )

    # Save to disk
    file_content.write_to_file(file_metadata.filename)
    print(f"Downloaded: {file_metadata.filename}")

Version Control: Production vs Development

Smart teams version their Skills:

# Production: Pin to specific versions for stability
production_config = {
    "skills": [{
        "type": "custom",
        "skill_id": "skill_01AbCdEfGhIjKlMnOpQrStUv",
        "version": "1759178010641129"  # Specific version timestamp
    }]
}

# Development: Use latest for rapid iteration
development_config = {
    "skills": [{
        "type": "custom",
        "skill_id": "skill_01AbCdEfGhIjKlMnOpQrStUv",
        "version": "latest"  # Always get newest
    }]
}

# Update workflow:
# 1. Develop and test with "latest"
from anthropic.lib import files_from_dir

new_version = client.beta.skills.versions.create(
    skill_id="skill_01AbCdEfGhIjKlMnOpQrStUv",
    files=files_from_dir("/path/to/updated_skill"),
    betas=["skills-2025-10-02"]
)

# 2. Deploy to production with specific version
# 3. Monitor and rollback if needed

The Prompt Caching Gotcha

Skills work WITH prompt caching, but there's a trap:

# This caches well:
response1 = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=[
        "code-execution-2025-08-25",
        "skills-2025-10-02",
        "prompt-caching-2024-07-31"
    ],
    container={
        "skills": [
            {"type": "anthropic", "skill_id": "xlsx", "version": "latest"}
        ]
    },
    messages=[{"role": "user", "content": "Analyze sales data"}],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# This breaks the cache (different Skills list):
response2 = client.beta.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=4096,
    betas=[
        "code-execution-2025-08-25",
        "skills-2025-10-02",
        "prompt-caching-2024-07-31"
    ],
    container={
        "skills": [
            {"type": "anthropic", "skill_id": "xlsx", "version": "latest"},
            {"type": "anthropic", "skill_id": "pptx", "version": "latest"}  # Cache miss
        ]
    },
    messages=[{"role": "user", "content": "Create presentation"}],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)

# Solution: Keep Skills list consistent for related workflows
# Group your API calls by Skill combinations

The Migration Checklist

Ready to stop hemorrhaging money? Here's your path:

Week 1: Audit

# Calculate your current burn
def audit_current_costs():
    """
    1. Count your API calls per month
    2. Measure your average system prompt length
    3. Calculate: calls × prompt_tokens × $27/million
    4. Cry into your spreadsheet
    """
    monthly_calls = 100000  # Your number
    avg_system_tokens = 8000  # Measure yours

    monthly_input_tokens = monthly_calls * avg_system_tokens
    monthly_cost = (monthly_input_tokens / 1_000_000) * 27

    print(f"Current monthly burn: ${monthly_cost:,.2f}")
    print(f"Annual run rate: ${monthly_cost * 12:,.2f}")
    return monthly_cost

Week 2: Extract Your First Skill

# Start with your most-used procedures
# Financial analysis, content generation, data validation, etc.

# Create skill directory:
"""
my-first-skill/
├── SKILL.md
└── README.md
"""

# SKILL.md template:
"""
---
name: my-company-analytics
description: Company-specific data analysis procedures. Use for financial analysis, revenue reporting, and KPI calculations.
---

# Analytics Skill

## Quick Start
For standard revenue analysis, use our PostgreSQL cluster.
Connection details in company vault.

## Common Tasks
- Revenue analysis: See [REVENUE.md](REVENUE.md)
- KPI dashboards: See [KPI.md](KPI.md)

## Data Sources
All company data sources documented in [SOURCES.md](SOURCES.md)
"""

Week 3: Upload and Test

from anthropic.lib import files_from_dir

# Upload
skill = client.beta.skills.create(
    display_title="Company Analytics",
    files=files_from_dir("/path/to/my-first-skill"),
    betas=["skills-2025-10-02"]
)

# Test with real queries
test_queries = [
    "Analyze Q4 revenue",
    "Generate monthly KPI report",
    "Compare revenue vs last quarter"
]

for query in test_queries:
    response = client.beta.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=4096,
        betas=["code-execution-2025-08-25", "skills-2025-10-02"],
        container={
            "skills": [{"type": "custom", "skill_id": skill.id, "version": "latest"}]
        },
        messages=[{"role": "user", "content": query}],
        tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
    )

    # Validate responses
    # Measure token usage
    # Calculate savings

Week 4: Measure and Scale

def measure_savings():
    """
    Track token usage before/after
    Calculate cost savings
    Project annual impact
    Send victory email to CFO
    """
    old_tokens_per_request = 8000
    new_tokens_per_request = 3500  # Your actual number

    savings_per_request = old_tokens_per_request - new_tokens_per_request
    monthly_calls = 100000

    monthly_savings = (monthly_calls * savings_per_request / 1_000_000) * 27
    annual_savings = monthly_savings * 12

    print(f"Monthly savings: ${monthly_savings:,.2f}")
    print(f"Annual savings: ${annual_savings:,.2f}")
    print(f"Efficiency gain: {(savings_per_request/old_tokens_per_request)*100:.1f}%")

The Bottom Line

Skills are not a "nice-to-have" feature. They're a fundamental architectural shift in how you should build with LLMs.

Stop doing this:

Copying 10,000-token system prompts into every API call
Burning $500K+ annually on repeated context
Explaining the same procedures to Claude 100,000 times

Start doing this:

Load instructions on-demand (54-75% token reduction)
Execute scripts at zero token cost
Compose Skills for complex workflows
Version your domain knowledge like code

The math is brutal: a mid-size SaaS company making 1.5M API calls/month with bloated prompts burns $526K monthly. With Skills, that drops to $206K. That's $3.8M saved per year.

Your investors didn't give you millions to spend it on redundant context loading.

Fix your architecture. Deploy Skills. Save money.

P.S. - If you're still copying system prompts after reading this, I can't help you. But your competitors using Skills will eat your lunch while spending 60% less on AI costs.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.