Let me guess: you're building a SaaS product with Claude. You've got customers making API calls, burning through tokens like a Tesla on fire. Your AWS bill looks like a phone number. And somewhere, a product manager is asking why your margins are trash.
I'll tell you why: you're shoving encyclopedias into every single API call when you could be carrying a library card.
The $10K/Month Mistake
Here's what I see in every production codebase I audit:
# Your current nightmare
system_prompt = """
You are DataCorp's financial analyst assistant.
BRAND GUIDELINES:
- Use active voice in all communications
- Keep sentences under 20 words
- Reference our Q4 2024 performance metrics
- Always cite from our approved data sources
... [2,000 more words]
FINANCIAL ANALYSIS PROCEDURES:
1. Always validate data sources
2. Calculate rolling 12-month averages
3. Compare to industry benchmarks from FinData Inc.
4. Generate variance reports
... [1,500 more words]
DATA SOURCES:
Our approved databases are:
- PostgreSQL cluster at db.datacorp.com
Schema: [500 words of table definitions]
- MongoDB analytics at analytics.datacorp.com
Collections: [400 words of collection schemas]
- S3 data lake structure: [600 words]
EXCEL FORMATTING STANDARDS:
- Header row always bold, 12pt Calibri
- Currency formatted to 2 decimals
- Charts use company colors: #1E3A8A, #3B82F6, #60A5FA
... [800 more words]
REPORT TEMPLATES:
[Monthly report template: 500 words]
[Quarterly analysis template: 700 words]
[Executive summary template: 400 words]
Now please analyze this quarter's revenue data.
"""
# Token cost per request: ~9,000 tokens
# Requests per month: 50,000
# Cost at Claude Sonnet 4: $27/million input tokens
# Your monthly burn: $12,150
# And that's JUST the system prompt
Every. Single. API. Call.
You're paying to load the same information 50,000 times a month. That's not engineering—that's setting money on fire while your CFO watches.
The Skill Solution: Pay Per Use, Not Per Pray
Skills work on progressive disclosure. Claude loads them in three levels:
Level 1: Metadata (~100 tokens, ALWAYS loaded)
---
name: datacorp-financial-analysis
description: "Analyze financial data following DataCorp standards and procedures. Use for revenue analysis, variance reports, and executive summaries."
---
That's it. 100 tokens. Claude now knows this Skill exists.
Level 2: Instructions (Under 5k tokens, loaded WHEN TRIGGERED)
Only loaded when someone actually asks for financial analysis:
# DataCorp Financial Analysis Skill
## Quick Start
Use our PostgreSQL cluster for source data.
For schema details, see [SCHEMAS.md](SCHEMAS.md).
For Excel formatting, see [FORMATTING.md](FORMATTING.md).
## Analysis Procedures
1. Validate data sources
2. Calculate 12-month rolling averages
3. Compare to benchmarks
4. Generate variance reports
For detailed procedures, see [PROCEDURES.md](PROCEDURES.md).
Level 3: Resources (Unlimited, loaded AS NEEDED)
financial-analysis-skill/
├── SKILL.md # Loaded when Skill triggers
├── SCHEMAS.md # Only if Claude needs schema details
├── FORMATTING.md # Only if creating Excel files
├── PROCEDURES.md # Only if doing complex analysis
├── templates/
│ ├── monthly.md # Only if monthly report requested
│ ├── quarterly.md # Only if quarterly report requested
│ └── executive.md # Only if exec summary requested
└── scripts/
├── validate_data.py # Never enters context, just executes
└── benchmark.py # Never enters context, just executes
Now watch your token economics transform:
import anthropic
client = anthropic.Anthropic()
# One-time setup: Upload your Skill
from anthropic.lib import files_from_dir
skill = client.beta.skills.create(
display_title="DataCorp Financial Analysis",
files=files_from_dir("/path/to/financial_analysis_skill"),
betas=["skills-2025-10-02"]
)
# Now your API calls look like this:
response = client.beta.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
betas=["code-execution-2025-08-25", "skills-2025-10-02"],
container={
"skills": [{
"type": "custom",
"skill_id": skill.id,
"version": "latest"
}]
},
messages=[{
"role": "user",
"content": "Analyze Q4 revenue vs Q3"
}],
tools=[{
"type": "code_execution_20250825",
"name": "code_execution"
}]
)
# Token cost breakdown:
# - Skill metadata: 100 tokens (always)
# - SKILL.md: 2,000 tokens (when financial analysis triggered)
# - SCHEMAS.md: 500 tokens (if needed)
# - PROCEDURES.md: 1,500 tokens (if needed)
# Total: 4,100 tokens vs 9,000 tokens
# Savings: 54% on this request
# But wait, it gets better...
The Script Multiplier: Zero-Token Execution
Here's where your accountant starts crying tears of joy:
Executable scripts never enter the context window.
When Claude runs a script, only the OUTPUT consumes tokens. The code itself? Zero.
# scripts/validate_revenue_data.py
import pandas as pd
import json
import sys
def validate_revenue_data(filepath):
"""
Comprehensive revenue data validation.
50 lines of sophisticated logic that you'd normally
need to explain to Claude in excruciating detail.
"""
df = pd.read_csv(filepath)
issues = []
# Check for negative revenue (red flag)
if (df['revenue'] < 0).any():
negative_count = (df['revenue'] < 0).sum()
issues.append(f"CRITICAL: {negative_count} negative revenue entries")
# Validate date continuity
df['date'] = pd.to_datetime(df['date'])
date_gaps = df['date'].diff().dt.days
if (date_gaps > 1).any():
issues.append(f"WARNING: Date gaps detected in data")
# Check for outliers (3 sigma rule)
mean_rev = df['revenue'].mean()
std_rev = df['revenue'].std()
outliers = df[abs(df['revenue'] - mean_rev) > 3 * std_rev]
if len(outliers) > 0:
issues.append(f"INFO: {len(outliers)} statistical outliers detected")
# Validate currency consistency
if df['currency'].nunique() > 1:
issues.append("ERROR: Multiple currencies in dataset")
# Check completeness
missing_data = df.isnull().sum()
if missing_data.any():
issues.append(f"WARNING: Missing data in columns: {missing_data[missing_data > 0].to_dict()}")
return {
'valid': len([i for i in issues if 'ERROR' in i or 'CRITICAL' in i]) == 0,
'issues': issues,
'records': len(df),
'date_range': f"{df['date'].min()} to {df['date'].max()}",
'total_revenue': float(df['revenue'].sum()),
'currency': df['currency'].iloc[0]
}
if __name__ == "__main__":
result = validate_revenue_data(sys.argv[1])
print(json.dumps(result, indent=2))
Claude runs: python scripts/validate_revenue_data.py q4_revenue.csv
Claude gets back:
{
"valid": true,
"issues": ["INFO: 2 statistical outliers detected"],
"records": 1247,
"date_range": "2024-10-01 to 2024-12-31",
"total_revenue": 1847293.42,
"currency": "USD"
}
Token cost of that 50-line validation script: ZERO
If you made Claude write that logic from scratch every time? 800+ tokens. Per request. Forever.
Real Numbers From Real Production Systems
Let me show you what this looks like at scale.
Case Study: MarketingCo (B2B SaaS, 200 employees)
Before Skills:
# Every request included:
system_prompt = """
[Brand guidelines: 2,000 tokens]
[SEO requirements: 1,500 tokens]
[Product catalog: 3,000 tokens]
[Writing templates: 2,500 tokens]
[Competitor positioning: 1,000 tokens]
Total: 10,000 tokens per request
"""
# Monthly usage:
# - 100,000 API calls
# - 10,000 tokens per call = 1 billion input tokens
# - Claude Sonnet 4: $27/million tokens
# - Monthly cost: $27,000 (just system prompts!)
After Skills:
# Marketing Skill structure:
marketing-skill/
├── SKILL.md (2k tokens)
├── BRAND.md (loaded if brand voice needed)
├── SEO.md (loaded if SEO optimization needed)
├── products/
│ ├── product_a.json (loaded per product)
│ ├── product_b.json
│ └── ...
├── templates/ (loaded per content type)
└── scripts/
└── check_seo.py (zero tokens, just runs)
# Typical blog post request:
# - Skill metadata: 100 tokens (always)
# - SKILL.md: 2,000 tokens (triggered)
# - SEO.md: 1,500 tokens (needed for blog)
# - templates/blog.md: 600 tokens (needed)
# - One product file: 200 tokens (needed)
# Total: 4,400 tokens per request
# New monthly cost:
# - 100,000 API calls
# - 4,400 tokens per call = 440 million input tokens
# - Monthly cost: $11,880
# SAVINGS: $15,120/month = $181,440/year
Case Study: FinTechStartup (Series A, Tight Margins)
The Setup:
- Building AI-powered financial advisory tool
- 10,000 users, averaging 5 queries/day
- 50,000 API calls/day = 1.5M/month
Before Skills:
# System prompt included:
# - Compliance guidelines: 3,000 tokens
# - Investment strategies: 4,000 tokens
# - Risk assessment procedures: 2,000 tokens
# - Market data schemas: 2,500 tokens
# - Report formatting: 1,500 tokens
# Total: 13,000 tokens per request
# Monthly burn:
# 1.5M calls × 13,000 tokens = 19.5 billion tokens
# Cost: $526,500/month on input tokens alone
# Annual run rate: $6.3 MILLION
After Skills:
# Investment advisory Skill:
investment-skill/
├── SKILL.md (3k tokens)
├── COMPLIANCE.md (only if compliance check needed)
├── STRATEGIES.md (only if strategy analysis needed)
├── RISK_MODELS.md (only if risk assessment needed)
├── schemas/
│ └── market_data.sql (only if schema needed)
├── templates/
│ └── reports/ (only if report generation needed)
└── scripts/
├── risk_calculator.py (zero tokens)
├── compliance_check.py (zero tokens)
└── portfolio_optimizer.py (zero tokens)
# Average request now:
# - Skill metadata: 100 tokens
# - SKILL.md: 3,000 tokens
# - One additional file: 2,000 tokens (average)
# - Script execution: 0 tokens
# Total: 5,100 tokens per request
# New monthly cost:
# 1.5M calls × 5,100 tokens = 7.65 billion tokens
# Cost: $206,550/month
# SAVINGS: $319,950/month = $3.8 MILLION/year
That's not a typo. $3.8 million saved per year.
The Multi-Skill Power Move
Here's where it gets spicy. You can combine up to 8 Skills per request:
# The old way: Everything in one bloated prompt
system_prompt = """
[Financial analysis: 5,000 tokens]
[Excel formatting: 2,000 tokens]
[PowerPoint templates: 2,500 tokens]
[Company branding: 1,500 tokens]
[Data validation: 2,000 tokens]
Total: 13,000 tokens every request
"""
# The Skills way: Load only what's needed
response = client.beta.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
betas=["code-execution-2025-08-25", "skills-2025-10-02"],
container={
"skills": [
{"type": "custom", "skill_id": "skill_financial", "version": "latest"},
{"type": "anthropic", "skill_id": "xlsx", "version": "latest"},
{"type": "anthropic", "skill_id": "pptx", "version": "latest"}
]
},
messages=[{
"role": "user",
"content": "Analyze Q4 revenue and create an executive presentation"
}],
tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)
# What actually loads:
# - All Skill metadata: 300 tokens (3 × 100)
# - Financial Skill: 3,000 tokens (triggered)
# - Excel Skill: 2,000 tokens (triggered for data analysis)
# - PowerPoint Skill: 2,500 tokens (triggered for presentation)
# Total: 7,800 tokens
# But if user just asks "What was our Q4 revenue?":
# - All Skill metadata: 300 tokens
# - Financial Skill: 3,000 tokens
# - Excel Skill: NOT loaded (not needed)
# - PowerPoint Skill: NOT loaded (not needed)
# Total: 3,300 tokens
# Savings: 58% on this simpler query
The Anthropic Skills: Free Optimization
Anthropic provides pre-built Skills for common tasks:
# Use Anthropic's optimized Skills for document generation
response = client.beta.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
betas=["code-execution-2025-08-25", "skills-2025-10-02"],
container={
"skills": [
{"type": "anthropic", "skill_id": "xlsx", "version": "latest"},
{"type": "anthropic", "skill_id": "pptx", "version": "latest"},
{"type": "anthropic", "skill_id": "docx", "version": "latest"},
{"type": "anthropic", "skill_id": "pdf", "version": "latest"}
]
},
messages=[{
"role": "user",
"content": "Create a quarterly report with data analysis and charts"
}],
tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)
# These are maintained by Anthropic
# Zero effort, just better performance
Downloading Generated Files: The Right Way
When Skills create documents, you need to download them properly:
# Step 1: Create the document
response = client.beta.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
betas=["code-execution-2025-08-25", "skills-2025-10-02"],
container={
"skills": [{"type": "anthropic", "skill_id": "xlsx", "version": "latest"}]
},
messages=[{
"role": "user",
"content": "Create Q4 revenue analysis spreadsheet"
}],
tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)
# Step 2: Extract file IDs from response
def extract_file_ids(response):
file_ids = []
for item in response.content:
if item.type == 'bash_code_execution_tool_result':
content_item = item.content
if content_item.type == 'bash_code_execution_result':
for file in content_item.content:
if hasattr(file, 'file_id'):
file_ids.append(file.file_id)
return file_ids
# Step 3: Download using Files API
for file_id in extract_file_ids(response):
# Get metadata
file_metadata = client.beta.files.retrieve_metadata(
file_id=file_id,
betas=["files-api-2025-04-14"]
)
# Download content
file_content = client.beta.files.download(
file_id=file_id,
betas=["files-api-2025-04-14"]
)
# Save to disk
file_content.write_to_file(file_metadata.filename)
print(f"Downloaded: {file_metadata.filename}")
Version Control: Production vs Development
Smart teams version their Skills:
# Production: Pin to specific versions for stability
production_config = {
"skills": [{
"type": "custom",
"skill_id": "skill_01AbCdEfGhIjKlMnOpQrStUv",
"version": "1759178010641129" # Specific version timestamp
}]
}
# Development: Use latest for rapid iteration
development_config = {
"skills": [{
"type": "custom",
"skill_id": "skill_01AbCdEfGhIjKlMnOpQrStUv",
"version": "latest" # Always get newest
}]
}
# Update workflow:
# 1. Develop and test with "latest"
from anthropic.lib import files_from_dir
new_version = client.beta.skills.versions.create(
skill_id="skill_01AbCdEfGhIjKlMnOpQrStUv",
files=files_from_dir("/path/to/updated_skill"),
betas=["skills-2025-10-02"]
)
# 2. Deploy to production with specific version
# 3. Monitor and rollback if needed
The Prompt Caching Gotcha
Skills work WITH prompt caching, but there's a trap:
# This caches well:
response1 = client.beta.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
betas=[
"code-execution-2025-08-25",
"skills-2025-10-02",
"prompt-caching-2024-07-31"
],
container={
"skills": [
{"type": "anthropic", "skill_id": "xlsx", "version": "latest"}
]
},
messages=[{"role": "user", "content": "Analyze sales data"}],
tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)
# This breaks the cache (different Skills list):
response2 = client.beta.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
betas=[
"code-execution-2025-08-25",
"skills-2025-10-02",
"prompt-caching-2024-07-31"
],
container={
"skills": [
{"type": "anthropic", "skill_id": "xlsx", "version": "latest"},
{"type": "anthropic", "skill_id": "pptx", "version": "latest"} # Cache miss
]
},
messages=[{"role": "user", "content": "Create presentation"}],
tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)
# Solution: Keep Skills list consistent for related workflows
# Group your API calls by Skill combinations
The Migration Checklist
Ready to stop hemorrhaging money? Here's your path:
Week 1: Audit
# Calculate your current burn
def audit_current_costs():
"""
1. Count your API calls per month
2. Measure your average system prompt length
3. Calculate: calls × prompt_tokens × $27/million
4. Cry into your spreadsheet
"""
monthly_calls = 100000 # Your number
avg_system_tokens = 8000 # Measure yours
monthly_input_tokens = monthly_calls * avg_system_tokens
monthly_cost = (monthly_input_tokens / 1_000_000) * 27
print(f"Current monthly burn: ${monthly_cost:,.2f}")
print(f"Annual run rate: ${monthly_cost * 12:,.2f}")
return monthly_cost
Week 2: Extract Your First Skill
# Start with your most-used procedures
# Financial analysis, content generation, data validation, etc.
# Create skill directory:
"""
my-first-skill/
├── SKILL.md
└── README.md
"""
# SKILL.md template:
"""
---
name: my-company-analytics
description: Company-specific data analysis procedures. Use for financial analysis, revenue reporting, and KPI calculations.
---
# Analytics Skill
## Quick Start
For standard revenue analysis, use our PostgreSQL cluster.
Connection details in company vault.
## Common Tasks
- Revenue analysis: See [REVENUE.md](REVENUE.md)
- KPI dashboards: See [KPI.md](KPI.md)
## Data Sources
All company data sources documented in [SOURCES.md](SOURCES.md)
"""
Week 3: Upload and Test
from anthropic.lib import files_from_dir
# Upload
skill = client.beta.skills.create(
display_title="Company Analytics",
files=files_from_dir("/path/to/my-first-skill"),
betas=["skills-2025-10-02"]
)
# Test with real queries
test_queries = [
"Analyze Q4 revenue",
"Generate monthly KPI report",
"Compare revenue vs last quarter"
]
for query in test_queries:
response = client.beta.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
betas=["code-execution-2025-08-25", "skills-2025-10-02"],
container={
"skills": [{"type": "custom", "skill_id": skill.id, "version": "latest"}]
},
messages=[{"role": "user", "content": query}],
tools=[{"type": "code_execution_20250825", "name": "code_execution"}]
)
# Validate responses
# Measure token usage
# Calculate savings
Week 4: Measure and Scale
def measure_savings():
"""
Track token usage before/after
Calculate cost savings
Project annual impact
Send victory email to CFO
"""
old_tokens_per_request = 8000
new_tokens_per_request = 3500 # Your actual number
savings_per_request = old_tokens_per_request - new_tokens_per_request
monthly_calls = 100000
monthly_savings = (monthly_calls * savings_per_request / 1_000_000) * 27
annual_savings = monthly_savings * 12
print(f"Monthly savings: ${monthly_savings:,.2f}")
print(f"Annual savings: ${annual_savings:,.2f}")
print(f"Efficiency gain: {(savings_per_request/old_tokens_per_request)*100:.1f}%")
The Bottom Line
Skills are not a "nice-to-have" feature. They're a fundamental architectural shift in how you should build with LLMs.
Stop doing this:
- Copying 10,000-token system prompts into every API call
- Burning $500K+ annually on repeated context
- Explaining the same procedures to Claude 100,000 times
Start doing this:
- Load instructions on-demand (54-75% token reduction)
- Execute scripts at zero token cost
- Compose Skills for complex workflows
- Version your domain knowledge like code
The math is brutal: a mid-size SaaS company making 1.5M API calls/month with bloated prompts burns $526K monthly. With Skills, that drops to $206K. That's $3.8M saved per year.
Your investors didn't give you millions to spend it on redundant context loading.
Fix your architecture. Deploy Skills. Save money.
P.S. - If you're still copying system prompts after reading this, I can't help you. But your competitors using Skills will eat your lunch while spending 60% less on AI costs.
Top comments (0)