ARE EXTRA SERVICES BURNING YOUR TOKENS?
I built an autonomous AI trading agent that runs 24/7, scanning hundreds of crypto and traditional finance markets, analyzing technical indicators, and executing trades when signals align.
It worked perfectly — until I checked the bill. Continue reading →
The Problem: Silent Token Bleeding
Every 60 seconds, the system called an expensive AI model to analyze potentially dozens of markets simultaneously. Even when markets were quiet or showing weak signals, it still ran full AI analysis.
| Metric | Before |
|---|---|
| AI calls/day | 7,200 |
| Daily token cost | $8-$52 |
| Monthly cost | $240-$600 |
I was paying premium AI prices for noise.
The Architecture (What Changed)
Before: Scan → Trigger → AI Research → Execute
↓
Every trigger burns tokens
After: Scan → Trigger → TA Filter → AI Research → Execute
(cheap) ↑
Only CONFIRMED signals
The key insight: use expensive AI as a last resort, not a first step.
Layer 1: Skill System Purge
My agent loaded 100+ skills on every turn — ASCII art generators, pixel art tools, Minecraft server managers, smart home controllers. None of them relevant to autonomous trading.
Fix: Removed 16 unnecessary categories (~80+ skills). System prompt shrank 50-60%. Each turn saves ~2,000+ tokens.
Layer 2: Pre-AI Technical Analysis Filter
Built a statistical pre-filter that runs multi-timeframe technical analysis:
- EMA crossovers (1h/4h/1d)
- RSI, ADX, ATR calculations
- Volume confirmation
- Trend alignment scoring
All pure computation. Zero AI cost. Only signals scoring ≥65/100 as "CONFIRMED" proceed to AI analysis.
Layer 3: Systematic Tuning
| Setting | Before | After |
|---|---|---|
| Scan interval | 60s | 3min |
| Trigger threshold | 75 | 80 |
| Max AI per cycle | 5 | 2 |
| Max tokens/call | 2048 | 1024 |
| News fetch | Every call | Removed |
| System prompt | ~2,200 chars | ~1,645 chars |
The Results
| Metric | Before | After | Reduction |
|---|---|---|---|
| AI calls/day | 7,200 | ~960 | 87% |
| Daily cost | $8-$52 | $3-$10 | 80%+ |
| System prompt | 2,200 chars | 1,645 chars | 25% |
| Monthly cost | $240-$600 | $90-$300 | $150-$300 saved |
The Architecture That Emerged
The system now has five distinct layers, each reducing load for the next:
- Heartbeat (every 3 min) — scans 230+ markets
- Trigger Engine — fires statistical signals (price spikes, volume, breakouts)
- TA Filter — multi-TF analysis, scores signals CONFIRMED/WEAK/REJECTED
- AI Research (only CONFIRMED) — deep analysis with reasoning
- Risk Gates — 10 independent compliance checks before execution
The AI model is deployed only when statistical analysis has already validated the opportunity.
Key Lessons for Building AI Systems
- Never let AI compute what you can calculate — Technical indicators are math. Run them cheaply first.
- Every service loaded costs something — Don't load skills your agent doesn't need. They accumulate.
- Align frequency with signal timeframes — Trading 4-hour candles? Don't scan every 60 seconds.
- Use statistical thresholds before AI — Filter with math, reserve AI for nuance.
- Build defensive architecture — Each layer should reduce workload for the next.
Why This Matters
This isn't just about saving tokens. It demonstrates:
- Systems thinking — Mapped entire architecture, identified bottlenecks systematically
- Data-driven optimization — Measured before/after metrics with clear ROI
- Real-world AI deployment — Running 24/7 AI systems in production
- Engineering rigor — Layered architecture where each component has a specific purpose
Built with Hermes Agent, Next.js 16, Hyperliquid API, and OpenRouter. Source code on GitHub. I'm a Hermes Agent contributor — if your team builds AI systems and needs someone who knows where the hidden costs live, let me talk.
Top comments (0)