DEV Community

Julian Martinez
Julian Martinez

Posted on

I Slashed My AI Trading Agent Token Costs by 80% — Here's the Architecture

ARE EXTRA SERVICES BURNING YOUR TOKENS?

I built an autonomous AI trading agent that runs 24/7, scanning hundreds of crypto and traditional finance markets, analyzing technical indicators, and executing trades when signals align.

It worked perfectly — until I checked the bill. Continue reading →

The Problem: Silent Token Bleeding

Every 60 seconds, the system called an expensive AI model to analyze potentially dozens of markets simultaneously. Even when markets were quiet or showing weak signals, it still ran full AI analysis.

Metric Before
AI calls/day 7,200
Daily token cost $8-$52
Monthly cost $240-$600

I was paying premium AI prices for noise.


The Architecture (What Changed)

Before: Scan → Trigger → AI Research → Execute
                                ↓
                        Every trigger burns tokens

After: Scan → Trigger → TA Filter → AI Research → Execute
                          (cheap)      ↑
                              Only CONFIRMED signals
Enter fullscreen mode Exit fullscreen mode

The key insight: use expensive AI as a last resort, not a first step.

Layer 1: Skill System Purge

My agent loaded 100+ skills on every turn — ASCII art generators, pixel art tools, Minecraft server managers, smart home controllers. None of them relevant to autonomous trading.

Fix: Removed 16 unnecessary categories (~80+ skills). System prompt shrank 50-60%. Each turn saves ~2,000+ tokens.

Layer 2: Pre-AI Technical Analysis Filter

Built a statistical pre-filter that runs multi-timeframe technical analysis:

  • EMA crossovers (1h/4h/1d)
  • RSI, ADX, ATR calculations
  • Volume confirmation
  • Trend alignment scoring

All pure computation. Zero AI cost. Only signals scoring ≥65/100 as "CONFIRMED" proceed to AI analysis.

Layer 3: Systematic Tuning

Setting Before After
Scan interval 60s 3min
Trigger threshold 75 80
Max AI per cycle 5 2
Max tokens/call 2048 1024
News fetch Every call Removed
System prompt ~2,200 chars ~1,645 chars

The Results

Metric Before After Reduction
AI calls/day 7,200 ~960 87%
Daily cost $8-$52 $3-$10 80%+
System prompt 2,200 chars 1,645 chars 25%
Monthly cost $240-$600 $90-$300 $150-$300 saved

The Architecture That Emerged

The system now has five distinct layers, each reducing load for the next:

  1. Heartbeat (every 3 min) — scans 230+ markets
  2. Trigger Engine — fires statistical signals (price spikes, volume, breakouts)
  3. TA Filter — multi-TF analysis, scores signals CONFIRMED/WEAK/REJECTED
  4. AI Research (only CONFIRMED) — deep analysis with reasoning
  5. Risk Gates — 10 independent compliance checks before execution

The AI model is deployed only when statistical analysis has already validated the opportunity.

Key Lessons for Building AI Systems

  1. Never let AI compute what you can calculate — Technical indicators are math. Run them cheaply first.
  2. Every service loaded costs something — Don't load skills your agent doesn't need. They accumulate.
  3. Align frequency with signal timeframes — Trading 4-hour candles? Don't scan every 60 seconds.
  4. Use statistical thresholds before AI — Filter with math, reserve AI for nuance.
  5. Build defensive architecture — Each layer should reduce workload for the next.

Why This Matters

This isn't just about saving tokens. It demonstrates:

  • Systems thinking — Mapped entire architecture, identified bottlenecks systematically
  • Data-driven optimization — Measured before/after metrics with clear ROI
  • Real-world AI deployment — Running 24/7 AI systems in production
  • Engineering rigor — Layered architecture where each component has a specific purpose

Built with Hermes Agent, Next.js 16, Hyperliquid API, and OpenRouter. Source code on GitHub. I'm a Hermes Agent contributor — if your team builds AI systems and needs someone who knows where the hidden costs live, let me talk.

Top comments (6)

Collapse
 
arvavit profile image
Vadym Arnaut

The pre-AI statistical filter is the layer most teams underuse. We
saw a similar 60-70% drop in another domain (auto-translation of user
content for our LMS) by not sending text to the model unless a content
hash had actually changed. Same shape -- cheap deterministic gate before
the expensive call. Two things that compounded for us: caching keyed
by (content_hash, source_lang, target_lang) instead of (entity_id,
locale), and bundling canonical artifacts (Bible verses in our case)
as substitution tokens so the model only translates the prose around
them, not the verbatim quotes. Wonder if you've explored a similar
cache layer for the AI analysis stage -- same signal triggering the
same prompt-shape should ideally hit a cache, not re-invoke.

Collapse
 
theuniverseson profile image
Andrii Krugliak

The Layer 1 skill purge is the part nobody talks about. I spent a week running four agents and discovered I'd been shipping the full 100-skill manifest to every turn same problem on a totally different domain. Trimming to 12 skills cut my context burn by ~60% and the agents made fewer wrong-tool choices, which I didn't predict. Cheap pre-filter first, expensive reasoning last is a useful rule. The thing your post doesn't address: when the cheap TA filter is wrong, you lose the trade you would have won.

Collapse
 
harjjotsinghh profile image
Harjot Singh

80% reduction is exactly the kind of result that shows up when you stop treating the agent as one monolithic premium call. Curious how much of your win came from each lever - in these architectures it's usually some mix of: caching the stable context instead of re-sending it, pruning what actually goes in the window, and routing cheap calls to a small model.

The trading-agent angle adds a nice twist: a lot of the loop is structured, repetitive classification/parsing that a small fast model nails for cents, and you only need the expensive reasoning model for the genuinely ambiguous decisions. That task-difficulty split is where the 80% almost always hides. Would love to see the actual model-routing breakdown if you do a follow-up - which calls you kept on the big model vs pushed down. Great architecture writeup.

Collapse
 
dev-rv profile image
VishnuRv

yup! this really helped me

Some comments may only be visible to logged-in visitors. Sign in to view all comments.