Julian Martinez

Posted on May 12

I Slashed My AI Trading Agent Token Costs by 80% — Here's the Architecture

#hermes #ai #automation #opensource

ARE EXTRA SERVICES BURNING YOUR TOKENS?

I built an autonomous AI trading agent that runs 24/7, scanning hundreds of crypto and traditional finance markets, analyzing technical indicators, and executing trades when signals align.

It worked perfectly — until I checked the bill. Continue reading →

The Problem: Silent Token Bleeding

Every 60 seconds, the system called an expensive AI model to analyze potentially dozens of markets simultaneously. Even when markets were quiet or showing weak signals, it still ran full AI analysis.

Metric	Before
AI calls/day	7,200
Daily token cost	$8-$52
Monthly cost	$240-$600

I was paying premium AI prices for noise.

The Architecture (What Changed)

Before: Scan → Trigger → AI Research → Execute
                                ↓
                        Every trigger burns tokens

After: Scan → Trigger → TA Filter → AI Research → Execute
                          (cheap)      ↑
                              Only CONFIRMED signals

The key insight: use expensive AI as a last resort, not a first step.

Layer 1: Skill System Purge

My agent loaded 100+ skills on every turn — ASCII art generators, pixel art tools, Minecraft server managers, smart home controllers. None of them relevant to autonomous trading.

Fix: Removed 16 unnecessary categories (~80+ skills). System prompt shrank 50-60%. Each turn saves ~2,000+ tokens.

Layer 2: Pre-AI Technical Analysis Filter

Built a statistical pre-filter that runs multi-timeframe technical analysis:

EMA crossovers (1h/4h/1d)
RSI, ADX, ATR calculations
Volume confirmation
Trend alignment scoring

All pure computation. Zero AI cost. Only signals scoring ≥65/100 as "CONFIRMED" proceed to AI analysis.

Layer 3: Systematic Tuning

Setting	Before	After
Scan interval	60s	3min
Trigger threshold	75	80
Max AI per cycle	5	2
Max tokens/call	2048	1024
News fetch	Every call	Removed
System prompt	~2,200 chars	~1,645 chars

The Results

Metric	Before	After	Reduction
AI calls/day	7,200	~960	87%
Daily cost	$8-$52	$3-$10	80%+
System prompt	2,200 chars	1,645 chars	25%
Monthly cost	$240-$600	$90-$300	$150-$300 saved

The Architecture That Emerged

The system now has five distinct layers, each reducing load for the next:

Heartbeat (every 3 min) — scans 230+ markets
Trigger Engine — fires statistical signals (price spikes, volume, breakouts)
TA Filter — multi-TF analysis, scores signals CONFIRMED/WEAK/REJECTED
AI Research (only CONFIRMED) — deep analysis with reasoning
Risk Gates — 10 independent compliance checks before execution

The AI model is deployed only when statistical analysis has already validated the opportunity.

Key Lessons for Building AI Systems

Never let AI compute what you can calculate — Technical indicators are math. Run them cheaply first.
Every service loaded costs something — Don't load skills your agent doesn't need. They accumulate.
Align frequency with signal timeframes — Trading 4-hour candles? Don't scan every 60 seconds.
Use statistical thresholds before AI — Filter with math, reserve AI for nuance.
Build defensive architecture — Each layer should reduce workload for the next.

Why This Matters

This isn't just about saving tokens. It demonstrates:

Systems thinking — Mapped entire architecture, identified bottlenecks systematically
Data-driven optimization — Measured before/after metrics with clear ROI
Real-world AI deployment — Running 24/7 AI systems in production
Engineering rigor — Layered architecture where each component has a specific purpose

Built with Hermes Agent, Next.js 16, Hyperliquid API, and OpenRouter. Source code on GitHub. I'm a Hermes Agent contributor — if your team builds AI systems and needs someone who knows where the hidden costs live, let me talk.

Top comments (6)

Vadym Arnaut • May 13

The pre-AI statistical filter is the layer most teams underuse. We
saw a similar 60-70% drop in another domain (auto-translation of user
content for our LMS) by not sending text to the model unless a content
hash had actually changed. Same shape -- cheap deterministic gate before
the expensive call. Two things that compounded for us: caching keyed
by (content_hash, source_lang, target_lang) instead of (entity_id,
locale), and bundling canonical artifacts (Bible verses in our case)
as substitution tokens so the model only translates the prose around
them, not the verbatim quotes. Wonder if you've explored a similar
cache layer for the AI analysis stage -- same signal triggering the
same prompt-shape should ideally hit a cache, not re-invoke.

Andrii Krugliak • May 15

The Layer 1 skill purge is the part nobody talks about. I spent a week running four agents and discovered I'd been shipping the full 100-skill manifest to every turn same problem on a totally different domain. Trimming to 12 skills cut my context burn by ~60% and the agents made fewer wrong-tool choices, which I didn't predict. Cheap pre-filter first, expensive reasoning last is a useful rule. The thing your post doesn't address: when the cheap TA filter is wrong, you lose the trade you would have won.

Harjot Singh • May 30

80% reduction is exactly the kind of result that shows up when you stop treating the agent as one monolithic premium call. Curious how much of your win came from each lever - in these architectures it's usually some mix of: caching the stable context instead of re-sending it, pruning what actually goes in the window, and routing cheap calls to a small model.

The trading-agent angle adds a nice twist: a lot of the loop is structured, repetitive classification/parsing that a small fast model nails for cents, and you only need the expensive reasoning model for the genuinely ambiguous decisions. That task-difficulty split is where the 80% almost always hides. Would love to see the actual model-routing breakdown if you do a follow-up - which calls you kept on the big model vs pushed down. Great architecture writeup.

VishnuRv • May 13

yup! this really helped me

Some comments may only be visible to logged-in visitors. Sign in to view all comments.