My trading bot lost $176 in its first real backtest.
Not because of a bug. Not because of bad data. The algorithm was working exactly as designed it just couldn't figure out when to exit trades.
The bot would enter positions with 48.6% accuracy (better than random), hold them for an average of 27 bars, and then... panic. It would close winning trades too early and hold losing trades too long. Classic human behavior, except this was supposed to be an emotionless machine.
That was Run 4. Two runs later (Run 5 and Run 6B), I had a system that generated $507 profit on completely unseen 2024-2025 data (1.87 years, 45,246 bars), with a Sharpe ratio of 6.94 and max drawdown of 0.98%.
For perspective: With proper position sizing (Half Kelly), that same system could turn $10K into $102K over the same period. Compare that to:
- Savings account (5% APY): $11,025
- S&P 500 (11% avg): $12,321
- Hedge funds (12%): $12,544
This is the story of Amertume a gold trading bot built with xLSTM (Extended Long Short-Term Memory) and PPO (Proximal Policy Optimization) that combines deep learning and reinforcement learning.
Why I Built This
I wanted to build a trading system that could pass prop firm evaluations not because I'm obsessed with trading, but because it's a perfect testbed for combining deep learning and reinforcement learning.
The constraint is simple: make 10% profit without losing more than 5% in drawdown. But the challenge is hard: 97% of traders fail.
This became my design goal: build a system that survives volatility without blowing up.
Why Most Trading Bots Fail (And Why Mine Did Too)
Before Amertume, I tried everything:
- Run 1: LSTM models with basic features (overtrading problem - 1981 trades, -$867 loss)
- Run 2: Fixed transaction costs (oscillated between 9-983 trades, unstable)
- Run 3: Better xLSTM encoder with focal loss (hold exploit - avg 41 bars, always hitting max time)
They all had the same core problems:
- Overtrading: Run 1 executed 1981 trades in training because transaction costs were invisible (0.00004 vs 0.01 log returns)
- Hold Exploit: Run 2-3 learned to hold positions for exactly 60 bars (max time limit) instead of exiting naturally
- Exit Paralysis: Run 4 became too selective (only 37 trades in 1.87 years) but still lost money because it didn't know when to close
But there was a deeper problem I discovered: 1-minute data is too noisy.
The 1-Minute → 15-Minute Pivot
My first 4 encoder training attempts used 1-minute OHLCV data. The results were terrible:
Encoder v1-v4 (1-minute data):
- Accuracy: 50.3% (coin flip)
- Problem: Model just memorized training data
- Insight: Predicting next 1-minute move is basically random noise
Why 1-minute failed:
- Gold moves $0.10-$0.50 per minute (mostly noise)
- News events cause instant spikes (unpredictable)
- Spread costs eat profits on short timeframes
- ATR(14) on 1-min = only 14 minutes of context
Encoder v5+ (15-minute data):
- Validation accuracy: 42.3%
- Test accuracy: 41.9% (8.6% edge over random 33.3%)
- 3-class classification: UP/DOWN/NEUTRAL (random baseline = 33.3%)
- ATR(14) on 15-min = 3.5 hours of context
- Filters out microstructure noise
- Captures actual momentum moves
The math:
- 1-min: 1440 bars/day → 99% noise, 1% signal
- 15-min: 96 bars/day → 70% noise, 30% signal
Switching to 15-minute was the breakthrough that made xLSTM encoder actually work.
The bot needed to understand: "Is this a breakout I should chase, or noise I should ignore?"
That's where xLSTM comes in.
What is xLSTM
xLSTM is the 2024 evolution of LSTM, created by Sepp Hochreiter (the guy who invented LSTM in 1997).
The key innovation: Instead of just remembering sequences, xLSTM has two types of memory:
-
sLSTM (scalar memory): Tracks single values over time with exponential gating
- Perfect for: price momentum, volatility regimes, trend strength
-
mLSTM (matrix memory): Stores relationships between multiple features
- Perfect for: correlations (DXY vs Gold), multi-timeframe patterns
Why xLSTM (not XGBoost, Random Forest, or Transformer)?
XGBoost & Random Forest are powerful for tabular data but struggle with temporal dependencies. Tree-based models make predictions by averaging values in leaf nodes if the test data falls outside the training range (common in financial markets), they simply return the nearest leaf's average. This "extrapolation ceiling" is fatal for trading, where regime changes and unprecedented volatility are the norm.
Transformers solve the extrapolation problem but introduce computational overhead that's prohibitive for real-time trading. Research on self-attention computational complexity shows that Transformers require quadratic memory (O(n²)) relative to sequence length due to self-attention mechanisms. For a 60-bar window with 25 features (1,500 tokens), attention matrices explode to 2.25 million parameters per layer.
Why xLSTM wins for trading:
xLSTM processes sequentially, updating its memory state bar-by-bar. It can handle infinite context without exploding memory, and it naturally captures temporal dependencies.
For financial time series, this translates to:
- Better regime detection (remembers volatility patterns from 1000+ bars ago)
- Faster inference (linear complexity vs. quadratic for Transformers)
- Natural extrapolation (unlike tree-based models, can predict beyond training ranges)
- Less overfitting (sequential processing = natural regularization)
The Architecture: xLSTM + PPO + Triple Barrier
Here's how Amertume works:
Raw OHLCV (15-min gold prices)
↓
Feature Engineering (25 features)
↓
xLSTM Encoder (frozen, pre-trained)
↓
128-dim embedding (market state)
↓
PPO Agent (trainable)
↓
Action: BUY / SELL / HOLD
Why this architecture is hard to replicate:
The magic isn't in any single component it's in how they're wired together:
- xLSTM encoder is pre-trained separately (7 training runs, 22 epochs, Focal Loss with gamma=2.0)
- Then frozen (no gradients during RL training)
- PPO learns on top of frozen embeddings (not end-to-end)
- Curriculum learning (3 stages, each with different volatility filtering)
- Triple Barrier exits (agent can't close positions manually)
Each piece alone is standard. The combination + training procedure is what makes it work.
Want to See the Full System?
This is just the beginning. The full blog post covers:
Complete Architecture Breakdown
- Feature engineering pipeline (25 features from raw OHLCV)
- xLSTM pre-training with Triple Barrier labeling
- PPO training with curriculum learning (calm → mixed → full volatility)
- Dynamic ATR Triple Barrier (2:1 RR) implementation
The 6 Failed Runs
- Run 1: Overtrading disaster (1981 trades, -$867)
- Run 2-3: Hold exploit (agent gaming the time limit)
- Run 4: Exit paralysis (48.6% entry accuracy but -$176 loss)
- Run 5: EV trap (agent refused to trade)
- Run 6: The breakthrough (6.94 Sharpe, 0.98% drawdown)
Kelly Criterion Position Sizing
- Why the $507 PnL is deliberately conservative (0.01 micro-lot stress test)
- Projections with proper position sizing:
- 1% risk: $10K → $18K (81.6% return)
- 2% risk: $10K → $30K (206% return)
- Half Kelly: $10K → $102K (924% return)
- The brutal truth about drawdowns and sleep quality
Academic Comparison
- How Amertume compares to recent papers
- Kalman-Enhanced DRL: 13.12 Sharpe (vs my 6.94)
- Why action space reduction is underappreciated
- Statistical significance analysis (294 trades, 95% CI)
Production Deployment
- Live testing on demo account
- Safety features (kill-switches, latency checks)
- What could go wrong (overfitting, regime change, execution issues)
Full References
- 20+ academic papers cited
- xLSTM, PPO, Focal Loss, Triple Barrier, Kelly Criterion
- Comparison papers on tree-based vs deep learning
Disclaimer: This is educational content about machine learning and trading system design. Trading involves substantial risk of loss. I am not a financial advisor. Do your own research and never risk money you can't afford to lose.
Top comments (0)