DEV Community

Cover image for Can AI Remember the Market? Teaching LLMs to Detect When the Rules Change
Manas Mudbari
Manas Mudbari

Posted on

Can AI Remember the Market? Teaching LLMs to Detect When the Rules Change

TL;DR

We built a memory system for LLMs to track Bitcoin market regimes. The LLM can't predict tomorrow's price any better than a coin flip (nobody can, honestly). But it can detect major market regime changes with zero false alarms, and unlike every statistical method, it tells you why the regime changed in plain English. That explainability is the real contribution.


The Problem: AI Models Have Amnesia

Imagine you trained an AI model to predict Bitcoin prices during the 2020-2021 bull run, a period when institutional investors were piling in, central banks were printing money, and everything was going up. The model learns the rules of that world pretty well.

Then 2022 arrives. The Fed starts aggressively hiking interest rates. Crypto exchange FTX collapses. Luna implodes. The entire market enters a prolonged bear phase.

Your model, still operating on the old rules, has no idea what hit it.

This is called concept drift: when the underlying patterns that a model learned no longer reflect reality. It's one of the most underappreciated problems in applied machine learning, especially in financial markets where the "rules" can change overnight.

Traditional fixes are crude: either retrain the model on new data (expensive and reactive), or use statistical alarms that fire when something looks statistically unusual (they tell you that something changed, but never why).


Our Idea: Give the AI a Memory

Large language models (LLMs) like GPT-4 have been trained on enormous amounts of text, including financial news, market commentary, earnings reports, and macroeconomic analysis. They already "know" things like "when the Fed raises rates, risk assets tend to fall" or "Bitcoin historically rallies in the months before a halving."

What if we could structure that knowledge into a formal memory system that the LLM consults before making predictions? Instead of treating every 24-hour window as if it exists in isolation, the model would have context: what regime is the market in right now, what has happened before in similar conditions, and what did the model itself predict recently?

That's the core idea of this paper. We built four types of adaptive memory and tested them on seven years of Bitcoin data.


The Four Memory Types

Think of each memory type as a different "cheat sheet" the AI gets to read before making its prediction:

1. Regime Memory
The AI is told what "mode" the market is currently in (e.g., "Macro Bear Market") and what characteristics define that mode. Like giving a student a study guide that says: "Right now we're in a period defined by Fed tightening, exchange failures, and risk-off sentiment."

2. News Memory
Recent headlines are ranked by importance and fed to the model, along with any major events that happened during the same calendar window in previous years. Think of it as saying: "Here are the most important things happening right now, and here's what happened at this time of year historically."

3. Similarity Memory
The current market conditions (price momentum, volatility, volume) are compared against every similar-looking period in the past seven years. The top five most similar historical windows are retrieved, along with what actually happened next. Essentially: "The last five times the market looked like this, here's what followed."

4. Relative Memory
The AI is shown a log of its own recent predictions: how accurate it's been, whether it's been systematically biased toward UP or DOWN, and what its last seven predictions were. This lets it self-correct: "I've been wrong six times in a row predicting UP, maybe I should reconsider."


Two Tests

We ran the system on two tasks:

Task 1: Predict Tomorrow's Price Direction

Given the last 7 days of Bitcoin price data, predict whether the price will be higher or lower 24 hours from now.

Task 2: Detect Regime Changes

Given the current market conditions and recent news, determine whether Bitcoin has transitioned into a fundamentally new market regime.

We tested against 6 real historical regime transitions that occurred between 2017 and 2024.


What We Found

On Price Prediction: Nobody Wins

Method Accuracy
LSTM (traditional neural net) 50.8%
LLM with no memory 50.1%
LLM + Similarity Memory 51.3%
LLM + News Memory 48.6%
LLM + Regime Memory 47.1%
LLM + Relative Memory 49.0%

Every single method lands within a coin-flip range of 50%. This is actually an important and honest result, confirming that short-term Bitcoin price prediction is genuinely hard regardless of how sophisticated your model is. Nobody has cracked this, and we didn't pretend to either.

The statistical analysis confirmed that none of the differences between methods are statistically significant. In plain terms: the margin of error swallows all the differences.

On Regime Detection: The LLM Has a Unique Edge

Method Detected False Alarm Rate Can Explain Why?
CUSUM (statistical) 5/6 (83%) N/A No
LLM 3/6 (50%) 0% Yes
BinSeg (statistical) 2/6 (33%) N/A No
Bollinger Bands 1/6 (17%) N/A No

The LLM doesn't win on raw detection rate; CUSUM beats it handily. But two things stand out:

Zero false alarms. The LLM never incorrectly flagged a regime change when there wasn't one. It only raised its hand when it was genuinely confident.

It can explain itself. When CUSUM fires, it just says "something changed." When the LLM fires, it says things like: "Fed tightening beginning in Q1 2022, combined with the collapse of the Terra/Luna ecosystem in May, has fundamentally altered risk appetite. The current regime shows classic bear market characteristics: declining volume, high correlation with equities, and consistent negative news flow from exchange insolvencies."

That explanation has real practical value. A risk manager doesn't just want to know the alarm went off; they want to know why, so they can decide what to do.


Where the LLM Struggled

The most interesting failure was the Institutional Accumulation regime (April 2019 to February 2020). This was a quiet period of slow, steady accumulation by institutional players like Grayscale, with no dramatic headlines, no price explosions, and no obvious trigger.

The LLM scored 0% on detecting this transition. It relies heavily on news hooks and dramatic price movements. Slow, structural, low-noise regime changes are essentially invisible to it.

This reveals a genuine limitation: LLMs reason from narrative, and quiet regimes have no narrative.


The Bigger Picture

The paper makes a case that LLMs and traditional statistical methods are complementary, not competing:

  • Use CUSUM as a cheap, fast first-stage detector (it's great at catching that something changed)
  • Use the LLM as a second stage to interpret what changed and why

Neither alone is the full answer. Together, they cover each other's weaknesses.


What We Released

Everything is open source:

  • The full Bitcoin OHLCV dataset (2017-2024) with labeled regimes
  • 50 annotated news events
  • All model code, prompts, and raw LLM responses
  • A reproducibility checklist so anyone can replicate every number in the paper

The total API cost to run every LLM experiment in the paper was about $4.40. The entire pipeline is accessible to any individual researcher without institutional compute budgets.


The full paper is available on engrXiv. Code and data are on GitHub.

Top comments (0)