By Ada Corujo, Risk Analyst at Apex Hedge Fund
There's a lot of noise about AI and crypto. Most of it is marketing. This is an attempt to describe what machine learning actually does — and doesn't do — in a serious digital asset management context.
The problem ML is solving
Traditional portfolio management models were built for assets that trade on regulated exchanges with predictable liquidity, clear fundamentals, and relatively stable correlation structures. Digital assets break most of those assumptions.
Crypto markets trade 24/7. Liquidity can collapse in minutes. Correlations between assets are unstable — they're low during normal conditions and spike toward 1.0 during stress events, which is precisely when you need diversification to work. On-chain data gives you signals that have no equivalent in traditional markets. None of this fits neatly into a model built for equities.
ML doesn't fix these problems. But it handles certain parts of them better than classical approaches.
What we actually use it for
Signal generation from on-chain data
On-chain metrics — wallet activity, exchange inflows and outflows, miner behavior, network transaction volume — contain information about supply and demand dynamics that price alone doesn't capture. The challenge is that these signals are noisy, non-linear, and interact with each other in ways that are hard to model explicitly.
Gradient boosted trees and random forests handle this reasonably well. You're not predicting price — you're generating a probability distribution over near-term conditions that informs position sizing and hedging decisions. The model doesn't tell you what to buy. It tells you something about the risk environment you're operating in.
# Simplified example of feature set for on-chain signal model
features = [
'exchange_netflow_btc_7d', # Exchange inflow minus outflow
'active_addresses_30d_change', # Network activity trend
'miner_reserve_change_14d', # Miner selling pressure
'stablecoin_supply_ratio', # Dry powder on sidelines
'funding_rate_perpetuals', # Sentiment in derivatives market
'realized_volatility_7d', # Short-term vol regime
]
These features don't have clean linear relationships with outcomes. Tree-based models tolerate that. Linear regression doesn't.
Dynamic correlation estimation
Standard portfolio theory uses historical correlation matrices. The problem is that crypto correlations are regime-dependent — they look one way during trending markets and completely different during liquidation cascades.
We use a rolling Hidden Markov Model to identify regime states and estimate separate correlation matrices per regime. When the model detects a shift toward a stress regime, the portfolio construction logic adjusts accordingly — reducing concentration, increasing hedge ratios, tightening drawdown controls.
# Regime detection — simplified
from hmmlearn import hmm
model = hmm.GaussianHMM(
n_components=3, # Low vol / trending / stress
covariance_type="full",
n_iter=100
)
model.fit(returns_matrix)
regimes = model.predict(returns_matrix)
This isn't predicting the future. It's describing the current environment more accurately than a static model does.
Anomaly detection for risk monitoring
Position-level risk monitoring runs a one-class SVM against a baseline of normal market microstructure. When intraday behavior deviates significantly from baseline — unusual bid-ask spread widening, abnormal order book depth changes, correlated liquidation signals across multiple assets — it flags for manual review before the situation becomes a loss event.
from sklearn.svm import OneClassSVM
detector = OneClassSVM(kernel='rbf', nu=0.05)
detector.fit(normal_conditions_matrix)
# Returns -1 for anomalies, 1 for normal
anomaly_flags = detector.predict(current_conditions)
False positive rate matters a lot here. Too many false flags and the system gets ignored. We tune nu conservatively and validate against historical stress events.
What ML doesn't do
It doesn't predict price. Anyone telling you their ML model predicts crypto prices with meaningful accuracy is either confused about what their model is doing or lying.
It doesn't replace judgment. The models generate inputs to decisions — they don't make decisions. When a regime shift flag fires, a human looks at it. When anomaly detection triggers, a human reviews the position. The model narrows the decision space. It doesn't close it.
It doesn't handle black swan events well. A model trained on historical data has no meaningful representation of events outside that distribution. March 2020, the FTX collapse, the Luna implosion — these were outside the training distribution of any model that hadn't seen them. The response to those events was risk framework, not ML output.
The honest summary
ML in portfolio management is useful for specific, well-defined tasks where the signal-to-noise ratio is high enough and the training data is sufficient. On-chain signal extraction, regime detection, and anomaly monitoring fit that description reasonably well.
It's not a competitive advantage by itself. A gradient boosted tree trained on the same public on-chain data everyone has access to isn't going to generate persistent alpha. The edge, if there is one, comes from the combination of the model, the features you engineer, the risk framework it feeds into, and the execution discipline around it.
Anyone who tells you otherwise is selling something.
Ada Corujo is a Risk Analyst at Apex Hedge Fund, a SEC-regulated digital asset manager. apexhedgefund.com
Top comments (0)