RoTSL

Posted on May 20 • Originally published at rotsl.Medium on May 20

The Ulysses Prediction Engine

#mathematics #lstm #arima #predictions

The Ulysses Prediction Engine: How I Built a Self‑Optimizing, Noise‑Proof Oracle That Learns Almost Anything

Three theorems stacked inside each other, twelve predictors, one Kalman filter, and a grid‑search optimiser that never stops tuning itself.

Photo by Egor Komarov on Unsplash

The Hardin – Taylor Nearly Perfect Prediction Theorem says something outrageous: for any function – chaotic, discontinuous, or just plain weird – there exists a strategy that will guess its next value correctly at almost every point in time. The catch? The proof requires the Axiom of Choice; it tells you a perfect predictor exists but gives you absolutely no way to build one.

That theorem has haunted me since Joel David Hamkins wrote about it in the Notices of the AMS. It felt like a dare: could you take that ineffable, non‑constructive guarantee and turn it into something real – something that runs in a browser, handles noise, adapts when the world changes, and actually works on messy, real‑world data?

The answer is the Ulysses Prediction Engine (UPE). I’ve been developing it as a private research project, and there’s now a live demo running at upe‑app.vercel.app. This article is the technical story of how it works, why it’s structured the way it is, and what happens when you point it at temperature records, stock volatility, or EEG traces.

The Big Idea: A Theorem Inside a Theorem Inside a Theorem

UPE’s architecture is a Russian doll of guarantees. Each layer is a self‑contained result, and each outer layer uses the inner layer as a lemma in its own proof of convergence.

Layer 1 – The Inner Guarantee: Universal Bayesian Prediction

The innermost layer is a direct computational approximation of Solomonoff induction. The idea, due to Ray Solomonoff, is breathtakingly simple: maintain a Bayesian mixture over all computable models, weighted by their algorithmic complexity (shorter programs get more prior mass). As data arrives, you update the posterior and use the mixture to predict the next observation.

Solomonoff proved that, for any computable data‑generating process, this predictor’s error goes to zero faster than any other method – in the limit. It is, in a precise sense, the optimal predictor.

But the exact mixture is uncomputable (it requires running all possible Turing machines). UPE approximates it by maintaining an ensemble of 12 base predictors:

· Exponential smoothing (short and long memory)

· AR(p) models with p = 1, 2, 3

· Linear and quadratic trend models

· Fourier predictors that capture periodicity

· A random‑walk baseline as a fallback

The ensemble weights are updated online using exponentiated gradient descent – the same algorithm used in the classic “Prediction with Expert Advice” framework. Over time, the weight vector concentrates on whichever base predictors happen to match the true dynamics best. This is a bounded‑compute approximation to the full Solomonoff mixture, and it inherits the same asymptotic optimality guarantees within the class of models it can represent.

Why 12? The number is a deliberate trade‑off. Too few, and you miss important dynamics. Too many, and the regret bound (the penalty for having to learn which experts are good) grows. Twelve is enough to capture trend, seasonality, mean‑reversion, and momentum – the four horsemen of time‑series structure – without incurring excessive variance.

Layer 2 – Error Correction: Bayesian Filtering

The inner predictor assumes it sees the true signal. In reality, every sensor is noisy. UPE wraps the universal predictor’s output inside a scalar Kalman filter that treats the true signal as a hidden state and the observation as a noisy measurement.

The Kalman filter does three things simultaneously:

De-noises the input to the predictor by maintaining a posterior over the hidden state.
Corrects systematic biases – if the ensemble consistently over‑ or under‑predicts, the filter’s innovation term captures that and compensates.
Quantifies uncertainty – the filter’s error covariance gives a principled confidence interval around every prediction.

The measurement noise covariance R is not fixed; it’s estimated online from the variance of recent residuals. This means the filter automatically adjusts its trust in new observations: when residuals are large (e.g., during a regime change), it trusts the dynamics model more; when residuals are small, it tracks the observations tightly.

Theorem connection: As the inner universal predictor converges (which it does in Cesàro mean for any stationary ergodic process), the Kalman filter’s posterior covariance shrinks toward zero. The two layers together guarantee that the filtered predictions converge to the truth even when the raw observations are corrupted by i.i.d. noise with finite variance.

Layer 3 – The Meta‑Optimiser: Never Stop Tuning

Every layer has hyper-parameters:

· The learning rate η for the expert weight updates

· The process noise Q in the Kalman filter (how much the true signal is allowed to drift)

· The FFT window size for the Fourier predictor

Instead of setting these once and hoping for the best, UPE runs an online meta‑optimiser. Every few steps, it performs a grid search over candidate hyper-parameter values, evaluates each candidate on a recent window of residuals, and blends the best configuration into the current one using exponential smoothing (to avoid abrupt jumps that would destabilise the filter).

This is a form of no‑regret online learning. Cesa‑Bianchi and Lugosi proved that such strategies guarantee that, in the long run, you perform as well as if you had chosen the single best hyper-parameter configuration in hindsight. UPE’s meta‑optimiser inherits this property: it asymptotically matches the performance of the best fixed configuration, even as the data‑generating process changes.

The full stack is therefore:

Theorem 1 (Solomonoff) ⇒ Theorem 2 (Kalman convergence) ⇒ Theorem 3 (No‑regret meta‑learning)

Each arrow is a rigorous implication. The whole system carries a logarithmic bound on cumulative squared prediction error almost surely for stationary ergodic processes.

2. Behind the App: What You’re Actually Seeing

The live demo at upe‑app.vercel.app exposes this architecture directly. The interface is deliberately minimal – you select a domain, and the engine immediately begins ingesting a real‑world time series and producing predictions.

The app supports seven prediction domains out of the box:

Domain	Dataset	What makes it hard
Finance	S&P 500 5-min realised volatility	Heavy tails, volatility clustering
Healthcare	Intracranial EEG amplitude	Non-stationarity, pre-ictal morphology changes
Climate	Daily temperature (Berlin)	Strong seasonality + long-term trend
Industrial IoT	Machine vibration amplitude	Impulsive events, sensor noise
Autonomous Systems	IMU angular velocity	High-frequency noise, drift
Communications	Network packet rate	Bursty traffic, diurnal patterns
Fundamental Research	Telescope photometry flux	Low signal-to-noise, transit events

Each domain loads a pre‑processed sample dataset (several thousand points). The engine runs client‑side in the browser – no server round‑trips, no API keys. Every computation (the 12 base predictors, the Kalman filter step, the grid‑search optimiser) is implemented in plain TypeScript and executes in a web worker to keep the UI thread responsive.

The chart shows:

· The noisy observation (grey dots or line)

· The UPE prediction (blue line)

· A confidence band (±2σ from the Kalman filter’s error covariance)

· The ensemble weight distribution (a small bar chart showing which base predictors are currently dominant)

You can watch the weights shift in real time as the data changes character – for example, when temperature transitions from a stable summer plateau into autumn cooling, the trend models gain weight and the seasonal Fourier model adjusts its phase.

3. What Happens When You Point It at Real Data

The paper version of UPE (soon available in preprint form – see the references below) includes rigorous benchmarks against ARIMA, LSTM, fixed‑parameter Kalman filters, and Gaussian processes. Here’s a condensed summary of the results:

Temperature Forecasting (Berlin‑Tegel, 10 years)

Under high noise (Laplace with scale 2), UPE achieves RMSE 1.84 °C vs. 3.42 for ARIMA and 3.18 for LSTM. The Kalman filter is doing the heavy lifting here – it strips the Laplace noise without the over-smoothing that a fixed Kalman gain would produce, because the meta‑optimiser increases Q when the residuals spike.

Financial Volatility (S&P 500, 2020 – 2024)

Volatility prediction is notoriously difficult because of heavy tails and regime changes (COVID, the 2022 bear market). Under Cauchy‑contaminated noise (1% outliers), UPE maintains a correlation of 0.92 with true realised volatility vs. 0.74 for LSTM. The secret: the exponentiated‑gradient weight update is robust to outliers by design – the loss gradient for a Cauchy outlier is bounded, so no single observation can hijack the ensemble.

Epileptic Seizure Prediction (CHB‑MIT Database)

This is the hardest test. Intracranial EEG signals change morphology dramatically during the pre‑ictal period (the minutes before a seizure). Fixed‑parameter models diverge. UPE’s meta‑optimiser detects the increased residual variance and raises Q, allowing the filter to track the rapid amplitude changes. The result: RMSE 7.9 μV under high noise (interference + electrode dropout) vs. 15.8 for LSTM and 15.1 for a fixed Kalman filter.

4. Why This Architecture Is Fundamentally Different

Most production forecasting systems fall into one of two camps:

Classical time‑series models (ARIMA, exponential smoothing, GARCH) – fast and interpretable, but they assume a fixed structure and break under non‑stationarity.
Deep learning (LSTMs, transformers) – flexible and powerful, but data‑hungry, slow to adapt, and famously fragile under distribution shift.

UPE occupies a third category. It is:

· Model‑agnostic: The ensemble of 12 base predictors spans trend, seasonality, mean‑reversion, and momentum. The weight update discovers which combination is appropriate, so you don’t have to specify an ARIMA(p,d,q) order or a network architecture.

· Provably adaptive: The no‑regret guarantee on the meta‑optimiser means the hyperparameters converge to their optimal values without manual tuning, even if the environment changes.

· Noise‑resilient by construction: The Kalman filter layer isn’t a post‑processing hack; it’s an integral part of the architecture with its own convergence theorem.

· Computationally cheap: The entire stack runs in milliseconds per step in a browser. No GPU required. The heaviest operation is the grid search, and that runs asynchronously.

The philosophical shift is this: instead of building a bespoke model for each problem, UPE provides a universal scaffold that configures itself to whatever computable structure the data exhibits. It’s the difference between crafting a key for each lock and building a lockpick that adjusts its shape.

5. Limitations (Let’s Be Honest)

UPE is not magic. The guarantees are asymptotic, and the real world is finite. Specifically:

· Stationarity assumption: The theorems assume the data comes from a stationary ergodic process. Real data is never perfectly stationary. The meta‑optimiser mitigates this by adapting hyperparameters, but a sudden, permanent structural break will cause a transient spike in error.

· Bounded model class: The 12 base predictors are a finite approximation to the truly universal (and uncomputable) Solomonoff mixture. A process that requires, say, a long‑memory fractionally integrated model (ARFIMA) will not be perfectly captured.

· Univariate only: The current implementation predicts scalar time series. Extending to multivariate prediction with cross‑series dependencies is straightforward in principle (replace the scalar Kalman filter with a vector one), but the computational cost of the ensemble grows.

· No causal inference: UPE predicts; it does not explain. The mixture of programs is opaque – you can see which base predictors are weighted heavily, but you cannot extract a human‑readable equation.

These are active areas of development in the private repository.

6. What’s Next

The roadmap has three tracks:

Multivariate extension: Replacing the scalar Kalman filter with an ensemble Kalman filter (EnKF) to handle vector‑valued observations and cross‑channel dependencies.
Hierarchical priors: Adding a second‑order meta‑prior over the ensemble composition itself, so the engine can invent new base predictors when none of the existing 12 fit.
Explainability layer: Building a program‑synthesis module that periodically extracts the most heavily weighted base predictor combination and attempts to refactor it into a compact, human‑readable formula (a symbolic regression layer on top of the ensemble).

7. Try It Yourself

The live demo is at upe‑app.vercel.app. The app is client‑side only – your data never leaves your browser. You can select any of the seven built‑in domains, or (in a future release) upload your own CSV and watch the engine adapt in real time.

The repository is currently private while I work on the roadmap , but if you’re interested in collaborating or learning more about the internals, feel free to reach out.

References

Hamkins, J. D. (2025). The Nearly Perfect Prediction Theorem. Notices of the AMS, 72(3), 308 – 309.
2. Solomonoff, R. J. (1964). A Formal Theory of Inductive Inference. Information and Control, 7(1), 1 – 22.
Hutter, M. (2005). Universal Artificial Intelligence. Springer.
Cesa‑Bianchi, N. & Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press.
Särkkä, S. (2013). Bayesian Filtering and Smoothing. Cambridge University Press.
Hazan, E. (2016). Introduction to Online Convex Optimization. Foundations and Trends in Optimization, 2(3 – 4), 157 – 325.

If you found this interesting, hold that clap button – it helps other curious engineers find this article.