DEV Community

정상록
정상록

Posted on

Kronos: The First Open-Source Foundation Model for Financial K-Line Data

What if we could treat financial candlestick data the way GPT treats natural language? That's exactly what Kronos does.

Built by researchers from Tsinghua and Nanjing University, Kronos is the first open-source foundation model specifically designed for financial market K-line (candlestick) data. It was accepted at AAAI 2026 and has gained 14,800+ GitHub stars.

The Problem with General-Purpose Models

General-purpose time series foundation models (TSFMs) like TimeMoE or Moirai don't perform well on financial data. Financial time series have:

  • Low signal-to-noise ratio — markets are inherently noisy
  • Strong non-stationarity — statistical properties change over time
  • Higher-order dependencies — complex patterns beyond simple autocorrelation

Kronos addresses these challenges with a domain-specific approach.

Two-Stage Framework

Stage 1: Custom Tokenizer

Kronos converts continuous multi-dimensional OHLCV (Open, High, Low, Close, Volume) data into hierarchical discrete tokens using Binary Spherical Quantization (BSQ). This preserves both price dynamics and trading activity patterns.

Stage 2: Autoregressive Transformer

A decoder-only transformer (GPT-style) is pre-trained on 12 billion+ K-line records from 45 global exchanges. It covers:

  • Asset types: Stocks, futures, forex, options, cryptocurrency
  • Timeframes: 1min, 5min, 15min, 30min, 60min, daily, weekly, biweekly

Benchmark Results

The zero-shot performance improvements are significant:

Metric Improvement Compared To
Price prediction RankIC 93% TimeMoE (SOTA TSFM)
Volatility prediction MAE 9% Previous TSFM
Synthetic data fidelity 22% Previous models
Return prediction (full-shot) 60% DLinear

Quick Start

from model import Kronos, KronosTokenizer, KronosPredictor

tokenizer = KronosTokenizer.from_pretrained("NeoQuasar/Kronos-Tokenizer-base")
model = Kronos.from_pretrained("NeoQuasar/Kronos-small")
predictor = KronosPredictor(model, tokenizer, max_context=512)

pred_df = predictor.predict(
    df=x_df,
    x_timestamp=x_ts,
    y_timestamp=y_ts,
    pred_len=120
)
Enter fullscreen mode Exit fullscreen mode

Three model sizes are available on Hugging Face:

  • Kronos-mini (4.1M params, context 2048) — lightweight, CPU-friendly
  • Kronos-small (24.7M params, context 512) — general prediction
  • Kronos-base (102.3M params, context 512) — highest accuracy

All under MIT license.

Fine-Tuning

Fine-tuning scripts were released in August 2025:

# Tokenizer fine-tuning
python finetune_tokenizer.py --data_path ./data/your_market/ --epochs 50

# Model fine-tuning (multi-GPU)
torchrun --nproc_per_node=4 finetune_model.py \
    --model_name NeoQuasar/Kronos-base \
    --data_path ./data/your_market/ \
    --distributed
Enter fullscreen mode Exit fullscreen mode

Includes Qlib-based examples for China A-shares and a built-in backtesting pipeline.

Limitations to Know

  • Context length is 512 tokens for small/base models
  • Cannot predict black swan events or regime shifts
  • Kronos-large (499M params) remains private
  • It's a probabilistic tool, not a profit guarantee

Links


Disclaimer: Not financial advice. Kronos is a research and analytical tool.

Top comments (0)