TL;DR
- Korea's equity market is having a moment, and TOSS Securities recently opened an Open API — a rare, developer-friendly on-ramp for retail quants.
- Microsoft's Qlib is the best open-source "AI research + backtest" quant platform, but it does not officially support the Korean market.
- So I built a small Node.js/TypeScript + Redis middleware that pulls quotes from the TOSS Open API, normalizes them into Qlib's CSV convention, and feeds
dump_bin.py. - I also wrote a Korean-language "Qlib Getting Started" guide for Korean developers, including a full KRX data-integration section.
- Next up: a Korea-specialized middleware that also ingests secondary data (corporate disclosures / DART filings, etc.) and is reusable across trading bots — not just Qlib.
Why now? The Korean market opportunity
Korea's stock market has been unusually active lately, and for developers the timing is interesting for one specific reason: TOSS Securities opened an Open API. Historically, Korean retail brokerage automation meant wrestling with legacy Windows-only OCX/COM bridges. A clean, OAuth2-based HTTP API changes the game — it means you can build data pipelines and trading tooling on any stack, on any OS.
Meanwhile, the best open-source quant research stack — Microsoft Qlib — has no first-class Korea support. Its region setting only covers CN / US / TW. That gap is exactly where a middleware belongs.
Qlib is a calculator, not an oracle. No framework saves you from bad data or sloppy methodology. But if the data plumbing is clean, the research loop gets a lot faster.
What is Qlib, quickly
Qlib is Microsoft Research's AI-oriented quantitative investment platform (open-sourced 2020, ~40k+ GitHub stars). It covers the full ML pipeline — data → factor computation → model training → backtest → reporting — in one framework.
A few things that make it stand out:
- All-in-one pipeline. No more gluing zipline (backtest) + backtrader (execution) + a separate factor library.
- Purpose-built data infra. A binary storage format plus a two-tier cache (ExpressionCache + DatasetCache). In Microsoft's own benchmark (800 symbols × 14 factors, 2007–2020 daily, 1 CPU), the fully-cached path runs in 7.4s vs. 365s for MySQL — roughly 49× faster.
-
Expression-based factor engine. Define a factor as a string like
Ref($close, 1)/$close - 1and the engine handles vectorization + caching for you. - A reproducible Model Zoo. 25+ SOTA models (LightGBM, GRU, ALSTM, Transformer, TRA, TFT…) on the same Alpha158 / Alpha360 datasets, comparable under identical backtest conditions.
- Non-stationarity tooling. Rolling retraining and DDG-DA (meta-learning for concept drift) ship as benchmarks — a Qlib-specific strength.
The one thing Qlib deliberately leaves out: live broker order execution. That's out of scope by design — which matters for how I scoped the middleware below.
The Korean-developer gap: a KR "Getting Started" guide
Since Qlib's docs and community are largely CN/EN-centric, I wrote a Korean-language getting-started guide aimed at Python developers standing up a quant/ML backtest environment for the first time.
It covers:
- Project overview, core strengths, and an honest comparison vs. zipline / backtrader / vectorbt / QuantConnect.
- Install paths (pip / source / Docker), including the Apple Silicon
brew install libompgotcha for LightGBM. - The 2026 data reality: the official download script is paused; the guide points to the community
investment_datadataset instead. - First workflow with
qrun, a code-based custom workflow, and the expression engine. - Benchmarks (Alpha158 vs Alpha360, DDG-DA dynamic adaptation).
- A full "Korean developer" section: wiring KRX data into Qlib.
- Pitfalls — install, data quality, and methodology (look-ahead bias, transaction cost, overfitting, "IC 0.05 is a starting point, not a good number").
Guide (Korean): Qlib-getting-started-KR.md
Connecting KRX data to Qlib
Qlib doesn't officially support Korea, but its dump_bin.py only needs CSV. So the recipe is: collect OHLCV → write CSV in Qlib's convention → convert to Qlib binary.
# pip install pykrx
from pykrx import stock
import pandas as pd, os
os.makedirs("csv_kr", exist_ok=True)
tickers = stock.get_market_ticker_list(market="KOSPI")
for t in tickers[:50]:
df = stock.get_market_ohlcv("20180101", "20260630", t)
df = df.reset_index().rename(columns={
"날짜": "date", "시가": "open", "고가": "high",
"저가": "low", "종가": "close", "거래량": "volume",
})
df["symbol"] = t
df["factor"] = 1.0 # Qlib adjust-price factor; 1.0 if unadjusted
df.to_csv(f"csv_kr/{t}.csv", index=False)
python scripts/dump_bin.py dump_all \
--csv_path ./csv_kr \
--qlib_dir ~/.qlib/qlib_data/kr_data \
--include_fields open,close,high,low,volume,factor \
--date_field_name date --symbol_field_name symbol
Korea-specific checklist (this is where naive ports break):
| Item | Why it matters |
|---|---|
Adjusted price (factor) |
Splits/dividends must be reflected or returns get distorted. |
| Trading rules |
REG_CN applies China's ±10% limit and T+1 — Korea is ±30% and T+0. Customize the executor. |
| Delisting / halts | Survivorship bias: ideally include delisted names. |
| Calendar | Verify the Korean trading-holiday calendar is generated. |
| Alpha158 factors | Factor definitions are market-neutral, but re-validate on Korean data — a CSI300 IC doesn't guarantee a KOSPI IC. |
The middleware: TOSS Open API → Qlib
Rather than cram OAuth2, token expiry, rate limits, and pagination into the same Python codebase that does factor research, I split concerns. A separate middleware drops normalized CSVs; Qlib just consumes them.
TOSS Open API --OAuth2--> [Node.js/TS middleware] --CSV(csv_kr/*.csv)--> scripts/dump_bin.py --> ~/.qlib/qlib_data/kr_data
|
Redis (token cache + market data cache)
Middleware (English README): toss-qlib-middleware/README_EN.md
Authentication (confirmed spec)
| Item | Detail |
|---|---|
| Flow | OAuth2 Client Credentials Grant (no user login step) |
| Token issuance |
POST {TOSS_BASE_URL}/oauth2/token with grant_type / client_id / client_secret as a form-urlencoded body (not Basic Auth) |
| Lifetime | 86,400s (24h), no refresh token — you must re-issue with the client secret before expiry |
| Call header | Authorization: Bearer {access_token} |
| Account/order APIs | need an extra X-Tossinvest-Account header (not called here) |
I verified the endpoint by actually hitting POST /oauth2/token: even with wrong credentials it returns a real {"error":"invalid_client", ...}, confirming the path and request shape. (As of mid-2026 the service is still in a pre-registration phase, so candle/price field schemas are held defensively.)
Redis caching strategy
| Cached item | Key | TTL | Reason |
|---|---|---|---|
| Access token | toss:access_token |
86400 − safety margin |
No refresh token → re-issue well before expiry |
| Token refresh lock | toss:access_token:lock |
10s (SET NX) |
Stops a thundering herd of simultaneous re-issues |
| Finalized past candles | toss:candles:{symbol}:{interval}:{start}:{end} |
1 day | Closed candles never change |
| Today's candles | same key | 30s | Values keep updating intraday |
| Current price | toss:price:{symbol} |
5s | Fresh, per-symbol so batches reuse hits |
On 401 the cache is invalidated and the request retried once; on 429 it backs off using the Retry-After header. The candles endpoint returns at most 200 rows and has no start/end filter, so the middleware paginates backward with a before cursor, then returns the merged result sorted ascending.
API surface
| Method | Path | Description |
|---|---|---|
| GET | /health |
Health check |
| GET | /api/candles/:symbol?start=&end=&interval=day |
Normalized candle JSON (Redis-cached, before pagination) |
| GET | /api/prices?symbols=005930,000660 |
Batch current-price lookup (chunked at 200) |
| POST |
/api/export/qlib {symbols, start, end, outDir?}
|
Fetch symbols → write csv_kr/{symbol}.csv
|
Or skip the server and export CSV straight from the CLI:
npm run export:qlib -- --symbols 005930,000660 --start 2020-01-01 --end 2026-07-01
Quick start
cd TechDoc/Quant_Qlib/toss-qlib-middleware
npm install
npm run setup # interactively creates .env, optionally test-issues a real token
npm run typecheck
npm test # passes WITHOUT Redis (in-memory adapter validates the logic)
npm run dev # http://localhost:4000, requires a real Redis instance
Why trading (order execution) is intentionally out of scope
Auth and market-data retrieval are common needs that look nearly identical for everyone — perfect for shared middleware. Order logic (state tracking, dedupe-on-retry, risk limits, fill confirmation) varies completely by strategy and risk tolerance, so shipping it generically would be irresponsible. Qlib itself leaves live execution out too, and a backtest never guarantees live performance. The middleware exposes TossAuthService + TossApiClient as clean extension points if you want to add orders yourself — but test with tiny/paper trades first.
What's next: secondary data + bot-agnostic
The current middleware stops at price/candle data. The roadmap is a Korea-specialized middleware that also ingests secondary data — corporate disclosures and filings (e.g. DART), so strategies can react to events, not just prices.
And crucially: this middleware won't be Qlib-only. The normalization layer is generic enough that the same authenticated, cached, rate-limit-aware data feed can back any trading bot — Qlib is just the first consumer. Think of it as a reusable Korea-market data plane: one integration, many downstream engines.
If you're building anything on the Korean market with Python or TypeScript, I'd love feedback on which secondary datasets matter most to you.
Links
- Microsoft Qlib — https://github.com/microsoft/qlib
- Korean "Getting Started" guide — https://github.com/gameworkerkim/vibe-investing/blob/main/TechDoc/Quant_Qlib/Qlib-getting-started-KR.md
- Middleware (English README) — https://github.com/gameworkerkim/vibe-investing/blob/main/TechDoc/Quant_Qlib/toss-qlib-middleware/README_EN.md
- TOSS Securities Open API docs — https://developers.tossinvest.com/docs
Not investment advice. This is data-pipeline tooling; investment decisions and their consequences are your own.
Top comments (0)