DEV Community

Jim L
Jim L

Posted on

How I built a LOF arbitrage monitor for HK/CN ETFs (and what I learned about 'free' alpha)

I keep seeing the same question in HK/SG investor chats: "the S&P 500 QDII ETF is trading 5% above NAV again — is this free money?"

Short answer: not really. But the idea — that on-exchange ETF prices can drift from their net asset value — is real enough that I wanted a dashboard that just told me, every 15 minutes, which Chinese LOF/QDII ETFs were trading most disconnected from the underlying. So I built one.

This is the boring-but-useful write-up: what a LOF is, why premiums happen, what the pipeline looks like, and the three things I got wrong on the first try.

What's a LOF, quickly

LOF = Listed Open-Ended Fund. It's a mutual fund wrapper that also trades on-exchange. QDII LOFs are the ones that hold offshore assets — S&P 500, Nasdaq, HK tech, gold miners, etc.

The premium/discount mechanic:

  • NAV is published once a day (T+1 for offshore QDII — you get yesterday's value tomorrow morning).
  • On-exchange price moves live during the trading day.
  • When retail piles into, say, 华夏纳斯达克 after a big US overnight rally, the price can float well above the last-known NAV. That gap is the premium.
  • In theory, the fund house can issue new units to arb it down. In practice, QDII quotas are capped, so premiums can persist for days.

So: premium ≠ free profit. It's mostly "the market is front-running tomorrow's NAV update." But unusual premiums are worth watching, because that's where forced-selling and fat-finger trades show up.

The pipeline

Stack ended up boringly simple. Four moving parts:

Eastmoney REST  ─┐
                 ├─► Python collector (every 15 min, cron)
Tencent REST  ──┘          │
                           ▼
                     SQLite (append-only)
                           │
                           ▼
                Next.js /tools/lof-premium (ISR 15min)
Enter fullscreen mode Exit fullscreen mode

No Kafka, no Redis, no Airflow. It's a 200-line Python script and a static-ish Next.js page.

Collector

The collector is two functions:

def fetch_realtime(code: str) -> dict:
    # 东财 push2 API, returns last price + bid/ask
    url = f"https://push2.eastmoney.com/api/qt/stock/get?secid={secid(code)}&fields=f43,f60,f169,f170"
    ...

def fetch_nav(code: str) -> dict:
    # 天天基金 fundgz API, returns "估值" (intra-day NAV estimate)
    url = f"https://fundgz.1234567.com.cn/js/{code}.js"
    # returns JSONP; strip the wrapper, json.loads the middle
Enter fullscreen mode Exit fullscreen mode

One gotcha that cost me an afternoon: fundgz returns HTML on weekends and holidays (a friendly "市场休市" page) instead of the usual JSONP. First version crashed on every Saturday at 09:15 until I added a content-type check.

Why not just use one source?

East Money gives you price but not intra-day NAV estimate. Tiantian gives you NAV estimate but not L2 price. So you have to join them on the fund code. Cross-check also catches the case where one API starts returning stale data, which happens more than you'd think.

Storage

Single SQLite file, one row per (code, timestamp). Append-only. ~300 funds × 26 snapshots/day × 365 days = ~3M rows/year. SQLite eats that for breakfast.

I briefly tried Postgres. Moved back to SQLite after two weeks because the entire deploy is a file copy and backups are cp lof.db lof.db.bak.

Frontend

Next.js 15, ISR with revalidate: 900. The page is essentially a table sorted by absolute premium, with a tiny sparkline of the last 48 hours per fund.

The sparkline was the part I over-engineered. First I pulled in a charting library (120KB), then I swapped it for a 40-line inline SVG component. Same visual, 3% of the bundle size.

Three things I got wrong on the first try

1. I trusted the "premium" column on 东财

The portal shows a premium column. It's computed off yesterday's official NAV, not the intra-day estimate. For a QDII holding US stocks that rallied 2% overnight, "yesterday's NAV" understates the fund by 2% before the market even opens, so the premium column is systematically inflated on up days and depressed on down days.

Using the estimated NAV instead (the one Tiantian publishes intra-day) cut the noise dramatically. The high-premium list used to be "whatever went up last night in the US." Now it's actually unusual positioning.

2. I assumed 15-minute cadence was fine

It mostly is. But around 09:30 and 14:57 (CN market open / close auction) the price moves 0.5–2% in a single minute. A 15-minute snapshot misses those.

Compromise: 15-min during the day, 1-min windows around open/close auctions. cron with two schedules.

3. I forgot time zones, twice

  • Tiantian returns Beijing time with no tz marker.
  • East Money returns Unix timestamps in ms.
  • My server runs UTC.
  • My browser renders in Sydney time.

First bug: charts were off by 8 hours. Second bug: I "fixed" it by hard-coding +8, then flew to Sydney, and everything shifted again.

Final rule: store UTC in SQLite, tag Beijing explicitly at the API boundary, format to the browser's locale in the client. Boring, but it's the only approach that survives moving.

Does the data actually give you alpha?

Honestly — mostly no. Here's what a month of logs looks like in practice:

  • 80% of the top-premium funds on any given day are just "US market gapped up overnight, retail is buying the reopen." By the time you see it, the arb is gone.
  • 15% are chronic premium funds — usually QDII with exhausted quota. You can't subscribe at NAV even if you wanted to. The premium is a structural access-fee, not mispricing.
  • Maybe 5% are genuinely odd: a small-cap sector LOF that jumped on news nobody else was tracking, or a fund where the manager announced something that moved NAV estimate but not price yet.

That 5% is the reason the dashboard exists. Not as a trading signal on its own, but as a "huh, why is this one weird?" attention filter.

What I'd do differently if I rebuilt it today

  • Push notifications instead of pull. I still refresh the page. A Telegram bot that pings me when premium > 2σ would be 10x more useful.
  • Historical NAV backfill. My DB starts from the day I deployed. If I'd backfilled 2 years from Tiantian's archive, regime comparisons ("is this premium unusual for this fund?") would actually work.
  • Skip the live sparkline. Nobody looks at it. A single "premium now vs 7-day avg" number would convey more.

Summary for the impatient

  • LOF premium = on-exchange price minus intra-day estimated NAV. Don't use the portal's published premium column; it's anchored on T-1 NAV.
  • Two APIs, join on fund code, cross-check. SQLite is enough. 15-min cadence + 1-min around auctions.
  • Most "premiums" are just timezone artifacts or quota constraints. The signal you want is the ~5% of funds that are genuinely priced weird today.
  • Store UTC, tag at the boundary, format at render. Every time.

If you end up building something similar and hit a case I didn't cover — especially around holiday calendars for A-shares vs HK vs US simultaneously — I'd love to compare notes in the comments.

Top comments (0)