I built a free Morningstar X-Ray replacement for European/US ETF investors

#opensource #python #showdev #sideprojects

tags: fintech, python, api, opensource

When Morningstar paywalled their Portfolio X-Ray tool in 2025 ($249/year), European investors got hit the hardest. Most alternatives are US-ticker-only. If you hold iShares, Vanguard, Xtrackers, or Synchrony funds with ISINs, your options were basically zero.

So I built a free replacement: nwc-advisory.com/xray

What it does

Enter your fund ISINs and amounts (no login, no account). It fetches public factsheet data, decomposes each fund into underlying holdings, and shows 6 analysis views:

Asset Allocation -- before and after look-through
Geographic Exposure -- economic exposure, not fund domicile (your IE-domiciled S&P 500 ETF shows as US, not Ireland)
Sector Breakdown -- GICS sectors across your entire portfolio
Stock Overlap -- stocks appearing in 2+ funds, with fund similarity percentages
Fee Analysis -- weighted TER vs cheapest passive alternative, with annual savings in your currency
Top Holdings -- your 30 largest underlying positions

Tech stack

Backend: Python/FastAPI
Data: 13 async factsheet fetchers (iShares CSV exports, JustETF, FT.com, Gerifonds PDF parsing, and 9 more)
Cache: 24h disk-backed JSON + in-memory LRU (500 entries). Popular funds return in under 100ms, cold starts take 15-30s
PDF: ReportLab for 6-page PDF reports (email-gated)
Frontend: Vanilla HTML/CSS/JS, no framework
Hosting: Single Ubuntu server, systemd service, nginx reverse proxy

The interesting technical bits

Name normalization was harder than expected. The same company appears as "Nestle SA", "Nestle AG", "NESTLE S.A.", and "Nestlé" across different data sources. I strip accents, remove corporate suffixes (AG, SA, Ltd, Inc, Corp, PLC, Group, Holdings), and normalize whitespace before deduplicating.

Fund-pair overlap uses a Jaccard-like calculation on the intersection of underlying holdings, weighted by position size. This tells you "your MSCI World and S&P 500 ETFs are 78% identical" -- meaning you're paying two sets of fees for nearly the same exposure.

ISIN-based routing determines which fetchers to try first. CH-prefix ISINs go to Swiss sources (Gerifonds, SwissFundData), IE/LU go to JustETF and iShares, and everything falls back through 9 legacy fetchers.

What I couldn't replicate

Morningstar's style box (3x3 value/growth/blend grid) -- proprietary classification
P/E and P/B ratios -- would need a real-time data feed
Performance tracking -- out of scope for a free tool

Also: Property Comps API

Separately, I built an API serving 4.2M+ government-recorded property transactions across 11 markets (UK, France, Dubai, Singapore, NYC, and 6 more). If you're building anything in proptech:

Swagger docs: api.nwc-advisory.com/docs
Free tier available

Happy to answer questions about the architecture or data sources.

Top comments (2)

New Way Capital Advisory • Mar 23

If you want to try it, here are some ISINs to get started:

IE00B5BMR087 (iShares Core S&P 500)
IE00B4L5Y983 (iShares Core MSCI World)
CH0237935637 (iShares Swiss Dividend ETF)

Paste them at nwc-advisory.com/xray with any amounts. Takes about 15 seconds for the first analysis (cached after that).

Happy to answer any questions about the tech or the data sources.

Apex Stack • Mar 24

The name normalization problem is one of the most underrated challenges in financial data pipelines. We hit the same wall building a stock/ETF platform that pulls from yfinance and Finnhub — the same company shows up as "Shopify Inc", "Shopify Inc.", "SHOPIFY INC", and "Shopify" depending on the source and the day. Your strip-accents + suffix normalization approach is cleaner than what we ended up with (a manual synonym table for the top 500 tickers, which scales terribly).

The European underserved angle is interesting — our GSC data for stockvs.com shows Dutch pages (NL locale) consistently outperforming English for certain tickers, which suggests European investors are genuinely underserved by English-language financial data tools. Building ISIN-first rather than ticker-first is the right call for that audience.

The 15-30s cold start vs 100ms cached tradeoff is the exact tension in financial data APIs. Our yfinance batch pipeline has the same shape: first-hit is slow because it is pulling live data, everything after is cached for the session. One thing we found useful: a background pre-warm job that refreshes the cache for the top 500 most-requested tickers before market open, so the first users of the day never see the cold-start latency.

The fund overlap Jaccard calculation is a useful feature — the "you are paying two expense ratios for 78% the same exposure" framing is exactly the kind of insight that makes a financial tool feel worth trusting.