DEV Community

Cover image for Building QuantTrade AI: Where Wall Street Meets Machine LearningπŸ“ˆ
Yash Joshi
Yash Joshi

Posted on • Originally published at yashjoshi.cv

Building QuantTrade AI: Where Wall Street Meets Machine LearningπŸ“ˆ

You're watching NVIDIA ($NVDA) trading at $184.86 on the screen, Stocktitan. something just doesn't look right here. The technical indicators are looking bullish on this stock - the volume is up, the technical indicators are all green. but then you go on Twitter real quick and look up the latest headlines:

"Trump Administration Greenlights H200 Chip Sales to China with 25% Revenue Share"
"Democrats Warn Nvidia Sales Could Boost China's Military Edge"
"China May Reject H200 Imports Despite U.S. Approval"
Enter fullscreen mode Exit fullscreen mode

So... is this good news or bad news for NVIDIA? You could spend 3+ hours reading 10-K filling, cross reference with different sources, analyze and dig though performance analysis.

Or you could just ask an QuantTrade AI that's already done the work.

Meet QuantTrade AI: Your Trading Research Copilot

AI Memes for Trading

Imagine typing into a chat interface:

You: What's the NVIDIA export control situation? How does it affect their revenue?

QuantTrade AI Engine instantly:

  • Fetches NVIDIA's latest SEC filings (10-K, 10-Q, 8-K forms)
  • Chunks them into searchable segments
  • Retrieves the exact sections mentioning export controls and China revenue exposure
  • Cross-references with real-time news sentiment from Alpha Vantage
  • Generates a cited, evidence-backed answer like:

QuantTradeAI Copilot: Based on NVIDIA's Q2 FY2026 10-Q filing (Section: Geographic Revenue Breakdown), China represented 14% of total Data Center revenue in Q1 2026. The January 15, 2026 BIS rule allows H200 exports under a 50% cap relative to U.S. sales, which analysts estimate at ~1M chips. However, news from Reuters (Jan 16) indicates Chinese regulators may block imports. Risk score: Medium-High (35% from regulatory uncertainty, factored via SHAP model).

All while you're watching the TradingView chart update in real-time. No context-switching. No information overload.

🎯 The Secret Sauce: RAG + Explainable AI

RAG model Secret Sauce

Here's where it gets interesting. Most AI trading tools are black boxes, they spit out predictions with zero transparency. We flipped that model:
Multi-Layered RAG Pipeline:

  • Ingest SEC filings and chunk them into 1000-token segments
  • Generate embeddings using sentence-transformers (384-dimensional vectors)
  • Store vectors in pgvector (PostgreSQL extension) for lightning-fast retrieval
  • When you ask a question, we perform semantic search to find the most relevant filing sections
  • Feed those documents to LLM model - Opus 4.5 and Llama 3.3 with conversation context
  • Get back answers that cite specific regulatory documents

Explainable Risk Scoring:
Instead of just saying "this stock is risky," we use SHAP (SHapley Additive exPlanations) to show why:

  • 40% weighted by volatility (NVDA's 30-day realized vol: 28.4%)
  • 30% by maximum drawdown (recent 15% correction from $220 β†’ $185)
  • 20% by beta (NVDA beta: 1.67, higher than S&P 500)
  • 10% by momentum indicators (RSI: 54, MACD neutral)

> Risk Score for NVDA: 6.8/10 (Medium-High)

NVDA Stock price predicition

Transparency matters when real money is on the line, that's why Every prediction comes with a breakdown showing which factors contributed and by how much.

πŸ› οΈ Under the Hood: The Stack

Tech Stack

  • Backend: Python, FastAPI, SQLAlchemy, PostgreSQL + pgvector, Celery + Redis
  • Frontend: React/Next.js, TypeScript, TradingView Lightweight Charts
  • AI/ML: Llama/Anthropic API, LangChain, Chroma/Pinecone
  • Data: yfinance, Alpha Vantage, SEC EDGAR API, googlefinance, NYSE
  • DevOps: Docker, GitHub Actions
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend  β”‚  React/Next.js with TradingView-style charts
β”‚  (Web App)  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   API Gateway/Backend   β”‚  FastAPI REST API & CI/CD Pipeline
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
   β”Œβ”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚                                   β”‚
β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Data      β”‚              β”‚   RAG/Copilot    β”‚
β”‚  Services   β”‚              β”‚    Service       β”‚
β”‚             β”‚              β”‚                  β”‚
β”‚ - Market    β”‚              β”‚ - Embeddings     β”‚
β”‚ ─ Data      β”‚              β”‚ - Vector Store   β”‚
β”‚ - News      β”‚              β”‚ - LLM Agent      β”‚
β”‚ - Filings   β”‚              β”‚ - Tool Calling   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Storage Layer         β”‚
β”‚                         β”‚
β”‚ - PostgreSQL (OHLCV)    β”‚
β”‚ - Vector DB (RAG)       β”‚
β”‚ - Object Storage        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

.

πŸ’‘ The Build Process: Solving the Concurrency Puzzle

The Challenge:

How do you pull real-time stock data, fetch news sentiment, scrape SEC filings, generate embeddings, and run vector searchesβ€”all while keeping the UI responsive and sub-second query times?

The Naive Approach (That Fails):
Blocking API calls. The user clicks a stock, the backend freezes for 3-5 seconds while it fetches everything sequentially, and the TradingView chart just... sits there spinning.

The Solution:
Asynchronous task orchestration with Celery + Redis:

  • Celery workers handle heavy lifting in the background (fetching news, processing filings, generating embeddings)
  • Lazy-loading of ML modelsβ€”embeddings model only initializes when needed(applies Rate-Limiting), not on every API call
  • Aggressive caching of computed indicators (SMA, RSI, MACD, Bollinger Bands)
  • pgvector's native cosine distance operations for efficient nearest-neighbor search
  • Frontend streams data progressively, charts load immediately, copilot results stream in as documents are retrieved

Result? from "Trust me bro" AI to verifiable, mathematical transparency and chart renders in ~500ms, while the AI copilot fetches context in parallel. No blocking, no lag.

Real-time news feed and homepage
QuantTrade AI analyzing NVIDIA ($NVDA) in real-time
Real-time Finance Research

Check out the repo: GitHub: QuantTrade-AI
Subscribe for deep dives: I'll be breaking down the architecture, sharing repo, and documenting challenges as we scale this thing.

Next issue: "How We Built a Vector Database for SEC Filings Using PostgreSQL + pgvector"

Top comments (0)