DEV Community

Robins163
Robins163

Posted on • Originally published at unknowntoplay.hashnode.dev

System Design: Real-Time Stock Trading Platform — From My Experience Building Mstock

TL;DR: I helped build Mstock, a stock trading platform handling 100K+ concurrent users during market hours. This post walks through the full system design — from WebSocket connections for live prices, to order matching, to the caching and database layers — with architecture diagrams, sequence flows, and the hard lessons we learned in production.

The Problem

Designing a stock trading platform isn't like building a typical web app. The constraints are brutal:

  • Latency: Order placement must complete in under 100ms. Users lose real money on delays.
  • Concurrency: 100K+ users online simultaneously during the 6-hour trading window (9:15 AM - 3:30 PM IST).
  • Data velocity: Stock prices update every second across 5,000+ symbols. That's 5,000 messages/second to broadcast.
  • Accuracy: A rounding error in price or quantity isn't a bug — it's a financial incident.
  • Availability: Downtime during market hours = regulatory scrutiny + angry traders + lost revenue.

I worked on Mstock as part of the frontend architecture team, but I had deep exposure to the full stack. Here's how a system like this is designed from the ground up.

High-Level Architecture

  STOCK TRADING PLATFORM — HIGH LEVEL ARCHITECTURE
  ═══════════════════════════════════════════════════

  ┌─────────────────────────────────────────────────────────────┐
  │                      CLIENT LAYER                           │
  │                                                             │
  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐ │
  │  │   Web    │  │ Mobile   │  │ Mobile   │  │  Desktop   │ │
  │  │  (React) │  │  (iOS)   │  │(Android) │  │  Terminal  │ │
  │  └────┬─────┘  └────┬─────┘  └────┬─────┘  └─────┬──────┘ │
  │       │              │              │              │        │
  │       └──────────────┴──────┬───────┴──────────────┘        │
  │                             │                               │
  └─────────────────────────────┼───────────────────────────────┘
                                │  HTTPS + WebSocket
                                ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                    GATEWAY LAYER                             │
  │                                                             │
  │  ┌──────────────────────────────────────────────────────┐   │
  │  │              API Gateway / Load Balancer              │   │
  │  │         (Nginx / AWS ALB / Kong / Traefik)           │   │
  │  │                                                      │   │
  │  │  - SSL termination                                   │   │
  │  │  - Rate limiting (1000 req/min per user)             │   │
  │  │  - WebSocket upgrade handling                        │   │
  │  │  - IP whitelisting for exchange connections          │   │
  │  └──────────────────────┬───────────────────────────────┘   │
  │                         │                                   │
  └─────────────────────────┼───────────────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                   APPLICATION LAYER                         │
  │                                                             │
  │  ┌────────────┐  ┌────────────┐  ┌────────────────────┐    │
  │  │   Order    │  │  Market    │  │    Portfolio       │    │
  │  │  Service   │  │   Data     │  │    Service         │    │
  │  │            │  │  Service   │  │                    │    │
  │  │ - Place    │  │ - Live     │  │ - Holdings         │    │
  │  │ - Cancel   │  │   prices   │  │ - P&L calculation  │    │
  │  │ - Modify   │  │ - OHLCV    │  │ - Margin check     │    │
  │  │ - History  │  │ - Depth    │  │ - Fund transfer    │    │
  │  └──────┬─────┘  └──────┬─────┘  └──────┬─────────────┘    │
  │         │               │               │                   │
  │  ┌────────────┐  ┌────────────┐  ┌────────────────────┐    │
  │  │   Auth     │  │ WebSocket  │  │   Notification     │    │
  │  │  Service   │  │   Server   │  │    Service         │    │
  │  │            │  │            │  │                    │    │
  │  │ - JWT      │  │ - Price    │  │ - Order status     │    │
  │  │ - 2FA      │  │   feed     │  │ - Price alerts     │    │
  │  │ - Session  │  │ - Order    │  │ - Margin calls     │    │
  │  │            │  │   updates  │  │ - Push + Email     │    │
  │  └────────────┘  └────────────┘  └────────────────────┘    │
  │                                                             │
  └──────────────────────────┬──────────────────────────────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                     DATA LAYER                              │
  │                                                             │
  │  ┌────────────┐  ┌────────────┐  ┌────────────────────┐    │
  │  │   Redis    │  │ PostgreSQL │  │    TimescaleDB     │    │
  │  │  Cluster   │  │  (Primary) │  │  (Time-series)     │    │
  │  │            │  │            │  │                    │    │
  │  │ - Sessions │  │ - Users    │  │ - OHLCV candles    │    │
  │  │ - Prices   │  │ - Orders   │  │ - Tick data        │    │
  │  │ - Pub/Sub  │  │ - Trades   │  │ - Historical       │    │
  │  │ - Queues   │  │ - Holdings │  │   charts           │    │
  │  └────────────┘  └────────────┘  └────────────────────┘    │
  │                                                             │
  └─────────────────────────────────────────────────────────────┘
                             │
                             ▼
  ┌─────────────────────────────────────────────────────────────┐
  │                  EXCHANGE LAYER                              │
  │                                                             │
  │  ┌────────────┐  ┌────────────┐  ┌────────────────────┐    │
  │  │    NSE     │  │    BSE     │  │    MCX             │    │
  │  │  Exchange  │  │  Exchange  │  │  (Commodities)     │    │
  │  └────────────┘  └────────────┘  └────────────────────┘    │
  │                                                             │
  │  Connected via: FIX Protocol / Exchange API                 │
  │  Latency requirement: < 5ms to exchange gateway             │
  │                                                             │
  └─────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Core Component Deep-Dives

1. WebSocket Server — The Real-Time Backbone

The WebSocket server is the heart of the real-time experience. Every user has a persistent connection that receives live price updates, order status changes, and alerts.

  WEBSOCKET CONNECTION LIFECYCLE
  ══════════════════════════════

  Client                          Server
    │                                │
    │──── HTTP Upgrade Request ─────▶│
    │     GET /ws                    │
    │     Upgrade: websocket         │
    │     Connection: Upgrade        │
    │     Sec-WebSocket-Key: ...     │
    │                                │
    │◀── 101 Switching Protocols ───│
    │     Upgrade: websocket         │
    │     Sec-WebSocket-Accept: ...  │
    │                                │
    │◀════ WebSocket Connected ═════▶│
    │                                │
    │──── Subscribe: RELIANCE ──────▶│
    │──── Subscribe: INFY ──────────▶│
    │──── Subscribe: TCS ───────────▶│
    │                                │
    │◀─── Price: RELIANCE 2450.50 ──│  (every ~1s)
    │◀─── Price: INFY 1623.75 ──────│
    │◀─── Price: TCS 3890.20 ───────│
    │◀─── Depth: RELIANCE {...} ────│
    │                                │
    │──── Place Order ──────────────▶│
    │◀─── Order Confirmed ──────────│
    │◀─── Order Executed ───────────│
    │                                │
    │──── Heartbeat (ping) ─────────▶│  (every 30s)
    │◀─── Heartbeat (pong) ─────────│
    │                                │
Enter fullscreen mode Exit fullscreen mode
flowchart TD
    A[Client Connects via WS] --> B[Authenticate JWT]
    B -->|Valid| C[Add to Connection Pool]
    B -->|Invalid| D[Reject + Close]
    C --> E[Client Subscribes to Symbols]
    E --> F[Add to Symbol Channels]
    F --> G[Receive Price Updates via Redis Pub/Sub]
    G --> H[Broadcast to Subscribed Clients]
    H --> G

    style D fill:#ef4444,color:#fff
    style H fill:#22c55e,color:#fff
Enter fullscreen mode Exit fullscreen mode

WebSocket Server Implementation

// ws-server.js — WebSocket server with Redis Pub/Sub for price distribution
const WebSocket = require('ws');
const Redis = require('ioredis');
const jwt = require('jsonwebtoken');

const wss = new WebSocket.Server({ noServer: true });
const redisSub = new Redis();  // Subscriber connection
const redisPub = new Redis();  // Publisher connection

// Track subscriptions: symbol → Set of WebSocket clients
const subscriptions = new Map();

// Track clients: ws → { userId, subscribedSymbols }
const clients = new Map();

// Handle new WebSocket connections
wss.on('connection', (ws, req) => {
  const userId = req.userId; // Set during upgrade authentication
  clients.set(ws, { userId, subscribedSymbols: new Set() });

  ws.on('message', (raw) => {
    const msg = JSON.parse(raw);

    switch (msg.type) {
      case 'subscribe':
        handleSubscribe(ws, msg.symbols);
        break;
      case 'unsubscribe':
        handleUnsubscribe(ws, msg.symbols);
        break;
      case 'order':
        handleOrder(ws, msg.data);
        break;
    }
  });

  ws.on('close', () => {
    // Clean up subscriptions
    const client = clients.get(ws);
    if (client) {
      for (const symbol of client.subscribedSymbols) {
        const subs = subscriptions.get(symbol);
        if (subs) {
          subs.delete(ws);
          if (subs.size === 0) {
            subscriptions.delete(symbol);
            redisSub.unsubscribe(`price:${symbol}`);
          }
        }
      }
      clients.delete(ws);
    }
  });
});

function handleSubscribe(ws, symbols) {
  const client = clients.get(ws);

  for (const symbol of symbols) {
    // Add to client's subscription set
    client.subscribedSymbols.add(symbol);

    // Add to symbol's subscriber set
    if (!subscriptions.has(symbol)) {
      subscriptions.set(symbol, new Set());
      // First subscriber for this symbol — subscribe to Redis channel
      redisSub.subscribe(`price:${symbol}`);
    }
    subscriptions.get(symbol).add(ws);
  }
}

// When Redis publishes a price update, broadcast to all subscribed clients
redisSub.on('message', (channel, message) => {
  const symbol = channel.replace('price:', '');
  const subscribers = subscriptions.get(symbol);

  if (subscribers) {
    const payload = JSON.stringify({
      type: 'price',
      symbol,
      data: JSON.parse(message),
      timestamp: Date.now(),
    });

    for (const ws of subscribers) {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(payload);
      }
    }
  }
});
Enter fullscreen mode Exit fullscreen mode

Scaling WebSockets to 100K+ Connections

A single Node.js process can handle ~10K-50K WebSocket connections (depending on message frequency). For 100K+, you need horizontal scaling:

  WEBSOCKET SCALING ARCHITECTURE
  ══════════════════════════════

                    ┌───────────────────┐
                    │   Load Balancer   │
                    │  (Sticky Sessions │
                    │   by IP hash)     │
                    └────────┬──────────┘
                             │
            ┌────────────────┼────────────────┐
            ▼                ▼                ▼
  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
  │  WS Server 1 │ │  WS Server 2 │ │  WS Server 3 │
  │  ~35K conns  │ │  ~35K conns  │ │  ~35K conns  │
  └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
         │                │                │
         └────────────────┼────────────────┘
                          │
                 ┌────────▼────────┐
                 │  Redis Pub/Sub  │
                 │                 │
                 │  All 3 servers  │
                 │  subscribe to   │
                 │  same channels  │
                 └─────────────────┘

  Key insight: Redis Pub/Sub acts as the broadcast bus.
  When a price update comes in, ALL servers receive it
  and broadcast to their own connected clients.
Enter fullscreen mode Exit fullscreen mode

Why sticky sessions? WebSocket connections are stateful. If a client disconnects and reconnects, they should hit the same server to resume their subscriptions. IP-hash based load balancing handles this without session state.

2. Order Flow — The Critical Path

The order flow is the most latency-sensitive part of the system. From the moment a user clicks "Buy" to the order reaching the exchange, every millisecond counts.

  ORDER FLOW — END TO END
  ═══════════════════════

  ┌────────┐     ┌──────────┐     ┌──────────────┐     ┌──────────┐     ┌──────────┐
  │ Client │────▶│   API    │────▶│    Order     │────▶│ Exchange │────▶│  Order   │
  │  (UI)  │     │ Gateway  │     │   Service    │     │ Gateway  │     │ Matching │
  └────────┘     └──────────┘     └──────────────┘     └──────────┘     └──────────┘
                                        │
                                        │ Validates:
                                        ├─ Sufficient margin?
                                        ├─ Valid symbol?
                                        ├─ Market hours?
                                        ├─ Price within circuit limits?
                                        ├─ Quantity within limits?
                                        └─ User not restricted?
                                        │
                                        ▼
                                  ┌──────────────┐
                                  │  Risk Check  │
                                  │   Engine     │
                                  └──────────────┘

  Timeline:
  ─────────────────────────────────────────────────────▶ time
  │ User clicks │ Validation │ Risk Check │ To Exchange │
  │   "Buy"     │   ~5ms     │   ~3ms     │   ~10ms     │
  │             │            │            │             │
  │             └────────────┴────────────┴─────────────│
  │                    Total: ~20-50ms                   │
Enter fullscreen mode Exit fullscreen mode
sequenceDiagram
    participant Client
    participant API as API Gateway
    participant Order as Order Service
    participant Risk as Risk Engine
    participant Exchange
    participant WS as WebSocket Server
    participant DB as PostgreSQL

    Client->>API: Place Order (BUY RELIANCE x100 @ 2450)
    API->>Order: Validate & Forward
    Order->>Order: Check market hours, symbol, quantity
    Order->>Risk: Margin check
    Risk-->>Order: Margin OK (available: ₹5,00,000)
    Order->>DB: Save order (status: PENDING)
    Order->>Exchange: Submit via FIX Protocol
    Exchange-->>Order: Order ID: ORD123456
    Order->>DB: Update status: SUBMITTED
    Order->>WS: Notify client
    WS-->>Client: Order Submitted ✓

    Note over Exchange: Exchange matches order...

    Exchange-->>Order: Execution Report (FILLED @ 2449.50)
    Order->>DB: Update status: EXECUTED
    Order->>DB: Update holdings + positions
    Order->>WS: Notify client
    WS-->>Client: Order Executed ✓ (100 x 2449.50)
Enter fullscreen mode Exit fullscreen mode

Order Validation Code

// order-service.js — Order validation and submission
const MARKET_OPEN = { hour: 9, minute: 15 };
const MARKET_CLOSE = { hour: 15, minute: 30 };

async function placeOrder(userId, orderData) {
  const { symbol, quantity, price, type, side } = orderData;

  // Step 1: Basic validation
  validateOrderParams(symbol, quantity, price, type, side);

  // Step 2: Check market hours (IST)
  const now = new Date();
  const istHour = now.getUTCHours() + 5;
  const istMin = now.getUTCMinutes() + 30;
  if (!isMarketOpen(istHour, istMin)) {
    throw new Error('Market is closed. Trading hours: 9:15 AM - 3:30 PM IST');
  }

  // Step 3: Margin check — can the user afford this order?
  const requiredMargin = calculateMargin(symbol, quantity, price, side);
  const availableMargin = await getAvailableMargin(userId);

  if (requiredMargin > availableMargin) {
    throw new Error(
      `Insufficient margin. Required: ₹${requiredMargin}, Available: ₹${availableMargin}`
    );
  }

  // Step 4: Block margin (prevent double-spending)
  await blockMargin(userId, requiredMargin);

  // Step 5: Save order to database
  const order = await db.query(
    `INSERT INTO orders (user_id, symbol, quantity, price, type, side, status, created_at)
     VALUES ($1, $2, $3, $4, $5, $6, 'PENDING', NOW())
     RETURNING *`,
    [userId, symbol, quantity, price, type, side]
  );

  // Step 6: Submit to exchange
  try {
    const exchangeOrderId = await exchangeGateway.submit({
      orderId: order.rows[0].id,
      symbol,
      quantity,
      price,
      type,  // LIMIT, MARKET, SL, SL-M
      side,  // BUY, SELL
    });

    // Update with exchange order ID
    await db.query(
      'UPDATE orders SET exchange_order_id = $1, status = $2 WHERE id = $3',
      [exchangeOrderId, 'SUBMITTED', order.rows[0].id]
    );

    // Notify client via WebSocket
    notifyClient(userId, {
      type: 'order_update',
      orderId: order.rows[0].id,
      status: 'SUBMITTED',
      exchangeOrderId,
    });

    return { orderId: order.rows[0].id, status: 'SUBMITTED' };
  } catch (err) {
    // Release blocked margin on failure
    await releaseMargin(userId, requiredMargin);
    await db.query(
      'UPDATE orders SET status = $1 WHERE id = $2',
      ['REJECTED', order.rows[0].id]
    );
    throw err;
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Market Data Pipeline — 5,000 Price Updates Per Second

Stock exchanges push price data via their feed APIs. We need to process, store, and broadcast this data to 100K+ users in real-time.

  MARKET DATA PIPELINE
  ════════════════════

  ┌──────────┐     ┌──────────────┐     ┌──────────┐     ┌──────────────┐
  │   NSE    │────▶│   Exchange   │────▶│  Price   │────▶│    Redis     │
  │ Exchange │     │  Feed Parser │     │ Processor│     │   Pub/Sub    │
  │  Feed    │     │              │     │          │     │              │
  │          │     │ - Binary     │     │ - LTP    │     │ price:INFY   │
  │ 5000     │     │   protocol   │     │ - OHLCV  │     │ price:TCS    │
  │ symbols  │     │ - Decode     │     │ - Change │     │ price:...    │
  │ ~1msg/s  │     │ - Validate   │     │ - Volume │     │              │
  │ each     │     │              │     │ - Depth  │     │ ~5000 msg/s  │
  └──────────┘     └──────────────┘     └──────────┘     └──────┬───────┘
                                                                │
                                                    ┌───────────┼───────────┐
                                                    ▼           ▼           ▼
                                              ┌──────────┐ ┌──────────┐ ┌──────────┐
                                              │  WS      │ │  WS      │ │  WS      │
                                              │ Server 1 │ │ Server 2 │ │ Server 3 │
                                              │ 35K conn │ │ 35K conn │ │ 35K conn │
                                              └──────────┘ └──────────┘ └──────────┘
                                                    │           │           │
                                                    ▼           ▼           ▼
                                                  Clients    Clients    Clients
Enter fullscreen mode Exit fullscreen mode
// price-processor.js — Process exchange feed and publish to Redis
const Redis = require('ioredis');
const redis = new Redis();

class PriceProcessor {
  constructor() {
    this.lastPrices = new Map();
  }

  async processTickData(tick) {
    const {
      symbol,
      ltp,       // Last Traded Price
      open,
      high,
      low,
      close,
      volume,
      timestamp,
    } = tick;

    // Calculate change and % change
    const prevClose = this.lastPrices.get(symbol)?.close || close;
    const change = ltp - prevClose;
    const changePercent = ((change / prevClose) * 100).toFixed(2);

    const priceData = {
      symbol,
      ltp,
      open,
      high,
      low,
      close: prevClose,
      volume,
      change,
      changePercent: parseFloat(changePercent),
      timestamp,
    };

    // Update local cache
    this.lastPrices.set(symbol, priceData);

    // Publish to Redis — all WS servers will receive this
    await redis.publish(
      `price:${symbol}`,
      JSON.stringify(priceData)
    );

    // Store latest price in Redis hash (for API queries)
    await redis.hset(`stock:${symbol}`, {
      ltp: ltp.toString(),
      open: open.toString(),
      high: high.toString(),
      low: low.toString(),
      volume: volume.toString(),
      change: change.toString(),
      changePercent: changePercent,
      updatedAt: timestamp.toString(),
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

4. Database Design

  DATABASE SCHEMA (Simplified)
  ════════════════════════════

  ┌──────────────────┐       ┌──────────────────┐
  │      users       │       │     accounts     │
  ├──────────────────┤       ├──────────────────┤
  │ id          PK   │──┐    │ id          PK   │
  │ email            │  │    │ user_id     FK   │──┐
  │ phone            │  │    │ balance          │  │
  │ pan_number       │  │    │ blocked_margin   │  │
  │ kyc_status       │  │    │ available_margin │  │
  │ created_at       │  └───▶│ updated_at       │  │
  └──────────────────┘       └──────────────────┘  │
                                                    │
  ┌──────────────────┐       ┌──────────────────┐  │
  │      orders      │       │    holdings      │  │
  ├──────────────────┤       ├──────────────────┤  │
  │ id          PK   │       │ id          PK   │  │
  │ user_id     FK   │──┐    │ user_id     FK   │◀─┘
  │ symbol           │  │    │ symbol           │
  │ quantity         │  │    │ quantity         │
  │ price            │  │    │ avg_price        │
  │ type (LIMIT/..)  │  │    │ current_value    │
  │ side (BUY/SELL)  │  │    │ pnl             │
  │ status           │  │    │ updated_at       │
  │ exchange_order_id│  │    └──────────────────┘
  │ executed_price   │  │
  │ executed_at      │  │    ┌──────────────────┐
  │ created_at       │  │    │     trades       │
  └──────────────────┘  │    ├──────────────────┤
                        │    │ id          PK   │
                        └───▶│ order_id    FK   │
                             │ symbol           │
                             │ quantity         │
                             │ price            │
                             │ side             │
                             │ exchange_trade_id│
                             │ executed_at      │
                             └──────────────────┘

  ┌──────────────────────────────────────────────┐
  │         ohlcv_candles (TimescaleDB)          │
  ├──────────────────────────────────────────────┤
  │ time         TIMESTAMPTZ  (hypertable key)   │
  │ symbol       TEXT                             │
  │ open         DECIMAL(12,2)                    │
  │ high         DECIMAL(12,2)                    │
  │ low          DECIMAL(12,2)                    │
  │ close        DECIMAL(12,2)                    │
  │ volume       BIGINT                           │
  │ interval     TEXT  ('1m','5m','15m','1h','1d')│
  └──────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode
erDiagram
    USERS ||--o{ ACCOUNTS : has
    USERS ||--o{ ORDERS : places
    USERS ||--o{ HOLDINGS : owns
    ORDERS ||--o{ TRADES : generates

    USERS {
        int id PK
        string email
        string phone
        string pan_number
        string kyc_status
    }

    ORDERS {
        int id PK
        int user_id FK
        string symbol
        int quantity
        decimal price
        string type
        string side
        string status
    }

    HOLDINGS {
        int id PK
        int user_id FK
        string symbol
        int quantity
        decimal avg_price
    }

    TRADES {
        int id PK
        int order_id FK
        string symbol
        int quantity
        decimal price
        timestamp executed_at
    }
Enter fullscreen mode Exit fullscreen mode

Why Two Databases?

  PostgreSQL vs TimescaleDB — DECISION MATRIX
  ═══════════════════════════════════════════

  ┌──────────────────┬──────────────────┬──────────────────────┐
  │                  │   PostgreSQL     │    TimescaleDB       │
  ├──────────────────┼──────────────────┼──────────────────────┤
  │ Used for         │ Users, Orders,   │ Price candles,       │
  │                  │ Holdings, Trades │ tick data, charts    │
  ├──────────────────┼──────────────────┼──────────────────────┤
  │ Query pattern    │ Random access    │ Time-range queries   │
  │                  │ by user/order ID │ "RELIANCE, last 30d" │
  ├──────────────────┼──────────────────┼──────────────────────┤
  │ Write pattern    │ Low-medium       │ Very high            │
  │                  │ (orders/trades)  │ (5000 writes/sec)    │
  ├──────────────────┼──────────────────┼──────────────────────┤
  │ Retention        │ Forever          │ 2 years, then        │
  │                  │                  │ aggregate + archive  │
  ├──────────────────┼──────────────────┼──────────────────────┤
  │ Why              │ ACID for money   │ Optimized for        │
  │                  │ transactions     │ time-series inserts  │
  │                  │                  │ + range scans        │
  └──────────────────┴──────────────────┴──────────────────────┘
Enter fullscreen mode Exit fullscreen mode

5. Caching Strategy

We covered caching in detail in my Redis caching post, but here's how it fits into the trading platform specifically:

  WHAT WE CACHE (AND WHAT WE DON'T)
  ══════════════════════════════════

  CACHED (Redis):
  ┌──────────────────────────────────────────────────┐
  │  stock:RELIANCE    → { ltp, ohlcv, volume }     │  TTL: 2s
  │  depth:RELIANCE    → { bids[], asks[] }          │  TTL: 1s
  │  watchlist:user123 → [RELIANCE, INFY, TCS]       │  TTL: 5min
  │  session:abc123    → { userId, permissions }      │  TTL: 30min
  │  portfolio:user123 → { holdings, pnl }            │  TTL: 60s
  │  config:margins    → { symbol: margin% }          │  TTL: 1hr
  └──────────────────────────────────────────────────┘

  NOT CACHED (Direct DB):
  ┌──────────────────────────────────────────────────┐
  │  Order placement    → Must be 100% accurate      │
  │  Account balance    → Real-time for margin calc   │
  │  Trade execution    → Exchange is source of truth │
  │  Fund transfer      → Financial transaction       │
  └──────────────────────────────────────────────────┘

  Rule: If a stale value could cause financial loss,
        don't cache it. Go to the database.
Enter fullscreen mode Exit fullscreen mode

6. Authentication & Security

  AUTH FLOW — JWT + TOTP 2FA
  ══════════════════════════

  ┌────────┐                    ┌──────────┐                 ┌───────┐
  │ Client │                    │   Auth   │                 │ Redis │
  └───┬────┘                    │ Service  │                 └───┬───┘
      │                         └────┬─────┘                     │
      │  1. POST /login              │                           │
      │  { email, password }         │                           │
      │─────────────────────────────▶│                           │
      │                              │  2. Verify password       │
      │                              │     (bcrypt compare)      │
      │                              │                           │
      │  3. Send TOTP code           │                           │
      │◀─────────────────────────────│                           │
      │                              │                           │
      │  4. POST /verify-2fa         │                           │
      │  { code: "482910" }          │                           │
      │─────────────────────────────▶│                           │
      │                              │  5. Verify TOTP           │
      │                              │                           │
      │                              │  6. Create session ──────▶│
      │                              │     SET session:uuid      │
      │                              │     { userId, perms }     │
      │                              │     TTL: 30 min           │
      │  7. Return JWT               │                           │
      │  { accessToken,              │                           │
      │    refreshToken }            │                           │
      │◀─────────────────────────────│                           │
      │                              │                           │
      │  8. Subsequent requests      │                           │
      │  Authorization: Bearer jwt   │                           │
      │─────────────────────────────▶│                           │
      │                              │  9. Verify JWT            │
      │                              │  10. Check session ──────▶│
      │                              │      in Redis             │
      │                              │◀──────────────────────────│
      │  11. Response                │                           │
      │◀─────────────────────────────│                           │
Enter fullscreen mode Exit fullscreen mode

Handling Failure Scenarios

In production, things break. Here's how we designed for the common failure modes:

  FAILURE SCENARIOS & MITIGATIONS
  ═══════════════════════════════

  ┌─────────────────────────────────────────────────────────────┐
  │ SCENARIO              │ IMPACT        │ MITIGATION          │
  ├───────────────────────┼───────────────┼─────────────────────┤
  │ Redis goes down       │ Slower reads  │ Fall back to DB     │
  │                       │               │ Serve stale prices  │
  ├───────────────────────┼───────────────┼─────────────────────┤
  │ WS server crashes     │ Users         │ Client auto-        │
  │                       │ disconnected  │ reconnect with      │
  │                       │               │ exponential backoff │
  ├───────────────────────┼───────────────┼─────────────────────┤
  │ Exchange feed drops   │ Stale prices  │ Show "last updated" │
  │                       │               │ timestamp, warn user│
  ├───────────────────────┼───────────────┼─────────────────────┤
  │ Order rejected by     │ Order fails   │ Release margin,     │
  │ exchange              │               │ notify user, retry  │
  │                       │               │ if transient        │
  ├───────────────────────┼───────────────┼─────────────────────┤
  │ DB write fails during │ Inconsistent  │ Use DB transaction  │
  │ order execution       │ state         │ + idempotency key   │
  ├───────────────────────┼───────────────┼─────────────────────┤
  │ Network partition     │ Split brain   │ Prefer consistency  │
  │ between services      │               │ (reject orders      │
  │                       │               │ rather than risk     │
  │                       │               │ double execution)   │
  └───────────────────────┴───────────────┴─────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Client Reconnection Strategy

// client-reconnect.js — WebSocket reconnection with exponential backoff
class TradingWebSocket {
  constructor(url) {
    this.url = url;
    this.retryCount = 0;
    this.maxRetries = 10;
    this.subscriptions = new Set();
    this.connect();
  }

  connect() {
    this.ws = new WebSocket(this.url);

    this.ws.onopen = () => {
      console.log('Connected to trading server');
      this.retryCount = 0;

      // Re-subscribe to all symbols after reconnection
      if (this.subscriptions.size > 0) {
        this.ws.send(JSON.stringify({
          type: 'subscribe',
          symbols: [...this.subscriptions],
        }));
      }
    };

    this.ws.onclose = () => {
      if (this.retryCount < this.maxRetries) {
        // Exponential backoff: 1s, 2s, 4s, 8s, 16s... capped at 30s
        const delay = Math.min(1000 * Math.pow(2, this.retryCount), 30000);
        console.log(`Reconnecting in ${delay}ms (attempt ${this.retryCount + 1})`);
        this.retryCount++;
        setTimeout(() => this.connect(), delay);
      } else {
        console.error('Max reconnection attempts reached');
      }
    };

    this.ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      this.handleMessage(data);
    };
  }

  subscribe(symbols) {
    symbols.forEach((s) => this.subscriptions.add(s));
    if (this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify({ type: 'subscribe', symbols }));
    }
  }

  handleMessage(data) {
    switch (data.type) {
      case 'price':
        this.onPriceUpdate?.(data);
        break;
      case 'order_update':
        this.onOrderUpdate?.(data);
        break;
      case 'alert':
        this.onAlert?.(data);
        break;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Scaling Considerations

  SCALING FROM 10K TO 100K+ USERS
  ════════════════════════════════

  10K Users (Single Instance)
  ┌──────────┐  ┌──────────┐  ┌──────────┐
  │ 1 Node.js│  │ 1 Redis  │  │ 1 PgSQL  │
  │ instance │  │ instance │  │ instance │
  └──────────┘  └──────────┘  └──────────┘

  50K Users (Horizontal + Read Replicas)
  ┌──────────┐  ┌──────────┐  ┌──────────┐
  │ 3 Node.js│  │ Redis    │  │ PgSQL    │
  │ instances│  │ Sentinel │  │ Primary  │
  │ + LB     │  │ (HA)     │  │ + 2 Read │
  └──────────┘  └──────────┘  │ Replicas │
                               └──────────┘

  100K+ Users (Full Cluster)
  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
  │ 5+ Node  │  │ Redis    │  │ PgSQL    │  │Timescale │
  │ instances│  │ Cluster  │  │ Primary  │  │ DB for   │
  │ + ALB    │  │ (6 nodes │  │ + Read   │  │ candles  │
  │ + Auto-  │  │  3M+3S)  │  │ Replicas │  │ + ticks  │
  │  scale   │  │          │  │ + PgPool │  │          │
  └──────────┘  └──────────┘  └──────────┘  └──────────┘

  Key metrics to monitor:
  ├── WebSocket connections per server (target: <40K)
  ├── Redis memory usage and hit rate (target: >90%)
  ├── DB connection pool utilization (target: <80%)
  ├── Order processing latency P99 (target: <100ms)
  └── Price broadcast latency (target: <50ms from exchange)
Enter fullscreen mode Exit fullscreen mode

Common Mistakes in Trading System Design

1. Using REST for Real-Time Data

Polling every second for 5,000 symbols = 5,000 HTTP requests/second per user. Use WebSockets.

2. Not Separating Read and Write Databases

Order writes and portfolio reads have very different patterns. Use read replicas for dashboards, primary for order execution.

3. Caching Account Balances

A stale balance means a user might place an order they can't afford. Always read balance from the primary database.

4. No Circuit Breaker for Exchange Connection

If the exchange gateway is slow, your order queue backs up. Use a circuit breaker pattern to fail fast and inform users.

5. Synchronous Order Processing

Order validation, margin check, and exchange submission can be pipelined. Validate first, then submit asynchronously and notify via WebSocket.

Key Takeaways

  • WebSocket + Redis Pub/Sub is the backbone for real-time price distribution across multiple servers
  • Separate your services: order execution, market data, portfolio, and auth have very different scaling and consistency requirements
  • Cache aggressively but carefully: stock prices (2s TTL) yes, account balances (never) no
  • Design for failure: exchanges go down, connections drop, databases lag — graceful degradation beats crashing
  • Use the right database for the job: PostgreSQL for transactional data, TimescaleDB for time-series, Redis for hot data
  • Horizontal scaling via sticky-session load balancing and Redis as the broadcast bus
  • Latency budget: know where every millisecond goes in your order flow

Connect with Me

If you found this useful, I write deep-dive system design posts and visual explainers every week. Follow along:

Next post: "The Node.js Event Loop — Finally Explained Right" — where I break down the event loop with animated diagrams and show exactly why setTimeout(fn, 0) doesn't mean what you think it does.

Drop a comment if you have questions about any part of this architecture — happy to go deeper on any component.

Top comments (0)