Lycore Development

Posted on May 22

How to Build a Trading Platform: Architecture, Features, and the Hard Engineering Problems

#architecture #backend #softwareengineering #systemdesign

Why Trading Platforms Are Among the Hardest Software to Build

Most software has a generous margin for error. A bug in your e-commerce checkout means a failed transaction — annoying, recoverable. A bug in your trading platform's order matching engine means incorrect executions, real financial losses, and potentially regulatory consequences. The gap between "it works" and "it works correctly under all market conditions" is wider in trading software than almost anywhere else.

I've spent time building and reviewing trading platforms across retail brokerage, institutional execution, and DeFi. This post is a practical engineering guide: the architecture decisions that matter, the features you can't cut corners on, and the failure modes that will bite you if you're not prepared.

This is not financial advice, and building a regulated trading platform requires legal and compliance expertise beyond the scope of any engineering post. What this covers is the engineering substance of the problem.

The Core Components Every Trading Platform Needs

1. Order Management System (OMS)

The OMS is the heart of the platform. It receives orders from users, validates them, routes them for execution, tracks their lifecycle, and reconciles the results. Every other component interacts with it.

Key requirements:

Idempotency: Order submission must be idempotent. Network timeouts are common; if a user retries a submission, you must not create duplicate orders.
State machine correctness: An order has a defined lifecycle (pending → submitted → partially filled → filled, or pending → cancelled, etc.). Transitions must be atomic and auditable.
Audit trail: Every state change, every modification, every cancellation must be logged with timestamp, actor, and reason. This is not optional in any regulated context.

from enum import Enum
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import uuid

class OrderStatus(str, Enum):
    PENDING = "pending"
    SUBMITTED = "submitted"
    PARTIALLY_FILLED = "partially_filled"
    FILLED = "filled"
    CANCELLED = "cancelled"
    REJECTED = "rejected"
    EXPIRED = "expired"

class OrderSide(str, Enum):
    BUY = "buy"
    SELL = "sell"

class OrderType(str, Enum):
    MARKET = "market"
    LIMIT = "limit"
    STOP = "stop"
    STOP_LIMIT = "stop_limit"

@dataclass
class Order:
    user_id: str
    symbol: str
    side: OrderSide
    order_type: OrderType
    quantity: float
    limit_price: Optional[float] = None
    stop_price: Optional[float] = None

    # System-managed fields
    order_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    client_order_id: Optional[str] = None  # Idempotency key from client
    status: OrderStatus = OrderStatus.PENDING
    filled_quantity: float = 0.0
    average_fill_price: Optional[float] = None
    created_at: datetime = field(default_factory=datetime.utcnow)
    updated_at: datetime = field(default_factory=datetime.utcnow)

    def validate(self) -> list[str]:
        """Validate order before submission. Returns list of error messages."""
        errors = []

        if self.quantity <= 0:
            errors.append("Quantity must be positive")

        if self.order_type in (OrderType.LIMIT, OrderType.STOP_LIMIT):
            if self.limit_price is None or self.limit_price <= 0:
                errors.append("Limit price required and must be positive")

        if self.order_type in (OrderType.STOP, OrderType.STOP_LIMIT):
            if self.stop_price is None or self.stop_price <= 0:
                errors.append("Stop price required and must be positive")

        return errors

    def can_transition_to(self, new_status: OrderStatus) -> bool:
        """Enforce valid state machine transitions."""
        valid_transitions = {
            OrderStatus.PENDING: {OrderStatus.SUBMITTED, OrderStatus.REJECTED},
            OrderStatus.SUBMITTED: {
                OrderStatus.PARTIALLY_FILLED, OrderStatus.FILLED,
                OrderStatus.CANCELLED, OrderStatus.EXPIRED
            },
            OrderStatus.PARTIALLY_FILLED: {
                OrderStatus.FILLED, OrderStatus.CANCELLED
            },
        }
        return new_status in valid_transitions.get(self.status, set())


class OrderManagementSystem:

    def __init__(self, db, risk_engine, execution_router, audit_log):
        self.db = db
        self.risk = risk_engine
        self.router = execution_router
        self.audit = audit_log

    def submit_order(self, order: Order) -> dict:
        # Idempotency check
        if order.client_order_id:
            existing = self.db.find_by_client_order_id(order.client_order_id)
            if existing:
                return {"status": "duplicate", "order_id": existing.order_id}

        # Validation
        errors = order.validate()
        if errors:
            return {"status": "rejected", "errors": errors}

        # Pre-trade risk checks
        risk_result = self.risk.check(order)
        if not risk_result.approved:
            order.status = OrderStatus.REJECTED
            self.db.save(order)
            self.audit.log("order_rejected", order, reason=risk_result.reason)
            return {"status": "rejected", "reason": risk_result.reason}

        # Submit
        order.status = OrderStatus.SUBMITTED
        self.db.save(order)
        self.audit.log("order_submitted", order)

        # Route to execution (async in production)
        self.router.route(order)

        return {"status": "submitted", "order_id": order.order_id}

2. Market Data Infrastructure

Your platform needs real-time market data: current prices, order book depth, trade history, and historical data for charts. This is harder than it looks because:

Volume is high: A single liquid equity can generate thousands of price updates per second
Latency matters: Stale prices cause bad user decisions and, in some architectures, bad executions
Data quality matters: Bad ticks (erroneous price prints) need to be filtered

The architecture decision is whether to build your own market data pipeline or use a managed provider. For most platforms, managed providers (Polygon.io, Alpaca, Interactive Brokers data feeds) are the right answer — the engineering investment in a production-grade market data system is substantial and the differentiation is minimal.

When you do need to build your own data handling layer, a time-series database is essential. TimescaleDB (Postgres extension) handles most use cases well without introducing a new operational dependency:

-- TimescaleDB hypertable for OHLCV data
CREATE TABLE ohlcv (
    time        TIMESTAMPTZ NOT NULL,
    symbol      TEXT NOT NULL,
    open        NUMERIC(18, 8) NOT NULL,
    high        NUMERIC(18, 8) NOT NULL,
    low         NUMERIC(18, 8) NOT NULL,
    close       NUMERIC(18, 8) NOT NULL,
    volume      NUMERIC(24, 8) NOT NULL
);

SELECT create_hypertable('ohlcv', 'time');
CREATE INDEX ON ohlcv (symbol, time DESC);

-- Continuous aggregate for 1-hour candles from tick data
CREATE MATERIALIZED VIEW ohlcv_1h
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS bucket,
    symbol,
    first(open, time) AS open,
    max(high) AS high,
    min(low) AS low,
    last(close, time) AS close,
    sum(volume) AS volume
FROM ohlcv
GROUP BY bucket, symbol;

3. Risk Engine

The risk engine sits between order submission and execution. It enforces position limits, buying power constraints, and market risk parameters. It is not optional.

Pre-trade risk checks for a retail platform typically include:

Buying power: Does the user have sufficient funds/margin to cover this order?
Position limits: Would this order exceed maximum allowed position size per symbol?
Order size limits: Is this order unreasonably large (potential fat-finger error)?
Market hours: Is this market currently open for the order type being submitted?
Symbol restrictions: Is this symbol available for trading on this platform?

from dataclasses import dataclass

@dataclass
class RiskCheckResult:
    approved: bool
    reason: Optional[str] = None
    warnings: list = field(default_factory=list)

class PreTradeRiskEngine:

    def __init__(self, account_service, position_service, config):
        self.accounts = account_service
        self.positions = position_service
        self.config = config

    def check(self, order: Order) -> RiskCheckResult:
        account = self.accounts.get(order.user_id)

        # Buying power check
        estimated_cost = self._estimate_order_cost(order)
        if account.available_cash < estimated_cost:
            return RiskCheckResult(
                approved=False,
                reason=f"Insufficient buying power. Required: {estimated_cost:.2f}, Available: {account.available_cash:.2f}"
            )

        # Position limit check
        current_position = self.positions.get(order.user_id, order.symbol)
        new_position = current_position.quantity + (
            order.quantity if order.side == OrderSide.BUY else -order.quantity
        )

        max_position = self.config.get_max_position(order.symbol, account.tier)
        if abs(new_position) > max_position:
            return RiskCheckResult(
                approved=False,
                reason=f"Order would exceed maximum position limit of {max_position} for {order.symbol}"
            )

        # Fat finger check
        if order.quantity > self.config.fat_finger_threshold:
            return RiskCheckResult(
                approved=False,
                reason=f"Order size {order.quantity} exceeds maximum single order size {self.config.fat_finger_threshold}"
            )

        return RiskCheckResult(approved=True)

    def _estimate_order_cost(self, order: Order) -> float:
        if order.order_type == OrderType.LIMIT and order.limit_price:
            return order.quantity * order.limit_price
        # For market orders, use last price with a buffer
        last_price = self.positions.get_last_price(order.symbol)
        return order.quantity * last_price * 1.02  # 2% buffer for market impact

4. Real-Time Portfolio and P&L

Users need to see their current positions, unrealised P&L, and account value in real time. This is a read-heavy workload that benefits from a separate read model updated by the execution feed.

WebSocket connections are the standard for pushing portfolio updates to frontend clients. The architecture: execution fills update a portfolio state store (Redis works well here for latency), and a WebSocket gateway pushes diffs to connected clients.

The Features You Cannot Cut Corners On

Order History and Statements

Every trade must be recorded and retrievable. Users need complete trade history for tax purposes. Regulators need it for compliance purposes. Your operations team needs it for reconciliation.

This means: immutable trade records, complete audit trails, export capabilities (CSV at minimum), and retention policies that meet your regulatory requirements. The retention requirement for financial records in most jurisdictions is 5-7 years.

Account Security

Trading accounts are high-value targets. The security requirements go beyond standard web application security:

MFA mandatory, not optional: SMS, TOTP, or hardware key
Session management: Short session timeouts, concurrent session detection, geographic anomaly alerts
Withdrawal address whitelisting: For crypto platforms, withdrawals only to pre-approved addresses
Transaction monitoring: Flag unusual patterns — unusually large trades, trading at unusual hours, rapid position changes

Reconciliation

End-of-day reconciliation between your internal records and your execution venue records is not optional. Discrepancies exist — execution venues make mistakes, network issues cause message loss, edge cases in your OMS create inconsistencies. Daily automated reconciliation with exception alerting catches these before they compound.

The Infrastructure Reality

A trading platform is not a typical web application. The requirements that differentiate it:

Latency: Order submission to acknowledgement needs to be fast — users notice delays above 200ms, and anything above 1 second creates trust issues. This means database query optimisation, connection pooling, and careful attention to your critical path.

Reliability: Trading platforms need 99.9%+ uptime during market hours. Planned maintenance windows need to be outside market hours. Unplanned outages during high-volatility market sessions are severe reputational events.

Consistency over availability: When you have to choose between availability and consistency (a partition tolerance scenario), trading platforms choose consistency. It is better to reject an order than to create an inconsistent state.

Disaster recovery: You need point-in-time recovery for your trade database, tested regularly. RTO (recovery time objective) and RPO (recovery point objective) need to be defined and designed for before you go live.

For teams building fintech and trading infrastructure, our team at Lycore has hands-on experience with the full stack — from order management systems to real-time market data pipelines to regulatory reporting. The complexity is significant but manageable with the right architecture from the start.

What Most Teams Get Wrong

Starting with the UI: The beautiful trading interface is the last thing to build, not the first. The OMS, risk engine, and execution connectivity need to be solid before the front end matters.

Underestimating reconciliation: Teams consistently underinvest in reconciliation infrastructure and spend months retrofitting it after launch. Build it in from day one.

Ignoring the operational side: A trading platform needs a full operational runbook, clear escalation paths for execution issues, and relationships with your execution venues' technical support teams. You will have incidents. Being prepared for them is the difference between a recoverable situation and a crisis.

Not testing failure modes: Test what happens when your execution venue connection drops mid-order. Test what happens when the market data feed goes stale. Test what happens when your database primary fails over. These scenarios will occur in production.

Building something in the fintech or trading space? I'm happy to discuss architecture in the comments — the specifics vary a lot by asset class, regulatory jurisdiction, and execution model.

DEV Community