DEV Community: Tapesh Chandra Das

TickForge: a low-latency C++20 market data & order execution engine published: false tags: cpp, systemsprogramming, quant, lowlatency

Tapesh Chandra Das — Mon, 06 Jul 2026 20:36:49 +0000

Why I'm building this

Most "low-latency trading" writeups stay at the conceptual level — kernel bypass, lock-free queues, cache-friendly memory layouts — without much code you can actually run, break, and rebuild. I wanted the real thing, or at least a serious foundation of it, so I started TickForge: a C++20 market data and order execution engine, built to understand the architecture behind low-latency trading systems by actually implementing it.

Repo: github.com/td-02/DPDKTrade

This is a foundation-stage project, not a finished trading system — and I want to be upfront about that. The goal right now is a clean, extensible architecture, not a feature checklist.

What's built so far

Fixed-depth L2 order book — bounded memory, deterministic behavior, no dynamic growth on the hot path
Custom 62-byte wire frame for market data and order events — fixed-size, no serialization overhead
Imbalance-based strategy signal — a simple first strategy to validate the book → strategy → risk pipeline end to end
Pre-trade risk guard — a gate between signal generation and order dispatch
Engine layer wiring book, strategy, and risk together into one control flow
Transport scaffolding for both AF_PACKET (baseline) and DPDK-ring (kernel-bypass target), with an initial ring benchmark

Design principles

A few constraints I set for myself up front, because they're the ones that actually matter in this domain:

No dynamic allocation on the hot path. Anything that runs per-tick or per-order needs to avoid malloc/new entirely.
Fixed-size, predictable data structures. The order book and wire protocol are both bounded and pre-allocated rather than growing dynamically.
Modern C++20 throughout, with modern CMake for the build — no legacy build cruft.
AF_PACKET first, DPDK second. Rather than jumping straight to DPDK, I wanted a normal-socket baseline in place first, so the eventual kernel-bypass path has something concrete to benchmark against.

Currently built and tested on WSL2 (Ubuntu 24.04) with GCC/Clang + Ninja.

What's next

Fill out the DPDK-ring transport path and get it talking to the rest of the engine
Proper benchmarking: DPDK-ring vs. AF_PACKET, with real latency histograms rather than a single average
Expand the strategy layer beyond the initial imbalance signal
Docker-based build for portability beyond WSL2

Why I'm sharing this now, mid-build

I'd rather post the architecture while it's still small enough to review in one sitting than wait until it's "done" (it never really is). If you've built order books, market data handlers, or kernel-bypass networking code professionally, I'd genuinely like structural feedback — what would you have done differently at this stage, before more gets built on top of it?

Repo, again: github.com/td-02/DPDKTrade

I Built a Fully Local, AI-Native Hedge Fund System (Multi-Agent, Auditable, No Paid APIs)

Tapesh Chandra Das — Thu, 16 Apr 2026 20:38:41 +0000

Most “AI trading projects” fall into one of three categories:

notebook experiments
single-model pipelines
or black-box systems with no observability

They don’t resemble real trading systems.

So I built something closer to production reality:

A free, portable, AI-native hedge fund prototype with:

multi-agent decision making
backtesting + paper execution
full audit infrastructure
and zero paid API dependencies

Project:
https://github.com/td-02/ai-native-hedge-fund

What this actually is

This is not just a model.

It’s a complete trading runtime with:

data ingestion
research layer
strategy ensemble
risk management
execution system
audit + tracing

All wired together into a single pipeline.

From the README:

“A production-grade multi-agent trading system with backtesting, paper execution, and full audit infrastructure. No paid APIs required.”

Why I built this

The gap is simple:

Most people optimize models.
Real systems fail on integration, control, and visibility.

This project focuses on:

system design over isolated intelligence
traceability over black-box outputs
reliability over demos
Architecture (high-level)

The system runs as a structured pipeline:

Data Ingest
→ Data Quality Gate
→ Research Agent
→ Strategy Ensemble
→ Overlays (alpha / arbitrage / macro)
→ Fund Manager
→ Risk Manager
→ Execution Controls
→ Broker Router
→ Audit + Tracing

There’s also a more advanced AI-native v2 layer with:

regime-aware routing
AI forecast calibration
benchmark-relative optimization
“no-harm” guards

This is closer to how institutional systems evolve over time.

Multi-agent system (core idea)

Instead of one model making decisions, the system uses:

Research & Data
market data ingestion (yfinance)
deterministic + LLM-based research
multi-agent research council
Strategy Layer
trend following
mean reversion
volatility carry
event-driven strategies
alpha signals (earnings, volume, options proxies, etc.)
Risk & Execution
portfolio aggregation
VaR / ES constraints
drawdown brakes
TWAP/VWAP-style execution
broker failover

Each component is independent but coordinated.

What makes this different

Fully free stack yfinance for data Ollama for local LLMs Alpaca for paper execution

No paid APIs required.

Auditability by design

Every decision is:

logged
traceable
reproducible

Artifacts include:

audit logs in Postgres
decision snapshots
TraceLM execution traces
heartbeat monitoring

This is rarely implemented properly in open projects.

Production-style reliability

The system includes:

circuit breakers per stage
retries and degraded modes
data quality validation
dead-man heartbeat

These are not typical in hobby projects, but essential in real systems.

Runs anywhere

You can deploy it on:

Docker
Oracle Always Free
Render / Railway
GitHub Actions (scheduled execution)

This makes it practical, not just experimental.

Backtest performance (honest view)

The current strategy (momentum + trend, ETF universe):

Sharpe: ~0.61
CAGR: ~7.6%
Max drawdown: ~-25.8%

Comparable to SPY in some regimes, worse in others.

This is intentional.

The goal here is not “alpha marketing” — it’s building the system correctly first.

Observability (critical but ignored in most projects)

This system includes:

TraceLM-based execution tracing
structured audit logs
decision-level introspection

You can answer:

why a trade happened
which agent contributed
what constraints were applied

Without this, scaling any AI system becomes guesswork.

Enterprise-style runtime (beyond toy setups)

The system already includes:

FastAPI service layer
Celery worker queues (research / strategy / execution)
Redis + Postgres/TimescaleDB
feature flags
metrics (Prometheus)

This is closer to a distributed system than a script.

What’s still missing

This is not production-ready.

Key gaps:

stronger portfolio optimization
execution realism (slippage, liquidity impact)
better regime detection (HMM, change-point models)
more robust ML layers (RL, transformers)
real capital deployment safeguards

These are hard problems and intentionally not abstracted away.

Who this is for
ML engineers moving into finance
developers interested in multi-agent systems
early quant developers
people tired of toy AI demos
Key lessons from building this
Systems matter more than models
Observability is non-negotiable
Reliability is harder than intelligence
Finance is a coordination problem, not just prediction
Final note

Most AI projects optimize for:

demos
benchmarks
or isolated components

This project tries to optimize for:

structure
traceability
and real system constraints

If you’re building in AI + systems + finance, this is the direction that actually compounds.

I built a replay testing tool for MCP servers — here's why and how it works

Tapesh Chandra Das — Wed, 01 Apr 2026 18:49:55 +0000

When your AI agent does something unexpected, where do you look?

For most teams right now: stderr noise, missing logs, or vendor black boxes. The execution path disappears, you have no idea what the agent actually sent to the tool, and there's no way to reproduce the failure in a test.

I kept hitting this wall while building MCP agents, so I built mcpscope — an open source observability and replay testing layer for MCP servers.

The problem

MCP (Model Context Protocol) is becoming the standard way AI agents call external tools. But the tooling around it is still catching up. When something goes wrong in production:

There's no standard trace format for MCP traffic
Tool call failures vanish into stderr with no context
Schema changes on upstream servers break your agent silently
There's no way to reproduce a production failure in a test environment

This is the gap mcpscope fills.

How it works

mcpscope is a transparent proxy. You point it at your MCP server and it intercepts every JSON-RPC message — recording requests, responses, latency, and errors — without changing a single line in your server.

go install github.com/td-02/mcp-observer@latest
mcpscope proxy --server ./your-mcp-server --db traces.db

Open http://localhost:4444 and you have a live dashboard showing every tool call, with P50/P95/P99 latency histograms and error timelines.

For Python servers:

mcpscope proxy -- uv run server.py

For HTTP MCP servers:

mcpscope proxy --transport http --upstream-url http://127.0.0.1:8080

The feature I'm most excited about: replay

This is the part I haven't seen in any other MCP tooling.

Once mcpscope has recorded your production traces, you can export and replay them against your server in CI:

# Export real production traces
mcpscope export --config ./mcpscope.example.json --output traces.json --limit 200

# Replay in CI — fail on errors or latency regressions
mcpscope replay --input traces.json --fail-on-error --max-latency-ms 500 -- uv run server.py

Record in prod. Replay in CI. Catch regressions before they reach your agent.

This unlocks a workflow that wasn't possible before: take a session where your agent behaved unexpectedly, export the exact traces, and turn them into a reproducible test case. No more "it only happens in production."

Schema drift in CI

The other thing that kept biting me: upstream MCP servers changing their tool schemas without warning, silently breaking my agent.

# Capture baseline
mcpscope snapshot --server ./your-mcp-server --output baseline.json
git add baseline.json && git commit -m "chore: add MCP baseline snapshot"

# On every PR:
mcpscope snapshot --server ./your-mcp-server --output current.json
mcpscope diff baseline.json current.json --exit-code

The --exit-code flag makes it CI-friendly — exits non-zero on breaking changes so your PR check fails before the change reaches your agent. There's a GitHub Actions example in the repo.

Everything else in v0.1.0

Live web dashboard — tool call feed, latency percentile views, error timelines
Alerts — Slack, PagerDuty, or any webhook
OpenTelemetry export — plugs into Grafana or Jaeger via OTLP gRPC
SQLite trace store — local by default, Postgres-ready, configurable retention
Workspace + environment scoping — prod vs staging
Docker + Docker Compose included

Why open source and MIT

Tool call data can contain sensitive information. I wanted something that keeps traces local by default, plugs into the stack you already have, and can run in air-gapped environments. No telemetry, MIT licensed.

What's next

Per-team budget enforcement, audit log export (CSV and JSON), and a hosted cloud version are on the roadmap.

But right now I'm most interested in hearing from people building MCP agents — what are you running into that mcpscope doesn't solve yet?

Repo: https://github.com/td-02/mcp-observer