Len

Posted on Jun 4

A Gym-style API for algorithmic trading research, in Rust

#devchallenge #githubchallenge #rust #githubcopilot

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

Chapaty is an open-source Rust backtesting framework with a Gym-style reset / step / act API for algorithmic trading. Strategy logic lives in a single act function. Order execution, matching engine, data sync, and reporting sit behind the simulation environment. A 400-point parameter grid of different parameterizations over 9 years of end-of-day market data runs in ~1 second on an 8-core laptop.

To speed up the Build-Measure-Learn loop for algorithmic trading strategies, we need two things:

An accessible framework that people are already comfortable with and that is well-established: Gym.
A fast programming language with fearless concurrency, so we can natively run many different strategies in parallel: Rust.

The motivation for Chapaty was to unify these worlds in an open-source project, so that everyone has access to a framework for developing algorithms without having to focus on infrastructure.

Demo

End-to-end workflow: Running make run in the chapaty-template project executes the example strategy from this blog post and produces a QuantStats tearsheet.

The template repo has a make run quick-start that reproduces the demo above. It also ships with prompt templates for LLM-assisted development with GitHub-Copilot.

Chapaty Template, a quick start showcasing the power of Chapaty with GitHub Copilot
Chapaty Core Lib, the library I built over four years which has over 25k SLOC and more than 350 unit tests.
Chapaty Blogpost, a technical write-up about the design decisions I made.

The Comeback Story

I started building Chapaty four years ago. I was driven by my vision to have a standardized framework for algorithmic trading research that everyone can use. My development was a constant nights and weekends effort next to my full-time job as a software engineer. This made the project steadily grow to over 25,000 lines of code and more than 350 unit tests. The primary motivation to finally push it across the finish line this year was the feedback I got from testers and the ability to integrate my framework into modern LLM workflows. To complete the transition from a personal codebase to a polished open-source tool, I decided to stabilize the API, finalize the core library, publish it on crates.io, and build a 60-second starter template that integrates with GitHub Copilot, so anybody can use my software out of the box.

My Experience with GitHub Copilot

With GitHub CoPilot you don't need to be a Rust expert. Describe your strategy in plain English, and instruct GitHub-Copilot to generate the Rust code. Backtesting is a process that takes traders months. With Chapaty and GitHub Copilot it is now possible in under 5 minutes. Try it yourself with the official Chapaty Template!

The process is the following:

Pass the AI.md to GitHub Copilot.
Describe your trading strategy in plain English.
Let GitHub Copilot implement the trading strategy for you.
Run make run and have the full backtest results of your trading strategy.

Top comments (3)

mote • Jun 10

Gym-style API for backtesting is a natural fit — RL researchers already think in reset/step/act, and the Markov property maps cleanly to portfolio state transitions. The 1-second grid search across 400 params on 8 cores is where Rust really shines: Python would be minutes, maybe hours with that combinatorial explosion.

One design question: the Gym API assumes discrete time steps (each step moves one bar forward). Real market microstructure has asynchronous events (limit fills, cancelations, partial executions at different timestamps). How does Chapaty handle sub-bar order events? In my experience, backtesting frameworks either oversimplify (fill at next bar open) or explode in complexity (event-driven order books). Curious where Chapaty lands on that spectrum.

Len • Jun 10

Thanks for your comment. Yes, RL researcher think in reset/step/act. I didn’t talk about Markov properties neither did I mention portfolio state transitions. The Markov property is interesting for people implementing trading agents. In a future version I want to add a gym/portfolio environment next to my existing gym/trading environment. This version only covers trade state transitions of N different trades, produced by M different agents.

Regarding your two questions: Both of your questions are covered in my longer blogpost (see Design Decisions section).

The framework is data agnostic. It can handle any data with a point in time timestamp. The runtime of a backtest scales linear with the data to be processed. The different event timestamps caused by asynchronous events are handled by timeframe synchronisation. The simulation steps through time by selecting the next strictly monotonically increasing point_in_time timestamp across all data sources. It then extends the view on the data up to this point in time.
Currently realism on fills and slippage is still pragmatic rather than microstructure-accurate. I assume that orders are filled at their stop loss, take profit or entry prices. To support microstructure-accurate data I first need the market data. Once I have the market data I would extend the matching engine. It already handles partial fills, this means there is no explosion in complexity. Therefore, Chapaty lands on the "fill at next bar open" spectrum. Extending it to order book level granularity will not be an explosion in complexity. It is a natural extension of the existing matching logic I already built.

I hoped my explanation helped. I'm happy to clarify or answer any follow-ups. Thanks for asking. Those are great observations you made!

Valentyn Kit • Jun 25

The Gym-style act/step split is a clean fit for strategy research. The part I'd push on is the matching engine: EOD bars hide the fills that actually decide whether a strategy is real, so order-matching fidelity (partial fills, slippage, queue position) is usually where backtests quietly lie. How does Chapaty model fills between bars, and is a run
bit-for-bit reproducible across the 400-point grid? Reproducibility is the thing I'd want nailed down before trusting a parameter sweep.