Ian Sosunov

Posted on Sep 8 • Originally published at cxrtisxl.Medium on Sep 8

Latency Wars: The Architecture Of A Real-Time Trading Game

#softwareengineering #architecture #gamedev #latency

Overview

Last year, I challenged myself to develop a game. This article outlines my vision for the architecture of a real-time trading game, where timing is crucial.

Wolf Street — the game about paper trading. The MVP was a simple 30-second option with a grand plan to upgrade it to “real” paper trading after the launch.

For those unfamiliar with these financial derivatives, you can think of an option as a bet. The trader, or “player,” is simply trying to predict the price movement within a given time interval.

Here, I would like to convince you that you should not play binary options with real money. Most online platforms will just scam you.

Our idea was entirely different — we wanted to create a safe and honest space for trading without risking any real money. That’s how it should work: sponsors create branded trading tournaments, players participate in them with virtual tokens (which can’t be purchased with real money but can be acquired by playing the game), climb the leaderboard, and win real prizes.

Here’s how the game looked.

However, the implementation of this idea couldn’t be simple — to ensure the fairness of the game, we had to work with the BTC price in real-time, and the delays between players’ actions and their execution had to be minimal, since we were dealing with a binary option.

Let’s ask the right questions and build the architecture together!

Designing the Architecture

The market data will be streamed from polygon.io. All trades should be handled by the Game Engine, so in the simplest form, the architecture looks like this:

It might work for a POC, but there are a lot of issues if we’re talking about a production-quality game.

First of all, the Game Engine is responsible for everything: processing market data feeds, serving this data to clients, validating users’ actions, and handling trades. What could go wrong? Everything. Too many client connections, Polygon errors, game errors, and so on. The worst thing here is that any of the errors could potentially break everything else. Also, adding new game features will turn the codebase into a nightmare. Let’s split the Game Engine into several services, each serving its own purpose. Additionally, let’s add a database, as our current service doesn’t save any game progress.

Here, the price data originates from Polygon. Market Feed processes it, builds candles, and sends them to the Trade Engine and Client s via WebSocket. Core service is used for all non-trading-related activities, such as account management. We store user data and all the trades in a Postgres database.

When a player makes a trade, a request is sent to the Trade Engine. Trade Engine has the actual asset price since Market Feed streams it and can handle the trade properly.

Yet, we are far from the complete system. Let’s examine the current flow.

Please note that I have omitted some requests not directly related to the topic of handling a deal, such as those made during the service setup stage or those related to retrieving user account data.

See the issue? How should the client be notified that the trade has been closed and the balance has been updated? Long polling (periodic HTTP requests) is a potential solution here, but is it the best solution? We already have live updates on price provided by Market Feed, and now we want the same updates on user balance and trades.

Additionally, it’s crucial to consider high loads. What will happen if thousands or tens of thousands of clients connect to Market Feed? Will its performance degrade? Market Feed has an extremely important mission: to feed the system with live price data. Not only clients, but the heart of the game — Trade Engine. Managing Client s’ WebSocket connections is definitely outside the scope of this microservice, and any issues with handling them might potentially cause bugs related to price data processing.

Another thing to consider is what will happen if we update and redeploy the Market Feed service while the game is running? It will result in the loss of historical price data. Do you remember the screenshot from the game? We have 2 minutes of price history on the chart. Restarting the Market Feed will cause the bug on the UI for 2 minutes while fresh data is being formed to populate the history.

Let’s solve everything by adding a Redis for caching Market Feed price history and by adding a new microservice to handle WebSocket connections and serve all real-time updates to Client s. Also, I’d add an API microservice to simply create a unified API endpoint for all HTTP REST requests. We can also use it to proxy WS connections, but in the actual project, we’ve decided to separate WS from HTTP services.

Note that we now split our services into two groups: internal (in blue) and external, which includes Polygon and everything that the end user can access.

Let’s break down what’s happening here.

Internal

Market Feed Processes price stream from Polygon , saves the data to Redis, and serves it via WebSocket to Trade Engine and WS Notifier. That’s how these services always have the latest price.
Trade Engine Processes trade requests from the API. Get live price data via WebSocket from Market Feed. Stores trades data in Postgres.
Core Processes requests from the API related to the user’s account. Stores user data in Postgres.
Postgres A place where users and trade data are stored. It utilizes the Postgres NOTIFY mechanism to feed data updates to the WS Notifier.
Redis Stores price history from Market Feed. Direct read access from WS Notifier.

External

API A unified entry point for all HTTP REST API requests.
WS Notifier Responsible for serving Cent s live updates on price and user data (balance). It provides price history from Redis for the Client ’s first subscription and proxies live price updates from the Market Feed. Additionally, the service is subscribed to Postgres for trades and users’ data updates, and delivers them to the appropriate Client s.

Now, let’s take a look at the updated interaction diagram.

Facing Latencies

It looks like everything is just perfect now, but while trying to play my own game, I encountered a problem that ruined the entire experience! In options trading, timing is critical. Especially when we’re talking about 30-second options — the price is incredibly volatile on low timeframes.

So, playing the game in Southeast Asia with my game server in Frankfurt EU, I got a second or even two seconds of latency between clicking a button to open the trade and its actual execution. It was incredibly frustrating to lose deals solely because of this, and it happened frequently. You click “UP” expecting the price to go up, observe the spinner for 2 seconds or so while the price graph spikes, and then it shows you that you opened your deal at a totally different, much higher price than you wanted. A little dip and you lose. If you had opened the deal at the original price, you would have won. However, the game’s technical limitations simply did not allow it.

Making it Geo-Distributed

Let’s build a system that will allow players to open deals almost instantly. We need to fix the request marked orange on the previous interaction diagram. To achieve it, we need to bring game servers closer to the players. First, we need to decide which services should be geo-distributed. Obviously, Trade Engine. We decided to leave Core in just one replica in the EU, as it was primarily used during game loading and didn’t contribute significantly to latencies.

If we move the Trade Engine , we also need to move the API. There’s just zero sense in bringing Trade Engine closer to the end user if it is accessible only through an API that is not in the same location. Otherwise, it will cause even more latency with requests ping-ponging all over the world.

Now we have Trade Engine and API located closer to the end user. All that remains is to geo-distribute WS Notifier. Although it simply delivers data from the UE, we chose this for several reasons. The geo-distributed WS Notifier ensures that the client gets the nearest price to the Trade Engine. It also allows for better horizontal scaling based on regional loads.

Trade Engine and WS Notifier both get price data from Marketfeed , which is located in the EU. Now they both have the same base delay from the EU to their region. When a client connects to WS Notifier , price data divergence from the regional Trade Engine is now only caused by client-to- regional WS Notifier latency , not client-to- EU WS Notifier latency.

Now take a look at the previous schema again. There is a “save trade data” step in Trade Engine before returning trade data to the Client. This step requires Trade Engine to make a request to Postrgres in the EU, which is a very expensive action in terms of latency.

We need to do one more optimization — become optimistic! And by this, I mean we can consider the deal open even before all the checks are passed and before the data is stored in the DB. That will allow us to return the deal’s timestamp and price instantly, with a cost associated with the request delay between the Client and the Trade Engine. You can argue that it’s not safe since there might be issues during the checks and saving data to the database, but actually, this might occur only if someone tries to cheat the game, and we shouldn’t prioritize a smooth UI for these individuals. In this case, the deal will be marked as active on the client, but it will fail checks and won’t be executed on the backend. For those who play honestly, the frontend takes care of all validations, such as having only one active deal at a time, and all of them will be valid.

What about closing the deals? As you may recall, in our current system, the Trade Engine closes the deal, updates the data in Postgres, which notifies the WS Notifier, and then it sends the update to the appropriate Client. Since Postgres might be located far from the Client, this may take some time, but it’s actually not a problem at all. The problem was in opening the deal at the right time, but after that, it closes by Trade Engine in 30 seconds, and the client can wait 1–2 seconds (or so) more to get the result.

We run the 30-second timer on a Client adjusted for the deal opening time received from the Trade Engine. After it expires, we simply show a spinner, demonstrating that the deal was closed but we’re waiting for the result. Potentially, it could be optimized — the Trade Engine might send a notification to the WS Notifier once the deal is closed, before updating the state in Postgres. However, in the real game, it wasn’t a significant issue at all.

Let’s update the architecture schema and the interaction diagram.

I added colors to the arrows to illustrate client communication latency. Blue arrows represent low latency within a single region, while red arrows indicate higher latency due to requests crossing regional boundaries.

We addressed the issue, allowing deals to open almost instantly!

In this configuration, the game was published and performed extremely well even under high loads during tournaments.

Feel free to give it a try in Telegram: t.me/WolfStreetGameBot.

Thank you for reading.

I hope this article has offered valuable insights to help you approach project architecture with greater confidence and clarity. For more in-depth articles on tech and business, follow me on Medium and X.

X: @cxrtisxl

LinkedIn: Ian Sosunov

Medium: Ian Sosunov