Zmey56

Posted on Feb 21

Building an AI-Driven Arbitrage Intelligence: Rust, ClickHouse, and MCP

#ai #rust #programming

Let’s be honest: modern trading isn’t a problem of data scarcity — it’s a problem of data overload and the speed of interpretation. We live in an era where millions of tick rows are generated every second, but when you ask Gemini or Claude to analyze a market situation, you hit an “intellectual dead end.”

The issue isn’t that the models are “dumb.” The problem lies in the architectural gap:

Hallucinations: In financial data, this is fatal. A single hallucinated zero in an asset price turns a strategy into a catastrophe.
Stale Data: LLMs lack live data. A model’s training set is always “yesterday,” while the crypto market lives in the “now.”
API Bottlenecks: Traditional APIs are a narrow throat. You can’t simply “stuff” terabytes of historical data from ClickHouse into a chatbot’s context window via a standard JSON export. It’s expensive, slow, and inefficient.

The MCP Revolution

The solution we’ll be using is the Model Context Protocol (MCP). This is the standard that, by 2025–2026, finally buried the “crutches” of endless prompts filled with copy-pasted data. MCP allows a model not just to know data, but to have access to it. It’s like giving the AI a key to your analytical laboratory rather than just a book of answers. It is secure, typed, and real-time.

The Engineering Stack

In this project, we aren’t building a “money button.” We’re doing serious engineering, using a stack that commands respect:

Go: Chosen for predictable performance, strong typing, and honest multi-threading to handle high-velocity data streams.
ClickHouse: The uncompromising database for OLAP tasks. If you need to find an abnormal spread among a billion records from the last six months in under 200 milliseconds, there are almost no alternatives.
Gemini / Claude: The top layer acting as a high-level analyst, capable of connecting anomalies in liquidity pools with broader market context.

Our Goal: A system that handles a simple query like: “Find abnormal spreads on the SOL/USDC pair between Raydium and Binance over the last 15 minutes and explain if this was caused by slippage” — and returns not just raw numbers, but an engineer-validated conclusion.

We are building a bridge between the “cold iron” of the database and the flexible intelligence of the neural network, using MCP as the interface of trust.

System Architecture

Designing systems involving LLMs often slides into one of two extremes: either we try to “brute-force” the model by feeding it all the data via a prompt (expensive and inefficient), or we rely on “Text-to-SQL,” where the model writes its own queries to the database (dangerous and unstable). In our arbitrage-intel-mcp project, we take a third path: strictly typed tools via the Model Context Protocol.

The architecture is built on the principle of separation of concerns: the AI is responsible for intent and interpretation, Go handles the business logic and security, and ClickHouse performs the heavy computation on time-series data.

How It Works in Practice

The entire interaction cycle can be broken down into five distinct stages:

The Intent: The user tells Claude or Gemini: “Find spreads above 1% on Solana over the last hour.” The model doesn’t guess the answer. It looks at its arsenal of Tools provided by our MCP server.
The Tool Call: The AI generates a structured JSON request. This isn’t an abstract chat response; it’s a specific function call, such as get_top_spreads with parameters {"min_spread": 1.0, "network": "solana"}.
The Bridge: This is where Go steps in. Our server receives the call, validates the parameters (protecting ClickHouse from SQL injections or malformed queries), and translates the intent into optimized SQL. In this setup, Go acts as a controlled abstraction layer rather than just a simple proxy.
The Heavy Lifting: ClickHouse executes the query. Thanks to the MergeTree engine and proper indexing on the detected_at column, it sifts through millions of records and returns a compact JSON result. We don’t pass raw logs to the AI—we pass only the "essence."
The Reasoning: Once it receives the data, the AI becomes more than just an interface. It correlates the numbers, notices that a spread might be caused by low pool volume (Slippage), and provides the user with an engineered verdict rather than a dry table of data.

Implementation in Code

In the arbitrage-intel-mcp repository, this logic is centralized in the transport/mcp layer. We aren't writing complex parsing logic; instead, we define a contract that the AI is required to follow.
Here is how a tool registration looks in Go:

func (s *MCPServer) RegisterArbitrageTools() {
    s.server.RegisterTool("get_top_spreads", "Fetch current arbitrage opportunities by spread percentage", 
        func(ctx context.Context, args struct {
            MinSpread Float64 `json:"min_spread"`
            Network   string  `json:"network"`
        }) (interface{}, error) {
            opportunities, err := s.analyzer.FindHighSpreads(ctx, args.MinSpread, args.Network)
            if err != nil {
                return nil, fmt.Errorf("failed to analyze spreads: %w", err)
            }
            return opportunities, nil
        },
    )
}

Why This Stack?

If you’ve ever tried debugging LangChain chains, you know how quickly they can turn into a “black box.” Using Rust and MCP gives us predictability. We can see exactly which tool the AI triggered, what data it received, and how it interpreted it.

ClickHouse solves the scalability problem. Arbitrage is a game of milliseconds. While a standard database is still calculating an average spread, the window of opportunity has already closed. ClickHouse handles this on the fly, allowing our AI analyst to operate on live market slices rather than “historical dust.”

The result is a system where the AI doesn’t hallucinate about prices — it works with real facts provided by a robust backend. In the next section, we’ll dive into how to prepare your data in ClickHouse to ensure these queries execute instantly.

ClickHouse Setup

In trading, there is a strong temptation to store everything in the database: every tick, every order book change, and every liquidity update. But when we are building a system where the “brain” is an LLM, this approach creates more noise than value. For our AI Analyst, we deliberately chose a strategy of storing already processed events (Opportunities) rather than raw data streams. This helps conserve the model’s context window: instead of forcing the AI to aggregate millions of rows, we feed it “distilled” data — anomalies that have already been detected and are ready for high-level reasoning.

The data schema in ClickHouse should be concise and optimized for fast time-based slices. This is what our base contract looks like:

CREATE TABLE arbitrage_opportunities (
    exchange_pair String,
    spread_pct Float64,
    volume_usd Float64,
    detected_at DateTime64(3, 'UTC'),
    chain String
) ENGINE = MergeTree()
ORDER BY (detected_at, exchange_pair);

Why this design?

A few notes on the engineering choices. We use DateTime64(3) because in 2026, arbitrage on second-level intervals is already a thing of the past; milliseconds matter to understand the true sequence of events and to avoid missing the moment when an opportunity window closes. The ORDER BY starts with detected_at, and this is critical: 99% of AI queries via MCP will sound like “what happened in the last N minutes?”. This field order allows ClickHouse to instantly skip irrelevant data blocks on disk without scanning the entire table.

MergeTree Engine: The Industry Standard

As the storage engine, we use MergeTree. For those coming from traditional relational databases: forget heavy indexes and slow INSERTs. ClickHouse writes data in parts and merges them in the background. This gives us massive write throughput and, more importantly for analytics, insane performance for aggregate queries.

For our Rust backend, this means that even with billions of records in the table, the AI will get an answer to a query like “average spread over the last week” in a fraction of a second. We are building a solid foundation where the database handles all the heavy lifting of metric computation, leaving the AI to play the role of an intelligent interpreter.

Implementing the MCP Server in Rust

The transition from concept and database to living code is the moment when architectural abstractions collide with performance reality. Why did I choose Rust to implement the MCP server in the arbitrage-intel-mcp project, despite being a long-time Go fan? The answer is simple: determinism.

In arbitrage and data analytics, latency matters. In Go, we are always balancing development speed against Garbage Collector pauses. In Rust, we pay an “education tax” (the Borrow Checker), but in return we get a binary with predictable memory usage and async behavior that does not produce runtime surprises. When your AI agent requests analysis over a million rows, you want to be sure the backend won’t decide to clean up the heap at that exact moment.

Architecture: Clean Code on Rust Steroids

To prevent the project from turning into a bowl of spaghetti made of SQL queries and JSON parsers, we strictly follow clean architecture principles. In Rust, this is naturally expressed through the module system, which allows us to isolate business logic from implementation details.

arbitrage-intel-mcp/
└── src/
    ├── main.rs            # Composition: wiring dependencies and starting the server
    ├── domain/            # “Holy of holies”: data models and business rules
    ├── infrastructure/    # Dirty work: ClickHouse drivers and MCP specifics
    └── usecase/           # Orchestration: scenarios like “find spread”, “compare prices”

Why does this matter? If tomorrow MCP 2.0 appears or we decide to migrate from ClickHouse to TimescaleDB, we won’t have to rewrite the analysis logic. We’ll just swap out a file in infrastructure.

In Go, we would express this using interfaces:

type OpportunityRepo interface {
    GetTopSpreads(limit int) ([]Opportunity, error)
}

In Rust, we use traits, which give us the same flexibility but with zero-cost abstractions.

MCP as a Trust Boundary

The biggest mistake when working with LLMs is giving the model direct access to the database. MCP (Model Context Protocol) solves this by acting as a controlled gateway. Our Rust server doesn’t merely forward queries; it defines the “tools” that the AI is allowed to see.

When the AI wants data, it calls get_top_spreads. For the model, this is a black box. Inside the Rust server, the real work happens: we validate input parameters, check types, and only then delegate to the usecase layer.

Implementation: The Heart of the Server

Below is a fragment of the MCP server implementation that ties all layers together. We use the tokio async runtime and serde typing to process the JSON-RPC messages on which the protocol is based.

use async_trait::async_trait;
use serde_json::{json, Value};
use crate::usecase::analyzer::ArbitrageAnalyzer;
pub struct McpHandler {
    analyzer: ArbitrageAnalyzer,
}
impl McpHandler {
    pub async fn handle_tool_call(&self, tool_name: &str, arguments: Value) -> anyhow::Result<Value> {
        match tool_name {
            "get_top_spreads" => {
                // Extract and validate arguments from JSON
                let limit = arguments["limit"].as_i64().unwrap_or(10) as usize;

                // Invoke business logic
                let opportunities = self.analyzer.find_high_profit_windows(limit).await?;

                // Return a response understandable by the LLM
                Ok(json!({ "opportunities": opportunities }))
            },
            _ => Err(anyhow::anyhow!("Unknown tool: {}", tool_name)),
        }
    }
}

Interacting with ClickHouse

The infrastructure/clickhouse.rs layer is responsible for executing heavy queries. Thanks to the clickhouse-rs Rust driver, we get streaming data processing. We don’t just embed raw SQL directly in server methods — we encapsulate it inside repositories.

The key point here is preparing data for the LLM context. LLMs are bad at digesting raw table dumps. That’s why, in the Rust layer, we aggregate data so that the model receives a concise answer: not 1,000 tick rows, but the top 5 anomalies with computed Z-scores and liquidity.

impl ClickHouseRepository {
    pub async fn fetch_anomalies(&self, min_spread: f64) -> Result<Vec<ArbitrageOpportunity>> {
        let sql = "SELECT * FROM arbitrage_opportunities WHERE spread_pct > ? ORDER BY detected_at DESC LIMIT 10";
        let mut cursor = self.client.query(sql).bind(min_spread).fetch::<ArbitrageOpportunity>()?;

        let mut result = Vec::new();
        while let Some(row) = cursor.next().await? {
            result.push(row);
        }
        Ok(result)
    }
}

The Rust implementation gives us something no Python script or simple prompt ever could: control. We’ve built a system where the AI analyst operates within strict, typed boundaries. It can request data and analyze it, but it can never execute DROP TABLE or “hallucinate” a database schema, because response structures are rigidly defined in our Rust models.

At this stage, arbitrage-intel-mcp evolves from a set of ideas into an engineering platform. We’ve achieved safety through types, speed through ClickHouse, and flexibility through MCP. In the final part, we’ll look at how to package all of this with Docker and feed the configuration to your favorite AI client.

AI Integration: How the Model Becomes an Analyst

When the Rust backend is ready and ClickHouse is filled with data, there’s one final step left — to “open up” this data to AI. In the world of traditional web development, we would start designing a REST or gRPC API, writing Swagger documentation, and thinking about role-based access control. In the world of MCP (Model Context Protocol), we take a different path: we describe a contract that an AI client (whether it’s Claude Desktop or Gemini) uses to launch our binary as a child process.

The AI “sees” our server through a configuration file called mcp-config.json. For it, this is not just a list of endpoints, but a declarative description of tools (Tools) with clear instructions on when and why to call them. If in Go we’re used to interfaces as a way to isolate logic, here MCP acts as the interface between deterministic code and a probabilistic model. We strictly limit the access surface: the AI cannot “wander” through the entire database — it only sees the methods we explicitly exported in our Rust code.

Configuration: A Handshake Between AI and Rust

For Claude or Gemini to launch our analytics engine, the server description must be added to the configuration file (for example, in the Claude Desktop settings directory). It looks like a process launch instruction:

{
  "mcpServers": {
    "arbitrage-intel": {
      "command": "/path/to/arbitrage-intel-mcp",
      "args": [],
      "env": {
        "CLICKHOUSE_HOST": "localhost",
        "CLICKHOUSE_PORT": "9000",
        "CLICKHOUSE_DB": "arbitrage_db"
      }
    }
  }
}

As soon as the AI loads this config, it calls our listTools method (which we implemented in infrastructure/mcp.rs). From that moment on, the model “knows” that it can request spreads using the get_top_spreads tool.

From Prompt to SQL Query

Imagine the following scenario: you type in the chat: “Analyze the last 10 trades for the BTC/USDT pair between Binance and Bybit. Do you see a pattern of spread widening before news releases?”

The AI does not try to guess the answer. It runs an internal reasoning cycle (Chain of Thought):

Intent identification: The user needs BTC/USDT data.
Tool selection: The list of available tools includes get_top_spreads.
Call construction: The model generates a JSON-RPC request:{"method": "callTool", "params": {"name": "get_top_spreads", "arguments": {"limit": 10}}}.

This request arrives at the stdin of our Rust application. The infrastructure layer receives it, the usecase layer processes it, and ClickHouse returns the result. The key point is that the AI receives not raw bytes, but structured data.

Response Synthesis

After receiving JSON data from ClickHouse, the AI acts as the final aggregator. It correlates timestamps (our detected_at) with its internal knowledge of news release times. As a result, you don’t get a table — you get an analytical report: “Yes, I observe a spread increase to 0.8% two minutes before the inflation data release. This was accompanied by a surge in volume on Bybit...”

For us, as engineers, the critical point is that we fully control this flow. If the model attempts to request data it doesn’t have access to via the available Tools, it will receive an error at the Rust code level. In Go, we would call this strict validation at the system boundary. Here, it’s the only way to make AI interaction safe and production-grade reproducible.

Case Study: An AI Agent in Action

The main problem with classic trading bots is their “tunnel vision.” If the code contains a condition like if spread > 1.0% { execute() }, the bot will press the button even if the order book liquidity is only one hundred dollars. As a result, slippage will eat up all the profit, leaving us with losses and fees. Our AI Analyst, operating via the MCP protocol, solves exactly this problem by adding a layer of “common sense” on top of the raw data from ClickHouse.

Let’s look at a real scenario. An event is written into the arbitrage_opportunities table: the SOL/USDC pair, with the spread between Raydium and Binance increasing to 1.5%. A regular Go or Python script would already have sent the transaction. But our agent, after retrieving the data through the get_top_spreads tool, sees a broader picture.

In the usecase/analyzer.rs layer, we prepared the data so that the agent sees not only the percentage, but also the market depth:

#[derive(Serialize)]
pub struct AnalysisResult {
    pub spread_pct: f64,
    pub volume_usd: f64,
    pub liquidity_depth: f64, 
    pub risk_score: u8,       
}

impl ArbitrageAnalyzer {
    pub async fn analyze_opportunity(&self, opp: ArbitrageOpportunity) -> AnalysisResult {
        let risk = if opp.volume_usd < 500.0 { 80 } else { 10 }; 
        AnalysisResult {
            spread_pct: opp.spread_pct,
            volume_usd: opp.volume_usd,
            liquidity_depth: opp.volume_usd * 0.8, 
            risk_score: risk,
        }
    }
}

When the AI model analyzes this result, it doesn’t just see 1.5%. It correlates that with volume_usd: 200.0. In the Claude or Gemini console, we would see the following output:
“I’ve detected a 1.5% arbitrage window on the SOL/USDC pair. However, the liquidity pool volume is only $200. At this size, the expected slippage would exceed 2%, completely offsetting the potential profit. Recommendation: skip the trade.”

This is the fundamental difference in our approach. In Go development, we would call this complex business validation or a set of filters. With Rust + MCP, we delegate the interpretation of edge cases to a model capable of accounting for dynamic context. We’re not just storing logs — we’re giving the system the ability to “understand” why a specific number in ClickHouse at a given moment represents a trap rather than an opportunity.

That’s what transforms our project from a simple database connector into a полноценный AI assistant that saves money by filtering out low-quality trades.

Conclusion and GitHub

In summary, the MCP protocol fundamentally changes the rules of the game: it transforms a static database into a полноценный, long-term “memory” layer for AI. By choosing the Rust + ClickHouse + MCP stack, we achieved an architecture where each component performs its role with maximum efficiency. ClickHouse relentlessly processes massive volumes of tick data, the LLM takes on the role of an analyst evaluating risks and context, and Rust serves as a solid, deterministic bridge between them.
In the Go ecosystem, building clean architecture and implementing dependency injection (DI) typically relies on interfaces, while garbage collector pauses are accepted as a given. In Rust, we achieve the same level of modularity through traits — gaining strict type safety and predictable performance without runtime overhead in return.
As a result, we didn’t build yet another demo wrapper around the ChatGPT API, but a reliable foundation ready for production-level high-frequency trading workloads. The entire source code — from domain models to server configuration — is open and waiting for your pull requests. Take the architecture, study the implementation, and adapt it to your own trading strategies in the project repository: https://github.com/Zmey56/arbitrage-intel-mcp.

DEV Community