mihir mohapatra

Posted on May 31 • Edited on Jun 6

I Built a Production-Oriented Multi-Provider AI Chatbot in Rust — Here's How

#rust #backend #ai #llm

Most AI chatbot tutorials reach for Python. FastAPI, LangChain, a quick requests.post — done in 20 minutes. And that's fine for prototyping. But when I wanted to build something I'd actually put behind a real API — something with proper async concurrency, typed errors, and zero GC pauses — I reached for Rust instead.

This is a writeup of chatbot, a production-oriented Rust backend that unifies Claude, OpenAI, and Ollama behind a single interface — with a Web UI, CLI mode, and Docker support baked in.

Why Rust for an AI Backend?

It's a fair question. LLM API calls are network-bound, so why does the backend language even matter?

A few reasons:

Predictable latency. No GC pauses under load means P99 response times stay stable when you're handling dozens of concurrent conversations.
Memory efficiency. Each async task in Tokio is dramatically cheaper than a Python thread. You serve more users per instance.
Type safety. Rust's ownership model makes API contracts explicit. You can't accidentally share mutable conversation state across requests — the compiler enforces it.
Long-term maintainability. Explicit types and explicit error handling make the code self-documenting.

For LLM apps specifically: yes, 95% of your wall-clock time is waiting for the model to respond. But the other 5% — routing, state management, provider selection, connection handling — is all yours to control. Rust makes that part bulletproof.

What It Does

Connects to Claude (Anthropic), OpenAI, or Ollama (local) via a unified chat interface
Serves a Web UI at http://localhost:8080 by default, or runs in CLI mode
Maintains per-session conversation history in memory
Supports runtime provider switching via a connect-first flow
Ships with a Dockerfile for containerized deployment
Supports OpenRouter as an OpenAI-compatible provider

Architecture Overview

User (Browser or CLI)
        │
        ▼
  Axum HTTP Server (web.rs)
        │
        ├──▶ Conversation State (Arc<Mutex<Vec<Message>>>)
        │
        └──▶ Runtime Config (config.rs)
                    │
                    ▼
            ChatClient (client.rs)
                    │
           ┌────────┼────────┐
           ▼        ▼        ▼
        Claude   OpenAI   Ollama
         API      API      API

The project has a clean five-module layout in src/:

src/
├── main.rs          # Startup routing + CLI loop
├── config.rs        # Provider enum + env/runtime config
├── client.rs        # Provider-specific HTTP clients
├── conversation.rs  # In-memory chat state model
└── web.rs           # Axum routes, connect flow, chat API

Each module has exactly one responsibility. No god objects, no tangled imports.

The Provider Abstraction

The heart of the project is client.rs. Instead of sprinkling provider-specific logic everywhere, all outbound AI calls go through a single ChatClient that dispatches based on the active provider.

The key insight: Claude uses the Anthropic native API format, while OpenAI and Ollama both speak the OpenAI-compatible schema. Separating these two code paths keeps the provider logic honest — you're not faking compatibility where there isn't any.

// Simplified concept from client.rs
pub enum Provider {
    Claude,
    OpenAI,
    Ollama,
}

pub struct ChatClient {
    pub provider: Provider,
    pub model: String,
    pub base_url: String,
    pub api_key: Option<String>,
    pub max_tokens: u32,
    pub system_prompt: String,
    pub http: reqwest::Client,
}

When you send a message, the client picks the right HTTP contract:

pub async fn send(&self, messages: &[Message]) -> Result<String> {
    match self.provider {
        Provider::Claude => self.send_claude(messages).await,
        Provider::OpenAI | Provider::Ollama => self.send_openai_compat(messages).await,
    }
}

This means adding a new provider (Gemini, Cohere, etc.) in the future is a matter of adding one arm and one method — the rest of the application stays untouched.

Shared Conversation State

Multi-turn chat requires persistent message history. In Rust's async model, sharing state across request handlers requires explicit synchronization. The project does this with Arc<Mutex<...>>, Rust's standard pattern for shared mutable state:

// conversation.rs - shared across all handlers
pub type SharedConversation = Arc<Mutex<Vec<Message>>>;

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Message {
    pub role: String,    // "user" or "assistant"
    pub content: String,
}

The Arc makes the conversation cloneable across Axum handlers (each handler runs in its own async task), and the Mutex ensures only one handler touches the history at a time. No race conditions, guaranteed by the type system.

Dual-Mode: Web UI + CLI

The app launches in Web UI mode by default, but also supports a terminal workflow:

# Default: serves Web UI at http://localhost:8080
cargo run

# CLI mode: interactive terminal chat
cargo run -- cli

# Explicit web mode on a custom port
PORT=3000 cargo run -- web

This is useful in different contexts — the Web UI for demos and sharing, the CLI for scripting and piping into other tools.

Quick Start

Option A: No Rust Required

Download the prebuilt Windows executable (v1.0.1) and run it:

.\chatbot.exe
# Opens http://localhost:8080

Option B: From Source

# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# 2. Clone and configure
git clone https://github.com/MihirMohapatra/chatbot.git
cd chatbot
cp .env.example .env

# 3. Run
cargo run

Provider Configuration

Set your provider in .env:

Claude (Anthropic)

PROVIDER=claude
ANTHROPIC_API_KEY=sk-ant-...

OpenAI

PROVIDER=openai
OPENAI_API_KEY=sk-...

Ollama (Local — no API key needed)

# Pull a model first
ollama pull llama3.2

PROVIDER=ollama
MODEL=llama3.2

OpenRouter (access any model via OpenAI-compatible API)

PROVIDER=openai
OPENAI_API_KEY=sk-or-...
BASE_URL=https://openrouter.ai/api
MODEL=anthropic/claude-sonnet-4

All environment variables:

Variable	Default	Description
`PROVIDER`	`claude`	`claude`, `openai`, `ollama`
`ANTHROPIC_API_KEY`	—	Required for Claude
`OPENAI_API_KEY`	—	Required for OpenAI
`MODEL`	provider default	Override the model name
`BASE_URL`	provider default	Override the API endpoint
`MAX_TOKENS`	`1024`	Response token cap
`SYSTEM_PROMPT`	built-in	Custom assistant behavior

Docker

docker build -t chatbot .
docker run -it --rm -v .env:/data/.env chatbot

For Ollama with host networking:

docker run -it --rm --network host -v .env:/data/.env chatbot

Rust vs Python: Backend Performance Perspective

For backend workloads like this (concurrent HTTP + JSON + state management), Rust consistently outperforms Python across the metrics that matter in production:

Metric	Rust	Python	Why It Matters
Throughput (req/sec)	Higher	Lower	More concurrent users per instance
P95/P99 latency	Lower under load	Higher under load	More stable response times
Memory per worker	Lower	Higher	Better infra cost and density
CPU efficiency	Higher	Lower	More headroom before scaling out

Note: For LLM apps, model/API network time dominates total latency. But Rust still wins on concurrency behavior, memory footprint, and server efficiency — which directly impacts cost and reliability at scale.

What's Next (Roadmap)

Streamed responses — token-by-token streaming via SSE
Persistent chat history — SQLite or Postgres backend
Metrics + tracing — tracing crate + OpenTelemetry integration
Integration tests for provider adapters

Try It / Contribute

The project is open source and MIT licensed:

👉 github.com/MihirMohapatra/chatbot

If you're exploring Rust for backend systems, or building something that needs to talk to multiple AI providers without writing boilerplate for each one, this is a good starting point. Issues and PRs welcome.

Built with Rust 1.80+, Tokio, Axum, reqwest, serde, and anyhow.

Top comments (1)

Harjot Singh • Jun 1

multi-provider + production-oriented + Rust is a serious combo, the provider-abstraction layer alone is a lot of careful work. that resilience (assume any provider fails) is exactly the mindset behind Moonshift's harness, where agents build + deploy + market a SaaS overnight and the pipeline routes/gates around provider hiccups. impressive build. first run's free if you want to compare the orchestration approach.