Most AI chatbot tutorials reach for Python. FastAPI, LangChain, a quick requests.post — done in 20 minutes. And that's fine for prototyping. But when I wanted to build something I'd actually put behind a real API — something with proper async concurrency, typed errors, and zero GC pauses — I reached for Rust instead.
This is a writeup of chatbot, a production-oriented Rust backend that unifies Claude, OpenAI, and Ollama behind a single interface — with a Web UI, CLI mode, and Docker support baked in.
Why Rust for an AI Backend?
It's a fair question. LLM API calls are network-bound, so why does the backend language even matter?
A few reasons:
- Predictable latency. No GC pauses under load means P99 response times stay stable when you're handling dozens of concurrent conversations.
- Memory efficiency. Each async task in Tokio is dramatically cheaper than a Python thread. You serve more users per instance.
- Type safety. Rust's ownership model makes API contracts explicit. You can't accidentally share mutable conversation state across requests — the compiler enforces it.
- Long-term maintainability. Explicit types and explicit error handling make the code self-documenting.
For LLM apps specifically: yes, 95% of your wall-clock time is waiting for the model to respond. But the other 5% — routing, state management, provider selection, connection handling — is all yours to control. Rust makes that part bulletproof.
What It Does
- Connects to Claude (Anthropic), OpenAI, or Ollama (local) via a unified chat interface
- Serves a Web UI at
http://localhost:8080by default, or runs in CLI mode - Maintains per-session conversation history in memory
- Supports runtime provider switching via a connect-first flow
- Ships with a Dockerfile for containerized deployment
- Supports OpenRouter as an OpenAI-compatible provider
Architecture Overview
User (Browser or CLI)
│
▼
Axum HTTP Server (web.rs)
│
├──▶ Conversation State (Arc<Mutex<Vec<Message>>>)
│
└──▶ Runtime Config (config.rs)
│
▼
ChatClient (client.rs)
│
┌────────┼────────┐
▼ ▼ ▼
Claude OpenAI Ollama
API API API
The project has a clean five-module layout in src/:
src/
├── main.rs # Startup routing + CLI loop
├── config.rs # Provider enum + env/runtime config
├── client.rs # Provider-specific HTTP clients
├── conversation.rs # In-memory chat state model
└── web.rs # Axum routes, connect flow, chat API
Each module has exactly one responsibility. No god objects, no tangled imports.
The Provider Abstraction
The heart of the project is client.rs. Instead of sprinkling provider-specific logic everywhere, all outbound AI calls go through a single ChatClient that dispatches based on the active provider.
The key insight: Claude uses the Anthropic native API format, while OpenAI and Ollama both speak the OpenAI-compatible schema. Separating these two code paths keeps the provider logic honest — you're not faking compatibility where there isn't any.
// Simplified concept from client.rs
pub enum Provider {
Claude,
OpenAI,
Ollama,
}
pub struct ChatClient {
pub provider: Provider,
pub model: String,
pub base_url: String,
pub api_key: Option<String>,
pub max_tokens: u32,
pub system_prompt: String,
pub http: reqwest::Client,
}
When you send a message, the client picks the right HTTP contract:
pub async fn send(&self, messages: &[Message]) -> Result<String> {
match self.provider {
Provider::Claude => self.send_claude(messages).await,
Provider::OpenAI | Provider::Ollama => self.send_openai_compat(messages).await,
}
}
This means adding a new provider (Gemini, Cohere, etc.) in the future is a matter of adding one arm and one method — the rest of the application stays untouched.
Shared Conversation State
Multi-turn chat requires persistent message history. In Rust's async model, sharing state across request handlers requires explicit synchronization. The project does this with Arc<Mutex<...>>, Rust's standard pattern for shared mutable state:
// conversation.rs - shared across all handlers
pub type SharedConversation = Arc<Mutex<Vec<Message>>>;
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Message {
pub role: String, // "user" or "assistant"
pub content: String,
}
The Arc makes the conversation cloneable across Axum handlers (each handler runs in its own async task), and the Mutex ensures only one handler touches the history at a time. No race conditions, guaranteed by the type system.
Dual-Mode: Web UI + CLI
The app launches in Web UI mode by default, but also supports a terminal workflow:
# Default: serves Web UI at http://localhost:8080
cargo run
# CLI mode: interactive terminal chat
cargo run -- cli
# Explicit web mode on a custom port
PORT=3000 cargo run -- web
This is useful in different contexts — the Web UI for demos and sharing, the CLI for scripting and piping into other tools.
Quick Start
Option A: No Rust Required
Download the prebuilt Windows executable (v1.0.1) and run it:
.\chatbot.exe
# Opens http://localhost:8080
Option B: From Source
# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# 2. Clone and configure
git clone https://github.com/MihirMohapatra/chatbot.git
cd chatbot
cp .env.example .env
# 3. Run
cargo run
Provider Configuration
Set your provider in .env:
Claude (Anthropic)
PROVIDER=claude
ANTHROPIC_API_KEY=sk-ant-...
OpenAI
PROVIDER=openai
OPENAI_API_KEY=sk-...
Ollama (Local — no API key needed)
# Pull a model first
ollama pull llama3.2
PROVIDER=ollama
MODEL=llama3.2
OpenRouter (access any model via OpenAI-compatible API)
PROVIDER=openai
OPENAI_API_KEY=sk-or-...
BASE_URL=https://openrouter.ai/api
MODEL=anthropic/claude-sonnet-4
All environment variables:
| Variable | Default | Description |
|---|---|---|
PROVIDER |
claude |
claude, openai, ollama
|
ANTHROPIC_API_KEY |
— | Required for Claude |
OPENAI_API_KEY |
— | Required for OpenAI |
MODEL |
provider default | Override the model name |
BASE_URL |
provider default | Override the API endpoint |
MAX_TOKENS |
1024 |
Response token cap |
SYSTEM_PROMPT |
built-in | Custom assistant behavior |
Docker
docker build -t chatbot .
docker run -it --rm -v .env:/data/.env chatbot
For Ollama with host networking:
docker run -it --rm --network host -v .env:/data/.env chatbot
Rust vs Python: Backend Performance Perspective
For backend workloads like this (concurrent HTTP + JSON + state management), Rust consistently outperforms Python across the metrics that matter in production:
| Metric | Rust | Python | Why It Matters |
|---|---|---|---|
| Throughput (req/sec) | Higher | Lower | More concurrent users per instance |
| P95/P99 latency | Lower under load | Higher under load | More stable response times |
| Memory per worker | Lower | Higher | Better infra cost and density |
| CPU efficiency | Higher | Lower | More headroom before scaling out |
Note: For LLM apps, model/API network time dominates total latency. But Rust still wins on concurrency behavior, memory footprint, and server efficiency — which directly impacts cost and reliability at scale.
What's Next (Roadmap)
- Streamed responses — token-by-token streaming via SSE
- Persistent chat history — SQLite or Postgres backend
-
Metrics + tracing —
tracingcrate + OpenTelemetry integration - Integration tests for provider adapters
Try It / Contribute
The project is open source and MIT licensed:
👉 github.com/MihirMohapatra/chatbot
If you're exploring Rust for backend systems, or building something that needs to talk to multiple AI providers without writing boilerplate for each one, this is a good starting point. Issues and PRs welcome.
Built with Rust 1.80+, Tokio, Axum, reqwest, serde, and anyhow.
Top comments (0)