DEV Community

Cover image for I Built a Production-Oriented Multi-Provider AI Chatbot in Rust — Here's How
mihir mohapatra
mihir mohapatra

Posted on

I Built a Production-Oriented Multi-Provider AI Chatbot in Rust — Here's How

Most AI chatbot tutorials reach for Python. FastAPI, LangChain, a quick requests.post — done in 20 minutes. And that's fine for prototyping. But when I wanted to build something I'd actually put behind a real API — something with proper async concurrency, typed errors, and zero GC pauses — I reached for Rust instead.

This is a writeup of chatbot, a production-oriented Rust backend that unifies Claude, OpenAI, and Ollama behind a single interface — with a Web UI, CLI mode, and Docker support baked in.


Why Rust for an AI Backend?

It's a fair question. LLM API calls are network-bound, so why does the backend language even matter?

A few reasons:

  • Predictable latency. No GC pauses under load means P99 response times stay stable when you're handling dozens of concurrent conversations.
  • Memory efficiency. Each async task in Tokio is dramatically cheaper than a Python thread. You serve more users per instance.
  • Type safety. Rust's ownership model makes API contracts explicit. You can't accidentally share mutable conversation state across requests — the compiler enforces it.
  • Long-term maintainability. Explicit types and explicit error handling make the code self-documenting.

For LLM apps specifically: yes, 95% of your wall-clock time is waiting for the model to respond. But the other 5% — routing, state management, provider selection, connection handling — is all yours to control. Rust makes that part bulletproof.


What It Does

  • Connects to Claude (Anthropic), OpenAI, or Ollama (local) via a unified chat interface
  • Serves a Web UI at http://localhost:8080 by default, or runs in CLI mode
  • Maintains per-session conversation history in memory
  • Supports runtime provider switching via a connect-first flow
  • Ships with a Dockerfile for containerized deployment
  • Supports OpenRouter as an OpenAI-compatible provider

Architecture Overview

User (Browser or CLI)
        │
        ▼
  Axum HTTP Server (web.rs)
        │
        ├──▶ Conversation State (Arc<Mutex<Vec<Message>>>)
        │
        └──▶ Runtime Config (config.rs)
                    │
                    ▼
            ChatClient (client.rs)
                    │
           ┌────────┼────────┐
           ▼        ▼        ▼
        Claude   OpenAI   Ollama
         API      API      API
Enter fullscreen mode Exit fullscreen mode

The project has a clean five-module layout in src/:

src/
├── main.rs          # Startup routing + CLI loop
├── config.rs        # Provider enum + env/runtime config
├── client.rs        # Provider-specific HTTP clients
├── conversation.rs  # In-memory chat state model
└── web.rs           # Axum routes, connect flow, chat API
Enter fullscreen mode Exit fullscreen mode

Each module has exactly one responsibility. No god objects, no tangled imports.


The Provider Abstraction

The heart of the project is client.rs. Instead of sprinkling provider-specific logic everywhere, all outbound AI calls go through a single ChatClient that dispatches based on the active provider.

The key insight: Claude uses the Anthropic native API format, while OpenAI and Ollama both speak the OpenAI-compatible schema. Separating these two code paths keeps the provider logic honest — you're not faking compatibility where there isn't any.

// Simplified concept from client.rs
pub enum Provider {
    Claude,
    OpenAI,
    Ollama,
}

pub struct ChatClient {
    pub provider: Provider,
    pub model: String,
    pub base_url: String,
    pub api_key: Option<String>,
    pub max_tokens: u32,
    pub system_prompt: String,
    pub http: reqwest::Client,
}
Enter fullscreen mode Exit fullscreen mode

When you send a message, the client picks the right HTTP contract:

pub async fn send(&self, messages: &[Message]) -> Result<String> {
    match self.provider {
        Provider::Claude => self.send_claude(messages).await,
        Provider::OpenAI | Provider::Ollama => self.send_openai_compat(messages).await,
    }
}
Enter fullscreen mode Exit fullscreen mode

This means adding a new provider (Gemini, Cohere, etc.) in the future is a matter of adding one arm and one method — the rest of the application stays untouched.


Shared Conversation State

Multi-turn chat requires persistent message history. In Rust's async model, sharing state across request handlers requires explicit synchronization. The project does this with Arc<Mutex<...>>, Rust's standard pattern for shared mutable state:

// conversation.rs - shared across all handlers
pub type SharedConversation = Arc<Mutex<Vec<Message>>>;

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Message {
    pub role: String,    // "user" or "assistant"
    pub content: String,
}
Enter fullscreen mode Exit fullscreen mode

The Arc makes the conversation cloneable across Axum handlers (each handler runs in its own async task), and the Mutex ensures only one handler touches the history at a time. No race conditions, guaranteed by the type system.


Dual-Mode: Web UI + CLI

The app launches in Web UI mode by default, but also supports a terminal workflow:

# Default: serves Web UI at http://localhost:8080
cargo run

# CLI mode: interactive terminal chat
cargo run -- cli

# Explicit web mode on a custom port
PORT=3000 cargo run -- web
Enter fullscreen mode Exit fullscreen mode

This is useful in different contexts — the Web UI for demos and sharing, the CLI for scripting and piping into other tools.


Quick Start

Option A: No Rust Required

Download the prebuilt Windows executable (v1.0.1) and run it:

.\chatbot.exe
# Opens http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Option B: From Source

# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# 2. Clone and configure
git clone https://github.com/MihirMohapatra/chatbot.git
cd chatbot
cp .env.example .env

# 3. Run
cargo run
Enter fullscreen mode Exit fullscreen mode

Provider Configuration

Set your provider in .env:

Claude (Anthropic)

PROVIDER=claude
ANTHROPIC_API_KEY=sk-ant-...
Enter fullscreen mode Exit fullscreen mode

OpenAI

PROVIDER=openai
OPENAI_API_KEY=sk-...
Enter fullscreen mode Exit fullscreen mode

Ollama (Local — no API key needed)

# Pull a model first
ollama pull llama3.2
Enter fullscreen mode Exit fullscreen mode
PROVIDER=ollama
MODEL=llama3.2
Enter fullscreen mode Exit fullscreen mode

OpenRouter (access any model via OpenAI-compatible API)

PROVIDER=openai
OPENAI_API_KEY=sk-or-...
BASE_URL=https://openrouter.ai/api
MODEL=anthropic/claude-sonnet-4
Enter fullscreen mode Exit fullscreen mode

All environment variables:

Variable Default Description
PROVIDER claude claude, openai, ollama
ANTHROPIC_API_KEY Required for Claude
OPENAI_API_KEY Required for OpenAI
MODEL provider default Override the model name
BASE_URL provider default Override the API endpoint
MAX_TOKENS 1024 Response token cap
SYSTEM_PROMPT built-in Custom assistant behavior

Docker

docker build -t chatbot .
docker run -it --rm -v .env:/data/.env chatbot
Enter fullscreen mode Exit fullscreen mode

For Ollama with host networking:

docker run -it --rm --network host -v .env:/data/.env chatbot
Enter fullscreen mode Exit fullscreen mode

Rust vs Python: Backend Performance Perspective

For backend workloads like this (concurrent HTTP + JSON + state management), Rust consistently outperforms Python across the metrics that matter in production:

Metric Rust Python Why It Matters
Throughput (req/sec) Higher Lower More concurrent users per instance
P95/P99 latency Lower under load Higher under load More stable response times
Memory per worker Lower Higher Better infra cost and density
CPU efficiency Higher Lower More headroom before scaling out

Note: For LLM apps, model/API network time dominates total latency. But Rust still wins on concurrency behavior, memory footprint, and server efficiency — which directly impacts cost and reliability at scale.


What's Next (Roadmap)

  • Streamed responses — token-by-token streaming via SSE
  • Persistent chat history — SQLite or Postgres backend
  • Metrics + tracingtracing crate + OpenTelemetry integration
  • Integration tests for provider adapters

Try It / Contribute

The project is open source and MIT licensed:

👉 github.com/MihirMohapatra/chatbot

If you're exploring Rust for backend systems, or building something that needs to talk to multiple AI providers without writing boilerplate for each one, this is a good starting point. Issues and PRs welcome.


Built with Rust 1.80+, Tokio, Axum, reqwest, serde, and anyhow.

Top comments (0)