Fortune Ndlovu

Posted on Jan 27

Build an AI Chatbot Backend in Rust: Step-by-Step Tutorial

#programming #rust #ai #gemini

What We're Building and Why

In this tutorial, we'll build a complete AI chatbot backend from scratch using Rust. You'll learn both Rust programming concepts and AI API integration as we create a REST API that connects to Google's Gemini AI.

Reference project: https://github.com/Fortune-Ndlovu/rust-ai-chatbot/tree/main

What is Rust?
Rust is a high-performance systems programming language designed by Graydon Hoare at Mozilla in 2006 as a safer, modern alternative to C++. It achieves memory safety and thread concurrency without a garbage collector by using a strict compiler that eliminates common bugs at compile time. You can explore its development history through the Rust Foundation.

What You'll Build:

A REST API server that accepts chat messages
Integration with Google Gemini AI (free tier available)
An interactive CLI client to chat with your bot
Proper error handling and input validation
A working project you can play with

AI Concepts You'll Learn:

How to integrate with Large Language Model (LLM) APIs
Understanding API request/response formats
Handling AI model responses and errors
Building a conversational interface

Prerequisites

Before starting this tutorial, you should have:

Rust installed - Install Rust (latest stable version recommended)
Basic terminal/command line knowledge - Comfortable running commands in terminal
Text editor or IDE - VS Code, IntelliJ IDEA, or any Rust-capable editor
Google account - For accessing Gemini API (free tier available)

Helpful but not required:

Basic programming experience (any language)
Familiarity with HTTP/APIs (we'll explain as we go)
Understanding of JSON format (we'll cover it)

Verifying Rust Installation:

rustc --version    # Should show Rust version
cargo --version    # Should show Cargo version

If you don't have Rust installed, follow the official installation guide. The installation includes both rustc (compiler) and cargo (package manager).

Let's get started!

Project Structure

Before we dive into the code, here's the complete project structure you'll be building:

rust-ai-chatbot/
├── Cargo.toml              # Project manifest: defines dependencies and metadata
├── Cargo.lock              # Lock file: pins exact dependency versions (auto-generated)
├── .env                    # Environment variables: stores your Gemini API key (not in git)
├── .env.example            # Template for .env file: shows required environment variables
├── .gitignore              # Git ignore rules: excludes build artifacts and secrets
├── README.md               # Project documentation: quick start guide and usage
├── BLOG.md                 # This tutorial: complete step-by-step guide
└── src/
    ├── main.rs             # Main server code: REST API, routes, and Gemini integration
    └── cli.rs              # CLI client module: interactive chat interface

Now that you understand the structure, let's build it step by step!

Step 1: Project Setup

Creating the Project

Open your terminal and run:

cargo new rust-ai-chatbot
cd rust-ai-chatbot

What cargo new does:

Creates a new Rust project directory
Initializes a Git repository
Creates Cargo.toml (project manifest)
Sets up src/main.rs with a basic template

Understanding Cargo:
Cargo is Rust's package manager and build tool (like npm for Node.js or pip for Python). It streamlines the development process by automating tasks such as downloading libraries, compiling code, and managing project dependencies.

Step 2: Configuring Dependencies (Cargo.toml)

Create or edit Cargo.toml with the following content:

[package]
name = "rust-ai-chatbot"
version = "0.1.0"
edition = "2021"

[dependencies]
axum = "0.7"
tokio = { version = "1", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
reqwest = { version = "0.11", features = ["json"] }
dotenv = "0.15"
tower = "0.4"
tower-http = { version = "0.5", features = ["cors"] }

Let's understand each dependency:

axum = "0.7" - Web framework
- Rust concept: Modern async web framework built on Tokio
- What it does: Handles HTTP routing, request parsing, response formatting
- Why we need it: To build our REST API endpoints
tokio = { version = "1", features = ["full"] } - Async runtime
- Rust concept: Enables async/await syntax in Rust
- What it does: Provides runtime for concurrent operations
- Why we need it: To handle multiple requests without blocking threads
- Allows our server to handle multiple chat requests simultaneously
serde = { version = "1.0", features = ["derive"] } - Serialization
- Rust concept: Framework for converting Rust types to/from formats like JSON
- What it does: Auto-generates code to serialize/deserialize structs
- Why we need it: To convert between Rust structs and JSON (API format)
- LLM APIs use JSON, so we need to convert our Rust data to JSON
serde_json = "1.0" - JSON support
- Rust concept: JSON implementation for Serde
- What it does: Handles JSON parsing and generation
- Why we need it: Gemini API communicates in JSON format
reqwest = { version = "0.11", features = ["json"] } - HTTP client
- Rust concept: Async HTTP client library
- What it does: Makes HTTP requests to external APIs
- Why we need it: To call the Gemini API
- This is how we send prompts to the AI and receive responses
dotenv = "0.15" - Environment variables
- Rust concept: Loads .env files
- What it does: Reads environment variables from .env file
- Why we need it: To securely store API keys (never commit to git!)
tower and tower-http - Middleware
- Rust concept: Middleware framework for HTTP
- What it does: Adds cross-cutting concerns (CORS, logging, etc.)
- Why we need it: For production-ready features

When working with AI APIs, you typically need HTTP client (reqwest) to make API calls, JSON serialization (serde) to format requests/responses
and Async runtime (tokio) to handle concurrent requests efficiently.

Step 3: Environment Configuration

Create a .env file in your project root:

# Google Gemini API Key
# Get your free API key from: https://makersuite.google.com/app/apikey
GEMINI_API_KEY=your_api_key_here

Note on API Keys

LLM APIs require authentication via API keys
Never commit API keys to version control
Store them in .env files (which should be in .gitignore)
Free tier APIs (like Gemini) have rate limits, but are great for learning

Getting Your Gemini API Key:

Visit https://makersuite.google.com/app/apikey
Sign in with your Google account
Click "Create API Key"
Copy the key and paste it in .env (replace your_api_key_here)

For more information on env vars, see working with environment variables in Rust.

Step 4: The Main Server Code (src/main.rs)

Now let's build the complete server. Here's the full src/main.rs file:

mod cli;

use axum::{
    extract::Json,
    http::StatusCode,
    response::Json as ResponseJson,
    routing::{get, post},
    Router,
};
use serde::{Deserialize, Serialize};
use std::env;
use std::time::Duration;

#[derive(Deserialize)]
struct ChatRequest {
    message: String,
}

#[derive(Serialize)]
struct ChatResponse {
    response: String,
}

#[derive(Serialize)]
struct ErrorResponse {
    error: String,
}

// Gemini API request structures
#[derive(Serialize)]
struct GeminiRequest {
    contents: Vec<Content>,
}

#[derive(Serialize, Deserialize)]
struct Content {
    parts: Vec<Part>,
}

#[derive(Serialize, Deserialize)]
struct Part {
    text: String,
}

// Gemini API response structures
#[derive(Deserialize)]
struct GeminiResponse {
    candidates: Vec<Candidate>,
}

#[derive(Deserialize)]
struct Candidate {
    content: Content,
}

// Gemini API error response
#[derive(Deserialize)]
struct GeminiErrorResponse {
    error: GeminiError,
}

#[derive(Deserialize)]
struct GeminiError {
    code: u16,
    message: String,
    status: String,
}

async fn call_gemini_api(message: &str) -> Result<String, String> {
    let api_key = env::var("GEMINI_API_KEY")
        .map_err(|_| "GEMINI_API_KEY not found in environment variables".to_string())?;

    // Try different models - Use actual available models from API
    // Models that support generateContent: gemini-2.5-flash, gemini-flash-latest, gemini-pro-latest, etc.
    let models = ["gemini-2.5-flash", "gemini-flash-latest", "gemini-pro-latest", "gemini-2.0-flash"];
    let api_versions = ["v1beta", "v1"];

    // Create HTTP client with timeout
    let client = reqwest::Client::builder()
        .timeout(Duration::from_secs(30))
        .build()
        .map_err(|e| format!("Failed to create HTTP client: {}", e))?;

    let request_body = GeminiRequest {
        contents: vec![Content {
            parts: vec![Part {
                text: message.to_string(),
            }],
        }],
    };

    // Try different API versions and models
    for api_version in &api_versions {
        for model in &models {
            let url = format!(
                "https://generativelanguage.googleapis.com/{}/models/{}:generateContent?key={}",
                api_version, model, api_key
            );

            let response = match client
                .post(&url)
                .json(&request_body)
                .send()
                .await
            {
                Ok(resp) => {
                    eprintln!("Trying: {} (model: {})", api_version, model);
                    resp
                }
                Err(e) => {
                    eprintln!("Failed to send request to {}: {}", url, e);
                    continue;
                }
            };

            let status = response.status();
            let status_code = status.as_u16();

            if status.is_success() {
                match response.json::<GeminiResponse>().await {
                    Ok(gemini_response) => {
                        if let Some(text) = gemini_response
                            .candidates
                            .first()
                            .and_then(|c| c.content.parts.first())
                            .map(|p| p.text.clone())
                        {
                            eprintln!("Success with {} / {}", api_version, model);
                            return Ok(text);
                        }
                    }
                    Err(e) => {
                        eprintln!("Failed to parse response from {}: {}", url, e);
                        continue;
                    }
                }
            } else {
                // Read response text first (can only consume once)
                let error_text = response.text().await.unwrap_or_else(|_| "Unknown error".to_string());

                // Try to parse as structured error
                if let Ok(error_response) = serde_json::from_str::<GeminiErrorResponse>(&error_text) {
                    eprintln!(
                        "API error from {} / {}: {} ({}): {}",
                        api_version, model, error_response.error.status, error_response.error.code, error_response.error.message
                    );
                } else {
                    eprintln!("HTTP {} from {} / {}: {}", status_code, api_version, model, error_text);
                }
                // Continue to next model/version
                continue;
            }
        }
    }

    Err("Failed to get response from Gemini API. Please check your API key and model availability.".to_string())
}

async fn chat_handler(Json(payload): Json<ChatRequest>) -> Result<ResponseJson<ChatResponse>, (StatusCode, ResponseJson<ErrorResponse>)> {
    if payload.message.trim().is_empty() {
        return Err((
            StatusCode::BAD_REQUEST,
            ResponseJson(ErrorResponse {
                error: "Message cannot be empty".to_string(),
            }),
        ));
    }

    // Validate message length
    if payload.message.len() > 10000 {
        return Err((
            StatusCode::BAD_REQUEST,
            ResponseJson(ErrorResponse {
                error: "Message is too long (max 10000 characters)".to_string(),
            }),
        ));
    }

    match call_gemini_api(&payload.message).await {
        Ok(response) => Ok(ResponseJson(ChatResponse { response })),
        Err(e) => {
            eprintln!("Error calling Gemini API: {}", e);
            Err((
                StatusCode::INTERNAL_SERVER_ERROR,
                ResponseJson(ErrorResponse { error: e }),
            ))
        }
    }
}

#[tokio::main]
async fn main() {
    // Load environment variables
    dotenv::dotenv().ok();

    // Check for CLI mode
    let args: Vec<String> = std::env::args().collect();
    if args.len() > 1 && (args[1] == "chat" || args[1] == "cli" || args[1] == "--chat" || args[1] == "--cli") {
        cli::run_interactive_chat().await;
        return;
    }

    // Verify API key is set
    if env::var("GEMINI_API_KEY").is_err() {
        eprintln!("Warning: GEMINI_API_KEY not found in environment variables");
        eprintln!("   Please create a .env file with your API key");
    }

    // Health check endpoint
    async fn health() -> &'static str {
        "OK"
    }

    // Build the application router
    let app = Router::new()
        .route("/", get(health))
        .route("/health", get(health))
        .route("/chat", post(chat_handler));

    // Run the server
    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000")
        .await
        .expect("Failed to bind to port 3000. Is another server running?");

    println!("🚀 Server running on http://localhost:3000");
    println!("📝 POST to http://localhost:3000/chat with {{ \"message\": \"your message\" }}");
    println!("💡 Health check: http://localhost:3000/health");

    axum::serve(listener, app)
        .await
        .expect("Server failed to start");
}

Now let's explore how this code works and what makes Rust so effective for building AI backends.

When you look at the imports at the top, you'll notice we're using modules to organize our code. The mod cli; declaration tells Rust to look for a file called cli.rs in the same directory. This keeps our code organized without needing complex directory structures. The use statements bring types into scope, so instead of writing axum::Router::new() everywhere, we can just write Router::new(). It's a small thing, but it makes the code much more readable.

The interesting part is how Axum's Json extractor works. When a request comes in with JSON data, Axum automatically converts it into our ChatRequest struct. If the JSON doesn't match our struct shape, Axum returns a 400 error before our handler even runs. This is what people mean when they talk about Rust's type safety: the compiler and framework work together to catch errors before they become runtime problems. You can learn more about Rust's type system in the official book.

Our struct definitions use derive macros to automatically generate serialization code. When you write #[derive(Deserialize)], Rust generates all the code needed to convert JSON into your struct at compile time. There's no runtime reflection or dynamic parsing; the compiler knows exactly what fields your struct has and generates optimized code to parse them. This means when JSON like {"message": "hello"} arrives, it becomes a ChatRequest { message: "hello" } struct with zero runtime overhead.

The Gemini API structures show how Rust handles nested data. We have GeminiRequest containing a vector of Content, which contains a vector of Part, which contains a String. This mirrors the JSON structure the API expects, where you have arrays of conversation turns that can contain multiple parts (text, images, etc.). The Vec type is Rust's growable array, similar to lists in other languages, but with compile-time guarantees about memory safety.

What's neat about the call_gemini_api function is how it handles errors. The return type Result<String, String> means the function can either succeed with a String response or fail with a String error message. This forces you to handle both cases; you can't accidentally ignore an error like you might in languages with exceptions. The ? operator is Rust's way of saying "if this fails, return the error immediately." It's syntactic sugar for a common pattern, but it makes error handling feel natural rather than tedious. Learn more about error handling in Rust.

The model fallback logic demonstrates Rust's iteration capabilities. We try multiple API versions and models in nested loops, and if one fails, we just continue to the next one. The match expression handles the HTTP response, checking if it's successful or an error. What's interesting is that Rust's pattern matching is exhaustive; the compiler won't let you compile code that doesn't handle all possible cases. This means you can't accidentally forget to handle an error condition.

When we process the response, we use Option chaining to safely navigate through nested data. The .first() method returns an Option, which is Rust's way of saying "this might not exist." Then .and_then() says "if this exists, apply this function, otherwise return None." This pattern lets us safely extract the text from deep inside the response structure without worrying about null pointer exceptions or index out of bounds errors. The compiler ensures we handle the case where any part of the chain might be missing.

The chat_handler function shows how Axum's extractors work. The parameter Json(payload) tells Axum to automatically deserialize the request body into a ChatRequest. If the JSON is malformed or missing required fields, Axum handles it before our code runs. The return type is interesting too; we can return either a successful response or a tuple of status code and error response. This gives us fine-grained control over HTTP semantics while keeping everything type-safe.

Input validation happens before we even call the AI. We check if the message is empty or too long, returning appropriate HTTP status codes. This is defensive programming; we validate early and fail fast with clear error messages. The .trim() method removes whitespace, and .len() gives us the byte length of the string. These are simple operations, but they prevent us from wasting API calls on invalid input.

The main function uses the #[tokio::main] attribute macro, which transforms our async main function into a regular synchronous main that sets up the Tokio runtime. This is how Rust enables async/await syntax; the runtime handles all the scheduling and task management behind the scenes. When we call .await on something, the runtime can pause that task and work on other tasks while waiting for I/O to complete. This is how we handle thousands of concurrent requests without creating thousands of threads.

Building the router is straightforward; we just chain .route() calls to define our endpoints. Each route maps a path and HTTP method to a handler function. The compiler verifies that our handlers have the correct signatures, so we can't accidentally wire up a handler that expects different parameters. When we start the server with axum::serve(), it begins listening on port 3000 and handling incoming requests. The server runs until the process exits, and thanks to Tokio's async runtime, it can handle many requests concurrently without blocking.

Step 5: The Interactive CLI Client (src/cli.rs)

Create src/cli.rs with this complete code:

use std::io::{self, Write};

pub async fn run_interactive_chat() {
    println!("🤖 Rust AI Chatbot - Interactive Mode");
    println!("Type 'exit' or 'quit' to end the conversation\n");

    loop {
        print!("You: ");
        io::stdout().flush().unwrap();

        let mut input = String::new();
        match io::stdin().read_line(&mut input) {
            Ok(_) => {
                let message = input.trim();

                if message.is_empty() {
                    continue;
                }

                if message.eq_ignore_ascii_case("exit") || message.eq_ignore_ascii_case("quit") {
                    println!("👋 Goodbye!");
                    break;
                }

                // Send request to local server
                match send_chat_request(message).await {
                    Ok(response) => {
                        println!("Bot: {}\n", response);
                    }
                    Err(e) => {
                        eprintln!("Error: {}\n", e);
                    }
                }
            }
            Err(error) => {
                eprintln!("Error reading input: {}\n", error);
                break;
            }
        }
    }
}

async fn send_chat_request(message: &str) -> Result<String, String> {
    let client = reqwest::Client::new();
    let body = serde_json::json!({
        "message": message
    });

    let response = client
        .post("http://localhost:3000/chat")
        .json(&body)
        .send()
        .await
        .map_err(|e| format!("Failed to connect to server: {}. Make sure the server is running on port 3000.", e))?;

    if !response.status().is_success() {
        let error_text = response.text().await.unwrap_or_else(|_| "Unknown error".to_string());
        return Err(format!("Server error: {}", error_text));
    }

    let chat_response: serde_json::Value = response
        .json()
        .await
        .map_err(|e| format!("Failed to parse response: {}", e))?;

    chat_response["response"]
        .as_str()
        .map(|s| s.to_string())
        .ok_or_else(|| "Invalid response format".to_string())
}

The CLI code is simpler than the server, but it shows how Rust handles interactive I/O. The loop keyword creates an infinite loop, and we use print! instead of println! so the cursor stays on the same line. The .flush() call forces the output to appear immediately; without it, you might not see the prompt until after you type something. The .unwrap() here is fine for a CLI tool since we want it to crash if stdout fails, but you'd be more careful in a server.

Reading user input uses mutable references. The mut keyword makes the variable mutable, and &mut input passes a mutable reference to read_line(). This lets the function modify the string without taking ownership. When the user presses Enter, read_line() fills the string with their input, including the newline, so we call .trim() to remove it.

The match expression handles the result of reading input. If it succeeds, we process the message. If it fails (maybe the terminal closed), we break out of the loop. This is another example of Rust forcing you to handle errors explicitly; you can't accidentally ignore a failed read operation.

When we send the request to the server, we use the same reqwest client and serde_json::json! macro. The json! macro is convenient for creating simple JSON objects inline; it's compile-time checked, so you get errors if you write invalid JSON syntax. The response handling uses the same patterns we saw in the server code: checking status codes, parsing JSON, and handling errors with Result types.

Step 6: Running Your Chatbot

1. Start the Server

In one terminal:

cargo run

You should see:

🚀 Server running on http://localhost:3000
📝 POST to http://localhost:3000/chat with { "message": "your message" }
💡 Health check: http://localhost:3000/health

2. Use Interactive Mode

In another terminal:

cargo run -- chat

You'll see:

🤖 Rust AI Chatbot - Interactive Mode
Type 'exit' or 'quit' to end the conversation

You:

Type a message and get AI responses!

3. Test with HTTP

In PowerShell:

$body = @{message="Hello! Tell me about Rust."} | ConvertTo-Json
Invoke-RestMethod -Uri http://localhost:3000/chat -Method Post -Body $body -ContentType "application/json"

Conclusion

Congratulations! You've built a complete AI chatbot backend in Rust, keep Learning, and building more projects to reinforce concepts.

Happy coding! 🦀

DEV Community