How I Built an Offline AI Orchestrator in 100% Rust (Goodbye, OpenAI Bills)

#programming #opensource #ai #rust

The cloud is great, until you get the bill.

Like many developers, I wanted to build "Agentic Workflows"—systems where one AI model’s output triggers another’s action. But every time I wanted to test a multi-step chain, I hit the same three walls:

The Token Tax: Iterating on complex logic meant burning API credits.
Privacy: I couldn't build workflows that touched sensitive local files because everything had to be shipped to a remote server.
Latency: Waiting for a server round-trip just to summarize a local text file felt inefficient.

I realized I didn't need another wrapper around the OpenAI API. I needed a "Unix pipe" for local intelligence.

So I built LAO (Local AI Orchestrator)—a desktop tool written entirely in Rust (from the backend logic to the GUI) that chains local models like Llama 3 and Whisper into powerful, offline workflows.

What is LAO?

LAO is a cross-platform desktop application that lets you visually build Directed Acyclic Graphs (DAGs) of AI tasks.

Instead of writing Python scripts to glue models together, you define steps: "Watch this folder for audio files" -> "Transcribe with Whisper" -> "Summarize with Llama 3" -> "Save to Markdown".

And here is the kicker: It runs 100% offline. No internet, no API keys, no monthly subscriptions.

The Stack: "Full Stack" Rust

I chose Rust not just for performance, but for correctness. When you are orchestrating heavy compute tasks (like loading a 7GB model into RAM), memory safety isn't optional.

1. The Core (Systems Engineering)

The backend isn't just a script runner. It's a custom DAG Engine that handles:

Dependency Resolution: Ensuring Step B never runs if Step A fails.
Hot-Swappable Plugins: I built a dynamic plugin system using Rust's libloading. Plugins (like the Ollama interface or the Whisper engine) are compiled as shared libraries (.so/.dll) and loaded at runtime.
State Management: If the app crashes, the WorkflowScheduler knows exactly where it left off.

2. The UI (egui)

Initially, I considered using a web-based frontend. But I wanted LAO to feel like a tool, not a website.

I built the entire interface using egui (an immediate mode GUI library for Rust).

Why egui? It compiles to a single binary with the backend. No Electron bloat, no 200MB memory overhead just to render a button.
Performance: The visual graph editor renders at 60FPS even with complex workflows because it's running natively on the GPU.

The Coolest Part: Dynamic Plugin Loading

The hardest technical challenge was making the system extensible without recompiling the core. I needed a way to load new capabilities (like a new model type) on the fly.

Here is a snippet from the PluginManager that handles hot reloading. It safely unloads the old plugin code and swaps in the new version while the app is running:

// core/plugin_manager.rs

pub fn hot_reload_plugin(&mut self, name: &str) -> Result<()> {
    println!("🔄 Hot reloading plugin: {}", name);

    // 1. Emit unload event so the UI can clean up
    self.emit_event(PluginEvent::PluginUnloaded {
        plugin_name: name.to_string(),
    });

    // 2. Safely remove the old library from memory
    if self.registry.plugins.contains_key(name) {
        // In the real impl, we drop the dynamic library handle here
        self.registry.plugins.remove(name);
    }

    // 3. Load the new binary from disk
    self.load_plugins()?;

    println!("✓ Successfully hot reloaded plugin: {}", name);
    Ok(())
}

This architecture allows developers to write their own Rust plugins for LAO and drop the .dll or .so file into the plugins/ folder, and the engine picks it up instantly.