DEV Community

David
David

Posted on

We Just Shipped a Coding Agent Inside a Desktop AI App — Here's How It Works

I've been building Locally Uncensored, an open-source desktop app for running AI locally. Today we shipped v2.2.2 and it's the biggest update yet. Let me walk you through what's new and some of the technical decisions behind it.

A Coding Agent That Runs on Your Machine

The headline feature is Codex — a dedicated coding agent tab that sits alongside the regular chat. It reads your codebase, writes files, and runs shell commands autonomously. Up to 20 tool iterations per task, all running locally through Ollama.

The UI is a three-tab system: LU (chat) | Codex (coding agent) | OpenClaw (experimental). Each mode gets its own conversation history so your coding sessions don't mix with casual chat.

The folder picker uses native Windows dialogs via the rfd crate — no Electron file picker jank. Select a project directory and the agent gets working directory context for all file and shell operations.

13 MCP Tools with Smart Filtering

We replaced the old hardcoded 7-tool system with a dynamic MCP tool registry. The full set:

web_search, web_fetch, file_read, file_write, 
file_list, file_search, shell_execute, code_execute, 
system_info, process_list, screenshot, 
image_generate, run_workflow
Enter fullscreen mode Exit fullscreen mode

The interesting technical challenge here was token cost. Sending 13 tool definitions with every API call eats a lot of context, especially with local models where every token matters.

The solution: keyword-based tool filtering. Before each request, we scan the user's message for patterns and only include relevant tool definitions. A message about files gets file_read, file_write, file_search. A message about running code gets shell_execute, code_execute. This saves ~80% of tool-definition tokens on average.

We also built a JSON repair layer because local LLMs are notoriously bad at producing clean JSON for tool calls. The parser handles trailing commas, single quotes instead of double quotes, missing closing braces, and other common failures. If JSON parsing fails, it falls back to Hermes XML format extraction.

Provider-Agnostic Thinking Mode

This one was surprisingly tricky. Different providers handle "thinking" completely differently:

  • Ollama: Native think: true API parameter
  • OpenAI / Anthropic: System prompt injection with <think> tag parsing
  • Gemma 4: Custom <|channel>thought tag format that needs stripping

We unified all of this behind a single toggle. The frontend shows collapsible thinking blocks regardless of which provider generated them. The parser detects the format automatically.

File Upload with Vision — Cross-Provider

Drag & drop, paste (Ctrl+V), or click the 📎 button. Up to 5 images per message. Vision-capable models describe what they see.

The gnarly part: each provider wants images in a different format. Ollama wants base64 in a specific field. OpenAI wants base64 data URIs. Anthropic wants base64 with explicit media types. We handle the conversion automatically based on which provider is active.

Native PC Control Through Rust

All system operations go through Tauri's Rust backend:

// shell_execute uses tokio::spawn_blocking
// No more UI freezes during long-running commands
#[tauri::command]
async fn shell_execute(command: String, workdir: String) -> Result<ShellResult, String> {
    tokio::task::spawn_blocking(move || {
        // Execute and capture stdout/stderr
    }).await
}
Enter fullscreen mode Exit fullscreen mode

The key decision was bypassing Tauri's sandbox entirely for filesystem and shell operations. A local AI app that can't read your files or run commands isn't very useful. The permission system (7 categories, 3 levels) gives users control over what the agent can do.

UI Changes That Actually Matter

  • 15% larger everything — root font-size bumped to 18.4px. Sounds minor but it makes the whole app feel less cramped
  • Collapsible code blocks — anything over 4 lines collapses with a "Show all X lines" button. Huge for agent responses that dump entire files
  • Monochrome tool output blocks — collapsed by default, click to expand. Keeps the chat clean
  • Real-time elapsed counter during generation so you know the model isn't frozen
  • Stop button actually works nowAbortSignal threaded through all fetch calls. Embarrassing that this was broken before, but here we are

What's Next

Image generation is stubbed out in the permission system as "Coming Soon". ComfyUI integration is the next big focus — wrapping node-based workflows into something a non-technical user can actually use.

The app is MIT licensed and the .exe is a single download. No Docker, no Python environment, no cloud account needed. Just Ollama running in the background.

GitHub: PurpleDoubleD/locally-uncensored
Website: locallyuncensored.com

If you try it, feedback genuinely helps — especially hardware-specific issues we can't test for. Open an issue or drop a comment in the GitHub Discussions.

Top comments (0)