Most local LLM UIs let you chat. Some let you tweak parameters. But what if the model could actually do things — browse the web, read files, execute code, generate images — autonomously?
That's Agent Mode in Locally Uncensored v2.1. And the hardest part wasn't the tools themselves — it was making it work with models that were never designed for tool calling.
The Tool Calling Problem with Local Models
OpenAI and Anthropic have native function calling baked into their APIs. You send tool definitions, the model returns structured tool_calls, you execute them, feed results back. Clean.
Local models through Ollama? It depends. Some support native tool calling (Llama 3.1+, Qwen 2.5, Mistral, Hermes 3). Many don't — especially abliterated/uncensored models, which are exactly the ones our users want to run.
So I built a three-tier strategy system.
Strategy 1: Native Tool Calling
For models in a curated compatibility list (llama3.1-3.3, qwen2.5, mistral, phi-4, deepseek-v3, gemma3/4...), we send OpenAI-format function definitions through Ollama's API:
const strategy = resolveAgentStrategy(modelName)
// Returns 'native' | 'hermes_xml' | 'template_fix'
Native is clean and reliable. The model returns structured JSON tool calls, we execute them, append results, loop.
Strategy 2: Hermes XML Fallback
For models without native support, we inject tool definitions into the system prompt using NousResearch's Hermes format:
<tools>
[{"type": "function", "function": {"name": "web_search", ...}}]
</tools>
The model responds with:
<tool_call>{"name": "web_search", "arguments": {"query": "..."}}</tool_call>
And we parse results back as:
<tool_response>{"name": "web_search", "content": "..."}</tool_response>
Parsing is where it gets messy. Local models don't always produce valid JSON. My parser has a three-level fallback:
-
JSON.parse()on the extracted content - Nested object extraction (find the inner
{}that's valid) - Regex field extraction (pull
"name"and"arguments"individually)
This handles most of the creative JSON that 7B models produce.
Strategy 3: Template Fix (The Weird One)
Here's a scenario: you have an abliterated Llama 3.1 model. The base model supports native tool calling, but the abliteration process stripped the chat template that makes it work.
The fix: detect the model family via ollama show, look up the correct Go template (stored as string constants for each family), then create a new model variant via ollama create with the original weights + correct template. The new model gets named <original>:agent.
If the template fix works → upgrade to native strategy. If it fails → fall back to Hermes XML. The user never sees any of this.
The Agent Loop
The core loop is a while-loop that keeps going until the model stops requesting tools:
while (running && !aborted) {
// 1. Compact context if approaching token limit
compactMessages(messages, maxContext * 0.8)
// 2. Call LLM with tools
const response = await provider.chatWithTools(messages, tools)
// 3. If no tool calls → done
if (!response.toolCalls?.length) break
// 4. For each tool call:
for (const call of response.toolCalls) {
// Check permission (auto vs requires confirmation)
if (needsConfirmation(call)) {
await waitForUserApproval() // Promise-based pause
}
const result = await executeTool(call)
messages.push(toolResultMessage(result))
}
}
The permission system is interesting: web_search and web_fetch are auto-approved (low risk), but file_write, code_execute, and image_generate pause for user confirmation. The approval uses a Promise that resolves when the user clicks approve/reject in the UI.
The 7 Tools
- web_search — Rust backend with 3-tier fallback: SearXNG → DuckDuckGo HTML scrape → Wikipedia API
- web_fetch — Fetches URLs via Tauri IPC, then runs a sophisticated HTML→text extractor using DOMParser + TreeWalker (strips nav, headers, ads, preserves heading structure)
-
file_read — Sandboxed to
~/agent-workspace/with path traversal prevention - file_write — Same sandbox, requires user approval, auto-creates directories
- code_execute — Writes Python to temp file, spawns subprocess with 30s timeout via poll-based monitoring
- image_generate — Submits ComfyUI workflow, polls history every 1s for up to 5 minutes
- run_workflow — Executes a saved Agent Workflow, with depth counter (max 5) to prevent recursive bombs
Every tool wraps errors as string results ("Error: ...") instead of throwing — the model sees the error and can try a different approach. This is crucial for autonomous operation.
Agent Workflows: Visual Step Chains
Beyond ad-hoc tool calling, there's a visual workflow builder for reusable multi-step chains. Six step types:
- prompt — Send to LLM with optional tool whitelist
- tool — Execute specific tool with templated arguments
- condition — Branch based on output evaluation (contains, equals, truthy...)
- loop — Repeat until condition met (capped at 100 iterations)
- user_input — Pause for human input
- memory_save — Persist data to the memory store
Variables flow between steps via {{variable}} interpolation. The engine is sequential with branching support, and the user_input step uses the same Promise-based pause pattern as tool approval.
Three workflows ship built-in:
- "Research Topic": input → search → pick URL → fetch → summarize → save to memory
- "Summarize URL": input → fetch → summarize
- "Code Review": input → read file → review
What I Learned
The biggest insight: tool calling with local models is 80% parsing and error recovery, 20% actual tool execution. Cloud APIs give you clean JSON. A quantized 8B model gives you... creative interpretations of JSON. Building robust fallbacks for malformed output is what makes or breaks the experience.
The code is MIT licensed: Locally Uncensored on GitHub
Locally Uncensored is a standalone desktop app for local AI — chat, image gen, video gen, and now agent mode. Single .exe/.AppImage/.dmg, no Docker.
Top comments (0)