David

Posted on Apr 3

Making Local LLMs Use Tools — Even When They Weren't Designed For It

#ai #typescript #webdev #opensource

Most local LLM UIs let you chat. Some let you tweak parameters. But what if the model could actually do things — browse the web, read files, execute code, generate images — autonomously?

That's Agent Mode in Locally Uncensored v2.1. And the hardest part wasn't the tools themselves — it was making it work with models that were never designed for tool calling.

The Tool Calling Problem with Local Models

OpenAI and Anthropic have native function calling baked into their APIs. You send tool definitions, the model returns structured tool_calls, you execute them, feed results back. Clean.

Local models through Ollama? It depends. Some support native tool calling (Llama 3.1+, Qwen 2.5, Mistral, Hermes 3). Many don't — especially abliterated/uncensored models, which are exactly the ones our users want to run.

So I built a three-tier strategy system.

Strategy 1: Native Tool Calling

For models in a curated compatibility list (llama3.1-3.3, qwen2.5, mistral, phi-4, deepseek-v3, gemma3/4...), we send OpenAI-format function definitions through Ollama's API:

const strategy = resolveAgentStrategy(modelName)
// Returns 'native' | 'hermes_xml' | 'template_fix'

Native is clean and reliable. The model returns structured JSON tool calls, we execute them, append results, loop.

Strategy 2: Hermes XML Fallback

For models without native support, we inject tool definitions into the system prompt using NousResearch's Hermes format:

<tools>
[{"type": "function", "function": {"name": "web_search", ...}}]
</tools>

The model responds with:

<tool_call>{"name": "web_search", "arguments": {"query": "..."}}</tool_call>

And we parse results back as:

<tool_response>{"name": "web_search", "content": "..."}</tool_response>

Parsing is where it gets messy. Local models don't always produce valid JSON. My parser has a three-level fallback:

JSON.parse() on the extracted content
Nested object extraction (find the inner {} that's valid)
Regex field extraction (pull "name" and "arguments" individually)

This handles most of the creative JSON that 7B models produce.

Strategy 3: Template Fix (The Weird One)

Here's a scenario: you have an abliterated Llama 3.1 model. The base model supports native tool calling, but the abliteration process stripped the chat template that makes it work.

The fix: detect the model family via ollama show, look up the correct Go template (stored as string constants for each family), then create a new model variant via ollama create with the original weights + correct template. The new model gets named <original>:agent.

If the template fix works → upgrade to native strategy. If it fails → fall back to Hermes XML. The user never sees any of this.

The Agent Loop

The core loop is a while-loop that keeps going until the model stops requesting tools:

while (running && !aborted) {
  // 1. Compact context if approaching token limit
  compactMessages(messages, maxContext * 0.8)

  // 2. Call LLM with tools
  const response = await provider.chatWithTools(messages, tools)

  // 3. If no tool calls → done
  if (!response.toolCalls?.length) break

  // 4. For each tool call:
  for (const call of response.toolCalls) {
    // Check permission (auto vs requires confirmation)
    if (needsConfirmation(call)) {
      await waitForUserApproval() // Promise-based pause
    }
    const result = await executeTool(call)
    messages.push(toolResultMessage(result))
  }
}

The permission system is interesting: web_search and web_fetch are auto-approved (low risk), but file_write, code_execute, and image_generate pause for user confirmation. The approval uses a Promise that resolves when the user clicks approve/reject in the UI.

The 7 Tools

web_search — Rust backend with 3-tier fallback: SearXNG → DuckDuckGo HTML scrape → Wikipedia API
web_fetch — Fetches URLs via Tauri IPC, then runs a sophisticated HTML→text extractor using DOMParser + TreeWalker (strips nav, headers, ads, preserves heading structure)
file_read — Sandboxed to ~/agent-workspace/ with path traversal prevention
file_write — Same sandbox, requires user approval, auto-creates directories
code_execute — Writes Python to temp file, spawns subprocess with 30s timeout via poll-based monitoring
image_generate — Submits ComfyUI workflow, polls history every 1s for up to 5 minutes
run_workflow — Executes a saved Agent Workflow, with depth counter (max 5) to prevent recursive bombs

Every tool wraps errors as string results ("Error: ...") instead of throwing — the model sees the error and can try a different approach. This is crucial for autonomous operation.

Agent Workflows: Visual Step Chains

Beyond ad-hoc tool calling, there's a visual workflow builder for reusable multi-step chains. Six step types:

prompt — Send to LLM with optional tool whitelist
tool — Execute specific tool with templated arguments
condition — Branch based on output evaluation (contains, equals, truthy...)
loop — Repeat until condition met (capped at 100 iterations)
user_input — Pause for human input
memory_save — Persist data to the memory store

Variables flow between steps via {{variable}} interpolation. The engine is sequential with branching support, and the user_input step uses the same Promise-based pause pattern as tool approval.

Three workflows ship built-in:

"Research Topic": input → search → pick URL → fetch → summarize → save to memory
"Summarize URL": input → fetch → summarize
"Code Review": input → read file → review

What I Learned

The biggest insight: tool calling with local models is 80% parsing and error recovery, 20% actual tool execution. Cloud APIs give you clean JSON. A quantized 8B model gives you... creative interpretations of JSON. Building robust fallbacks for malformed output is what makes or breaks the experience.

The code is MIT licensed: Locally Uncensored on GitHub

Locally Uncensored is a standalone desktop app for local AI — chat, image gen, video gen, and now agent mode. Single .exe/.AppImage/.dmg, no Docker.

Locally Uncensored — free, open source, MIT licensed. GitHub.

DEV Community