DEV Community

Cover image for I Built a Local-First Agent Runtime in Rust (and Why Wrapping Existing CLIs Didn’t Work)
Calvin Sturm
Calvin Sturm

Posted on

I Built a Local-First Agent Runtime in Rust (and Why Wrapping Existing CLIs Didn’t Work)

I’ve been trying to make local AI workflows reliable for real day-to-day use: coding tasks, browser tasks, repeatable evals, and auditable tool execution.

I first tried adding trust/approval controls around existing agent CLIs. That approach hit a hard limit quickly: when tool execution is deeply native to the host app, external wrappers can’t reliably enforce policy boundaries.

So I built my own runtime: LocalAgent.

GitHub: https://github.com/CalvinSturm/LocalAgent


Why I built this

I kept seeing the same failure pattern with local 20–30B models:

  • brittle tool behavior
  • occasional non-answers
  • inconsistent step execution
  • hard-to-debug failures without replayable state

The answer wasn’t just “pick a better model.”

The answer was to harden the runtime process:

  • explicit safety gates
  • deterministic artifacts
  • policy + approvals
  • eval + baseline comparisons
  • replay + verification

What LocalAgent is

LocalAgent is a local-first agent runtime CLI focused on control and reliability.

It supports:

  • local providers: LM Studio, Ollama, llama.cpp server
  • tool calling with hard gates
  • trust workflows (policy, approvals, audit)
  • replayable run artifacts
  • MCP stdio tool sources (including Playwright MCP)
  • deterministic eval harnesses
  • TUI chat mode

Safety defaults (important)

Defaults are intentionally restrictive:

  • trust is off
  • shell is disabled
  • write tools are not exposed
  • file write execution is disabled

You have to explicitly enable risky capabilities.


Architecture (high level)

At a high level, each run does:

  1. Build runtime context (provider/model/workdir/state/settings)
  2. Prepare prompt messages (session/task memory/instructions if enabled)
  3. Apply compaction (if configured)
  4. Call model (streaming or non-streaming)
  5. If tool calls are returned:
    • run TrustGate decision first
    • execute only if allowed
    • normalize tool result envelope
    • feed tool result back to model
  6. Repeat until final output or exit condition
  7. Write artifacts/events best-effort for replay/debug

This design keeps side effects behind explicit gates and makes failures inspectable.


Why this is better than wrapper-only trust

External wrappers are useful, but they’re limited when tool execution happens inside another runtime you don’t control.

With LocalAgent:

  • tool identity/args are first-class internal data
  • policy and approvals are evaluated before side effects
  • event/audit/run artifacts are generated in one execution graph
  • replay and verification use the same runtime semantics

In short: security and reliability controls are part of the execution model, not bolted on.


Quickstart

cargo install --path . --force
localagent init
localagent doctor --provider lmstudio
localagent --provider lmstudio --model <model> chat --tui
Enter fullscreen mode Exit fullscreen mode


`

One-shot run:

bash
localagent --provider ollama --model qwen3:8b --prompt "Summarize README.md" run


Slow hardware notes

On slower CPUs / first-token-heavy setups, retries can create a bad UX (re-sent prompts before completion). During debugging, use larger timeouts and disable retries:

bash
localagent --provider llamacpp \
--base-url http://localhost:5001/v1 \
--model default \
--http-timeout-ms 300000 \
--http-stream-idle-timeout-ms 120000 \
--http-max-retries 0 \
--prompt "..." run


What I’ve learned so far

The biggest reliability gains came from process constraints, not model hype:

  • bounded tasks
  • strict output expectations
  • pre-exec arg validation
  • deterministic evals + baselines
  • replayable artifacts for root-cause debugging

For high-ambiguity reasoning, I still route to stronger hosted models.
For a lot of productivity helper work, local models are viable when the runtime is disciplined.

Current docs

  • README: project overview + workflows
  • CLI reference: complete command/flag map
  • provider setup guide (LM Studio/Ollama/llama.cpp)
  • templates, policy docs, and eval docs

Repo: https://github.com/CalvinSturm/LocalAgent


Feedback I’d love

  • What local model + runtime combos are most stable for tool-calling?
  • Which prompt/output constraints improved reliability most for you?
  • What would make local-first coding workflows feel “production-ready”?

If this is useful, I can write a follow-up with concrete eval/baseline workflows and model routing strategy.

Top comments (0)