DEV Community

Cover image for Supervise a multi-agent setup with Local LLMs
Ilbets
Ilbets

Posted on

Supervise a multi-agent setup with Local LLMs

There’s a popular misconception that local LLMs are not useful for anything beyond passing “trust me, bro” benchmarks. In reality, they can be surprisingly effective when used for the right tasks with the right setup. I’ve been using them for a while to supervise my agents in TaskSquad, and they’ve proven to be genuinely useful.

Each TaskSquad daemon has a “Supervisor,” powered by a local LLM (e.g., Qwen3.5 or Gemma 4). It follows a very simple instruction: regularly attach to the running TSQ harness, check whether everything is working properly, and if something requires intervention, act on the user’s behalf (provide permissions or answer questions). If it gets stuck, it sends a message via TSQ to request manual intervention.

In practice, most of the time it has to deal with “out of tokens” situations and schedules a bash command to send a “Resume work” message after a timeout period.

While it works, the key is to effectively manage the context and pick the right model, harness, and backend. I use

  • the claw-code harness
  • omlx to enable hot/cold cache
  • MLX-optimized, instruction, quantized models

Reaching ~40 t/s gen; 32k context window with solid accuracy.

Top comments (0)