varun pratap Bhardwaj

Posted on Apr 17 • Originally published at qualixar.com

Run Multi-Agent Teams from Claude Code with Qualixar OS (25 MCP Tools)

#claudecode #mcp #qualixaros #tutorial

Run Multi-Agent Teams from Claude Code with Qualixar OS (25 MCP Tools)

Qualixar OS is an open-source agent orchestration runtime. You give it a task in plain English, and it designs a team of AI agents, picks a topology, runs them, and evaluates the output through an adversarial judge pipeline. It ships with 25 MCP tools, so you can drive the entire system from Claude Code without touching a browser.

This post walks through connecting Qualixar OS as an MCP server in Claude Code and using it to design, run, and evaluate a multi-agent code review team -- all from your terminal.

Installation

npx qualixar-os

That starts the server and opens the dashboard at localhost:3000/dashboard/. You can also install globally:

npm install -g qualixar-os
qos serve --dashboard --port 3000

Qualixar OS auto-detects Ollama for local inference. No API keys required to start. Add cloud providers (Anthropic, OpenAI, Azure, etc.) later through the Settings tab if you want more power.

MCP Server Configuration

Add this to your ~/.claude.json:

{
  "mcpServers": {
    "qualixar-os": {
      "command": "npx",
      "args": ["qualixar-os", "--mcp"]
    }
  }
}

Restart Claude Code. You now have 25 tools available. Run list tools in Claude Code to verify they appear.

The same config works in Cursor, Windsurf, VS Code (with MCP extension), or any MCP-compatible client.

The 25 MCP Tools

Tools are organized by domain. Here is the full catalog.

Task Execution

Tool	What it does
`run_task`	Submit a task. Forge AI auto-designs the agent team. Accepts optional `topology`, `budget_usd`, `mode`, and `simulate` (dry-run).
`get_status`	Poll task status by ID.
`list_tasks`	List recent tasks (most recent 50).
`pause_task`	Pause a running task.
`resume_task`	Resume a paused task.
`cancel_task`	Cancel a task.
`redirect_task`	Change a task's prompt mid-execution. Useful for steering agents without restarting.

Agents and Forge AI

Tool	What it does
`list_agents`	List all registered agents and their current state.
`list_topologies`	List the 13 available execution topologies (sequential, debate, hierarchical, etc.).
`get_forge_designs`	Retrieve the team designs Forge AI generated. Shows agent roles, tool assignments, topology selection, and estimated cost.

Quality and Memory

Tool	What it does
`get_judge_results`	Get structured evaluation results from the judge pipeline. Includes per-criterion scores, severity ratings, and improvement suggestions.
`search_memory`	Search SLM-Lite memory by query. Supports filtering by layer (episodic, semantic, procedural, behavioral) and result limits.
`get_rl_stats`	Get reinforcement learning stats -- which topologies perform best for which task types over time.

Chat and Data

Tool	What it does
`send_chat_message`	Send a message in a chat conversation (streaming via WebSocket on the dashboard side).
`list_connectors`	List configured data connectors.
`test_connector`	Test a connector's connection.
`list_datasets`	List available datasets.
`preview_dataset`	Preview rows from a dataset.
`search_vectors`	Search the vector store.

Blueprints and Prompts

Tool	What it does
`list_blueprints`	List saved agent blueprints.
`deploy_blueprint`	Deploy a blueprint as a running agent team.
`list_prompts`	List prompt templates.
`create_prompt`	Create a new prompt template.

System

Tool	What it does
`get_cost`	Cost breakdown -- per model, per agent, per task.
`get_system_config`	Current system configuration (providers, models, budget limits).

If you are on a tight context budget, Qualixar OS also offers 7 domain-grouped tools (qos_task, qos_system, qos_agents, qos_context, qos_quality, qos_workspace, qos_workflow_create) that pack the same 25 operations into fewer tool definitions using an action discriminator. Set QOS_TIER=core to expose only 2 tools (task + system), or QOS_TIER=extended for 4. Default is full.

Tutorial: Code Review Team via Forge AI

Here is a concrete walkthrough. You are in Claude Code, Qualixar OS is connected as an MCP server, and you want to run a multi-agent code review on a pull request.

Step 1: Submit the task

Call run_task with your prompt:

run_task({
  prompt: "Review the authentication module in src/auth/ for security vulnerabilities, code quality issues, and test coverage gaps. Produce a structured report.",
  type: "code",
  mode: "power"
})

Forge AI reads the prompt, decides this is a code quality task, and auto-designs a team.

Step 2: Inspect the Forge design

Call get_forge_designs to see what Forge created:

get_forge_designs({ taskType: "code" })

Forge might return something like:

3 agents: Security Analyst, Code Quality Reviewer, Test Coverage Auditor
Topology: Debate (two reviewers produce independent reports, a judge synthesizes)
Tools assigned: file_read, code_search, file_write
Estimated cost: $0.04

If you disagree with the topology, you can cancel and re-submit with an explicit override:

run_task({
  prompt: "...",
  topology: "hierarchical"
})

Step 3: Monitor execution

Poll status:

get_status({ taskId: "task_abc123" })

Status transitions: pending -> forge_designing -> executing -> judging -> completed (or rejected -> retry loop, up to 5 rounds).

Step 4: Check quality scores

Once execution completes, the judge pipeline runs automatically. Retrieve results:

get_judge_results({ taskId: "task_abc123" })

The judge returns structured feedback: per-criterion scores (correctness, completeness, clarity), an overall verdict (approved/rejected), severity ratings on any issues found, and specific improvement suggestions. If rejected, Forge automatically redesigns the team using the judge's feedback and re-executes -- up to 5 rounds, with a 3x budget cap as a safeguard.

Step 5: View in the dashboard

Open localhost:3000/dashboard/ to see the full execution visually. The 24-tab dashboard shows real-time agent activity (Swarms tab), judge verdicts (Judges tab), cost breakdown (Cost tab), and the final output (Chat tab). Everything you did from Claude Code is reflected there.

Advanced: Topology Selection and Cost Constraints

Choosing a topology

Qualixar OS supports 13 execution topologies. A few worth knowing:

Topology	When to use
`sequential`	Step-by-step pipelines where order matters
`parallel`	Independent analyses you want to run simultaneously
`debate`	When you want adversarial quality (two agents argue, judge decides)
`hierarchical`	Complex tasks that need decomposition into subtasks
`hybrid`	PII-sensitive work -- routes sensitive fields to local models, offloads the rest to cloud

Pass topology to run_task to override Forge's automatic selection.

Budget constraints

run_task({
  prompt: "...",
  budget_usd: 0.10
})

Forge respects the budget when selecting models and team size. Cost tracking is available during and after execution via get_cost.

Dry run

run_task({
  prompt: "...",
  simulate: true
})

Returns the Forge design and cost estimate without actually running agents.

A2A: Agent-to-Agent Protocol

Qualixar OS also implements the A2A protocol (v0.3). When the server is running, it exposes an agent card at:

GET http://localhost:3000/.well-known/agent-card

This means external A2A-compatible agents can discover and submit tasks to your Qualixar OS instance. Internal agents also communicate via A2A. Both MCP (tool calling from IDE) and A2A (agent-to-agent federation) work simultaneously on the same server.

The Qualixar AI Agent Reliability Platform

Seven open-source primitives. Seven peer-reviewed papers. One reliability platform.

SuperLocalMemory — persistent memory + learning for AI agents (16K+ monthly installs)
Qualixar OS — universal agent runtime with 13 topologies
SLM Mesh — P2P coordination across AI sessions
SLM MCP Hub — federate 430+ MCP tools through one gateway
AgentAssay — token-efficient agent testing
AgentAssert — behavioral contracts + drift detection
SkillFortify — formal verification for agent skills

Start here → qualixar.com

Top comments (1)

Max Quimby • May 6

The adversarial-judge-with-redesign loop is one of the more honest takes I've seen on multi-agent reliability — most "orchestrators" assume the first plan is roughly right and just retry steps. Letting the judge force a team redesign up to five iterations is a much stronger signal that the architecture itself is wrong, not just the execution.

A practical question on the 25 tools split across domains: at what point did Claude's tool-selection accuracy start to degrade? In our experience the model gets noticeably worse around 15–20 tools per namespace — it starts conflating semantically similar tools (e.g., "search_docs" vs "search_code") and picks the wrong one under load. We ended up sharding tools behind a router tool and only exposing the ~5 most relevant per task. Curious whether the domain-organization in Qualixar OS is partly a workaround for the same problem, or if you've measured it doesn't matter at this scale.

The other question I'd love to see addressed: how do you stop the judge from becoming the bottleneck (and the most expensive step) once the judge is also a Claude call? We had a setup where the judge was burning more tokens than the doers.

DEV Community

Run Multi-Agent Teams from Claude Code with Qualixar OS (25 MCP Tools)

Run Multi-Agent Teams from Claude Code with Qualixar OS (25 MCP Tools)

Installation

MCP Server Configuration

The 25 MCP Tools

Task Execution

Agents and Forge AI

Quality and Memory

Chat and Data

Blueprints and Prompts

System

Tutorial: Code Review Team via Forge AI

Step 1: Submit the task

Step 2: Inspect the Forge design

Step 3: Monitor execution

Step 4: Check quality scores

Step 5: View in the dashboard

Advanced: Topology Selection and Cost Constraints

Choosing a topology

Budget constraints

Dry run

A2A: Agent-to-Agent Protocol

Links

The Qualixar AI Agent Reliability Platform

Top comments (1)