DEV Community

Cover image for Run Multi-Agent Teams from Claude Code with Qualixar OS (25 MCP Tools)
varun pratap Bhardwaj
varun pratap Bhardwaj

Posted on • Originally published at qualixar.com

Run Multi-Agent Teams from Claude Code with Qualixar OS (25 MCP Tools)

Run Multi-Agent Teams from Claude Code with Qualixar OS (25 MCP Tools)

Qualixar OS is an open-source agent orchestration runtime. You give it a task in plain English, and it designs a team of AI agents, picks a topology, runs them, and evaluates the output through an adversarial judge pipeline. It ships with 25 MCP tools, so you can drive the entire system from Claude Code without touching a browser.

This post walks through connecting Qualixar OS as an MCP server in Claude Code and using it to design, run, and evaluate a multi-agent code review team -- all from your terminal.

37 MCP servers collapsed into one endpoint — 430+ tools, 75% less RAM

Installation

npx qualixar-os
Enter fullscreen mode Exit fullscreen mode

That starts the server and opens the dashboard at localhost:3000/dashboard/. You can also install globally:

npm install -g qualixar-os
qos serve --dashboard --port 3000
Enter fullscreen mode Exit fullscreen mode

Qualixar OS auto-detects Ollama for local inference. No API keys required to start. Add cloud providers (Anthropic, OpenAI, Azure, etc.) later through the Settings tab if you want more power.

MCP Server Configuration

Add this to your ~/.claude.json:

{
  "mcpServers": {
    "qualixar-os": {
      "command": "npx",
      "args": ["qualixar-os", "--mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Restart Claude Code. You now have 25 tools available. Run list tools in Claude Code to verify they appear.

The same config works in Cursor, Windsurf, VS Code (with MCP extension), or any MCP-compatible client.

The 25 MCP Tools

Tools are organized by domain. Here is the full catalog.

Task Execution

Tool What it does
run_task Submit a task. Forge AI auto-designs the agent team. Accepts optional topology, budget_usd, mode, and simulate (dry-run).
get_status Poll task status by ID.
list_tasks List recent tasks (most recent 50).
pause_task Pause a running task.
resume_task Resume a paused task.
cancel_task Cancel a task.
redirect_task Change a task's prompt mid-execution. Useful for steering agents without restarting.

Agents and Forge AI

Tool What it does
list_agents List all registered agents and their current state.
list_topologies List the 13 available execution topologies (sequential, debate, hierarchical, etc.).
get_forge_designs Retrieve the team designs Forge AI generated. Shows agent roles, tool assignments, topology selection, and estimated cost.

Quality and Memory

Tool What it does
get_judge_results Get structured evaluation results from the judge pipeline. Includes per-criterion scores, severity ratings, and improvement suggestions.
search_memory Search SLM-Lite memory by query. Supports filtering by layer (episodic, semantic, procedural, behavioral) and result limits.
get_rl_stats Get reinforcement learning stats -- which topologies perform best for which task types over time.

Chat and Data

Tool What it does
send_chat_message Send a message in a chat conversation (streaming via WebSocket on the dashboard side).
list_connectors List configured data connectors.
test_connector Test a connector's connection.
list_datasets List available datasets.
preview_dataset Preview rows from a dataset.
search_vectors Search the vector store.

Blueprints and Prompts

Tool What it does
list_blueprints List saved agent blueprints.
deploy_blueprint Deploy a blueprint as a running agent team.
list_prompts List prompt templates.
create_prompt Create a new prompt template.

System

Tool What it does
get_cost Cost breakdown -- per model, per agent, per task.
get_system_config Current system configuration (providers, models, budget limits).

If you are on a tight context budget, Qualixar OS also offers 7 domain-grouped tools (qos_task, qos_system, qos_agents, qos_context, qos_quality, qos_workspace, qos_workflow_create) that pack the same 25 operations into fewer tool definitions using an action discriminator. Set QOS_TIER=core to expose only 2 tools (task + system), or QOS_TIER=extended for 4. Default is full.

Tutorial: Code Review Team via Forge AI

Here is a concrete walkthrough. You are in Claude Code, Qualixar OS is connected as an MCP server, and you want to run a multi-agent code review on a pull request.

Step 1: Submit the task

Call run_task with your prompt:

run_task({
  prompt: "Review the authentication module in src/auth/ for security vulnerabilities, code quality issues, and test coverage gaps. Produce a structured report.",
  type: "code",
  mode: "power"
})
Enter fullscreen mode Exit fullscreen mode

Forge AI reads the prompt, decides this is a code quality task, and auto-designs a team.

Step 2: Inspect the Forge design

Call get_forge_designs to see what Forge created:

get_forge_designs({ taskType: "code" })
Enter fullscreen mode Exit fullscreen mode

Forge might return something like:

  • 3 agents: Security Analyst, Code Quality Reviewer, Test Coverage Auditor
  • Topology: Debate (two reviewers produce independent reports, a judge synthesizes)
  • Tools assigned: file_read, code_search, file_write
  • Estimated cost: $0.04

If you disagree with the topology, you can cancel and re-submit with an explicit override:

run_task({
  prompt: "...",
  topology: "hierarchical"
})
Enter fullscreen mode Exit fullscreen mode

Step 3: Monitor execution

Poll status:

get_status({ taskId: "task_abc123" })
Enter fullscreen mode Exit fullscreen mode

Status transitions: pending -> forge_designing -> executing -> judging -> completed (or rejected -> retry loop, up to 5 rounds).

Step 4: Check quality scores

Once execution completes, the judge pipeline runs automatically. Retrieve results:

get_judge_results({ taskId: "task_abc123" })
Enter fullscreen mode Exit fullscreen mode

The judge returns structured feedback: per-criterion scores (correctness, completeness, clarity), an overall verdict (approved/rejected), severity ratings on any issues found, and specific improvement suggestions. If rejected, Forge automatically redesigns the team using the judge's feedback and re-executes -- up to 5 rounds, with a 3x budget cap as a safeguard.

Step 5: View in the dashboard

Open localhost:3000/dashboard/ to see the full execution visually. The 24-tab dashboard shows real-time agent activity (Swarms tab), judge verdicts (Judges tab), cost breakdown (Cost tab), and the final output (Chat tab). Everything you did from Claude Code is reflected there.

Advanced: Topology Selection and Cost Constraints

Choosing a topology

Qualixar OS supports 13 execution topologies. A few worth knowing:

Topology When to use
sequential Step-by-step pipelines where order matters
parallel Independent analyses you want to run simultaneously
debate When you want adversarial quality (two agents argue, judge decides)
hierarchical Complex tasks that need decomposition into subtasks
hybrid PII-sensitive work -- routes sensitive fields to local models, offloads the rest to cloud

Pass topology to run_task to override Forge's automatic selection.

Budget constraints

run_task({
  prompt: "...",
  budget_usd: 0.10
})
Enter fullscreen mode Exit fullscreen mode

Forge respects the budget when selecting models and team size. Cost tracking is available during and after execution via get_cost.

Dry run

run_task({
  prompt: "...",
  simulate: true
})
Enter fullscreen mode Exit fullscreen mode

Returns the Forge design and cost estimate without actually running agents.

A2A: Agent-to-Agent Protocol

Qualixar OS also implements the A2A protocol (v0.3). When the server is running, it exposes an agent card at:

GET http://localhost:3000/.well-known/agent-card
Enter fullscreen mode Exit fullscreen mode

This means external A2A-compatible agents can discover and submit tasks to your Qualixar OS instance. Internal agents also communicate via A2A. Both MCP (tool calling from IDE) and A2A (agent-to-agent federation) work simultaneously on the same server.

Links

If you run into issues or have questions, open an issue on GitHub or comment below.


The Qualixar AI Agent Reliability Platform

Seven open-source primitives. Seven peer-reviewed papers. One reliability platform.

  • SuperLocalMemory — persistent memory + learning for AI agents (16K+ monthly installs)
  • Qualixar OS — universal agent runtime with 13 topologies
  • SLM Mesh — P2P coordination across AI sessions
  • SLM MCP Hub — federate 430+ MCP tools through one gateway
  • AgentAssay — token-efficient agent testing
  • AgentAssert — behavioral contracts + drift detection
  • SkillFortify — formal verification for agent skills

Start here → qualixar.com

Top comments (0)