Run Multi-Agent Teams from Claude Code with Qualixar OS (25 MCP Tools)
Qualixar OS is an open-source agent orchestration runtime. You give it a task in plain English, and it designs a team of AI agents, picks a topology, runs them, and evaluates the output through an adversarial judge pipeline. It ships with 25 MCP tools, so you can drive the entire system from Claude Code without touching a browser.
This post walks through connecting Qualixar OS as an MCP server in Claude Code and using it to design, run, and evaluate a multi-agent code review team -- all from your terminal.
Installation
npx qualixar-os
That starts the server and opens the dashboard at localhost:3000/dashboard/. You can also install globally:
npm install -g qualixar-os
qos serve --dashboard --port 3000
Qualixar OS auto-detects Ollama for local inference. No API keys required to start. Add cloud providers (Anthropic, OpenAI, Azure, etc.) later through the Settings tab if you want more power.
MCP Server Configuration
Add this to your ~/.claude.json:
{
"mcpServers": {
"qualixar-os": {
"command": "npx",
"args": ["qualixar-os", "--mcp"]
}
}
}
Restart Claude Code. You now have 25 tools available. Run list tools in Claude Code to verify they appear.
The same config works in Cursor, Windsurf, VS Code (with MCP extension), or any MCP-compatible client.
The 25 MCP Tools
Tools are organized by domain. Here is the full catalog.
Task Execution
| Tool | What it does |
|---|---|
run_task |
Submit a task. Forge AI auto-designs the agent team. Accepts optional topology, budget_usd, mode, and simulate (dry-run). |
get_status |
Poll task status by ID. |
list_tasks |
List recent tasks (most recent 50). |
pause_task |
Pause a running task. |
resume_task |
Resume a paused task. |
cancel_task |
Cancel a task. |
redirect_task |
Change a task's prompt mid-execution. Useful for steering agents without restarting. |
Agents and Forge AI
| Tool | What it does |
|---|---|
list_agents |
List all registered agents and their current state. |
list_topologies |
List the 13 available execution topologies (sequential, debate, hierarchical, etc.). |
get_forge_designs |
Retrieve the team designs Forge AI generated. Shows agent roles, tool assignments, topology selection, and estimated cost. |
Quality and Memory
| Tool | What it does |
|---|---|
get_judge_results |
Get structured evaluation results from the judge pipeline. Includes per-criterion scores, severity ratings, and improvement suggestions. |
search_memory |
Search SLM-Lite memory by query. Supports filtering by layer (episodic, semantic, procedural, behavioral) and result limits. |
get_rl_stats |
Get reinforcement learning stats -- which topologies perform best for which task types over time. |
Chat and Data
| Tool | What it does |
|---|---|
send_chat_message |
Send a message in a chat conversation (streaming via WebSocket on the dashboard side). |
list_connectors |
List configured data connectors. |
test_connector |
Test a connector's connection. |
list_datasets |
List available datasets. |
preview_dataset |
Preview rows from a dataset. |
search_vectors |
Search the vector store. |
Blueprints and Prompts
| Tool | What it does |
|---|---|
list_blueprints |
List saved agent blueprints. |
deploy_blueprint |
Deploy a blueprint as a running agent team. |
list_prompts |
List prompt templates. |
create_prompt |
Create a new prompt template. |
System
| Tool | What it does |
|---|---|
get_cost |
Cost breakdown -- per model, per agent, per task. |
get_system_config |
Current system configuration (providers, models, budget limits). |
If you are on a tight context budget, Qualixar OS also offers 7 domain-grouped tools (qos_task, qos_system, qos_agents, qos_context, qos_quality, qos_workspace, qos_workflow_create) that pack the same 25 operations into fewer tool definitions using an action discriminator. Set QOS_TIER=core to expose only 2 tools (task + system), or QOS_TIER=extended for 4. Default is full.
Tutorial: Code Review Team via Forge AI
Here is a concrete walkthrough. You are in Claude Code, Qualixar OS is connected as an MCP server, and you want to run a multi-agent code review on a pull request.
Step 1: Submit the task
Call run_task with your prompt:
run_task({
prompt: "Review the authentication module in src/auth/ for security vulnerabilities, code quality issues, and test coverage gaps. Produce a structured report.",
type: "code",
mode: "power"
})
Forge AI reads the prompt, decides this is a code quality task, and auto-designs a team.
Step 2: Inspect the Forge design
Call get_forge_designs to see what Forge created:
get_forge_designs({ taskType: "code" })
Forge might return something like:
- 3 agents: Security Analyst, Code Quality Reviewer, Test Coverage Auditor
- Topology: Debate (two reviewers produce independent reports, a judge synthesizes)
-
Tools assigned:
file_read,code_search,file_write - Estimated cost: $0.04
If you disagree with the topology, you can cancel and re-submit with an explicit override:
run_task({
prompt: "...",
topology: "hierarchical"
})
Step 3: Monitor execution
Poll status:
get_status({ taskId: "task_abc123" })
Status transitions: pending -> forge_designing -> executing -> judging -> completed (or rejected -> retry loop, up to 5 rounds).
Step 4: Check quality scores
Once execution completes, the judge pipeline runs automatically. Retrieve results:
get_judge_results({ taskId: "task_abc123" })
The judge returns structured feedback: per-criterion scores (correctness, completeness, clarity), an overall verdict (approved/rejected), severity ratings on any issues found, and specific improvement suggestions. If rejected, Forge automatically redesigns the team using the judge's feedback and re-executes -- up to 5 rounds, with a 3x budget cap as a safeguard.
Step 5: View in the dashboard
Open localhost:3000/dashboard/ to see the full execution visually. The 24-tab dashboard shows real-time agent activity (Swarms tab), judge verdicts (Judges tab), cost breakdown (Cost tab), and the final output (Chat tab). Everything you did from Claude Code is reflected there.
Advanced: Topology Selection and Cost Constraints
Choosing a topology
Qualixar OS supports 13 execution topologies. A few worth knowing:
| Topology | When to use |
|---|---|
sequential |
Step-by-step pipelines where order matters |
parallel |
Independent analyses you want to run simultaneously |
debate |
When you want adversarial quality (two agents argue, judge decides) |
hierarchical |
Complex tasks that need decomposition into subtasks |
hybrid |
PII-sensitive work -- routes sensitive fields to local models, offloads the rest to cloud |
Pass topology to run_task to override Forge's automatic selection.
Budget constraints
run_task({
prompt: "...",
budget_usd: 0.10
})
Forge respects the budget when selecting models and team size. Cost tracking is available during and after execution via get_cost.
Dry run
run_task({
prompt: "...",
simulate: true
})
Returns the Forge design and cost estimate without actually running agents.
A2A: Agent-to-Agent Protocol
Qualixar OS also implements the A2A protocol (v0.3). When the server is running, it exposes an agent card at:
GET http://localhost:3000/.well-known/agent-card
This means external A2A-compatible agents can discover and submit tasks to your Qualixar OS instance. Internal agents also communicate via A2A. Both MCP (tool calling from IDE) and A2A (agent-to-agent federation) work simultaneously on the same server.
Links
- GitHub: github.com/qualixar/qualixar-os
- Documentation: qualixar.com
- Paper: arXiv:2604.06392 -- formal topology semantics, empirical evaluation (1 of 7 papers in the Qualixar ecosystem)
- Research Ecosystem: Judge pipeline backed by AgentAssert (arXiv:2602.22302). Memory powered by SuperLocalMemory (3 papers). Evaluation via AgentAssay. Skill testing via SkillFortify.
- License: FSL-1.1-ALv2 (converts to Apache 2.0 after two years)
- Tests: 2,936 passing
If you run into issues or have questions, open an issue on GitHub or comment below.
The Qualixar AI Agent Reliability Platform
Seven open-source primitives. Seven peer-reviewed papers. One reliability platform.
- SuperLocalMemory — persistent memory + learning for AI agents (16K+ monthly installs)
- Qualixar OS — universal agent runtime with 13 topologies
- SLM Mesh — P2P coordination across AI sessions
- SLM MCP Hub — federate 430+ MCP tools through one gateway
- AgentAssay — token-efficient agent testing
- AgentAssert — behavioral contracts + drift detection
- SkillFortify — formal verification for agent skills
Start here → qualixar.com

Top comments (0)