## TL;DR
I built Task Orchestrator, an open-source MCP server that adds production safety to Claude Code agents. It catches semantic failures (hallucinations, wrong answers) not just crashes, learns from
mistakes, and prevents recurrence.
GitHub: github.com/TC407-api/task-orchestrator
- MIT licensed
- 680+ tests
- Provider-agnostic (works with any LLM)
## The Problem
Here's a stat that should terrify anyone deploying AI agents:
"Less than 1 in 3 teams are satisfied with their AI agent guardrails and observability" - Cleanlab AI Agents Report 2025
I've been building with Claude Code for months. It's incredible for development velocity. But here's what I noticed:
- Agents hallucinate file paths that don't exist
- They suggest fixes that introduce new bugs
- They claim "tests pass" without running them
- Same errors happen again and again
The tools exist to catch crashes. Nothing exists to catch semantic failures.
## The Math Problem
At 95% reliability per step, a 20-step agent workflow has only a 36% success rate overall.
0.95^20 = 0.358 = 35.8%
That's not a bug - it's compound probability. Every step that could fail, will eventually fail.
## What I Built
Task Orchestrator is an MCP server that adds an immune system to Claude Code:
### 1. Semantic Failure Detection
Not "did it crash?" but "did it actually do the right thing?"
### 2. ML-Powered Learning
The system learns from failures. Pattern stored -> warning before similar prompts.
### 3. Human-in-the-Loop
High-risk operations queue for human approval.
### 4. Cost Tracking
Know what you're spending across providers.
### 5. Self-Healing
Circuit breakers that back off automatically.
## Getting Started
git clone https://github.com/TC407-api/task-orchestrator.git
cd task-orchestrator && pip install -r requirements.txt
cp .env.example .env.local
claude mcp add task-orchestrator python mcp_server.py
Restart Claude Code. Done.
## What's Next
Core is free forever. For teams that need more, enterprise features are in development - see the roadmap for details.
I'm committed to maintaining and improving this project as long as there's interest. This isn't abandonware.
I want your input:
- What features would improve your AI agent workflows?
- What problems are you running into that this could solve?
GitHub: github.com/TC407-api/task-orchestrator
Star if you think AI agents need better safety.
Built by someone tired of AI agents failing silently.
This post was written with Claude Code, but all thoughts, ideas, and architecture decisions are my own - the result of countless hours of research, experimentation, and real-world frustration.
Top comments (0)