Any AI Agent Can Now "Vibe Check" LLM Outputs — No Code Required
Your AI agent just generated a customer email. It's grammatically perfect. The JSON is valid. But it accidentally threatened to cancel the customer's account instead of apologizing.
No guardrail caught it because no guardrail was checking meaning.
With Semantix v0.1.4, any MCP-capable agent — Claude Desktop, Claude Code, Cursor, or your own — can validate text against semantic intents as a tool call. Zero code changes. Zero API keys. Runs locally.
The Problem: Agents Don't Verify Their Own Output
LLM agents are getting more autonomous. They write emails, generate reports, draft code reviews, and respond to customers. But they operate on a trust-based system: generate output, ship it, hope for the best.
What if the agent could verify its own output before sending it? Not structurally — semantically. "Does this text actually do what I intended?"
That's what the Semantix MCP server enables.
What's New in v0.1.4: The Universal Standard Release
MCP Server: verify_text_intent
Semantix now ships a built-in MCP server that exposes a single, powerful tool: verify_text_intent.
Any MCP-capable agent can call it:
{
"text": "We sincerely apologize for the inconvenience and have credited your account.",
"intent_description": "The text must be a sincere customer apology that offers a concrete resolution.",
"threshold": 0.5
}
Response:
{
"score": 0.91,
"passed": true,
"reason": null
}
If it fails, the agent gets a structured correction suggestion — enabling cross-agent self-healing:
{
"score": 0.18,
"passed": false,
"reason": null,
"correction_suggestion": "## Semantix Verification Failed\n\n### What went wrong\n- **Score:** 0.1800 (threshold 0.5 not met)\n\n### What is required\nThe text must be a sincere customer apology...\n\n### Rejected output\n```
\nYour account has been flagged for termination.\n
```\n\nPlease generate a new response that satisfies the requirement above."
}
The agent reads the correction, regenerates, and tries again. Self-healing across any agent framework — no SDK integration needed.
Setup: 3 Lines
pip install "semantix-ai[mcp,nli]"
Add to your claude_desktop_config.json:
{
"mcpServers": {
"semantix-verify": {
"command": "mcp",
"args": ["run", "semantix/mcp/server.py"],
"cwd": "/path/to/your/semantix-ai"
}
}
}
That's it. Claude Desktop (or any MCP client) can now call verify_text_intent before responding.
NLI Accuracy Fixes
v0.1.4 also ships critical fixes to the NLI judge that dramatically improve scoring accuracy:
Entailment index fix — The model's label order is {0: contradiction, 1: entailment, 2: neutral}. We were accidentally reading the neutral logit instead of entailment. Fixed.
Softmax calibration — Raw logits are now converted to true 0-1 probability scores via apply_softmax=True. Before this, scores were unbounded and hard to threshold meaningfully.
Progressive tense hypothesis — NLI cross-encoders score dramatically better when the hypothesis is framed as ongoing action. "The text must politely decline an invitation" becomes "Someone is politely declining an invitation." This single change pushed scores from ~0.3 to 0.88+ for well-written declines.
Why MCP?
MCP (Model Context Protocol) is becoming the universal standard for agent-tool communication. By shipping Semantix as an MCP tool rather than a library-only solution, we get:
- Universal compatibility — Works with Claude Desktop, Claude Code, Cursor, and any future MCP client
- Zero integration code — Agents call it as a tool, not as a library import
- Language agnostic — Your agent doesn't need to be written in Python
-
Self-healing bridge — The
correction_suggestionfield gives any agent enough context to retry intelligently
This is what "validate meaning, not shape" looks like at the agent layer.
The Architecture
Your Agent (any MCP client)
|
v
MCP tool call: verify_text_intent
|
v
Semantix MCP Server (FastMCP)
|
v
NLIJudge (lazy-loaded singleton)
|
v
Cross-encoder: "Does this text entail the intent?"
|
+-- score >= threshold --> {"passed": true, "score": 0.91}
|
+-- score < threshold --> {"passed": false, "correction_suggestion": "..."}
The NLI model loads lazily on the first tool call — server startup is instant. The judge runs locally on CPU with no API keys.
20 Automated Tests, Zero Model Loading
The MCP test suite covers tool registration, response schema, correction suggestions, and dependency error handling — all without loading the actual NLI model. We mock the judge so tests run in milliseconds:
@patch("semantix.mcp.server._get_judge")
def test_failing_response_includes_correction(self, mock_get):
mock_get.return_value = _mock_judge(passed=False, score=0.15)
result = json.loads(verify_text_intent("bad text", "some intent"))
assert result["passed"] is False
assert "correction_suggestion" in result
The server also handles missing dependencies gracefully — if sentence-transformers isn't installed, it returns an error JSON instead of crashing.
Get Started
pip install "semantix-ai[mcp,nli]"
# Test it locally
python -c "
from semantix.mcp.server import verify_text_intent
print(verify_text_intent(
'I appreciate the invitation but unfortunately I will not be able to attend.',
'The text must politely decline an invitation'
))
"
# Run as MCP server
mcp run semantix/mcp/server.py
What's Next
Semantix is a semantic type system for AI outputs. v0.1.3 added self-healing retries. v0.1.4 makes it universal via MCP. The roadmap includes:
- More judge backends — Anthropic, Cohere, local LLMs via Ollama
- Pydantic integration — Semantic fields inside Pydantic models
- Streaming validation — Real-time intent checking during generation
Links
- GitHub: github.com/labrat-akhona/semantix-ai
- PyPI: pypi.org/project/semantix-ai
-
Install:
pip install "semantix-ai[mcp,nli]"
Star the repo if this is useful. Open an issue if it isn't.
Built by Akhona Eland in South Africa.
Top comments (2)
One surprising insight is that while many focus on building AI agents, the real challenge is integrating them into existing workflows. We've seen that the tech isn't the bottleneck - it's aligning outputs with business processes. In my experience with enterprise teams, starting with a clear mapping of decision points where AI adds value can prevent the dreaded "pilot purgatory." Think practically about where an AI agent can replace or enhance a human decision-maker, rather than just adding a cool feature. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)
Ali, this is a profound insight. 'Pilot Purgatory' is exactly what happens when the 'Semantic Gap' isn't addressed.
We’ve noticed that enterprise teams are hesitant to move past the pilot phase because they lack a Deterministic Trust Layer. They can't audit 'vibes,' and they can't rely on 'hope-driven prompts.'
That’s why I built Semantix—to turn those 'decision points' you mentioned into Semantic Contracts. If the AI deviates from the business process, the system doesn't just fail; it catches the error and self-corrects before it ever touches the workflow.
Currently working on an Immutable Audit Trail for v0.1.5 to provide the exact 'mapping' you mentioned for compliance. Would love to hear your thoughts on how important 'Auditability' is for the enterprise teams you work with!