You ask your LLM to write a polite decline to a meeting invite. It returns:
"I appreciate the invitation, but I would rather set myself on fire than attend your team-building retreat."
You run it through your Pydantic model. It passes. It's a string. The right length. Valid UTF-8. Technically a "response."
But it's not a polite decline. It's a career-ending email.
This is the gap nobody's filling. We have type systems for data structures — int, str, Pydantic models. We validate shape obsessively. But we have nothing for meaning.
Until now.
Introducing Semantix
Semantix is a semantic type system for LLM outputs. Instead of checking "is this a string?", it checks "does this string actually say what it's supposed to say?"
from semantix import Intent, validate_intent
class ProfessionalDecline(Intent):
"""The text must politely decline an invitation
without being rude or aggressive."""
@validate_intent
def decline_invite(event: str) -> ProfessionalDecline:
return call_my_llm(event)
result = decline_invite("the company retreat")
# ✓ Validated — the output actually IS a polite decline
# ✗ Raises SemanticIntentError if the LLM went off the rails
Three lines of setup. One decorator. Your LLM output is now semantically typed.
How It Works
The core idea is simple:
- You define an Intent — a class whose docstring describes the semantic contract.
- You decorate your LLM function — the return type hint tells Semantix what to validate against.
- A Judge evaluates the output — comparing what the LLM said against what it was supposed to mean.
The Judge is the interesting part. Semantix ships with three:
EmbeddingJudge — compares sentence embeddings using cosine similarity. Fast, runs locally, no API key. Good for clear-cut intents.
from semantix import validate_intent, EmbeddingJudge
@validate_intent(judge=EmbeddingJudge())
def summarize(text: str) -> ConciseSummary:
return call_llm(text)
LLMJudge — asks GPT-4o-mini "does this text satisfy this requirement? Yes or No." More accurate, needs an API key, costs fractions of a cent per call.
NLIJudge — uses a cross-encoder NLI model to check if the output entails the intent. Best of both worlds: accurate like an LLM judge, local like an embedding judge.
You pick the speed/accuracy tradeoff that fits your use case. And you can swap judges without changing any other code.
The Feature That Made Me Build This
Here's what pushed me over the edge. I was building an AI agent for a client that needed to generate customer-facing responses. The responses had to be:
- Professional in tone
- Factually grounded in the company's data
- Free of any promises or commitments
Pydantic could check that the response was a non-empty string under 500 characters. Great. But the LLM kept slipping in phrases like "I guarantee this will be resolved" — structurally valid, semantically dangerous.
So I built Semantix. And the feature I'm most proud of is smart retries:
from semantix import validate_intent, get_last_failure, EmbeddingJudge
@validate_intent(judge=EmbeddingJudge(), retries=3)
def respond(query: str) -> SafeCustomerResponse:
hint = ""
if failure := get_last_failure():
hint = (
f"\n\nYour previous attempt scored {failure.score:.2f}. "
"Remove any promises or guarantees."
)
return call_llm(f"Respond to: {query}{hint}")
get_last_failure() gives your LLM function access to the reason the previous attempt failed. So each retry isn't just "try again" — it's "try again, but here's what went wrong." The LLM gets smarter with each attempt.
Composable Intents
Real-world requirements are rarely one-dimensional. Semantix lets you combine intents:
from semantix import AllOf, AnyOf
# Must satisfy ALL — polite AND positive
SafeResponse = ProfessionalTone & NoPromises & FactuallyGrounded
# Must satisfy AT LEAST ONE — either formal or casual decline
FlexibleDecline = AnyOf(FormalDecline, CasualDecline)
@validate_intent(judge=EmbeddingJudge())
def respond(msg: str) -> SafeResponse:
return call_llm(msg)
The & and | operators work on Intent classes directly. Under the hood, AllOf concatenates the docstrings with "AND" and uses the minimum threshold. AnyOf uses "OR" and the maximum threshold.
Streaming Support
If you're streaming LLM responses (and you probably should be), Semantix validates once the full stream is assembled:
from semantix import StreamCollector
collector = StreamCollector(ProfessionalDecline, judge=my_judge)
for chunk in collector.wrap(llm_stream()):
print(chunk, end="") # stream to user in real-time
result = collector.result() # validate the complete output
Your users see the response streaming in. Behind the scenes, Semantix is collecting chunks. The moment the stream ends, it validates. If it fails, you catch the error and handle it — retry, fall back to a template, or flag for human review.
How It Compares
I built Semantix because the existing tools solve a different problem:
| Semantix | Guardrails AI | NeMo Guardrails | Instructor | |
|---|---|---|---|---|
| Validates meaning | ✅ | ❌ Schema-focused | ✅ Dialogue rails | ❌ Schema-focused |
| Zero required deps | ✅ | ❌ | ❌ | ❌ |
| Works with any LLM | ✅ Any function | ⚠️ Wrappers | ⚠️ Config files | ⚠️ Patched clients |
| Pluggable backends | ✅ 3 built-in + custom | ❌ | ❌ | ❌ |
| Lines to validate | ~5 | ~20+ | ~30+ | ~10 |
Semantix isn't a replacement for Pydantic or Guardrails. It's the layer above them. After you know the shape is right, verify the meaning is right too.
Try It
pip install semantix-ai
# With embedding judge (fast, local)
pip install "semantix-ai[embeddings]"
# With OpenAI judge (accurate)
pip install "semantix-ai[openai]"
Check out the repo: github.com/labrat-akhona/semantix-ai
It's MIT licensed, Python 3.10+, and the core has zero dependencies. I'd love feedback — open an issue or drop a comment below.
I'm Akhona, an automation engineer based in South Africa. I build AI-powered tools and integrations. You can find me on GitHub.
Top comments (0)