What is Director-Class AI?
An open-source Python library that guards LLM output in real time. It watches tokens as they stream and halts generation the moment it detects a hallucination.
It uses NLI (Natural Language Inference via DeBERTa/FactCG) and optional RAG knowledge grounding to score each claim against source documents.
pip install director-ai
Two-line integration:
from director_ai import guard
client = guard(openai.OpenAI()) # wraps any OpenAI/Anthropic client
Benchmarks (measured, not aspirational)
| Metric | Value | Conditions |
|---|---|---|
| Balanced accuracy | 75.8% | FactCG on LLM-AggreFact (29,320 samples) |
| GPU latency | 14.6ms/pair | GTX 1060, ONNX, batch=16 |
| L40S latency | 0.5ms/pair | FP16, batch=32 |
| E2E catch rate | 90.7% | Hybrid mode, 600 HaluEval traces |
| Rust BM25 speedup | 10.2x | Over pure Python implementation |
Framework Integrations
LangChain, LlamaIndex, LangGraph, CrewAI, Haystack, DSPy, Semantic Kernel, and SDK Guard (wraps OpenAI/Anthropic/Bedrock/Gemini/Cohere clients).
Honest Limitations
- NLI-only scoring needs KB grounding for domain use (medical FPR=100% without KB)
- ONNX CPU is slow (383ms/pair) — GPU recommended
- Long documents need >=16GB VRAM
- Summarisation accuracy weakest (AggreFact-CNN 68.8%)
Quality
- 3,545 tests, 91% coverage
- Sigstore-signed releases, SLSA provenance
- OpenSSF Best Practices: 100%
- 19 badges of CI/security health
Links
- GitHub: github.com/anulum/director-ai
- Docs: anulum.github.io/director-ai
- PyPI: pypi.org/project/director-ai
AGPL-3.0 with commercial licensing available.
Would love feedback from anyone working on LLM reliability, RAG pipelines, or AI safety!
Top comments (0)