The Problem
LLM evaluation tools like Patronus AI excel at hallucination detection, toxicity checks, and semantic relevance. But they don't catch the structural failures:
- A JSON response missing required fields
- A function call with malformed parameters
- Output that violates schema constraints
- Latency budget overruns silently degrading UX
- Cost explosions from runaway token usage
These aren't hallucinations. They're verification failures.
The Solution
correctover-patronus is an adapter that runs Correctover's 87 deterministic verification rules as native Patronus evaluators. Every verdict comes with a recomputable proof hash — meaning you can verify the verifier.
pip install correctover-patronus
The 6 Dimensions
| Dimension | What It Checks | Example |
|---|---|---|
| Structure | Output format validity | JSON parses correctly |
| Schema | Field presence & types | Required fields exist |
| Identity | Semantic relevance to input | Response addresses the question |
| Integrity | Forbidden pattern absence | No Tracebacks or error messages |
| Latency | Response time budget | Under 30s threshold |
| Cost | Token usage budget | Under 10k token limit |
Usage
Full 6-Dimension Verification
from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig
config = CorrectoverConfig(
min_confidence=0.7,
latency_rules={"max_ms": 5000},
cost_rules={"max_tokens": 4000}
)
evaluator = CorrectoverEvaluator(config=config)
result = evaluator.evaluate(
task_input="Summarize this article...",
task_output="The article discusses...",
task_context={"source": "article", "word_count": 1500}
)
print(f"Overall: {result.score:.2f} ({'PASS' if result.pass_ else 'FAIL'})")
print(f"Proof hash: {result.metadata['proof_hash']}")
for dim, info in result.metadata['dimensions'].items():
print(f" {dim}: {info['status']} (score={info['score']:.2f})")
Individual Dimensions
from correctover_patronus import correctover_structure, correctover_integrity
# Check if output is valid JSON
is_valid = correctover_structure(task_output='{"key": "value"}')
# Check for error patterns
is_clean = correctover_integrity(task_output="Result: 42")
Patronus Experiments Integration
from correctover_patronus import correctover_full
# Use in Patronus experiments for systematic benchmarking
results = patronus.evaluate(
evaluators=[correctover_full],
dataset=my_dataset,
experiment_name="correctover-benchmark"
)
Recomputable Proof
Every evaluation produces a proof_hash in the metadata. This hash covers:
- The input text
- The output text
- The verification rules applied
- The verdict for each dimension
You can re-run the same verification and get the same hash. No black boxes.
Architecture
┌─────────────────┐ ┌──────────────────────┐
│ Patronus AI │────>│ correctover-patronus │
│ Framework │ │ (this adapter) │
└─────────────────┘ └──────────┬───────────┘
│
┌──────────────▼──────────────┐
│ Correctover SDK │
│ (87 rules, 6 dimensions) │
│ P50 verification: 22us │
└──────────────┬──────────────┘
│
┌──────────────▼──────────────┐
│ Verification Request │
│ -> Verdict + Proof Hash │
│ -> Metadata + Tags │
└─────────────────────────────┘
Performance
- P50 verification latency: 22μs
- Self-healing rules: 87
- SDK size: 586KB
- Zero external API calls — fully deterministic, local execution
Links
- GitHub: https://github.com/Correctover/correctover-patronus
- PyPI: https://pypi.org/project/correctover-patronus/
- Core SDK: https://github.com/Correctover/correctover-crewai
- Standards: https://github.com/Correctover/correctover-crewai/blob/main/STANDARDS.md
- Conformance Board: https://api.babyblueviper.com/conformance
Failover ≠ Correctover.™
*Correctover verifies. Patronus evaluates. Together: complete output assurance.
Top comments (0)