DEV Community

correctover
correctover

Posted on

Introducing correctover-patronus: 6-Dimensional Verification for Patronus AI

The Problem

LLM evaluation tools like Patronus AI excel at hallucination detection, toxicity checks, and semantic relevance. But they don't catch the structural failures:

  • A JSON response missing required fields
  • A function call with malformed parameters
  • Output that violates schema constraints
  • Latency budget overruns silently degrading UX
  • Cost explosions from runaway token usage

These aren't hallucinations. They're verification failures.

The Solution

correctover-patronus is an adapter that runs Correctover's 87 deterministic verification rules as native Patronus evaluators. Every verdict comes with a recomputable proof hash — meaning you can verify the verifier.

pip install correctover-patronus
Enter fullscreen mode Exit fullscreen mode

The 6 Dimensions

Dimension What It Checks Example
Structure Output format validity JSON parses correctly
Schema Field presence & types Required fields exist
Identity Semantic relevance to input Response addresses the question
Integrity Forbidden pattern absence No Tracebacks or error messages
Latency Response time budget Under 30s threshold
Cost Token usage budget Under 10k token limit

Usage

Full 6-Dimension Verification

from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig

config = CorrectoverConfig(
    min_confidence=0.7,
    latency_rules={"max_ms": 5000},
    cost_rules={"max_tokens": 4000}
)

evaluator = CorrectoverEvaluator(config=config)
result = evaluator.evaluate(
    task_input="Summarize this article...",
    task_output="The article discusses...",
    task_context={"source": "article", "word_count": 1500}
)

print(f"Overall: {result.score:.2f} ({'PASS' if result.pass_ else 'FAIL'})")
print(f"Proof hash: {result.metadata['proof_hash']}")
for dim, info in result.metadata['dimensions'].items():
    print(f"  {dim}: {info['status']} (score={info['score']:.2f})")
Enter fullscreen mode Exit fullscreen mode

Individual Dimensions

from correctover_patronus import correctover_structure, correctover_integrity

# Check if output is valid JSON
is_valid = correctover_structure(task_output='{"key": "value"}')

# Check for error patterns
is_clean = correctover_integrity(task_output="Result: 42")
Enter fullscreen mode Exit fullscreen mode

Patronus Experiments Integration

from correctover_patronus import correctover_full

# Use in Patronus experiments for systematic benchmarking
results = patronus.evaluate(
    evaluators=[correctover_full],
    dataset=my_dataset,
    experiment_name="correctover-benchmark"
)
Enter fullscreen mode Exit fullscreen mode

Recomputable Proof

Every evaluation produces a proof_hash in the metadata. This hash covers:

  • The input text
  • The output text
  • The verification rules applied
  • The verdict for each dimension

You can re-run the same verification and get the same hash. No black boxes.

Architecture

┌─────────────────┐     ┌──────────────────────┐
│   Patronus AI   │────>│  correctover-patronus │
│   Framework     │     │  (this adapter)       │
└─────────────────┘     └──────────┬───────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │     Correctover SDK         │
                    │  (87 rules, 6 dimensions)   │
                    │  P50 verification: 22us     │
                    └──────────────┬──────────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │   Verification Request      │
                    │   -> Verdict + Proof Hash    │
                    │   -> Metadata + Tags         │
                    └─────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Performance

  • P50 verification latency: 22μs
  • Self-healing rules: 87
  • SDK size: 586KB
  • Zero external API calls — fully deterministic, local execution

Links


Failover ≠ Correctover.

*Correctover verifies. Patronus evaluates. Together: complete output assurance.

Top comments (0)