Catching LLM Hallucinations at the Gateway with Patronus AI and Bifrost

Use Bifrost to integrate Patronus AI for real-time detection of hallucinations, PII leaks, toxicity, and prompt injection attacks across your AI infrastructure.

LLMs produce fluent responses that can be factually incorrect, and these errors slip past application logs undetected. When you're running AI in production, safeguards for hallucination detection and comprehensive safety evaluation shift from nice-to-have to mission-critical. Bifrost, the open-source Go-based AI gateway by Maxim AI, intercepts these risks by plugging in Patronus AI as a guardrail integration at the gateway level. This guide covers how to wire up Patronus AI inside Bifrost so every prompt and completion is screened for factual integrity, sensitive data exposure, harmful content, and attack patterns before it reaches your users.

The Core Problem: LLM Unreliability at Scale

Models sometimes output plausible-sounding but factually groundless text, a phenomenon researchers call hallucination. It's not a bug; it's baked into how LLMs work. OpenAI's recent research shows that training and evaluation reward confident guessing over admitting uncertainty, which is why even state-of-the-art models keep producing convincing falsehoods.

The operational consequences are severe. According to Gartner, more than 40% of AI agent projects will be terminated by 2027, with inadequate risk controls cited as a top reason. If users only discover bad output after it's already served to customers or used in decisions, it's too late.

Safety evaluation in LLM systems spans multiple dimensions:

Factual grounding: are claims supported by evidence in the context?
Data privacy: do prompts or responses expose personal or sensitive information?
Content safety: does output include harmful, abusive, or illegal material?
Instruction integrity: can user input trick the model into ignoring its guidelines?
Behavioral bias: does output exhibit age, gender, or racial stereotypes?
Output format: does structured output (JSON, CSV, code) conform to schema?

Building these checks into individual applications means duplicating validation logic everywhere. A gateway-based approach centralizes evaluation: one rules engine, one audit trail, consistent policy across every provider and model.

How Bifrost Guardrails Enforce Safety Checks

Bifrost separates evaluation into two components: the provider (the service that performs checks) and the rule (when and what to evaluate). Guardrails are an Enterprise feature designed for teams that need uniform policy enforcement across all AI traffic rather than scattered, per-app logic.

A guardrail provider is essentially a connection to an external evaluator backend, Patronus AI in this case. A guardrail rule connects that provider to specific conditions: which traffic to inspect, whether to screen the input, the output, or both, and what fraction to sample. Rules are written in CEL (Common Expression Language), so you can target checks to specific routes, models, or provider combinations.

When a request matches the rule's conditions, Bifrost ships the selected text to the provider for evaluation. If any evaluator fails, Bifrost returns GUARDRAIL_INTERVENED and blocks the unsafe response. This creates a centralized enforcement gate for AI governance sitting upstream of all connected providers instead of scattered across services.

Integrating Patronus AI: Step-by-Step Setup

Connecting Patronus AI to Bifrost involves four phases: creating the provider configuration, specifying evaluators, attaching the provider to a rule, and letting that rule decide which traffic gets checked. You'll need a Patronus API key (available from the Patronus dashboard).

Phase 1: Define the Patronus provider

Register a provider by setting provider_name: "patronus-ai" and supplying your API key. Bifrost can read it directly or fetch it from an environment variable like env.PATRONUS_API_KEY to keep secrets out of config files. The API endpoint defaults to https://api.patronus.ai, with Bifrost appending /v1/evaluate when calling Patronus's evaluation API.

Phase 2: Configure which evaluators to run

Each provider executes one or more evaluators (at least one is required). An evaluator definition specifies the Patronus evaluator name (e.g., pii, toxicity-perspective-api, judge), an optional criteria profile (e.g., patronus:is-concise), and an explain strategy controlling when Patronus returns detailed explanations (never, on-fail, on-success, or always).

This example registers a provider with two evaluators, one scanning for PII and one checking response conciseness:

{
  "guardrails_config": {
    "guardrail_providers": [
      {
        "id": 40,
        "provider_name": "patronus-ai",
        "policy_name": "patronus-quality-checks",
        "enabled": true,
        "timeout": 30,
        "config": {
          "api_key": "env.PATRONUS_API_KEY",
          "base_url": "https://api.patronus.ai",
          "evaluators": [
            { "evaluator": "pii", "explain_strategy": "on-fail" },
            {
              "evaluator": "judge",
              "criteria": "patronus:is-concise",
              "explain_strategy": "on-fail"
            }
          ],
          "capture": "none"
        }
      }
    ]
  }
}

Phase 3: Bind the provider to a guardrail rule

Rules reference a provider by config ID and define the triggering conditions. Below, we run Patronus checks on all OpenAI responses, sampling 100% of traffic:

{
  "guardrail_rules": [
    {
      "id": 401,
      "name": "patronus-openai-output",
      "description": "Run Patronus checks on OpenAI responses",
      "enabled": true,
      "cel_expression": "provider == 'openai'",
      "apply_to": "output",
      "sampling_rate": 100,
      "timeout": 30,
      "provider_config_ids": [40]
    }
  ]
}

Alternatively, navigate the Bifrost dashboard: Guardrails > Providers > select Patronus AI > add evaluators > attach to a rule under Guardrails > Configuration. For infrastructure-as-code teams, the management API at /api/guardrails/patronus-ai supports the same workflows.

Available Patronus Evaluators and Built-in Presets

Patronus covers a broad spectrum of safety checks, and Bifrost surfaces common ones as dashboard presets. Hallucination detection itself uses a Patronus evaluator configured with an evaluator ID or criteria from your Patronus workspace:

PII Detection: the pii evaluator flags personally identifiable data
Toxicity Screening: the toxicity-perspective-api evaluator catches harmful language
Prompt Injection Protection: the judge evaluator with patronus:prompt-injection criteria
Refusal Detection: the judge evaluator with patronus:answer-refusal criteria
Bias Mitigation: patronus:no-age-bias, patronus:no-gender-bias, patronus:no-racial-bias
Response Quality: patronus:is-concise, patronus:is-helpful, patronus:is-polite
Structured Validation: patronus:is-json, patronus:is-code, patronus:is-csv

For domain-specific needs, use a custom evaluator by supplying your own evaluator ID and optional criteria. This path lets you deploy hallucination or groundedness evaluators tuned to your retrieval pipelines and business context, since those live in your Patronus account rather than in fixed presets. Pairing general safety screening with custom factuality checks in a single rule gives you layered, multi-dimension protection.

Scaling Safely: Best Practices for Production Deployments

Adding evaluation at the gateway introduces latency and cost, so tune rules to match the risk profile of each route. Here's how to keep Patronus checks effective without throttling throughput:

Set the sampling_rate to evaluate only a representative slice of low-risk traffic while screening all of high-risk flows (e.g., customer-facing completions). Use CEL expressions to apply expensive evaluators only where they matter; for instance, factuality checks on RAG endpoints but not on internal tooling. The capture field controls what Patronus stores: none (default) keeps results out of the Patronus dashboard, fails-only captures failures for debugging and audit, and all records everything.

Pair fails-only capture with Bifrost's audit logs to build a tamper-proof record of guardrail interventions for compliance audits (SOC 2, GDPR, HIPAA). Tune the timeout to prevent slow evaluator calls from blocking your request pipeline.

In regulated or high-throughput settings, layer Patronus safety checks with Bifrost's other governance tools. Secrets detection catches leaked API keys and tokens, pattern-based redaction enforces custom masking rules, and role-based access control gates which teams can reach which providers.

Since Bifrost supports on-premises and VPC-isolated deployments, evaluation traffic and logs stay inside your infrastructure boundary. Real-time monitoring through Bifrost's telemetry and observability shows which rules fire most often and where risky output clusters across all supported providers, giving you continuous visibility into safety performance.

Next Steps: Deploy and Monitor

LLM safety evaluation works best when uniform across your entire model ecosystem, not confined to individual services. Deploying Patronus AI inside the Bifrost AI gateway lets your platform team enforce a single, consistent safety posture for PII, toxicity, prompt injection, and factuality checks, with granular control over what runs, when, and where.

Ready to add safety guardrails to your AI infrastructure? Book a demo with the Bifrost team to see enterprise guardrails in action, or explore the resource library for deeper guides on governance and deployment patterns.