<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Patrick Joubert</title>
    <description>The latest articles on DEV Community by Patrick Joubert (@patrick_joubert_428bd9bc3).</description>
    <link>https://dev.to/patrick_joubert_428bd9bc3</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3173796%2F04fc44ca-c9b3-4875-beaa-26d46931211b.jpg</url>
      <title>DEV Community: Patrick Joubert</title>
      <link>https://dev.to/patrick_joubert_428bd9bc3</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/patrick_joubert_428bd9bc3"/>
    <language>en</language>
    <item>
      <title>AI Agents in production: Why guardrails fail and what actually works</title>
      <dc:creator>Patrick Joubert</dc:creator>
      <pubDate>Mon, 23 Feb 2026 00:39:28 +0000</pubDate>
      <link>https://dev.to/patrick_joubert_428bd9bc3/ai-agents-in-production-why-guardrails-fail-and-what-actually-works-144p</link>
      <guid>https://dev.to/patrick_joubert_428bd9bc3/ai-agents-in-production-why-guardrails-fail-and-what-actually-works-144p</guid>
      <description>&lt;h1&gt;
  
  
  AI Agents in Production: Why Guardrails Fail and What Actually Works
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Monitoring is too late. Guardrails are too dumb. Here's the missing layer between your LLM and your database.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your AI agent just refunded $50,000 to the wrong customer.&lt;/p&gt;

&lt;p&gt;Your observability dashboard caught it... 3 minutes later. Your LLM guardrails? They checked for prompt injection and toxicity, but had no idea the customer ID was hallucinated. Your compliance team is asking questions you can't answer.&lt;/p&gt;

&lt;p&gt;This isn't a hypothetical. After building and selling three AI companies (Recast.AI to SAP, Ponicode to CircleCI), I've seen this pattern repeat: &lt;strong&gt;agents fail in production not because the LLM is bad, but because there's a missing layer in the stack.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Your Agent Stack Has a Blind Spot
&lt;/h2&gt;

&lt;p&gt;Here's the typical AI agent architecture today:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input
    ↓
Agent Framework (LangChain, CrewAI, etc.)
    ↓
LLM (GPT-4, Claude, etc.)
    ↓
Tools/APIs (Stripe, Database, Email, etc.)
    ↓
Production Systems
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what's missing? &lt;strong&gt;There's no layer that validates decisions before execution.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your agent can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✓ Query your database&lt;/li&gt;
&lt;li&gt;✓ Call your Stripe API&lt;/li&gt;
&lt;li&gt;✓ Send emails to customers&lt;/li&gt;
&lt;li&gt;✓ Make refunds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But nothing sits between "the LLM decided to do this" and "it's now done in production."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Current Solutions Don't Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Observability = Post-Mortem
&lt;/h3&gt;

&lt;p&gt;Tools like Langfuse, LangSmith, and Arize are excellent for debugging &lt;em&gt;after&lt;/em&gt; something breaks. But they're fundamentally reactive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They log what happened&lt;/li&gt;
&lt;li&gt;They alert you when metrics degrade&lt;/li&gt;
&lt;li&gt;They help you understand failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;They don't prevent bad decisions from executing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When your agent hallucinates a customer ID and processes a refund, observability tools take a perfect screenshot of the disaster.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. LLM Guardrails = Stateless Theater
&lt;/h3&gt;

&lt;p&gt;Guardrails (NeMo, Llama Guard, Anthropic's Constitutional AI) check for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt injection&lt;/li&gt;
&lt;li&gt;Toxic output&lt;/li&gt;
&lt;li&gt;PII leakage&lt;/li&gt;
&lt;li&gt;Jailbreaks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is critical for safety, but &lt;strong&gt;guardrails are stateless&lt;/strong&gt;. They don't know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is this customer ID real?&lt;/li&gt;
&lt;li&gt;Does this user have permission for this action?&lt;/li&gt;
&lt;li&gt;Is this refund amount consistent with the order history?&lt;/li&gt;
&lt;li&gt;Has this decision been audited for compliance?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A guardrail will happily approve: &lt;em&gt;"Process a refund of $50,000 for customer ID cust_hallucinated123"&lt;/em&gt; because the text itself looks fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. RAG = Context Retrieval, Not Validation
&lt;/h3&gt;

&lt;p&gt;Vector databases help agents retrieve relevant context, but they don't enforce it. An agent can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieve the correct customer data&lt;/li&gt;
&lt;li&gt;Ignore it completely&lt;/li&gt;
&lt;li&gt;Hallucinate different data&lt;/li&gt;
&lt;li&gt;Execute anyway&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;RAG gives your agent a library card. It doesn't make sure they read the books.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Missing Layer: Decision Runtimes
&lt;/h2&gt;

&lt;p&gt;After our third company exit, we started working with AI teams at Series A startups and enterprise SaaS companies. The pattern was universal: &lt;strong&gt;agents needed a layer between decision and execution that could validate context, enforce policies, and audit actions in real-time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We call this a &lt;strong&gt;Decision Runtime&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is a Decision Runtime?
&lt;/h3&gt;

&lt;p&gt;A decision runtime sits between your agent and your production systems. Before any action executes, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Validates context&lt;/strong&gt; — Are the entities in this decision real?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enforces policies&lt;/strong&gt; — Does this user have permission? Is this within limits?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audits decisions&lt;/strong&gt; — Creates an immutable record for compliance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prevents hallucinations&lt;/strong&gt; — Blocks actions based on non-existent data&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Think of it like a type system for agent behavior. Your code won't compile if types don't match. Your agent won't execute if context doesn't validate.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How It Works: The Technical Architecture
&lt;/h2&gt;

&lt;p&gt;Here's the same stack with a decision runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input
    ↓
Agent Framework
    ↓
LLM Output
    ↓
Decision Runtime ← [validates before execution]
    ↓
Tools/APIs (only if validated)
    ↓
Production Systems
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example: Preventing the $50K Hallucinated Refund
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Agent wants to execute this&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;process_refund&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cust_hallucinated123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer_request&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Decision runtime validates context&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;decisionRuntime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Response:&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer_id does not exist in context graph&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;blocked&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;alternative&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Request customer verification before refund&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Agent receives feedback, can retry with correct context&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The decision runtime maintains a &lt;strong&gt;context graph&lt;/strong&gt; — a real-time representation of entities, relationships, and state that the agent must respect.&lt;/p&gt;

&lt;p&gt;Unlike RAG (which suggests context), the decision runtime &lt;strong&gt;enforces&lt;/strong&gt; it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Context Graph (ands Hypergraphs), not traditional databases?
&lt;/h2&gt;

&lt;p&gt;We built our decision runtime on hypergraph architecture because agent decisions aren't simple key-value lookups:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Does customer X exist?" → Database query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real agent decision:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Can user A refund customer B's order C, given policy D, audit trail E, and compliance requirement F?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is an &lt;strong&gt;n-ary relationship&lt;/strong&gt; across multiple entities. Hypergraphs model this naturally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hyperedge: refund_decision_001
  ├─ user: user_123 (role: support_agent)
  ├─ customer: cust_456 (status: verified)
  ├─ order: order_789 (amount: $50K, date: 2024-01-15)
  ├─ policy: refund_policy_v2 (max_amount: $10K)
  └─ audit: requires_manager_approval

Validation: FAIL (amount exceeds policy limit)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entire decision context lives in a single traversable structure. No joins, no latency, no hallucination gaps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Fintech: Preventing Fraudulent Transactions
&lt;/h3&gt;

&lt;p&gt;An AI agent processing payments needs to validate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer identity&lt;/li&gt;
&lt;li&gt;Transaction history&lt;/li&gt;
&lt;li&gt;Risk score&lt;/li&gt;
&lt;li&gt;Regulatory compliance&lt;/li&gt;
&lt;li&gt;Fraud patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A decision runtime blocks any transaction where context doesn't align, even if the LLM thinks it should proceed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Healthcare: HIPAA-Compliant Agent Actions
&lt;/h3&gt;

&lt;p&gt;Medical AI agents must audit every decision:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who accessed what patient data?&lt;/li&gt;
&lt;li&gt;Was consent verified?&lt;/li&gt;
&lt;li&gt;Is this action within protocol?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Decision runtimes create immutable audit trails that satisfy regulatory requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. SaaS: Customer-Facing Agents
&lt;/h3&gt;

&lt;p&gt;Support agents powered by LLMs need boundaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can this agent offer a refund to this customer tier?&lt;/li&gt;
&lt;li&gt;Is this discount within policy limits?&lt;/li&gt;
&lt;li&gt;Does this user have permission for this account action?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a decision runtime, you're hoping the LLM "remembers" your rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Built: Rippletide
&lt;/h2&gt;

&lt;p&gt;After validating this with 30+ AI engineering teams, we built &lt;a href="https://www.rippletide.com" rel="noopener noreferrer"&gt;Rippletide&lt;/a&gt; — the first production context graph for AI agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context graph store&lt;/strong&gt; — Real-time entity and relationship tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy enforcement engine&lt;/strong&gt; — Declarative rules for agent behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit-first architecture&lt;/strong&gt; — Every decision is immutable and traceable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framework-agnostic&lt;/strong&gt; — Works with LangChain, CrewAI, raw LLM APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Bedrock integration&lt;/strong&gt; — Native support for Claude, Llama, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're currently working with 8 design partners across fintech, healthtech, and AI-native SaaS companies.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future: Decision Runtimes as Infrastructure Primitives
&lt;/h2&gt;

&lt;p&gt;Five years ago, observability wasn't a "nice to have", it became infrastructure. Datadog, New Relic, and Sentry are now standard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision runtimes are following the same path.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As agents move from demos to production, the gap between "the LLM decided" and "it executed in prod" becomes unacceptable. Regulated industries won't adopt agents without it. Enterprise buyers won't trust agents without it.&lt;/p&gt;

&lt;p&gt;The companies shipping reliable agents in 2026 will have three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; — What happened?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Guardrails&lt;/strong&gt; — Is the output safe?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision Runtime&lt;/strong&gt; — Should this execute?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;If you're shipping AI agents to production and need to solve this problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visit &lt;a href="https://www.rippletide.com" rel="noopener noreferrer"&gt;rippletide.com&lt;/a&gt; to learn more&lt;/li&gt;
&lt;li&gt;We're accepting 2-3 more design partners for our beta program&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The best time to add a decision runtime is before your agent makes a $50K mistake. The second best time is now.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Patrick Joubert is the founder and CEO of &lt;a href="https://www.rippletide.com" rel="noopener noreferrer"&gt;Rippletide&lt;/a&gt;. He previously founded and sold Recast.AI (acquired by SAP), Ponicode (acquired by CircleCI), and Beamap (acquired by Steria). Rippletide recently raised $5M in seed funding.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>agentaichallenge</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
