<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dominic Peters</title>
    <description>The latest articles on DEV Community by Dominic Peters (@dombinic).</description>
    <link>https://dev.to/dombinic</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3982707%2Fa9876d0b-b193-4dd7-a4eb-6df01f6d3148.png</url>
      <title>DEV Community: Dominic Peters</title>
      <link>https://dev.to/dombinic</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dombinic"/>
    <language>en</language>
    <item>
      <title>Your AI Agents Are Failing Silently — Here's How to Catch It</title>
      <dc:creator>Dominic Peters</dc:creator>
      <pubDate>Sat, 13 Jun 2026 16:44:27 +0000</pubDate>
      <link>https://dev.to/dombinic/your-ai-agents-are-failing-silently-heres-how-to-catch-it-3gca</link>
      <guid>https://dev.to/dombinic/your-ai-agents-are-failing-silently-heres-how-to-catch-it-3gca</guid>
      <description>&lt;p&gt;Last month I ran hundreds of LangChain agent calls in production. Some of them silently failed by using wrong tool sequences, latency spikes, or even hallucinated outputs. My logs showed zero errors. No exceptions. No warnings. &lt;br&gt;
The agent just did the wrong thing, quietly.&lt;/p&gt;

&lt;p&gt;Traditional monitoring tools weren't built for this. Datadog can tell you a function threw an exception. It can't tell you your agent called &lt;em&gt;delete_file&lt;/em&gt; when it's never done that before, or that your LLM is suddenly generating 10x more tokens than usual, or that output quality has been slowly degrading over the last 500 runs.&lt;/p&gt;

&lt;p&gt;So I built Drift.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Drift Does&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Drift hooks into your agent's execution and applies statistical anomaly detection to the event stream in real time. Three detectors run simultaneously:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency &amp;amp; token SPC&lt;/strong&gt; — Uses rolling z-scores to flag when a tool call or LLM response takes significantly longer or uses significantly more tokens than its baseline. Catches hung API calls, runaway generation, and upstream provider issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sequence anomaly detection&lt;/strong&gt; — Builds a Markov transition matrix of tool-call sequences and flags when the agent takes a path that's never or rarely been seen. Catches agents going off-script, skipping required steps, or making dangerous tool calls they've never made before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output drift detection&lt;/strong&gt; — Tracks output length, vocabulary diversity, and structural patterns over time. Flags when outputs shift significantly from baseline. Catches hallucination drift, prompt injection effects, and gradual quality degradation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three Lines to Add It&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bash&lt;br&gt;
pip install drift-detection&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;python&lt;br&gt;
`from drift import DriftGuard&lt;br&gt;
from drift.callbacks.langchain import DriftCallbackHandler&lt;/p&gt;

&lt;p&gt;guard = DriftGuard(on_anomaly=lambda a: print(f"🚨 {a}"))&lt;br&gt;
handler = DriftCallbackHandler(guard)&lt;/p&gt;

&lt;h2&gt;
  
  
  Add to any LangChain agent, chain, or LLM
&lt;/h2&gt;

&lt;p&gt;agent.run("your query", callbacks=[handler])&lt;/p&gt;

&lt;p&gt;guard.report()`&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What It Looks Like in Action&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;DRIFT DEMO — Agent Anomaly Detection&lt;/p&gt;

&lt;p&gt;[Phase 1] Building baseline with 20 normal agent runs...&lt;br&gt;
  ✓ 120 events processed, 0 anomalies&lt;/p&gt;

&lt;p&gt;[Phase 2] Injecting anomalies...&lt;/p&gt;

&lt;p&gt;--- Injecting: Latency spike on 'search_web' ---&lt;br&gt;
  🚨 [CRITICAL] latency_spike: 'search_web' latency is 2500.0ms,&lt;br&gt;
     76.3σ above mean (200.7ms ± 30.1)&lt;/p&gt;

&lt;p&gt;--- Injecting: Novel tool sequence (search_web → delete_file) ---&lt;br&gt;
  🚨 [HIGH] sequence_anomaly: Novel transition: 'search_web' → 'delete_file'&lt;br&gt;
     (never observed; known transitions: ['parse_document'])&lt;/p&gt;

&lt;p&gt;--- Injecting: Output drift on 'write_response' ---&lt;br&gt;
  🚨 [CRITICAL] token_anomaly: 'write_response' token_count is 2000 tokens,&lt;br&gt;
     66.5σ above mean (107.8 ± 28.5)&lt;br&gt;
  🚨 [HIGH] output_drift: output_length is 3014 chars, 43.3σ above baseline&lt;br&gt;
  🚨 [MEDIUM] output_drift: novel structure 'code_block' (seen: ['plain_text'])`&lt;/p&gt;

&lt;p&gt;The latency spike at 76σ. The &lt;em&gt;delete_file&lt;/em&gt; call never seen before. The token count 18x baseline. All caught automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design Decisions&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero external dependencies — core only requires numpy, no embedding models or network calls&lt;/li&gt;
&lt;li&gt;Per-tool baselines — each tool gets its own statistical baseline&lt;/li&gt;
&lt;li&gt;Non-blocking — never crashes your agent, errors go to stderr&lt;/li&gt;
&lt;li&gt;Framework-agnostic core — thin adapters for each framework, LangChain first&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's Coming&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CrewAI and OpenAI Agents SDK support&lt;/li&gt;
&lt;li&gt;Persistent baselines across runs&lt;/li&gt;
&lt;li&gt;Slack / PagerDuty alerting&lt;/li&gt;
&lt;li&gt;Hosted dashboard for teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Try It&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bashpip install drift-detection&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/dombinic/Drift" rel="noopener noreferrer"&gt;GitHub: dombinic/Drift&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Star it if it's useful. Open an issue if you want a feature — I read everything.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>langchain</category>
      <category>python</category>
    </item>
  </channel>
</rss>
