<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sunil Prakash</title>
    <description>The latest articles on DEV Community by Sunil Prakash (@sunilprakash).</description>
    <link>https://dev.to/sunilprakash</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3838537%2F32569555-b587-4112-9fc0-0ea4a37e9fa4.png</url>
      <title>DEV Community: Sunil Prakash</title>
      <link>https://dev.to/sunilprakash</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sunilprakash"/>
    <language>en</language>
    <item>
      <title>Every AI toolchain is inventing its own safety layer.</title>
      <dc:creator>Sunil Prakash</dc:creator>
      <pubDate>Tue, 12 May 2026 18:26:09 +0000</pubDate>
      <link>https://dev.to/sunilprakash/every-ai-toolchain-is-inventing-its-own-safety-layer-we-shipped-one-that-works-across-all-of-them-1016</link>
      <guid>https://dev.to/sunilprakash/every-ai-toolchain-is-inventing-its-own-safety-layer-we-shipped-one-that-works-across-all-of-them-1016</guid>
      <description>&lt;h2&gt;
  
  
  Same policy. Three runtimes.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Claude Code, with @jamjet/claude-code-hook installed as a PreToolUse hook:
&amp;gt; Delete the old customer records from the staging DB.

  Tool request: bash.shell_exec
  Args: psql -c "DELETE FROM customers WHERE created_at &amp;lt; '2024-01-01'"

  JamJet policy: BLOCKED (rule: shell.exec)
  Audit: ~/.jamjet/audit/2026-05-11/claude-code-hook.jsonl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# OpenAI Agents SDK (TS), with @jamjet/openai-guardrail wired into a refund tool:
JamjetPolicyBlocked: JamJet policy: BLOCKED
  (tool: payments.refund, rule: payments.*)

  Audit: ~/.jamjet/audit/2026-05-11/openai-guardrail.jsonl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Claude Desktop talking to a Postgres MCP server, fronted by @jamjet/mcp-shim:
{"jsonrpc":"2.0","id":7,"error":{
  "code": -32000,
  "message": "JamJet policy: BLOCKED (rule: *delete*)",
  "data": {"tool": "postgres.delete_all_rows", "audit": "mcp-shim.jsonl"}
}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same &lt;code&gt;policy.yaml&lt;/code&gt;. Three runtimes. One audit log.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The models are real. The tool calls came from real agent loops. The destructive payloads never reached the tool function.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The fragmentation problem
&lt;/h2&gt;

&lt;p&gt;The market is converging on "control AI agent actions." But the primitives are not portable.&lt;/p&gt;

&lt;p&gt;Anthropic shipped &lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;Claude Code hooks&lt;/a&gt; — &lt;code&gt;PreToolUse&lt;/code&gt;, &lt;code&gt;PostToolUse&lt;/code&gt;, &lt;code&gt;Notification&lt;/code&gt;, and friends. They run as subprocesses, get JSON on stdin, and decide whether the tool call proceeds. OpenAI ships &lt;a href="https://openai.github.io/openai-agents-python/guardrails/" rel="noopener noreferrer"&gt;tool guardrails&lt;/a&gt; in the Agents SDK — Python (and now TS) callables you attach to a tool, with tripwire booleans that abort the run. The MCP ecosystem is sprouting gateways and proxies for the same purpose: &lt;a href="https://mcpx.dev" rel="noopener noreferrer"&gt;MCPX&lt;/a&gt;, &lt;a href="https://github.com/IBM/mcp-context-forge" rel="noopener noreferrer"&gt;IBM ContextForge&lt;/a&gt;, &lt;a href="https://github.com/microsoft/mcp-gateway" rel="noopener noreferrer"&gt;Microsoft's MCP Gateway&lt;/a&gt;, &lt;a href="https://www.lasso.security/" rel="noopener noreferrer"&gt;Lasso Security's MCP Gateway&lt;/a&gt; — all reasonable answers to the same wire-level question.&lt;/p&gt;

&lt;p&gt;Each one is competently designed for its own context. None of them speak the same policy.&lt;/p&gt;

&lt;p&gt;A real team I talked to last month runs Claude Code for engineering workflows, OpenAI Agents SDK for a customer-facing copilot, and two MCP servers wired into Cursor for ad-hoc database work. Their security review asked one question: &lt;em&gt;what can the agents do?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The honest answer required reading a &lt;code&gt;settings.json&lt;/code&gt;, a Python guardrail file, two MCP gateway configs in different YAML dialects, and an internal Confluence page describing the production &lt;code&gt;if&lt;/code&gt;-statements. Three audit trails in three formats. Three approval flows — one Slack bot, one PagerDuty escalation, and one human paging through the OpenAI trace viewer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkgk1430keud29tk0ebop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkgk1430keud29tk0ebop.png" alt="Today: every toolchain invents its own safety layer" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The platforms are not the problem. The hook API is good. The guardrail API is good. The MCP proxy pattern is good. The problem is the seam between them — every team writes their own.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thesis
&lt;/h2&gt;

&lt;p&gt;JamJet is the action-control plane for AI agents. One policy file. One audit trail. Across hooks, guardrails, MCP gateways, SDKs, and custom runtimes.&lt;/p&gt;

&lt;p&gt;The portable layer underneath all of them is a single &lt;code&gt;policy.yaml&lt;/code&gt; schema and a single audit JSONL schema. Every adapter reads the same YAML, writes the same JSONL. &lt;code&gt;jamjet audit show&lt;/code&gt; tails the lot in one chronological view.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17l9nh3i3hyr1n5974cm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17l9nh3i3hyr1n5974cm.png" alt="JamJet: one policy file, every adapter, one audit log" width="800" height="509"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Phase 2 shipped five packages today: &lt;a href="https://www.npmjs.com/package/@jamjet/cloud" rel="noopener noreferrer"&gt;&lt;code&gt;@jamjet/cloud@0.3.0&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://www.npmjs.com/package/@jamjet/claude-code-hook" rel="noopener noreferrer"&gt;&lt;code&gt;@jamjet/claude-code-hook@0.1.0&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://www.npmjs.com/package/@jamjet/mcp-shim" rel="noopener noreferrer"&gt;&lt;code&gt;@jamjet/mcp-shim@0.1.0&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://www.npmjs.com/package/@jamjet/openai-guardrail" rel="noopener noreferrer"&gt;&lt;code&gt;@jamjet/openai-guardrail@0.1.0&lt;/code&gt;&lt;/a&gt;, and &lt;a href="https://www.npmjs.com/package/@jamjet/cli" rel="noopener noreferrer"&gt;&lt;code&gt;@jamjet/cli@0.1.0&lt;/code&gt;&lt;/a&gt; on npm; plus &lt;code&gt;jamjet 0.8.3&lt;/code&gt; on PyPI with &lt;code&gt;jamjet.integrations.openai_guardrail&lt;/code&gt; as the Python sister. Source at &lt;a href="https://github.com/jamjet-labs/jamjet-policy" rel="noopener noreferrer"&gt;jamjet-labs/jamjet-policy&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three adapters in one paragraph each
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;@jamjet/claude-code-hook&lt;/code&gt;&lt;/strong&gt; wires into Claude Code's &lt;code&gt;PreToolUse&lt;/code&gt; hook. One line in &lt;code&gt;~/.config/claude-code/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json-doc"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"jamjet-hook --policy ~/.jamjet/policy.yaml"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every tool call — native or MCP — runs through the policy before Claude Code invokes it. What it does: enforce, audit, and surface approval prompts as blocks in v0.1. What it does &lt;em&gt;not&lt;/em&gt; do: replace Claude Code's own hook system. It is the hook.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;@jamjet/mcp-shim&lt;/code&gt;&lt;/strong&gt; sits between an MCP client (Claude Desktop, Cursor, an OpenAI Agents SDK MCP client) and any MCP server. You swap the server's &lt;code&gt;command&lt;/code&gt; for the shim, pass the policy path, and put the real server after &lt;code&gt;--&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json-doc"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"postgres"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@jamjet/mcp-shim"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--policy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~/.jamjet/policy.yaml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"--server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"postgres-mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--db"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgresql://localhost/mydb"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The shim relays MCP traffic transparently. On a blocked &lt;code&gt;tools/call&lt;/code&gt;, it returns a JSON-RPC error to the client — and the real MCP server never sees the request. What it does &lt;em&gt;not&lt;/em&gt; do: replace the MCP protocol. It speaks MCP on both ends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;@jamjet/openai-guardrail&lt;/code&gt;&lt;/strong&gt; (and its Python sister, &lt;code&gt;jamjet.integrations.openai_guardrail&lt;/code&gt;) plugs into the OpenAI Agents SDK's &lt;code&gt;inputGuardrails&lt;/code&gt; API. One line on a tool definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;openai-agents&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;jamjetGuardrail&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@jamjet/openai-guardrail&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;refund&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;payments.refund&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputGuardrails&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;jamjetGuardrail&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;~/.jamjet/policy.yaml&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})],&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;refundCustomer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Blocks throw &lt;code&gt;JamjetPolicyBlocked&lt;/code&gt;. Approval-required calls throw &lt;code&gt;JamjetApprovalRequired&lt;/code&gt; in v0.1 — the SDK aborts the run, audit gets written, and the run id is recoverable. What it does &lt;em&gt;not&lt;/em&gt; do: replace the SDK's tripwire pattern. It is a tripwire.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffudaewp3o7b3laikm5bg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffudaewp3o7b3laikm5bg.png" alt="JamJet plugs into the extension points your tools already give you" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The unified policy and audit
&lt;/h2&gt;

&lt;p&gt;The policy file every adapter reads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*delete*"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt;   &lt;span class="nv"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;block&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shell.exec"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt;  &lt;span class="nv"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;block&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payments.*"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt;  &lt;span class="nv"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;require_approval&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database.read_*"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;allow&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;span class="na"&gt;audit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;~/.jamjet/audit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Glob match. Four actions: &lt;code&gt;allow&lt;/code&gt;, &lt;code&gt;block&lt;/code&gt;, &lt;code&gt;require_approval&lt;/code&gt;, &lt;code&gt;audit&lt;/code&gt;. Same shape in every adapter.&lt;/p&gt;

&lt;p&gt;The audit log every adapter writes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ jamjet audit show
v 2026-05-11T10:14:02Z  claude-code-hook    fs.read_file           ALLOWED
x 2026-05-11T10:14:18Z  claude-code-hook    bash.shell_exec        BLOCKED               shell.exec
x 2026-05-11T10:21:47Z  mcp-shim            postgres.delete_rows   BLOCKED               *delete*
~ 2026-05-11T10:33:11Z  openai-guardrail    payments.refund        WAITING_FOR_APPROVAL  payments.*
v 2026-05-11T10:41:55Z  python-sdk          customers.search       ALLOWED
x 2026-05-11T10:52:09Z  openai-guardrail    db.drop_table          BLOCKED               *delete*
v 2026-05-11T11:07:33Z  mcp-shim            github.list_issues     ALLOWED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh2qhbkjka6rzb96759gl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh2qhbkjka6rzb96759gl.png" alt="Audit unification: one CLI tails every adapter" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Four files in &lt;code&gt;~/.jamjet/audit/2026-05-11/&lt;/code&gt;, one row per decision, sorted by timestamp. Pending approvals live in &lt;code&gt;~/.jamjet/pending/&amp;lt;run-id&amp;gt;.json&lt;/code&gt; and clear via &lt;code&gt;jamjet approve &amp;lt;run-id&amp;gt;&lt;/code&gt; or &lt;code&gt;jamjet reject &amp;lt;run-id&amp;gt;&lt;/code&gt;. The audit format is documented in the &lt;a href="https://github.com/jamjet-labs/jamjet-policy/tree/main/conformance" rel="noopener noreferrer"&gt;conformance spec&lt;/a&gt;, and the v1 schema is what each adapter is tested against in CI.&lt;/p&gt;

&lt;p&gt;This is the part of the launch we are most willing to defend. &lt;em&gt;That&lt;/em&gt; answer to "what can the agents do?" — read it once, in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's honest
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Each adapter is at &lt;strong&gt;v0.1&lt;/strong&gt;. The policy YAML and audit JSONL shapes are committed to v1 and covered by conformance tests across all four adapters. Adapter-specific options will evolve in minor versions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surfaces as exceptions or blocks in v0.1&lt;/strong&gt; for hook, guardrail, and Python adapters. The filesystem flow works end-to-end today — &lt;code&gt;jamjet approve &amp;lt;run-id&amp;gt;&lt;/code&gt; flips a pending file and the next run unblocks. SDK-integrated approval (the OpenAI Agents SDK approval API, Claude Code's native settings surface) and a web UI both land with JamJet Cloud sync in v0.2.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP shim is stdio only in v0.1.&lt;/strong&gt; HTTP/SSE MCP transports land in Phase 3 alongside the Java/Spring adapter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JamJet Cloud sync&lt;/strong&gt; — shared team policies, cloud audit retention, signed approvals — is the v0.2 milestone. Today's flow is local-only by design, so nothing leaves the developer's machine unless you opt into Cloud.&lt;/li&gt;
&lt;li&gt;One Phase 1 line still applies: the demo agent prompts are real, the enforcement path is real, the audit is real. Pre-baked deterministic agents are clearly labelled as such.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Claude Code hook:&lt;/span&gt;
npm i &lt;span class="nt"&gt;-g&lt;/span&gt; @jamjet/claude-code-hook

&lt;span class="c"&gt;# MCP shim (zero-install):&lt;/span&gt;
npx &lt;span class="nt"&gt;-y&lt;/span&gt; @jamjet/mcp-shim &lt;span class="nt"&gt;--help&lt;/span&gt;

&lt;span class="c"&gt;# OpenAI Agents SDK guardrail (TS):&lt;/span&gt;
pnpm add @jamjet/openai-guardrail
&lt;span class="c"&gt;# or Python:&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;jamjet  &lt;span class="c"&gt;# includes jamjet.integrations.openai_guardrail&lt;/span&gt;

&lt;span class="c"&gt;# Unified CLI:&lt;/span&gt;
npm i &lt;span class="nt"&gt;-g&lt;/span&gt; @jamjet/cli
jamjet audit show
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Star &lt;a href="https://github.com/jamjet-labs/jamjet-policy" rel="noopener noreferrer"&gt;jamjet-labs/jamjet-policy&lt;/a&gt; — the Phase 2 monorepo.&lt;/li&gt;
&lt;li&gt;Read the &lt;a href="https://dev.to/blog/blocking-unsafe-ai-tool-calls/"&gt;Phase 1 launch post&lt;/a&gt; for the deeper argument about why the runtime, not the model, is the safety boundary.&lt;/li&gt;
&lt;li&gt;Join the &lt;a href="https://discord.gg/SAYnEj86fr" rel="noopener noreferrer"&gt;JamJet Discord&lt;/a&gt; to talk through your toolchain — we want to know which extension points to plug into next.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Phase 3 is the Java/Spring adapter, MCP HTTP/SSE transport, and JamJet Cloud sync. Same policy. More surfaces.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>opensource</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Your AI agent already emits OpenTelemetry. Why aren't you watching it?</title>
      <dc:creator>Sunil Prakash</dc:creator>
      <pubDate>Sat, 09 May 2026 02:16:14 +0000</pubDate>
      <link>https://dev.to/sunilprakash/your-ai-agent-already-emits-opentelemetry-why-arent-you-watching-it-b06</link>
      <guid>https://dev.to/sunilprakash/your-ai-agent-already-emits-opentelemetry-why-arent-you-watching-it-b06</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: Spring AI, LangChain4j, Koog (Kotlin), the Python OpenLLMetry-style instrumentations, and the Go OTel SDKs all emit &lt;code&gt;gen_ai.*&lt;/code&gt; spans natively now. So you don't need a vendor SDK to make your agent observable — you need an OTLP endpoint that knows what to do with the spans your framework is already throwing on the wire. Here's what that looks like in three lines of YAML or one Kotlin extension.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Someone on your team shipped an LLM agent two months ago. Today it ran up a $400 bill in twenty minutes, hallucinated a refund policy to a real customer, and got stuck in a tool-calling loop that retried the same broken &lt;code&gt;payments.create&lt;/code&gt; call seventeen times before the rate limiter caught it.&lt;/p&gt;

&lt;p&gt;You'd like to know which of those things happened first, which agent was responsible (you're up to four now), what the user typed, and why the planner decided that calling the payments API was a reasonable response to "how do I unsubscribe."&lt;/p&gt;

&lt;p&gt;If you're observing your agents the same way you observe your other services, you can answer maybe two of those questions and only after a long Slack thread with the engineer who wrote the prompt. The trace span called &lt;code&gt;POST /chat&lt;/code&gt; doesn't help you. Neither does the metric for p99 latency on &lt;code&gt;/v1/agent/run&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This post is about why that gap exists, why it's about to close, and what to do about it today.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agent observability gap
&lt;/h2&gt;

&lt;p&gt;Two existing approaches sort of work and mostly don't:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generic APM (Datadog, New Relic, Honeycomb-as-default-config)&lt;/strong&gt; treats your agent like any other HTTP service. You get latency histograms, error rates, and a top-level span. You don't get the prompt, the model, the token counts, the tool calls, or the cost. The signal is buried under "request body" or not captured at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vendor LLM-observability SDKs (Langfuse, Helicone, Phoenix, the proprietary ones)&lt;/strong&gt; capture all the right signals but ship as a heavy SDK that you bolt onto your service. Every framework upgrade is now a coordination problem. Every backend switch is a rewrite. And the more frameworks your stack uses (Spring AI for the orchestrator, LangChain4j for the rag service, Koog for that one Kotlin pilot), the more SDKs you carry.&lt;/p&gt;

&lt;p&gt;Neither is the right shape. The right shape is: &lt;strong&gt;your framework emits standard signal, your backend understands standard signal, you change zero application code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Until recently that wasn't possible. The OpenTelemetry community had been working on the &lt;code&gt;gen_ai.*&lt;/code&gt; semantic conventions for a year, but framework support was uneven and the conventions kept shifting.&lt;/p&gt;

&lt;p&gt;That changed in the last six months. Concretely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Spring AI 1.0&lt;/strong&gt; emits &lt;code&gt;gen_ai.client.chat&lt;/code&gt;, &lt;code&gt;gen_ai.tool.execute&lt;/code&gt;, and friends via Micrometer Observations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangChain4j&lt;/strong&gt; emits the same via its &lt;code&gt;ChatModelListener&lt;/code&gt; API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Koog 0.8&lt;/strong&gt; ships a first-class OpenTelemetry feature with &lt;code&gt;addDatadogExporter&lt;/code&gt;, &lt;code&gt;addLangfuseExporter&lt;/code&gt;, &lt;code&gt;addWeaveExporter&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python OpenLLMetry / OpenInference&lt;/strong&gt; instrumentations (Anthropic, OpenAI, LangChain, LlamaIndex) emit the same conventions and stream through standard OTel exporters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Go's &lt;code&gt;otel-instrumentation-genai&lt;/code&gt;&lt;/strong&gt; is in alpha with the same shape.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words: &lt;strong&gt;the signal is on the wire, in standard form, regardless of which framework your team picked.&lt;/strong&gt; What's missing is a backend that does something useful with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four-line Spring AI version
&lt;/h2&gt;

&lt;p&gt;Stock Spring Boot 3.5 + Spring AI 1.0. No vendor SDK on the classpath. Just the standard OTel pieces (&lt;code&gt;micrometer-tracing-bridge-otel&lt;/code&gt;, &lt;code&gt;opentelemetry-exporter-otlp&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;application.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;management&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;tracing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${JAMJET_API_URL}/v1/otlp/v1/traces&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;authorization&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;${JAMJET_API_KEY}"&lt;/span&gt;
  &lt;span class="na"&gt;tracing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;sampling&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;probability&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Spring AI's &lt;code&gt;gen_ai.client.chat&lt;/code&gt; spans get serialized as standard OTLP/HTTP-protobuf and posted to a JamJet Cloud project. Demo: &lt;a href="https://github.com/jamjet-labs/jamjet-runtime-java/tree/main/examples/spring-ai-engram-cloud-demo" rel="noopener noreferrer"&gt;jamjet-runtime-java/examples/spring-ai-engram-cloud-demo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The same pattern works against any OTLP-aware backend. The endpoint URL is the only thing that varies.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Kotlin Koog one-liner
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nc"&gt;AIAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;.)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenTelemetry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;setServiceInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serviceName&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"memory-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;serviceVersion&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;addJamjetCloudExporter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;// reads JAMJET_API_KEY from env&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;About 20 lines wrapping the standard &lt;code&gt;OtlpHttpSpanExporter&lt;/code&gt;. Demo: &lt;a href="https://github.com/jamjet-labs/jamjet-runtime-java/tree/main/examples/kotlin-koog-engram-cloud-demo" rel="noopener noreferrer"&gt;jamjet-runtime-java/examples/kotlin-koog-engram-cloud-demo&lt;/a&gt;. We've filed an upstream YouTrack issue proposing this lands in &lt;code&gt;agents-features-opentelemetry-jvm&lt;/code&gt; directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  And for Python / Go folks
&lt;/h2&gt;

&lt;p&gt;The same shape works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_TRACES_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://api.jamjet.dev/v1/otlp/v1/traces"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_TRACES_HEADERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"authorization=Bearer &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;JAMJET_API_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plus &lt;code&gt;pip install opentelemetry-instrumentation-anthropic&lt;/code&gt; (or &lt;code&gt;-openai&lt;/code&gt;, &lt;code&gt;-langchain&lt;/code&gt;, etc.) and a one-line &lt;code&gt;instrument()&lt;/code&gt; call. The OpenInference and OpenLLMetry projects each ship instrumentations for the major frameworks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Standard &lt;code&gt;otelhttp&lt;/code&gt; for the LLM client wrapper, plus &lt;code&gt;otlptracehttp.New(...)&lt;/code&gt; configured to point at &lt;code&gt;/v1/otlp/v1/traces&lt;/code&gt;. The instrumentation surface is younger but moving fast.&lt;/p&gt;

&lt;p&gt;The point: &lt;strong&gt;once your framework speaks &lt;code&gt;gen_ai.*&lt;/code&gt; OTel, the only language-specific code you write is the exporter setup. And that's stock OTel boilerplate, not vendor-specific.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get on the other side
&lt;/h2&gt;

&lt;p&gt;The interesting question, which generic OTel backends won't help you with, is &lt;em&gt;what the receiver does with these spans&lt;/em&gt;. The signals an agent owner actually wants:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent network graph&lt;/strong&gt; — every cross-agent call is a node, edges show who-called-whom with cost and latency rolled up per edge. (W3C &lt;code&gt;traceparent&lt;/code&gt; plus a &lt;code&gt;jj&lt;/code&gt; &lt;code&gt;tracestate&lt;/code&gt; segment links agents across HTTP hops.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost rollups per agent / model / end-user&lt;/strong&gt; — computed server-side from &lt;code&gt;gen_ai.usage.*_tokens&lt;/code&gt; against current vendor pricing. No pricing table in your app.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failure-mode pie chart&lt;/strong&gt; — typed exception classification, not just "HTTP 5xx" buckets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-agent identity&lt;/strong&gt; — when a user request fans out across three agents, you see the same end-user-id stitched across all of them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy enforcement + audit export&lt;/strong&gt; — Ed25519-signed JSON+CSV+PDF audit packages for the SOC2-ish surface, OTLP-formatted exports for SIEM tools (Splunk, Datadog Logs).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If "the safety layer behind your AI agents" is something you've been trying to articulate to your CTO, that's the shape we're building for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this architecture is the durable bet
&lt;/h2&gt;

&lt;p&gt;Three reasons stock-OTel-plus-LLM-aware-backend wins over vendor SDKs over the next two years:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No SDK on the classpath / requirements.txt / go.mod.&lt;/strong&gt; The exporter is already in your app for HTTP tracing — you change one URL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend-portable.&lt;/strong&gt; Every line in those demos works against Honeycomb, Tempo, Jaeger, Datadog, or a self-hosted OTel collector. That's a real CTO-pitch argument when "vendor lock-in" comes up in the procurement review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The frameworks are doing the work.&lt;/strong&gt; Spring AI's Observation handlers, LangChain4j's listeners, Koog's OpenTelemetry feature, Python OpenLLMetry instrumentations — these aren't vendor projects. They're the framework's own contracts. Every release brings new signals for free.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Demos: &lt;a href="https://github.com/jamjet-labs/jamjet-runtime-java" rel="noopener noreferrer"&gt;github.com/jamjet-labs/jamjet-runtime-java&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cloud sign-up: &lt;a href="https://jamjet.dev" rel="noopener noreferrer"&gt;jamjet.dev&lt;/a&gt; (free tier)&lt;/li&gt;
&lt;li&gt;Spring Boot starter on Maven Central (for users who want a richer in-process path than stock OTLP): &lt;code&gt;dev.jamjet:jamjet-cloud-spring-boot-starter:0.2.0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're at &lt;strong&gt;Devoxx  this week&lt;/strong&gt; and want to see this running against a real Spring AI or Koog agent, drop me a line — happy to do a five-minute hallway walkthrough. If your stack is Python or Go and you'd like the equivalent demo for your language, that's the next post — let me know in the comments which framework so I can pick the right starting point.&lt;/p&gt;

&lt;p&gt;If you've got opinions on what AI-agent observability &lt;em&gt;should&lt;/em&gt; look like, especially the bits I've glossed over (multi-tenancy, on-prem, BYO collector), the comments are open.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>java</category>
      <category>genai</category>
      <category>observability</category>
    </item>
    <item>
      <title>The State of Memory in Java AI Agents (April 2026)</title>
      <dc:creator>Sunil Prakash</dc:creator>
      <pubDate>Tue, 07 Apr 2026 17:12:08 +0000</pubDate>
      <link>https://dev.to/sunilprakash/the-state-of-memory-in-java-ai-agents-april-2026-13c6</link>
      <guid>https://dev.to/sunilprakash/the-state-of-memory-in-java-ai-agents-april-2026-13c6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post was originally published on &lt;a href="https://jamjet.dev/blog/state-of-memory-java-ai-agents/" rel="noopener noreferrer"&gt;jamjet.dev&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you're building AI agents in Java today, your options for persistent memory range from "store the last 20 chat messages in Postgres" to "run a Python service in a sidecar container and call it over HTTP." There is no Java-native equivalent to Mem0, Zep, or Letta — the libraries Python developers reach for when they need real memory.&lt;/p&gt;

&lt;p&gt;This post is a tour of every option a Java developer has in April 2026, why most of them stop at chat history, what "real memory" should actually mean, and one library we shipped to fill the gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scenario every Java AI developer recognises
&lt;/h2&gt;

&lt;p&gt;You're building an AI agent in Spring Boot. Maybe it's a customer support copilot, maybe it's a coding assistant, maybe it's a research agent. You wire up Spring AI or LangChain4j, write a few tools, and the first conversation works.&lt;/p&gt;

&lt;p&gt;Then your user comes back the next day. The agent doesn't remember them. It doesn't remember they're allergic to peanuts. It doesn't remember they're working on the Acme migration. It doesn't remember they prefer verbose explanations. Every conversation starts from zero.&lt;/p&gt;

&lt;p&gt;You search for "Java AI agent memory" and end up with three kinds of results:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tutorials on how to store chat messages in Postgres&lt;/li&gt;
&lt;li&gt;Marketing pages for Mem0 and Zep — Python only&lt;/li&gt;
&lt;li&gt;GitHub issues asking why there's no Java SDK&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Memory" means three different things
&lt;/h2&gt;

&lt;p&gt;Before we tour the libraries, we need to be precise. There are at least three different things people mean when they say "agent memory":&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Conversation history.&lt;/strong&gt; The last N messages of the current session. Solved problem — every framework ships this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. State checkpointing.&lt;/strong&gt; Snapshots of agent execution state for resume and replay. Solved by LangGraph, Koog persistence, Temporal-style runtimes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Long-term knowledge memory.&lt;/strong&gt; Facts about the user, their preferences, their projects, their history — extracted from conversations, stored durably, retrievable across sessions, and de-conflicted when they change. This is what Mem0 and Zep do. &lt;strong&gt;It is not solved on the JVM.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The rest of this post is about the third one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What real memory needs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fact extraction.&lt;/strong&gt; An LLM reads a conversation and pulls out discrete, atomic facts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conflict detection.&lt;/strong&gt; When a new fact contradicts an old one, the system invalidates the old fact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid retrieval.&lt;/strong&gt; Vector + keyword + graph walk fused together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal reasoning.&lt;/strong&gt; Facts have validity windows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token-budgeted context assembly.&lt;/strong&gt; Pick which facts go in the prompt and respect the budget.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decay and consolidation.&lt;/strong&gt; Stale facts fade, frequent facts get promoted, duplicates merge.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tour: every option Java developers have today
&lt;/h2&gt;

&lt;h3&gt;
  
  
  LangChain4j ChatMemory
&lt;/h3&gt;

&lt;p&gt;Most popular JVM AI framework. Ships &lt;code&gt;ChatMemory&lt;/code&gt; interface with &lt;code&gt;MessageWindowChatMemory&lt;/code&gt; and &lt;code&gt;TokenWindowChatMemory&lt;/code&gt;. Persistence via developer-implemented &lt;code&gt;ChatMemoryStore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What it does: stores message objects, respects token/count limits. What it does not do: extract facts, deduplicate, retrieve semantically, reason about time. The docs are explicit — &lt;code&gt;ChatMemory&lt;/code&gt; is a container abstraction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spring AI ChatMemory
&lt;/h3&gt;

&lt;p&gt;Shipped GA in 2025 with broad backend support: JDBC, Cassandra, Mongo, Neo4j, Cosmos DB. Three advisors plug it into &lt;code&gt;ChatClient&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;VectorStoreChatMemoryAdvisor&lt;/code&gt; is the closest thing to "semantic memory" — it indexes raw messages in your VectorStore. But it indexes raw messages, not extracted facts. No entity model, no relationship graph, no conflict detection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google ADK for Java
&lt;/h3&gt;

&lt;p&gt;Ships 1.0.0 with two memory implementations: &lt;code&gt;InMemoryMemoryService&lt;/code&gt; (keyword matching only) and &lt;code&gt;VertexAiMemoryBankService&lt;/code&gt; (Vertex AI only). Memory Bank is excellent but Google Cloud-locked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Koog (JetBrains)
&lt;/h3&gt;

&lt;p&gt;Kotlin-first framework with &lt;code&gt;AgentMemory&lt;/code&gt; storing facts by &lt;code&gt;Concept&lt;/code&gt;, &lt;code&gt;Subject&lt;/code&gt;, &lt;code&gt;Scope&lt;/code&gt;. Closest competitor on the "facts about subjects" axis.&lt;/p&gt;

&lt;p&gt;Two caveats: Java consumption is awkward, and GitHub issue JetBrains/koog#1001 documents that AgentMemory floods prompts as facts accumulate — no token budgeting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embabel
&lt;/h3&gt;

&lt;p&gt;Rod Johnson's JVM agent framework. Uses a blackboard pattern — shared state per agent run.&lt;/p&gt;

&lt;p&gt;Per the maintainers: &lt;em&gt;"in Embabel it's not about conversational memory so much as domain objects that are stored in the blackboard during the flow."&lt;/em&gt; Long-term memory is an explicit non-goal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mem0 Java SDK (the one that doesn't exist)
&lt;/h3&gt;

&lt;p&gt;The top Google result is &lt;code&gt;me.pgthinker:mem0-client-java&lt;/code&gt;, a community wrapper at version 0.1.3, last updated nine months ago, with 9 GitHub stars. It's a thin REST client requiring a Python Mem0 server alongside your JVM app.&lt;/p&gt;

&lt;p&gt;No official Mem0 Java client exists. Python and Node.js only.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zep Java SDK (also doesn't exist)
&lt;/h3&gt;

&lt;p&gt;Zep's official clients are Python, TypeScript, and Go. No Java SDK.&lt;/p&gt;

&lt;h3&gt;
  
  
  DIY (what most teams actually do)
&lt;/h3&gt;

&lt;p&gt;When Java teams need real memory today, they assemble:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Postgres + pgvector (or Qdrant) for embeddings&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;JdbcChatMemoryRepository&lt;/code&gt; for messages&lt;/li&gt;
&lt;li&gt;Custom advisor that calls an LLM to extract facts&lt;/li&gt;
&lt;li&gt;Custom retrieval layer combining vector and keyword search&lt;/li&gt;
&lt;li&gt;Nightly cron job for decay and dedup&lt;/li&gt;
&lt;li&gt;Custom token-budgeting in the prompt builder&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Roughly 1,500–3,000 lines of bespoke Java per team. Quietly diverges between projects. Rarely gets temporal reasoning right. Almost never gets consolidation right.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern
&lt;/h2&gt;

&lt;p&gt;Every Java memory option lives in one of two boxes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chat history persistence&lt;/strong&gt; (LangChain4j, Spring AI core, Embabel)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State checkpointing&lt;/strong&gt; (LangGraph4j, Koog persistence)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing in between. No JVM-native library that does fact extraction + conflict resolution + temporal graph + hybrid retrieval + consolidation in one dependency.&lt;/p&gt;

&lt;p&gt;The Python ecosystem has had Mem0 since 2024 and Zep/Graphiti since early 2025. The Java ecosystem is roughly 18 months behind.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we built
&lt;/h2&gt;

&lt;p&gt;I run &lt;a href="https://jamjet.dev" rel="noopener noreferrer"&gt;JamJet&lt;/a&gt;. As we built our agent runtime, the memory gap kept showing up. So we built a memory layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engram&lt;/strong&gt; is a durable memory system that does the things on the list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fact extraction from conversation messages via LLM&lt;/li&gt;
&lt;li&gt;Conflict detection — vector similarity threshold plus LLM resolution&lt;/li&gt;
&lt;li&gt;Hybrid retrieval — vector + SQLite FTS5 keyword + graph walk&lt;/li&gt;
&lt;li&gt;Temporal knowledge graph with validity windows&lt;/li&gt;
&lt;li&gt;Token-budgeted context assembly with three output formats&lt;/li&gt;
&lt;li&gt;5-operation consolidation engine: &lt;strong&gt;decay&lt;/strong&gt;, &lt;strong&gt;promote&lt;/strong&gt;, &lt;strong&gt;dedup&lt;/strong&gt;, &lt;strong&gt;summarize&lt;/strong&gt;, &lt;strong&gt;reflect&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;MCP server option&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Runs against SQLite by default. No Postgres, no Qdrant, no Neo4j, no Python sidecar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;dev.jamjet.engram.EngramClient&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;dev.jamjet.engram.EngramConfig&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.util.List&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.util.Map&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EngramClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;EngramConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;      &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"I'm allergic to peanuts and live in Austin"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"role"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Got it, I'll remember that."&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;),&lt;/span&gt;
        &lt;span class="s"&gt;"alice"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"what should I cook for dinner"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"alice"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"system_prompt"&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"text"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Maven Central:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;dev.jamjet&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;jamjet-sdk&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;0.4.3&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apache 2.0. Rust runtime published as &lt;code&gt;jamjet-engram&lt;/code&gt; on crates.io.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it doesn't do (yet)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;No Spring Boot auto-configuration yet (starter on roadmap)&lt;/li&gt;
&lt;li&gt;No JDBC backend (SQLite-first, Postgres in 0.5.x)&lt;/li&gt;
&lt;li&gt;No managed cloud option&lt;/li&gt;
&lt;li&gt;No published LongMemEval / DMR scores yet (benchmarks running, not going to cherry-pick)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;dev.jamjet&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;jamjet-sdk&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;0.4.3&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or run it as an MCP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;jamjet-engram-server
engram serve &lt;span class="nt"&gt;--db&lt;/span&gt; memory.db
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/jamjet-labs/jamjet" rel="noopener noreferrer"&gt;github.com/jamjet-labs/jamjet&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;If you've been quietly rolling your own memory layer in Java, I'd love to hear what you ended up with. Reach out via GitHub issues or the comments below.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>java</category>
      <category>rust</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Introducing Agentic AI for Serious Engineers: A Free Practical Book + Code Repo</title>
      <dc:creator>Sunil Prakash</dc:creator>
      <pubDate>Fri, 27 Mar 2026 04:33:12 +0000</pubDate>
      <link>https://dev.to/sunilprakash/introducing-agentic-ai-for-serious-engineers-a-free-practical-book-code-repo-64g</link>
      <guid>https://dev.to/sunilprakash/introducing-agentic-ai-for-serious-engineers-a-free-practical-book-code-repo-64g</guid>
      <description>&lt;p&gt;Agentic AI is moving fast.&lt;/p&gt;

&lt;p&gt;There are new frameworks, new demos, new orchestration patterns, and new opinions almost every week. That is exciting, but it also creates a problem for engineers trying to build real systems.&lt;/p&gt;

&lt;p&gt;A lot of the material online is either too abstract, too framework-specific, too hype-heavy, or too focused on demo magic instead of production reality.&lt;/p&gt;

&lt;p&gt;I wanted something different.&lt;/p&gt;

&lt;p&gt;So I started writing &lt;strong&gt;Agentic AI for Serious Engineers&lt;/strong&gt; - a free practical book and code repo for engineers who want to build agent systems that are not just impressive in a demo, but usable, testable, observable, and trustworthy in real environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I wrote it
&lt;/h2&gt;

&lt;p&gt;I kept running into the same gap.&lt;/p&gt;

&lt;p&gt;There is plenty of excitement around agents, but much less practical material on questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What actually makes a system agentic?&lt;/li&gt;
&lt;li&gt;When should you use an agent instead of a workflow?&lt;/li&gt;
&lt;li&gt;How should tools be designed so they are safe and reliable to call?&lt;/li&gt;
&lt;li&gt;What does good context design look like?&lt;/li&gt;
&lt;li&gt;How do you evaluate an agent system beyond “it seemed to work”?&lt;/li&gt;
&lt;li&gt;Where should human approval sit in the architecture?&lt;/li&gt;
&lt;li&gt;How do you think about reliability, security, and observability from the start?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are the questions this book tries to answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who this is for
&lt;/h2&gt;

&lt;p&gt;This book is for engineers and architects building real AI systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;backend engineers&lt;/li&gt;
&lt;li&gt;platform engineers&lt;/li&gt;
&lt;li&gt;staff+ engineers&lt;/li&gt;
&lt;li&gt;software architects&lt;/li&gt;
&lt;li&gt;technical leads&lt;/li&gt;
&lt;li&gt;data engineers working on production AI applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It assumes you already understand software systems.&lt;/p&gt;

&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; assume you already know how to design agent systems well.&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes it different
&lt;/h2&gt;

&lt;p&gt;This project is intentionally practical.&lt;/p&gt;

&lt;p&gt;It is not a catalog of trendy frameworks.&lt;br&gt;
It is not a collection of magical claims.&lt;br&gt;
It is not “plug in this library and everything works.”&lt;/p&gt;

&lt;p&gt;Instead, it focuses on the engineering questions that matter when systems move closer to production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool contracts&lt;/li&gt;
&lt;li&gt;control flow&lt;/li&gt;
&lt;li&gt;context boundaries&lt;/li&gt;
&lt;li&gt;evaluation&lt;/li&gt;
&lt;li&gt;approval gates&lt;/li&gt;
&lt;li&gt;reliability&lt;/li&gt;
&lt;li&gt;hardening&lt;/li&gt;
&lt;li&gt;traceability&lt;/li&gt;
&lt;li&gt;operating tradeoffs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not just to help someone build an agent.&lt;/p&gt;

&lt;p&gt;The goal is to help them build one with better judgment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s inside
&lt;/h2&gt;

&lt;p&gt;The repo currently includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;7 chapters&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;2 threaded end-to-end projects&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;per-chapter code&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;tests and eval-oriented structure&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;diagrams, principles, and roadmap&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;a free online version of the book&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of the topics include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what “agentic” actually means&lt;/li&gt;
&lt;li&gt;tools, context, and the agent loop&lt;/li&gt;
&lt;li&gt;workflow-first design&lt;/li&gt;
&lt;li&gt;multi-agent systems&lt;/li&gt;
&lt;li&gt;human-in-the-loop architecture&lt;/li&gt;
&lt;li&gt;evaluating and hardening agents&lt;/li&gt;
&lt;li&gt;when not to use agents&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The kind of engineer I wrote this for
&lt;/h2&gt;

&lt;p&gt;I wrote this for the engineer who looks at agentic AI and thinks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is interesting.&lt;br&gt;&lt;br&gt;
But how do I build it in a way I can actually trust?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the center of gravity of the book.&lt;/p&gt;

&lt;p&gt;Not anti-agent.&lt;br&gt;&lt;br&gt;
Not pro-hype.&lt;br&gt;&lt;br&gt;
Just serious about engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Read it free
&lt;/h2&gt;

&lt;p&gt;You can read the book and explore the code here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;[&lt;a href="https://github.com/sunilp/agentic-ai" rel="noopener noreferrer"&gt;https://github.com/sunilp/agentic-ai&lt;/a&gt;]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you end up reading it, I’d genuinely love feedback on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what feels most useful&lt;/li&gt;
&lt;li&gt;what is still unclear&lt;/li&gt;
&lt;li&gt;what you’d want covered more deeply&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m continuing to improve it, and I’d love for it to become a genuinely useful resource for engineers building the next generation of AI systems.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>softwareengineering</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Your AI CLI Writes Code. Mine Tells You What It'll Break.</title>
      <dc:creator>Sunil Prakash</dc:creator>
      <pubDate>Sun, 22 Mar 2026 15:18:05 +0000</pubDate>
      <link>https://dev.to/sunilprakash/your-ai-cli-writes-code-mine-tells-you-what-itll-break-296l</link>
      <guid>https://dev.to/sunilprakash/your-ai-cli-writes-code-mine-tells-you-what-itll-break-296l</guid>
      <description>&lt;p&gt;AI CLI tools are everywhere right now. Claude Code, Gemini CLI, GitHub Copilot in the terminal — they'll write your code, refactor your modules, even run your tests.&lt;/p&gt;

&lt;p&gt;But ask any of them: &lt;strong&gt;"If I rename this function, what breaks?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They'll scan the files they can see, make their best guess, and probably miss the SQL view that reads the column you're about to change. Or the Java batch job that calls your Python function through a stored procedure. Or the dbt model downstream of the table your migration is about to alter.&lt;/p&gt;

&lt;p&gt;That's not a knock on AI. It's just not what LLMs are built for. Dependency analysis needs &lt;strong&gt;deterministic static analysis&lt;/strong&gt;, not probabilistic text generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap in every AI CLI
&lt;/h2&gt;

&lt;p&gt;Here's what I noticed building with these tools: they're incredible at &lt;em&gt;writing&lt;/em&gt; code but terrible at &lt;em&gt;understanding what already depends on it&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Ask Claude Code to "add retry logic to the HTTP client" — brilliant. Ask it "what will break if I change the response shape of &lt;code&gt;getUser&lt;/code&gt;" — it'll read a few files and give you a confident answer that misses half the callers.&lt;/p&gt;

&lt;p&gt;That's because LLMs work with whatever fits in context. Your codebase has thousands of files. The SQL stored procedure that calls your function through &lt;code&gt;EXEC&lt;/code&gt; isn't in the context window.&lt;/p&gt;

&lt;h2&gt;
  
  
  Static analysis that crosses language boundaries
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://jam.sunilprakash.com" rel="noopener noreferrer"&gt;Jam&lt;/a&gt; to fill this gap. It's a developer CLI with 40+ commands, but the one that keeps saving me is &lt;code&gt;jam trace --impact&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It uses tree-sitter to parse your entire workspace — TypeScript, Python, Java, and SQL — builds a SQLite index of every function, every call site, every import, and every SQL column reference. Then gives you a deterministic answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;jam trace updateBalance &lt;span class="nt"&gt;--impact&lt;/span&gt;

Impact Analysis &lt;span class="k"&gt;for &lt;/span&gt;updateBalance
───────────────────────────────────
Direct callers:
  PaymentService.processRefund&lt;span class="o"&gt;()&lt;/span&gt;  &lt;span class="o"&gt;[&lt;/span&gt;Java]
  BATCH_NIGHTLY_RECONCILE         &lt;span class="o"&gt;[&lt;/span&gt;SQL]

Column dependents:
  VIEW v_customer_summary   &lt;span class="o"&gt;(&lt;/span&gt;reads customer.balance&lt;span class="o"&gt;)&lt;/span&gt;
  PROC_MONTHLY_STATEMENT    &lt;span class="o"&gt;(&lt;/span&gt;reads customer.balance&lt;span class="o"&gt;)&lt;/span&gt;

Risk: HIGH — 2 callers across 2 languages, 2 column dependents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No hallucination. No "I think these files might be affected." Two callers in two languages, two column dependents, risk level HIGH. Deterministic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why LLMs can't do this (yet)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context window limits&lt;/strong&gt; — Your 200-file Java project with SQL migrations doesn't fit in any context window. Static analysis indexes everything.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cross-language boundaries&lt;/strong&gt; — A Java class running &lt;code&gt;EXEC update_user&lt;/code&gt; is calling a SQL stored procedure. LLMs see a string. Tree-sitter sees a cross-language call.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Column-level tracking&lt;/strong&gt; — When a SQL view reads &lt;code&gt;customer.balance&lt;/code&gt;, and your function writes to &lt;code&gt;customer.balance&lt;/code&gt;, that's a dependency. No LLM tracks this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Determinism&lt;/strong&gt; — Ask an LLM the same question twice, get different answers. Ask &lt;code&gt;jam trace&lt;/code&gt; twice, get the same graph. For impact analysis, you need guarantees, not guesses.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Not replacing AI — complementing it
&lt;/h2&gt;

&lt;p&gt;Jam isn't anti-AI. It literally has AI built in — &lt;code&gt;jam ask&lt;/code&gt;, &lt;code&gt;jam go&lt;/code&gt; (agentic execution), &lt;code&gt;jam commit&lt;/code&gt; (AI-powered commit messages), &lt;code&gt;jam review&lt;/code&gt;. It works with Ollama, Copilot, OpenAI, Anthropic, and Groq.&lt;/p&gt;

&lt;p&gt;But for the question "what breaks if I change this?" — AI is the wrong tool. You wouldn't ask ChatGPT to run your test suite. You shouldn't ask it to trace your dependency graph either.&lt;/p&gt;

&lt;p&gt;The best workflow: use Claude Code or Gemini CLI to &lt;em&gt;write&lt;/em&gt; the change, then use &lt;code&gt;jam trace --impact&lt;/code&gt; to &lt;em&gt;verify&lt;/em&gt; the blast radius before you ship.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Trace any symbol's callers and callees&lt;/span&gt;
jam trace createProvider &lt;span class="nt"&gt;--depth&lt;/span&gt; 5

&lt;span class="c"&gt;# Get the full impact report&lt;/span&gt;
jam trace updateBalance &lt;span class="nt"&gt;--impact&lt;/span&gt;

&lt;span class="c"&gt;# Output as Mermaid diagram for docs&lt;/span&gt;
jam trace handleRequest &lt;span class="nt"&gt;--mermaid&lt;/span&gt;

&lt;span class="c"&gt;# JSON for CI/automation&lt;/span&gt;
jam trace processPayment &lt;span class="nt"&gt;--impact&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tree-sitter parsing&lt;/strong&gt; — Builds ASTs for TypeScript, Python, Java, SQL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Symbol extraction&lt;/strong&gt; — Functions, classes, methods, imports, call sites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-language detection&lt;/strong&gt; — Java &lt;code&gt;EXEC&lt;/code&gt;/&lt;code&gt;CALL&lt;/code&gt; → SQL procedures. SQL column refs in SELECT/UPDATE/INSERT/DELETE&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQLite index&lt;/strong&gt; — Local database, fast graph queries, incremental updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact analysis&lt;/strong&gt; — Walks the graph upstream, finds column dependents, calculates risk&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The index rebuilds in seconds. No cloud. No API calls. Pure local static analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @sunilp-org/jam-cli
jam trace &amp;lt;any-function-name&amp;gt;
jam trace &amp;lt;any-function-name&amp;gt; &lt;span class="nt"&gt;--impact&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No API key needed for trace — it's pure static analysis. The AI features (ask, go, commit, review) auto-detect your provider.&lt;/p&gt;

&lt;p&gt;978 tests. MIT licensed. Works everywhere Node runs.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://github.com/sunilp/jam-cli" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://jam.sunilprakash.com" rel="noopener noreferrer"&gt;Website&lt;/a&gt; | &lt;a href="https://www.npmjs.com/package/@sunilp-org/jam-cli" rel="noopener noreferrer"&gt;npm&lt;/a&gt; | &lt;a href="https://marketplace.visualstudio.com/items?itemName=sunilp.jam-cli-vscode" rel="noopener noreferrer"&gt;VSCode Extension&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>developertools</category>
      <category>productivity</category>
      <category>cli</category>
    </item>
  </channel>
</rss>
