<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: bolt</title>
    <description>The latest articles on DEV Community by bolt (@bolty).</description>
    <link>https://dev.to/bolty</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3879828%2F8ca68d32-1ebf-48a8-b429-1459b620cbf6.png</url>
      <title>DEV Community: bolt</title>
      <link>https://dev.to/bolty</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bolty"/>
    <language>en</language>
    <item>
      <title>I caught my AI agent posting a customer's SSN to Slack. Here's what I built to stop it.</title>
      <dc:creator>bolt</dc:creator>
      <pubDate>Wed, 15 Apr 2026 07:07:04 +0000</pubDate>
      <link>https://dev.to/bolty/i-caught-my-ai-agent-posting-a-customers-ssn-to-slack-heres-what-i-built-to-stop-it-p44</link>
      <guid>https://dev.to/bolty/i-caught-my-ai-agent-posting-a-customers-ssn-to-slack-heres-what-i-built-to-stop-it-p44</guid>
      <description>&lt;p&gt;Two months ago I was testing a support automation I'd built with CrewAI. The agent reads Jira tickets, pulls customer context, and posts a summary to the team Slack channel. Simple workflow. The kind of thing every second startup is building right now.&lt;/p&gt;

&lt;p&gt;It worked perfectly until a ticket came through with a customer's Social Security number in the description. Someone on the support team had typed it in during a phone verification. The agent grabbed the full ticket, composed a nice summary, and tried to post it to Slack. Name, email, phone, SSN, credit card number. Everything. Headed to a channel with 40 people in it.&lt;/p&gt;

&lt;p&gt;I caught it because I was watching the terminal. If I'd been in a meeting, that SSN would be sitting in Slack right now. Searchable. Visible to everyone. Probably backed up to some compliance archive too.&lt;/p&gt;

&lt;p&gt;That was the moment I realized something was very wrong with how we deploy AI agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The gap nobody is talking about&lt;/strong&gt;&lt;br&gt;
There's a lot of work happening on LLM security right now. Prompt injection. Jailbreaking. Guardrails on model outputs. Important stuff.&lt;br&gt;
But almost nobody is looking at what happens after the LLM decides to call a tool.&lt;/p&gt;

&lt;p&gt;Your agent has an API token for Slack. It has credentials for Jira, for GitHub, for AWS. When the LLM decides to post a message or create a user or modify permissions, that API call goes straight through. No inspection. No filter. Nothing checking what's actually in the request body.&lt;/p&gt;

&lt;p&gt;Think about it. A Slack POST to &lt;code&gt;chat.postMessage&lt;/code&gt; looks identical at the HTTP level whether the body contains "meeting at 3pm" or a customer's SSN. The endpoint is the same. The method is the same. The token is valid. The only difference is what's inside the payload.&lt;br&gt;
And right now, for most agent deployments, nobody is reading the payload.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I built&lt;/strong&gt;&lt;br&gt;
I spent a few months building Interven. It's an inline gateway that sits between your AI agents and whatever APIs they call.&lt;br&gt;
Instead of your agent calling Slack directly, it sends the request to Interven. Interven reads the payload, runs 14 security engines on it, and decides what to do.&lt;/p&gt;

&lt;p&gt;Those engines include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PII detection&lt;/strong&gt; that catches emails, phone numbers, SSNs, credit card numbers (including dashed and spaced formats). Ten patterns covering the most common PII types.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets scanning&lt;/strong&gt; that finds API keys, database passwords, tokens. If your agent accidentally includes an AWS access key in a Slack message, it gets caught.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Threat intelligence matching&lt;/strong&gt; against 170,000+ indicators from five feeds (OpenPhish, Feodo Tracker, and others), updated every hour. If your agent tries to share files with a known malicious domain, it gets flagged.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic intent analysis&lt;/strong&gt; with seven labels. If the agent's action looks like privilege escalation, data exfiltration, or social engineering, it gets flagged even if the payload data itself is clean.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-agent trust scoring&lt;/strong&gt; that adapts automatically. If an agent starts getting denied a lot, Interven tightens enforcement for that specific agent without affecting others.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After all 14 engines run, Interven makes one of four decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ALLOW&lt;/strong&gt; if the request is clean. Forward it to the real API. The agent gets the response normally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DENY&lt;/strong&gt; if something dangerous was detected. Block the request entirely. Create a security incident automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SANITIZE&lt;/strong&gt; if the request contains PII but is otherwise fine. Strip the sensitive data, replace it with redaction tokens, and forward the clean version. The agent's work still gets done. The SSN just isn't in the message anymore.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;REQUIRE APPROVAL&lt;/strong&gt; if the request is risky but not clearly dangerous. Pause it and let a human decide.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why SANITIZE matters more than blocking&lt;/strong&gt;&lt;br&gt;
This was the design decision I spent the most time on and the one that has gotten the most positive feedback.&lt;br&gt;
When most security tools detect PII in a request, they block it. Request denied. Done.&lt;/p&gt;

&lt;p&gt;That works for attacks. But what about the support agent that's doing its job? If you block the Slack message entirely, the team doesn't get their summary. The workflow breaks. Someone has to write the summary manually. The agent becomes a liability instead of an asset.&lt;/p&gt;

&lt;p&gt;Sanitize is the middle ground. The support agent posts a summary with a customer's SSN in it. Interven catches the SSN, replaces it with &lt;code&gt;[REDACTED:pii:71992a34]&lt;/code&gt;, and forwards the clean message to Slack. The team gets the context they need. The customer's data stays safe. Nobody has to do anything.&lt;/p&gt;

&lt;p&gt;I tested this with a real Jira ticket and a real Slack workspace. The redacted message actually shows up in the Slack channel. The agent reports success. The team sees the summary. The SSN is gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it looks like in practice&lt;/strong&gt;&lt;br&gt;
I recorded a 14-minute demo with three real scenarios. Everything uses real APIs, not mocks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: Code review agent.&lt;/strong&gt; A CrewAI agent reviews pull requests on a real GitHub repository. It posts a review comment (allowed), tries to create a PR (paused for human approval since it's a write operation), and tries to add an external collaborator with admin access (denied, risk score 0.59, flagged as privilege escalation).&lt;br&gt;
Three actions from one agent. One allowed, one required approval, one blocked. That's the point. Interven doesn't block everything. It makes policy-based decisions about each individual request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: Support agent.&lt;/strong&gt; A CrewAI agent reads a real Jira ticket (SCRUM-5, containing customer PII in the description) through the Interven gateway, then posts a summary to a real Slack channel. The SSN, email, and phone number are stripped. The clean message appears in Slack with redaction tokens. The team gets the summary. The PII never leaves the gateway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3: Compromised agent.&lt;/strong&gt; A CrewAI agent with a malicious objective tries to steal credentials from Drive, post AWS access keys to Slack, escalate IAM privileges, share files with an external attacker domain, and access a known malicious GitHub repository. Five attacks across four tools. All blocked. The agent's trust score degrades automatically. One click to suspend it permanently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo&lt;/strong&gt;: &lt;a href="https://vimeo.com/1179128874" rel="noopener noreferrer"&gt;https://vimeo.com/1179128874&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The technical decisions that matter&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;No AI in the decision path.&lt;/strong&gt; Every decision Interven makes is deterministic. Pattern matching, policy evaluation, risk scoring. Same input, same output, every time. I did this on purpose. If you use an LLM to decide whether to block an LLM's actions, you've created something that can be prompt injected at the security layer. That defeats the purpose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Any REST API in 60 seconds.&lt;/strong&gt; You enter a tool name and base URL. If the API has an OpenAPI spec, Interven fetches it and discovers all operations automatically. I tested with Jira and it pulled in 620 operations from a single spec URL. Policies can target all operations, filter by HTTP method (block all writes), filter by category, or target specific operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-agent, not per-tool.&lt;/strong&gt; Each agent has its own trust score, its own call history, its own risk profile. A compromised agent sees tighter enforcement without affecting the other agents in your system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's next&lt;/strong&gt;&lt;br&gt;
I'm looking for design partners. Teams that are running AI agents in production and want to see what Interven catches in their real environment. I'll set it up for free. I want to understand what patterns show up in production that I haven't seen in my test lab.&lt;br&gt;
If you're deploying agents with CrewAI, LangChain, n8n, or any framework, and you've ever wondered what your agents are actually sending to your APIs, I'd love to talk.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Demo (14 min, real APIs): &lt;a href="https://vimeo.com/1179128874" rel="noopener noreferrer"&gt;https://vimeo.com/1179128874&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Website: &lt;a href="https://intervensecurity.com" rel="noopener noreferrer"&gt;https://intervensecurity.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm happy to answer any technical questions in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>cybersecurity</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
