<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gary</title>
    <description>The latest articles on DEV Community by Gary (@garyzzz).</description>
    <link>https://dev.to/garyzzz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3967502%2Fa540d629-1cde-47cf-b98e-abbaef7f83f8.jpg</url>
      <title>DEV Community: Gary</title>
      <link>https://dev.to/garyzzz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/garyzzz"/>
    <language>en</language>
    <item>
      <title>Gate: a deterministic PII boundary between your data and AI agents</title>
      <dc:creator>Gary</dc:creator>
      <pubDate>Thu, 04 Jun 2026 07:22:13 +0000</pubDate>
      <link>https://dev.to/garyzzz/gate-a-deterministic-pii-boundary-between-your-data-and-ai-agents-1j4</link>
      <guid>https://dev.to/garyzzz/gate-a-deterministic-pii-boundary-between-your-data-and-ai-agents-1j4</guid>
      <description>&lt;h2&gt;
  
  
  The thing that should have been a 2 a.m. incident
&lt;/h2&gt;

&lt;p&gt;You wire a database tool into your coding agent. PostgreSQL, Databricks, an internal HTTP API — whatever. The agent is &lt;em&gt;useful&lt;/em&gt;. It joins tables you'd forgotten existed, drafts the migration, and writes the report. Productivity goes up.&lt;/p&gt;

&lt;p&gt;Two weeks later, you scroll back through a transcript and see this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; select * from users where signup_at &amp;gt; '2026-04-01' limit 5;

[{
  "id": 41021,
  "full_name": "Alice Johnson",
  "email": "alice.johnson@example.com",
  "phone": "+1 415-555-0142",
  "card_last_four": "4242",
  "ssn": "123-45-6789",
  "status": "active"
}, ...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's now in the model's context. From there it's in the conversation log, in any summary the agent generates, in the file it just wrote to &lt;code&gt;/tmp&lt;/code&gt;, and — if the harness has any kind of memory or "share session" feature — potentially somewhere on someone else's machine. The agent didn't do anything wrong. Neither did you. The tool returned what you asked for, the model ingested it, life went on.&lt;/p&gt;

&lt;p&gt;This is the default. Every CLI client, MCP server, and &lt;code&gt;curl | jq&lt;/code&gt; pipeline returns the same bytes a human would see — and with agents, there's no human in the loop to triage what enters the model's window.&lt;/p&gt;

&lt;p&gt;This post is about a tool that fixes the leak at the layer where it can be fixed deterministically.&lt;/p&gt;

&lt;h2&gt;
  
  
  What &lt;code&gt;gate&lt;/code&gt; is
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/GaaraZhu/gate" rel="noopener noreferrer"&gt;&lt;code&gt;gate&lt;/code&gt;&lt;/a&gt; is a single Rust binary that sits between an AI agent and the data tools it calls. It intercepts the &lt;em&gt;output&lt;/em&gt; of configured commands, scans it for PII, and rewrites the values to typed placeholders before the bytes reach the model:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://gaarazhu.github.io/images/introducing-gate/demo.gif" rel="noopener noreferrer"&gt;See the demo on the original post&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- {"id": 1, "email": "alice@example.com", "ssn": "123-45-6789", "status": "active"}
&lt;/span&gt;&lt;span class="gi"&gt;+ {"id": 1, "email": "[PII:email]", "ssn": "[PII:ssn]", "status": "active", "_gate_summary": {"redacted": 2, "types": ["email", "ssn"]}}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The original JSON shape is preserved. The agent can still iterate, count, and reason about rows; it just never gets to see the values it doesn't need.&lt;/p&gt;

&lt;p&gt;The design constraints, in order of priority:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic.&lt;/strong&gt; No LLM-in-the-loop redaction. Same input → same output, every run, on a plane with no network.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bypass-resistant within the harness's threat model.&lt;/strong&gt; The agent should not be able to disable the filter by asking nicely, calling the tool in a clever way, or shelling out through a different verb.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast on the hot path.&lt;/strong&gt; It runs on every Bash command the agent invokes. If it's slow, people turn it off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Honest about its limits.&lt;/strong&gt; A false-negative is worse than a false-positive — the failure mode is silent data exposure, not a noisy block.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A small Rust tool with under 10ms overhead on the hot path, MIT-licensed, builds with &lt;code&gt;cargo build&lt;/code&gt;, and ships on Homebrew.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two access paths
&lt;/h2&gt;

&lt;p&gt;Modern agent harnesses give models data through two doors. The Model Context Protocol has become the de-facto integration layer for Postgres, Snowflake, GitHub, Linear, and internal APIs — and most of the published material on it is about &lt;em&gt;building&lt;/em&gt; servers, not about auditing what they return. The model trusts what the server hands back. Nobody is reading the bytes.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;gate&lt;/code&gt; covers both doors: MCP servers via a stdio proxy, and Bash/CLI tools via a harness hook.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. MCP servers (via a stdio proxy)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;gate mcp&lt;/code&gt; is a tiny stdio JSON-RPC proxy. You register it as the MCP server in your harness; it spawns the real server underneath and forwards every message verbatim — &lt;em&gt;except&lt;/em&gt; &lt;code&gt;tools/call&lt;/code&gt; responses, which are passed through the value-scanner before the bytes return to the model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI ──tools/call──&amp;gt; gate mcp ──forward──&amp;gt; upstream MCP server
                       │
                       │ &amp;lt;── tools/call response with PII
                       │
                       │ Gate 2 scan + redact
                       │
AI &amp;lt;───redacted result─┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The proxy is transparent. Upstream servers run unchanged, and you migrate the whole fleet in one shot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gate init &lt;span class="nt"&gt;--wrap-mcp&lt;/span&gt;                                    &lt;span class="c"&gt;# dry-run: lists every server that would be wrapped&lt;/span&gt;
gate init &lt;span class="nt"&gt;--wrap-mcp&lt;/span&gt; &lt;span class="nt"&gt;--yes&lt;/span&gt;                              &lt;span class="c"&gt;# apply&lt;/span&gt;
gate init &lt;span class="nt"&gt;--wrap-mcp&lt;/span&gt; &lt;span class="nt"&gt;--servers&lt;/span&gt; postgres,github &lt;span class="nt"&gt;--yes&lt;/span&gt;   &lt;span class="c"&gt;# opt-in subset&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This converts every server in your &lt;code&gt;~/.claude.json&lt;/code&gt; (or &lt;code&gt;./.mcp.json&lt;/code&gt; for project scope, or the OpenCode / Copilot CLI equivalents) into a &lt;code&gt;gate mcp &amp;lt;original-command&amp;gt;&lt;/code&gt; proxy in one shot. Already-proxied servers are skipped, so re-running is idempotent. When you add a new MCP server later, run it again.&lt;/p&gt;

&lt;p&gt;What this means concretely: you can adopt a third-party MCP server you don't control — a vendor's Postgres connector, an internal team's CRM bridge — and still get a deterministic PII boundary between what it returns and what your model ingests. Without changing the server. Without trusting its author to have thought about redaction.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Bash tools (via a harness hook)
&lt;/h3&gt;

&lt;p&gt;Every command the agent wants to run — &lt;code&gt;tkpsql query ...&lt;/code&gt;, &lt;code&gt;psql -c ...&lt;/code&gt;, &lt;code&gt;databricks api post ...&lt;/code&gt;, &lt;code&gt;curl https://internal/...&lt;/code&gt; — passes through &lt;code&gt;gate hook&lt;/code&gt; first. The hook checks whether the command matches a tool listed in config. If it does, the command is silently rewritten to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gate run -- &amp;lt;original command&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;gate run&lt;/code&gt; spawns the original subprocess, captures its stdout, runs the two-stage redaction pipeline on the bytes, and emits the sanitized result back. The agent sees the same JSON structure it always did, with values replaced by &lt;code&gt;[PII:&amp;lt;type&amp;gt;]&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The rewrite happens in the harness's pre-tool-execution hook, which means it is &lt;strong&gt;enforcing&lt;/strong&gt;, not advisory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; — &lt;code&gt;PreToolUse&lt;/code&gt; hook in &lt;code&gt;~/.claude/settings.json&lt;/code&gt;; Claude Code substitutes the rewritten command via &lt;code&gt;updatedInput&lt;/code&gt; before spawning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenCode&lt;/strong&gt; — a TypeScript plugin's &lt;code&gt;tool.execute.before&lt;/code&gt; handler mutates &lt;code&gt;output.args.command&lt;/code&gt; in-flight.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; — &lt;code&gt;PreToolUse&lt;/code&gt; hook in &lt;code&gt;.cursor/mcp.json&lt;/code&gt;; Cursor substitutes the rewritten command before spawning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Copilot CLI&lt;/strong&gt; — &lt;code&gt;PreToolUse&lt;/code&gt; hook in &lt;code&gt;.github/hooks/PreToolUse.json&lt;/code&gt; returns &lt;code&gt;modifiedArgs&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI&lt;/strong&gt; — &lt;code&gt;PreToolUse&lt;/code&gt; hook trusted and enabled via the Permissions UI; substitutes the rewritten command before spawning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI&lt;/strong&gt; — &lt;code&gt;PreToolUse&lt;/code&gt; hook; Gemini CLI substitutes the rewritten command after &lt;code&gt;gate init --harness gemini&lt;/code&gt; and a session restart.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent doesn't know &lt;code&gt;gate&lt;/code&gt; is there, and humans running the same commands in a normal terminal are untouched — there's no wrapper script on PATH.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two-gate detection pipeline
&lt;/h2&gt;

&lt;p&gt;Despite the name, &lt;code&gt;gate&lt;/code&gt; is two filters, applied in sequence, with very different jobs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gate 1: SQL intent analysis (best-effort)
&lt;/h3&gt;

&lt;p&gt;When the intercepted command has a &lt;code&gt;sql_arg&lt;/code&gt; configured (e.g. &lt;code&gt;tkpsql --sql&lt;/code&gt;, &lt;code&gt;psql -c&lt;/code&gt;, &lt;code&gt;databricks --json statement&lt;/code&gt;), &lt;code&gt;gate&lt;/code&gt; extracts the SQL string and runs it through a hand-written tokenizer. The goal is modest: figure out which &lt;em&gt;columns&lt;/em&gt; the query selects, so they're marked for guaranteed redaction regardless of what comes back in the value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;contact&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;profiles&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;signup_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gate 1 extracts &lt;code&gt;first_name&lt;/code&gt;, &lt;code&gt;email&lt;/code&gt; (aliased as &lt;code&gt;contact&lt;/code&gt;), and &lt;code&gt;phone&lt;/code&gt;. Any of those that match a PII heuristic gets added to a &lt;code&gt;forced_columns&lt;/code&gt; map. Gate 2 then redacts those fields unconditionally — even if the value happens to be &lt;code&gt;NULL&lt;/code&gt; or &lt;code&gt;"unknown"&lt;/code&gt; or fails a regex check.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why a hand-written tokenizer instead of &lt;code&gt;sqlparser-rs&lt;/code&gt;?&lt;/strong&gt;&lt;br&gt;
Because Gate 1 only needs to find column references. Pulling in a full SQL parser turned out to be a bad trade: more dependencies, more dialect bugs, more code paths where a parse failure could silently drop columns from the plan. The tokenizer is ~300 lines, dialect-agnostic, and on a parse failure it errs toward "I don't know which columns" — which is fine, because Gate 2 then runs on every field.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Gate 1 is &lt;strong&gt;explicitly best-effort&lt;/strong&gt;. It is documented as such. Wildcards, CTEs, function calls around columns, and weird dialects all degrade gracefully:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Gate 1 behaviour&lt;/th&gt;
&lt;th&gt;Safety net&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SELECT email, name FROM u&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;columns extracted ✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SELECT LOWER(email) FROM u&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;function call — column skipped&lt;/td&gt;
&lt;td&gt;Gate 2 catches the value via email regex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SELECT email AS contact&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;alias tracked: &lt;code&gt;contact → email&lt;/code&gt; ✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SELECT * FROM u&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;wildcard — no column hints&lt;/td&gt;
&lt;td&gt;Gate 2 runs on every field; &lt;code&gt;wildcard_policy: reject&lt;/code&gt; can block&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;WITH x AS (SELECT email...)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;only outermost SELECT analysed&lt;/td&gt;
&lt;td&gt;Gate 2 catches via value regex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-standard dialect&lt;/td&gt;
&lt;td&gt;may produce empty plan&lt;/td&gt;
&lt;td&gt;Gate 2 catches via value regex&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is the load-bearing design choice in &lt;code&gt;gate&lt;/code&gt;: &lt;strong&gt;Gate 1 is allowed to be wrong, because Gate 2 is the safety net.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Gate 2: value scanning + column-name heuristics
&lt;/h3&gt;

&lt;p&gt;Gate 2 runs on the JSON response after the subprocess returns. For each field, it applies three checks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Forced columns from Gate 1&lt;/strong&gt; → always redact, regardless of value.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Column-name heuristics&lt;/strong&gt; → tokenise the JSON key (handling &lt;code&gt;snake_case&lt;/code&gt;, &lt;code&gt;camelCase&lt;/code&gt;, &lt;code&gt;PascalCase&lt;/code&gt;, &lt;code&gt;UPPER_CASE&lt;/code&gt;) and match against ~50 PII categories. &lt;code&gt;userEmail&lt;/code&gt;, &lt;code&gt;user_email&lt;/code&gt;, and &lt;code&gt;USER_EMAIL&lt;/code&gt; all resolve to the same rule.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Value patterns&lt;/strong&gt; → regex matches for email, US/AU/NZ phone, US SSN, AU ABN, AU Medicare, AU/NZ TFN/IRD (formatted), NZ NHI, NZ bank account numbers, plus a Luhn check for payment cards.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The column-name match adds a confidence boost to any value match in the same field. Gate 2 always redacts on any match — there is no threshold to clear. Low-confidence matches (e.g. a 9-digit string in a column called &lt;code&gt;tax_id&lt;/code&gt;) are redacted &lt;em&gt;and&lt;/em&gt; flagged with a &lt;code&gt;low-confidence&lt;/code&gt; warning in &lt;code&gt;_gate_summary&lt;/code&gt;. Review flagged columns via &lt;code&gt;gate retro&lt;/code&gt;, then silence a false positive with &lt;code&gt;gate allowlist add &amp;lt;column&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The output goes back as the same JSON the tool produced, with values rewritten in-place and a &lt;code&gt;_gate_summary&lt;/code&gt; block appended so the agent can reason about what was scrubbed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rows"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[PII:email]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ssn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[PII:ssn]"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_gate_summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"redacted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"types"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ssn"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"warnings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Honesty about the gaps
&lt;/h2&gt;

&lt;p&gt;The full threat model is in the &lt;a href="https://github.com/GaaraZhu/gate/blob/main/THREAT-MODEL.md" rel="noopener noreferrer"&gt;repo&lt;/a&gt; but the headlines are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gate&lt;/code&gt; is not a sandbox.&lt;/strong&gt; It only filters commands explicitly listed in &lt;code&gt;tools:&lt;/code&gt;. Anything else passes through.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The adversary model is an inadvertent agent, not a malicious one.&lt;/strong&gt; &lt;code&gt;sudo gate protect&lt;/code&gt; (Unix) chowns the config to root so a hijacked agent can't disable gate via config edits, but a jailbroken agent that deliberately base64-encodes data, requests CSV output, or exfiltrates through a non-intercepted tool is still out of scope. Combine &lt;code&gt;gate&lt;/code&gt; with harness-level tool restrictions and a read-only database role if you need that boundary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Value regex covers the common cases and AU/NZ.&lt;/strong&gt; Email, US SSN (dashes required — &lt;code&gt;123456789&lt;/code&gt; slips), payment cards (via Luhn), and phone numbers including AU/NZ mobile and landline in both local format (&lt;code&gt;04XX&lt;/code&gt;/&lt;code&gt;02X&lt;/code&gt;, &lt;code&gt;0[2378]&lt;/code&gt;/&lt;code&gt;0[34679]&lt;/code&gt;) and international format (&lt;code&gt;+61&lt;/code&gt;/&lt;code&gt;+64&lt;/code&gt; prefixes) — international-prefix numbers auto-redact in any column; local-format numbers require a PII-named column. AU/NZ-specific identifiers caught by value: ABN (mod-89 checksum), Medicare card (mod-10), formatted TFN and IRD (mod-11, separators required), NZ NHI, and NZ bank account numbers. Bare TFN/IRD strings without separators are not caught by value alone — column-name matching is the safety net there. IBAN, passport, NHS, Aadhaar, and other non-AU/NZ formats rely on column-name matching only; extend &lt;code&gt;pii.patterns&lt;/code&gt; for your region.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP &lt;code&gt;resources/read&lt;/code&gt; and &lt;code&gt;prompts/get&lt;/code&gt; are not redacted.&lt;/strong&gt; Only &lt;code&gt;tools/call&lt;/code&gt; responses go through the scanner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-JSON output is not redacted.&lt;/strong&gt; If a tool emits CSV or plain text, configure a &lt;code&gt;pipe:&lt;/code&gt; to convert it (the example config uses &lt;code&gt;jq -c .&lt;/code&gt; for curl and a 3-line Python &lt;code&gt;csv.DictReader&lt;/code&gt; for &lt;code&gt;psql --csv&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disable mechanisms exist.&lt;/strong&gt; &lt;code&gt;enabled: false&lt;/code&gt; in config, deleting the config file, or removing the hook entry from the harness settings. &lt;code&gt;sudo gate protect&lt;/code&gt; (Unix) chowns the config to root to block the first two from inside the agent, but the harness settings file is still user-writable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If any of these are deal-breakers, the tool is honest about it up front. Better than discovering it in a post-mortem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not just mask the data at the source?
&lt;/h2&gt;

&lt;p&gt;Database-level masking — static anonymised copies, dynamic data masking (DDM), row security policies — is the right answer when you control the source and have the access to configure it. Gate fills the gap when you don't, and covers the paths masking can't reach.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;gate&lt;/th&gt;
&lt;th&gt;Database masking&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Requires DB admin access&lt;/td&gt;
&lt;td&gt;✅ No changes to the database&lt;/td&gt;
&lt;td&gt;❌ Needs column-level config by a DBA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works on vendor / external DBs&lt;/td&gt;
&lt;td&gt;✅ Wraps any JSON-returning tool&lt;/td&gt;
&lt;td&gt;❌ Only databases you administer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Covers MCP and API tools&lt;/td&gt;
&lt;td&gt;✅ GitHub, Linear, internal APIs — any &lt;code&gt;tools/call&lt;/code&gt; response&lt;/td&gt;
&lt;td&gt;❌ No masking concept at this layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production data freshness&lt;/td&gt;
&lt;td&gt;✅ Works against live data&lt;/td&gt;
&lt;td&gt;❌ Static copies drift; DDM may lag&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent bypass resistance&lt;/td&gt;
&lt;td&gt;✅ Direct value exposure blocked in harness hook&lt;/td&gt;
&lt;td&gt;❌ Aggregate functions and CASE expressions can bypass DDM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Known gaps&lt;/td&gt;
&lt;td&gt;✅ Documented&lt;/td&gt;
&lt;td&gt;❌ DDM gaps are often silent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;They're complementary: if you have DDM configured, gate is the safety net for the paths and patterns DDM misses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a deterministic CLI and not "just ask the model"
&lt;/h2&gt;

&lt;p&gt;It is technically possible to ask the model to redact its own input before it ingests it. People are building this. I chose not to, for three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost.&lt;/strong&gt; Every query result would round-trip through a model call. A single agent session might run hundreds of queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency.&lt;/strong&gt; A hook on every Bash command needs to return in single-digit milliseconds. &lt;code&gt;gate hook&lt;/code&gt;'s passthrough path is in that ballpark; an LLM call is not.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditability.&lt;/strong&gt; "Why was this field redacted?" needs an answer that survives review. A regex and a tokenizer can be inspected, golden-file tested, and re-run on the same input forever. A model in 2026 will not give the same output on the same input in 2027, and you will not get a stack trace.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Existing PII tools (Presidio, Nightfall, Skyflow) take the opposite trade — they're mostly built for data-pipeline or SaaS-gateway use, sitting at API boundaries or in batch jobs, not in the agent's tool-execution path with single-digit-ms latency and harness-level hook enforcement. &lt;code&gt;gate&lt;/code&gt; is shaped specifically for that boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it costs to try
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux via Homebrew&lt;/span&gt;
brew tap GaaraZhu/gate &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; brew &lt;span class="nb"&gt;install &lt;/span&gt;gate

&lt;span class="c"&gt;# or cargo binstall / direct download — see the README&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gate scan             &lt;span class="c"&gt;# pipe your schema in — risk report by PII tier, exits 1 if any found&lt;/span&gt;
gate config           &lt;span class="c"&gt;# creates ~/.config/gate/config.yaml in your editor&lt;/span&gt;
gate init             &lt;span class="c"&gt;# registers the PreToolUse hook in ~/.claude/settings.json&lt;/span&gt;
gate init &lt;span class="nt"&gt;--wrap-mcp&lt;/span&gt;  &lt;span class="c"&gt;# dry-run: shows which MCP servers would be wrapped&lt;/span&gt;
gate validate         &lt;span class="c"&gt;# compiles all regex patterns, lints the config&lt;/span&gt;
gate retro            &lt;span class="c"&gt;# after a few sessions: tally of what was redacted and where&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;gate disable&lt;/code&gt; to turn it off if you need to debug something, and &lt;code&gt;gate enable&lt;/code&gt; to switch it back on. &lt;code&gt;gate uninstall&lt;/code&gt; removes everything &lt;code&gt;gate&lt;/code&gt; added to your system and asks for confirmation before each step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this is going
&lt;/h2&gt;

&lt;p&gt;What's next, roughly in priority order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More built-in patterns by region.&lt;/strong&gt; AU/NZ identifiers are now covered natively. Community PRs adding IBAN, passport, NHS, Aadhaar, and other regional formats are welcome.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP &lt;code&gt;resources/read&lt;/code&gt; redaction.&lt;/strong&gt; Closing the one documented gap in the MCP path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows hardening.&lt;/strong&gt; The binary builds and runs on Windows, but test coverage is thin — &lt;code&gt;gate protect&lt;/code&gt; (config ownership transfer) is Unix-only, and edge cases around path handling and terminal output are less exercised. Contributions welcome.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've been holding off on connecting your AI agent to a real data source — database, internal API, or MCP server — because "what the model sees" was a vibes-based decision, this is the layer that turns it into a config file. Try it, scan your schema, and share what you find. The repo is &lt;a href="https://github.com/GaaraZhu/gate" rel="noopener noreferrer"&gt;github.com/GaaraZhu/gate&lt;/a&gt;. The issue tracker is open. The license is MIT.&lt;/p&gt;

&lt;p&gt;I'd rather hear "gate redacted something it shouldn't have" than "gate let something through that it shouldn't have." If you find the second one, that's a security bug and there's a &lt;a href="https://github.com/GaaraZhu/gate/blob/main/SECURITY.md" rel="noopener noreferrer"&gt;process for it&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>security</category>
      <category>claude</category>
    </item>
  </channel>
</rss>
