<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Stella Lin</title>
    <description>The latest articles on DEV Community by Stella Lin (@stella_lin_82914c71e25769).</description>
    <link>https://dev.to/stella_lin_82914c71e25769</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3919980%2F8e5c393d-fbb3-46f3-b7c8-4e185ddfc0f6.jpg</url>
      <title>DEV Community: Stella Lin</title>
      <link>https://dev.to/stella_lin_82914c71e25769</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stella_lin_82914c71e25769"/>
    <language>en</language>
    <item>
      <title>A HIPAA-safe alert pipeline checklist (8 controls)</title>
      <dc:creator>Stella Lin</dc:creator>
      <pubDate>Fri, 08 May 2026 22:08:52 +0000</pubDate>
      <link>https://dev.to/stella_lin_82914c71e25769/a-hipaa-safe-alert-pipeline-checklist-8-controls-3ppk</link>
      <guid>https://dev.to/stella_lin_82914c71e25769/a-hipaa-safe-alert-pipeline-checklist-8-controls-3ppk</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://theculprit.ai/blog/hipaa-checklist-for-alert-pipelines" rel="noopener noreferrer"&gt;theculprit.ai/blog/hipaa-checklist-for-alert-pipelines&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;The compliance review for a healthtech SaaS usually treats the alert pipeline as a footnote.&lt;/p&gt;

&lt;p&gt;The product is HIPAA-ready, the database is encrypted, the BAAs are signed, the access controls are documented. Then someone runs &lt;code&gt;grep&lt;/code&gt; on a week of monitoring logs and finds patient IDs, member emails, and the occasional plaintext SSN sitting in alert payloads — copies of which were forwarded to a third-party log aggregator (without a BAA), surfaced to an LLM-based incident-analysis tool (also without a BAA), and rendered in plaintext inside a Slack channel that a contractor was a member of last month.&lt;/p&gt;

&lt;p&gt;The product wasn't the leak. The alert pipeline was. And alert pipelines are a near-universal blind spot because the engineering team that built the application isn't the same team that wired up the alerting, and the alerting tools don't advertise themselves as PHI-handling systems.&lt;/p&gt;

&lt;p&gt;This post is the checklist a healthtech engineering team can hand a HIPAA auditor and say "here's how the alert path is treated like the rest of the data path." Eight controls, mapped to the HIPAA Security Rule's Technical Safeguards (45 CFR 164.312), with concrete pointers to what each one looks like in code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where PHI gets into alert payloads
&lt;/h2&gt;

&lt;p&gt;Before the controls, the threat model. A few common paths PHI takes into a monitoring alert:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stack traces from production exceptions.&lt;/strong&gt; A &lt;code&gt;NullReferenceException&lt;/code&gt; in a patient-record handler captures the request URL, often containing patient identifiers. A failed insert captures the row being inserted, often containing PHI fields. Your error-tracking vendor will happily forward these verbatim to whichever notification channels you've configured — usually without a redaction step in between.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Webhook payloads from third-party services.&lt;/strong&gt; A claims-clearing house's status webhook may include the member identifier in the body. A pharmacy benefit manager's notification includes the prescription. The alert that fires when the webhook 500s contains the full payload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database query timeouts.&lt;/strong&gt; Slow-query log lines often include the bound parameters of the query — patient IDs, dates of birth, diagnosis codes. The alert that fires on "slow query" forwards the line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application logs surfaced into alerts.&lt;/strong&gt; A log line emitted by your code with &lt;code&gt;logger.warn({ user, request })&lt;/code&gt; becomes the body of an alert when an aggregator's threshold fires. The full &lt;code&gt;user&lt;/code&gt; object — email, phone, SSN-last-4 — rides along.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health-check failure responses.&lt;/strong&gt; A health-check endpoint that returns the failing patient-record's ID in its error body propagates that ID into the uptime monitor's alert.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In each case, PHI lands somewhere outside the application's authorized data path: a log aggregator, a notification channel, an incident-analysis tool, an on-call engineer's phone screen. Most of those somewheres are vendors who have not signed a BAA with you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What HIPAA's Technical Safeguards actually require
&lt;/h2&gt;

&lt;p&gt;The relevant subsection of the Security Rule (45 CFR 164.312) names five Technical Safeguards. Five sound like a lot; the load-bearing ones for an alert pipeline are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;§ 164.312(a)(1) Access control&lt;/strong&gt; — only authorized personnel can decrypt PHI; the system enforces this in code, not by trust.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;§ 164.312(b) Audit controls&lt;/strong&gt; — every access to PHI is recorded; the audit trail itself is tamper-evident.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;§ 164.312(c)(1) Integrity&lt;/strong&gt; — PHI cannot be altered or destroyed by unauthorized parties; this includes side-channel destruction (e.g. a forgotten log retention deletes the only audit trail of a breach).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;§ 164.312(d) Person or entity authentication&lt;/strong&gt; — every PHI-accessing actor is authenticated with traceable identity, not "the on-call account."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;§ 164.312(e)(1) Transmission security&lt;/strong&gt; — PHI is encrypted in transit; this includes intra-system hops, not just the user-facing TLS layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The piece that catches most alert pipelines isn't any single safeguard — it's that the alert path is &lt;em&gt;not treated as a PHI path&lt;/em&gt;, so none of these safeguards are applied to it specifically. The Notice of Privacy Practices doesn't mention monitoring alerts. The internal access-control matrix lists the application's data store but not the log aggregator. The audit log captures application-level reads but not "the on-call engineer saw the alert payload."&lt;/p&gt;

&lt;p&gt;The checklist below addresses each gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 8-item checklist
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tokenize PHI at ingest, before any storage
&lt;/h3&gt;

&lt;p&gt;The first system that receives an alert payload (your ingestion edge) replaces every PHI value with an opaque token before writing the payload to any backing store. Concretely: a regex pass over the raw payload identifies high-confidence PHI shapes (emails, IPs, SSNs, common ID formats), each match gets replaced with &lt;code&gt;&amp;lt;EMAIL_a3f9&amp;gt;&lt;/code&gt; / &lt;code&gt;&amp;lt;SSN_b8c4&amp;gt;&lt;/code&gt; / &lt;code&gt;&amp;lt;IP_2c1e&amp;gt;&lt;/code&gt;, the token-to-real mapping is encrypted with the customer's per-tenant key and stored in a vault separate from the alert event row.&lt;/p&gt;

&lt;p&gt;After this step, the alert row in the operational store contains tokens only. Every downstream stage (correlation, LLM analysis, notification fan-out, log retention) operates on the tokenized form. The vault is read only by code paths that pass an authorization check.&lt;/p&gt;

&lt;p&gt;What this earns: the alert pipeline now satisfies §164.312(a)(1) and §164.312(e)(1) for everything past the ingest edge — there is no PHI to access without going through the vault, and there is no PHI in transit to any downstream system.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Encrypt the vault at rest with customer-controlled keys
&lt;/h3&gt;

&lt;p&gt;The vault that holds the token-to-real mapping is encrypted at rest with a customer-specific symmetric key. Postgres's &lt;code&gt;pgcrypto&lt;/code&gt; extension gives you &lt;code&gt;pgp_sym_encrypt()&lt;/code&gt; for this — the encrypted bytes go into a &lt;code&gt;bytea&lt;/code&gt; column, and only the application's authorized code paths know the key.&lt;/p&gt;

&lt;p&gt;Two decisions that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Key per tenant, not key per row.&lt;/strong&gt; Per-row keys are a key-management nightmare and don't add real security. Per-tenant keys mean a key rotation only requires re-encrypting one tenant's vault.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The key never enters the alert row's storage system.&lt;/strong&gt; Keys live in your secret store (1Password / AWS Secrets Manager / Cloudflare Workers' bindings) and are pulled into the application process at startup. A snapshot of the database without the keys is not a PHI breach.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What this earns: §164.312(c)(1) integrity (the vault is tamper-evident — modifying ciphertext without the key produces decryption failure) and one half of §164.312(e)(1) (encrypted at rest).&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Use SECURITY DEFINER functions for vault access, not direct SELECTs
&lt;/h3&gt;

&lt;p&gt;Application code never &lt;code&gt;SELECT&lt;/code&gt;s from the vault directly. Instead, it calls a SQL function defined as &lt;code&gt;SECURITY DEFINER&lt;/code&gt; (in Postgres) that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifies the caller is authorized to decrypt this specific record (the tenant matches, the actor has the right role, the access is being made in the context of an active incident, etc.)&lt;/li&gt;
&lt;li&gt;Decrypts the requested tokens using the tenant's key&lt;/li&gt;
&lt;li&gt;Writes an audit row capturing who decrypted what, when&lt;/li&gt;
&lt;li&gt;Returns the plaintext to the caller&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Wrapping decryption in a function gives you a single chokepoint to enforce all the access-control and audit-logging rules. Without it, every code path that wants to display PHI has to remember to do those checks, and the checks will drift.&lt;/p&gt;

&lt;p&gt;What this earns: §164.312(a)(1) access control (the function is the access enforcement) plus §164.312(b) audit controls (the function writes the audit row).&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Send only tokens to LLM analysis, never raw PHI
&lt;/h3&gt;

&lt;p&gt;Any LLM-driven analysis (root-cause inference, correlation, summarization) operates on the tokenized payload. The model sees &lt;code&gt;&amp;lt;EMAIL_a3f9&amp;gt;&lt;/code&gt; instead of &lt;code&gt;alice@example.com&lt;/code&gt;. The model's output, similarly, contains tokens — your UI rehydrates them into plaintext only on display, only for authenticated users with the right access.&lt;/p&gt;

&lt;p&gt;Why this matters even with a BAA-covered LLM vendor: the model's training data, the model's prompt cache, the model's logs, the inference platform's debug surfaces, the conversation context an engineer might paste into a developer console — all of these are surfaces where the prompt could end up being retained or visible. Sending tokens means none of those surfaces ever holds PHI.&lt;/p&gt;

&lt;p&gt;What this earns: closes the most common HIPAA-blast-radius gap in modern alert pipelines (LLM analysis was a 2023 addition for many teams, and the controls didn't get updated).&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Audit every PHI rehydration
&lt;/h3&gt;

&lt;p&gt;Every time a token is decrypted to plaintext (a UI that shows the original value, an export to PDF, a customer-support tool that surfaces the data), an audit row is written. The audit row captures: who (authenticated user ID), what (which tokens), when (timestamp), in what context (incident ID, ticket ID, support session ID).&lt;/p&gt;

&lt;p&gt;The audit table is append-only — no updates, no deletes from the application — and is itself protected (separate access control, separate retention).&lt;/p&gt;

&lt;p&gt;What this earns: §164.312(b) audit controls. A HIPAA auditor's standard test is "show me a record of every time PHI for patient X was accessed in the last 90 days." If you can produce that report from one audit table, you pass; if you have to assemble it from log files across five vendors, you fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Default-deny on outbound notifications
&lt;/h3&gt;

&lt;p&gt;Every channel the alert pipeline can fan out to (PagerDuty, Slack, email, webhook) receives the tokenized payload by default. To send plaintext, the channel configuration must explicitly opt in — and the opt-in is logged + reviewed quarterly.&lt;/p&gt;

&lt;p&gt;The default matters because new channels get added regularly ("can we send these to the new on-call rotation in the platform team?"), and the safe default is "yes, with tokens." If the default were "yes, with plaintext," every new channel introduces a fresh BAA conversation that's likely to be skipped under deadline pressure.&lt;/p&gt;

&lt;p&gt;What this earns: §164.312(e)(1) transmission security (PHI doesn't leave the system in plaintext); also a major reduction in BAA scope (you only need BAAs with vendors that actually receive plaintext, which is a much smaller set).&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Auto-resolve quiet alerts to limit retention
&lt;/h3&gt;

&lt;p&gt;The HIPAA Security Rule doesn't specify a retention period for alerts, but the principle behind §164.312(c)(1) is that PHI shouldn't sit indefinitely in places where it's not actively serving a clinical or operational purpose.&lt;/p&gt;

&lt;p&gt;A practical control: incidents that have been quiet for 30 minutes and don't have an active investigation get auto-resolved. Auto-resolve doesn't delete the underlying tokenized payloads (you may need them for a future investigation), but it moves the incident out of the on-call queue and out of the active dashboard. The vault retention policy (separately) governs how long the encrypted plaintext is kept; a defensible default is "as long as the corresponding incident is needed for audit, then deleted via a periodic sweep."&lt;/p&gt;

&lt;p&gt;What this earns: bounded retention of accessible PHI, which limits both the §164.312(c)(1) integrity surface and the volume that has to be re-reviewed during a periodic access audit.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Tenant-isolation at the database, not at the application
&lt;/h3&gt;

&lt;p&gt;Multi-tenant SaaS architectures often enforce tenant isolation in application code ("the application only queries rows where &lt;code&gt;tenant_id&lt;/code&gt; matches the authenticated session"). For HIPAA, this is too weak — a single bug in any code path that omits the predicate is a cross-tenant breach.&lt;/p&gt;

&lt;p&gt;The control: enforce tenant isolation at the database level via Row-Level Security (Postgres's &lt;code&gt;RLS&lt;/code&gt;), where every row in every PHI-adjacent table has a &lt;code&gt;tenant_id&lt;/code&gt; column and the database itself rejects queries that don't match the active session's tenant. Application code can still construct the query without the predicate; the database refuses to return cross-tenant data.&lt;/p&gt;

&lt;p&gt;What this earns: §164.312(a)(1) access control, with the enforcement layer being the database (not the application). A bug in application code can no longer cause a cross-tenant PHI leak — the bug would have to be in the RLS policy itself, which is much smaller surface area to review.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;Three concrete patterns from a production alert pipeline that ships with these controls:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edge tokenization in TypeScript:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Conceptual; production code wraps this in vault writes + key resolution.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sanitized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;EMAIL_REGEX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&amp;lt;EMAIL_&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;salt&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;vault&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;encrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tenantKey&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;hmac&lt;/code&gt; slice keeps tokens deterministic per-value within a tenant — the same email always produces the same token, so correlation across alerts works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SECURITY DEFINER decryption in Postgres:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;token_decrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_incident_id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_tokens&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plaintext&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SECURITY&lt;/span&gt; &lt;span class="k"&gt;DEFINER&lt;/span&gt;
&lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="c1"&gt;-- Authorization: caller must have read access to this incident's tenant.&lt;/span&gt;
  &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;tenant_members&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;incidents&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_incident_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;RAISE&lt;/span&gt; &lt;span class="n"&gt;EXCEPTION&lt;/span&gt; &lt;span class="s1"&gt;'forbidden'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;-- Audit the decryption.&lt;/span&gt;
  &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;token_decrypt_audit&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;incident_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_incident_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;array_length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="c1"&gt;-- Decrypt + return.&lt;/span&gt;
  &lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;QUERY&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pgp_sym_decrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encrypted_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;())::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;vault&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ANY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_tokens&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the only path application code uses to see plaintext. Direct &lt;code&gt;SELECT FROM vault&lt;/code&gt; is rejected by RLS; &lt;code&gt;token_decrypt&lt;/code&gt; is the chokepoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token-only LLM prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`Analyze the following sanitized incident events:
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;sanitizedEvents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;
The events contain placeholder tokens like &amp;lt;EMAIL_x&amp;gt; for redacted PII.
Do not attempt to infer the actual values; reason about the patterns.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM never sees plaintext. Its output cites the tokens; the UI rehydrates them via &lt;code&gt;token_decrypt&lt;/code&gt; only when an authorized user clicks "show plaintext."&lt;/p&gt;

&lt;h2&gt;
  
  
  What you sign up for
&lt;/h2&gt;

&lt;p&gt;These controls aren't free. Three honest tradeoffs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cluster quality drops slightly when the LLM can't see literal values.&lt;/strong&gt; Two alerts that both reference &lt;code&gt;alice@example.com&lt;/code&gt; cluster trivially when the LLM sees the email; with tokens, the cluster has to come from the surrounding context. Mitigations exist (deterministic tokens that produce the same placeholder for the same value, so co-occurrence is preserved) but don't fully close the gap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The audit table grows.&lt;/strong&gt; Every rehydration writes a row. At healthcare-SaaS scale (thousands of incidents per month per tenant) the table grows into the millions of rows per year. Plan for this with partitioning + a separate retention policy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The on-call experience adds one click.&lt;/strong&gt; Engineers who used to see the patient ID inline now see &lt;code&gt;&amp;lt;EMAIL_a3f9&amp;gt;&lt;/code&gt; and click "show plaintext." For most incidents the click is unnecessary (the tokens are enough to triage); for the 10% where it matters, the extra click is the cost of the audit trail.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The alternative is the status quo: PHI in alerts, scattered across vendors, audit trails that don't exist, and the cheerful assumption that nobody on the security team will think to look at the alert pipeline during the next audit. That assumption holds until it doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this generalizes
&lt;/h2&gt;

&lt;p&gt;The controls above are written for HIPAA but apply with minor edits to any compliance regime that requires (a) data classification, (b) access control, and (c) audit logging on a sensitive-data path. SOC 2's Common Criteria 6 (Logical and Physical Access Controls) maps to controls 1, 3, 5, 8. ISO 27001's Annex A.9 (Access Control) maps to controls 3, 5, 8. GDPR's Article 32 (Security of Processing) maps to controls 1, 2, 4, 6.&lt;/p&gt;

&lt;p&gt;The pattern is universal: &lt;strong&gt;treat the alert pipeline as a sensitive-data path&lt;/strong&gt;, not as a footnote. Every piece of PHI / PII / PCI that flows through the alert path should be tokenized, encrypted, audited, and tenant-isolated by the same primitives that protect the application's data store. Once those primitives exist (controls 1-3 are the foundation), the rest follows.&lt;/p&gt;

&lt;p&gt;This is the architecture behind &lt;a href="https://theculprit.ai" rel="noopener noreferrer"&gt;Culprit's&lt;/a&gt; edge tokenization model — every alert payload arrives, gets tokenized, gets stored encrypted, and downstream tools see only tokens. The /security page documents the specific control mapping if your security team wants to review the architecture during a vendor review.&lt;/p&gt;

</description>
      <category>hipaa</category>
      <category>security</category>
      <category>observability</category>
      <category>compliance</category>
    </item>
    <item>
      <title>Anthropic prompt caching cut our RCA cost by 90%</title>
      <dc:creator>Stella Lin</dc:creator>
      <pubDate>Fri, 08 May 2026 21:44:26 +0000</pubDate>
      <link>https://dev.to/stella_lin_82914c71e25769/anthropic-prompt-caching-cut-our-rca-cost-by-90-5gmb</link>
      <guid>https://dev.to/stella_lin_82914c71e25769/anthropic-prompt-caching-cut-our-rca-cost-by-90-5gmb</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://theculprit.ai/blog/anthropic-prompt-caching-90-percent" rel="noopener noreferrer"&gt;theculprit.ai/blog/anthropic-prompt-caching-90-percent&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;LLM costs in production scale faster than the post-mortem of the demo bill suggests they will.&lt;/p&gt;

&lt;p&gt;The shape of the problem: you ship a feature that calls Claude on every meaningful event. The first month the bill is rounding error and nobody looks at it. The second month a customer's traffic ramps and the line item is suddenly five percent of revenue. The third month your finance person sends a polite Slack about whether this is "a real cost trend or a one-time spike," and everyone on the engineering team has to defend an architecture decision they made eight weeks ago when the bill was rounding error.&lt;/p&gt;

&lt;p&gt;You can reduce this. Not by being clever about how you call the model — by being clever about what's &lt;em&gt;constant&lt;/em&gt; across your calls. Anthropic's prompt caching, in our case, takes the per-RCA input cost from full-rate to one-tenth of full-rate on a 90%+ cache-hit rate. That's not a hypothetical; it's what we measure in production, and the math is simple enough to walk through here so you can run the numbers on your own pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pricing structure
&lt;/h2&gt;

&lt;p&gt;Anthropic publishes four price points per model. For Claude Haiku 4.5, the model we run as the default for incident root-cause analysis, those points are (verified from the Anthropic API docs):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Token category&lt;/th&gt;
&lt;th&gt;Haiku 4.5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Base input&lt;/td&gt;
&lt;td&gt;$1.00 per million tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache write (5-minute TTL)&lt;/td&gt;
&lt;td&gt;$1.25 per million tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache read&lt;/td&gt;
&lt;td&gt;$0.10 per million tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;td&gt;$5.00 per million tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two things to read from that table:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cache read is 10x cheaper than base input.&lt;/strong&gt; Same tokens in the request body, ten percent of the cost — &lt;em&gt;if&lt;/em&gt; you can get them into the cache.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache write is 25% more expensive than base input.&lt;/strong&gt; First time you send a cached segment, you're paying a small premium so the next request can pay the discount. The math only pays off if you call the model with the same cached segment more than ~1.25 times on average within the 5-minute TTL window.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That second point is the one most teams miss. If your call pattern is "one-shot, cold cache every time," prompt caching makes you slightly worse off. The win comes from repeatable structure across calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's actually cacheable in an RCA call
&lt;/h2&gt;

&lt;p&gt;A typical RCA call has five sources of tokens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;System prompt.&lt;/strong&gt; Defines the role ("you are an SRE analyzing an incident"), the JSON schema for the response, and any guardrails. Identical across every call across every tenant. Maybe 800-1500 tokens depending on how rigorous your schema is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval context&lt;/strong&gt; ("here are 3 prior incidents from this same service that resolved similarly"). Static for a few minutes within a Batch run on one tenant + service. Maybe 400-800 tokens depending on how aggressive the retrieval is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-incident events&lt;/strong&gt; ("event 1 at 14:32:01: ConnectionPoolExhausted...; event 2 at 14:32:04: ..."). Unique to the incident under analysis. Cannot be cached across incidents. Typically 1500-3000 tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-incident metadata&lt;/strong&gt; (incident ID, service ID, severity). Tiny but unique.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output tokens.&lt;/strong&gt; The model's response. Cost is fixed at the output rate; caching doesn't apply.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Sources 1 and 2 are cacheable. Sources 3 and 4 are not. Source 5 is irrelevant.&lt;/p&gt;

&lt;p&gt;In our distribution, sources 1 + 2 are roughly 70-80% of the input tokens for a typical RCA call. Cache them at 0.10 per million; pay full rate on the remaining 20-30%; total input cost drops by about 60-70% from the naive baseline. The "90%" headline number rounds up because we measure cache &lt;em&gt;hits&lt;/em&gt;, not total &lt;em&gt;cost&lt;/em&gt;, and within the cached portion the savings really are 90%.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two-segment trick
&lt;/h2&gt;

&lt;p&gt;Anthropic's API takes a &lt;code&gt;cache_control&lt;/code&gt; marker per segment in your &lt;code&gt;system&lt;/code&gt; array. Each marker is an independent breakpoint — the cache stores tokens &lt;em&gt;up to&lt;/em&gt; the marker. If you have two segments, the API caches each one separately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Conceptual shape — see rca-prompt.ts for the exact code we run.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c1"&gt;// ~1200 tokens, identical everywhere&lt;/span&gt;
    &lt;span class="na"&gt;cache_control&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ephemeral&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;priorIncidentsContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;// ~600 tokens, per-tenant per-service&lt;/span&gt;
    &lt;span class="na"&gt;cache_control&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ephemeral&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why two segments instead of one? Because the cache lifetime for those two pieces is different.&lt;/p&gt;

&lt;p&gt;The system prompt almost never changes — every RCA call across every tenant hits the cache. Cache read essentially every time after the first call.&lt;/p&gt;

&lt;p&gt;The retrieval context (prior similar incidents for &lt;em&gt;this&lt;/em&gt; service) changes whenever a new incident on that service resolves and shifts the top-K. Within a single Batch run on one tenant + service, repeats hit the cache. Across tenants, never.&lt;/p&gt;

&lt;p&gt;If you stuff both into a single segment, the moment the retrieval context for tenant A changes, tenant B's hit rate drops too — because the &lt;em&gt;one&lt;/em&gt; combined segment hashes differently. Two segments → independent cache lifetimes → tenant A's churn doesn't punish tenant B.&lt;/p&gt;

&lt;p&gt;The order matters. Anthropic caches &lt;em&gt;up to&lt;/em&gt; each marker, so the more-static segment must come first. If you put per-tenant retrieval first and the static system prompt second, the static prompt's cache key now includes the per-tenant content above it; you've just made the most cacheable segment uncacheable across tenants.&lt;/p&gt;

&lt;h2&gt;
  
  
  What kills the cache
&lt;/h2&gt;

&lt;p&gt;In rough order of frequency:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 5-minute ephemeral TTL.&lt;/strong&gt; A cached segment expires 5 minutes after its last write. If your call pattern is bursty (RCA calls cluster around incidents, then quiet for an hour), a long quiet period will let every cached segment expire and you'll pay cache &lt;em&gt;write&lt;/em&gt; (slightly above base rate) on the next batch. Spread your calls if you can; if you can't, accept that the first few calls after a quiet period pay full freight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Whitespace drift.&lt;/strong&gt; If you concatenate the system prompt with &lt;code&gt;\n\n&lt;/code&gt; in one place and &lt;code&gt;\n&lt;/code&gt; in another, you have two distinct cache keys. The cache hashes the literal token sequence, not the semantic meaning. Pick one separator and lint for it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trailing dynamic content.&lt;/strong&gt; A common bug: someone adds a timestamp to the "system prompt" — &lt;code&gt;Today's date is 2026-05-08T14:32:01Z&lt;/code&gt; — for "context". The timestamp changes every call. Now nothing cached after the timestamp survives. Keep dynamic content out of cached segments entirely; pass it as a user-message turn instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema version churn.&lt;/strong&gt; If you're iterating on your JSON output schema (a normal early-product activity), every schema edit invalidates every cached system prompt. The cost of "tuning the schema" is partly paid in cache misses. Plan for one or two big schema-stabilization sweeps rather than continuous tweaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The production numbers
&lt;/h2&gt;

&lt;p&gt;Per-RCA cost on Haiku 4.5 with prompt caching enabled, Batch API (which itself adds another 50% off both input and output), 4000 input tokens + 500 output tokens, ~75% of input tokens cached:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input (cached portion, 3000 tokens × 0.5 batch × 0.10 cache-read): &lt;strong&gt;$0.00015&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Input (uncached portion, 1000 tokens × 0.5 batch × 1.00 base): &lt;strong&gt;$0.00050&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Output (500 tokens × 0.5 batch × 5.00): &lt;strong&gt;$0.00125&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Cache write amortized (1200 tokens × 0.5 batch × 1.25, divided across ~30 cache hits per write cycle): &lt;strong&gt;~$0.00003&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total: &lt;strong&gt;~$0.0033 per RCA call.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without caching, same call shape, real-time API: input would be ~$0.004, output would be ~$0.0025, total ~$0.0065. Caching alone gets us a ~50% reduction on input. Batch API gets us another 50% on top. Caching + Batch is what makes the per-RCA cost sit around a third of a cent.&lt;/p&gt;

&lt;p&gt;A cluster of typical incidents at this rate is the difference between "a flat-rate pricing model that works" and "a flat-rate pricing model with worst-case unit economics that don't." We document this in our &lt;a href="https://dev.to/pricing"&gt;pricing rationale&lt;/a&gt; — the discipline isn't a marketing posture, it's the load-bearing constraint that lets the price stay flat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this generalizes
&lt;/h2&gt;

&lt;p&gt;If you're calling Claude on a per-event or per-incident schedule, the structure above applies to whatever shape your calls take. The questions to answer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;What in your prompt is identical across every call?&lt;/strong&gt; That's segment 1. If the answer is "nothing," your prompt isn't designed for caching yet — find the constants. There almost always are some.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What is per-tenant or per-context but reused within a short window?&lt;/strong&gt; That's segment 2. Common cases: retrieval context, customer-specific style guidelines, account metadata.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What's truly per-call?&lt;/strong&gt; Goes in the user message turn, never in the cached system block.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is your call rate above the break-even threshold?&lt;/strong&gt; If you call the same cached prompt fewer than ~1.25 times per 5-minute window, you'll lose money on caching. For a noisy production system this is rarely the bottleneck, but for a low-volume tool it can be.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The pattern doesn't apply only to Claude. OpenAI's prompt caching follows similar economics with different numbers; Gemini's context caching has a different TTL but the same "what's static, what's dynamic" decomposition. The work of setting up your prompts so the static parts cluster at the front pays off across every model that supports caching, which is increasingly all of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  A single test
&lt;/h2&gt;

&lt;p&gt;If you're considering whether prompt caching applies to your pipeline, the cheapest first measurement is also the most informative one: count how many tokens of your typical request are &lt;em&gt;byte-for-byte identical&lt;/em&gt; to the previous request. Not "semantically the same" — literally identical. If the answer is more than 50%, you're leaving money on the table; ship cache_control on the static prefix and watch the input-cost line item drop on the next billing day.&lt;/p&gt;

&lt;p&gt;If the answer is less than 20%, your prompts are designed for context, not for repetition, and caching probably won't help much without a structural rewrite. Either way, knowing the number is a one-hour exercise that beats arguing about whether caching is worth the complexity.&lt;/p&gt;

&lt;p&gt;The architecture above is what makes &lt;a href="https://theculprit.ai" rel="noopener noreferrer"&gt;Culprit's&lt;/a&gt; flat-rate pricing economically defensible — RCA calls cluster around incidents, the system prompt and retrieval context dominate the input tokens, and the cache hit rate sits comfortably above 90%. Same primitives, different vertical: if you're shipping LLM features into production at any scale where the bill is starting to matter, this is the lowest-effort high-yield refactor you have available.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>performance</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>6 regexes for detecting PII in event payloads</title>
      <dc:creator>Stella Lin</dc:creator>
      <pubDate>Fri, 08 May 2026 11:49:03 +0000</pubDate>
      <link>https://dev.to/stella_lin_82914c71e25769/6-regexes-for-detecting-pii-in-event-payloads-1obe</link>
      <guid>https://dev.to/stella_lin_82914c71e25769/6-regexes-for-detecting-pii-in-event-payloads-1obe</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://theculprit.ai/blog/detecting-pii-in-event-payloads" rel="noopener noreferrer"&gt;theculprit.ai/blog/detecting-pii-in-event-payloads&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;This is a working document, not a survey. The patterns below are the ones we actually run against inbound alert payloads in &lt;a href="https://theculprit.ai" rel="noopener noreferrer"&gt;Culprit&lt;/a&gt;'s tokenizer. They are tuned for one job: catch as much PII in unstructured event text as a regex layer can plausibly catch, while erring toward false positives over false negatives. Where they fail, we say so and describe the fallback.&lt;/p&gt;

&lt;p&gt;If you're building a similar pipeline — observability tool, log sanitizer, ingestion middleware in front of an LLM — you can copy the set as-is, but the more useful thing is to read the failure modes and decide whether they apply to your traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The set
&lt;/h2&gt;

&lt;p&gt;Six patterns. All are stateless, all use the global flag so a single &lt;code&gt;String.prototype.matchAll(regex)&lt;/code&gt; walks the entire payload, all are scoped to word boundaries to avoid eating the surrounding text. The full source is in &lt;code&gt;packages/shared/src/pii-detect.ts&lt;/code&gt;; this is the load-bearing part:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PII_PATTERNS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b[&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z0-9._%+-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+@&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z0-9.-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\.[&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z&lt;/span&gt;&lt;span class="se"&gt;]{2,}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ipv4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(?:(?:&lt;/span&gt;&lt;span class="sr"&gt;25&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;0-5&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;|2&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;0-4&lt;/span&gt;&lt;span class="se"&gt;]\d&lt;/span&gt;&lt;span class="sr"&gt;|&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;01&lt;/span&gt;&lt;span class="se"&gt;]?\d\d?)\.){3}(?:&lt;/span&gt;&lt;span class="sr"&gt;25&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;0-5&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;|2&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;0-4&lt;/span&gt;&lt;span class="se"&gt;]\d&lt;/span&gt;&lt;span class="sr"&gt;|&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;01&lt;/span&gt;&lt;span class="se"&gt;]?\d\d?)\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ipv6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(?:[&lt;/span&gt;&lt;span class="sr"&gt;0-9a-fA-F&lt;/span&gt;&lt;span class="se"&gt;]{1,4}&lt;/span&gt;&lt;span class="sr"&gt;:&lt;/span&gt;&lt;span class="se"&gt;){7}[&lt;/span&gt;&lt;span class="sr"&gt;0-9a-fA-F&lt;/span&gt;&lt;span class="se"&gt;]{1,4}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;phone&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;(?:\+?&lt;/span&gt;&lt;span class="sr"&gt;1&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;-.&lt;/span&gt;&lt;span class="se"&gt;\s]?)?(?:\(?\d{3}\)?[&lt;/span&gt;&lt;span class="sr"&gt;-.&lt;/span&gt;&lt;span class="se"&gt;\s]?)?\d{3}[&lt;/span&gt;&lt;span class="sr"&gt;-.&lt;/span&gt;&lt;span class="se"&gt;\s]?\d{4}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ssn&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b\d{3}&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\d{2}&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\d{4}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;high_entropy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b[&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z0-9+&lt;/span&gt;&lt;span class="se"&gt;/&lt;/span&gt;&lt;span class="sr"&gt;=_-&lt;/span&gt;&lt;span class="se"&gt;]{40,}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What each one catches, what it misses, and what we do about it:&lt;/p&gt;

&lt;h3&gt;
  
  
  01 — &lt;code&gt;email&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Catches:&lt;/strong&gt; the overwhelming majority of email addresses you'll see in practice — &lt;code&gt;paula.holman@acme.com&lt;/code&gt;, &lt;code&gt;user+tag@subdomain.example.co.uk&lt;/code&gt;, &lt;code&gt;a@b.io&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False negatives:&lt;/strong&gt; RFC 5322 is much wider than the regex. Quoted local parts (&lt;code&gt;"weird name"@example.com&lt;/code&gt;), addresses with comments, IDN domains in their unicode form (&lt;code&gt;user@münchen.de&lt;/code&gt;). In several years of looking at production alert payloads we have seen exactly zero of these. They are theoretical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives:&lt;/strong&gt; anything formatted like an email but used as something else — service-account names that happen to look like addresses, fixture data, JIRA mention syntax in some custom apps. These tokenize harmlessly. The downstream consumer sees an opaque token; the engineer can reveal it if they need to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff:&lt;/strong&gt; the bracket-class &lt;code&gt;[._%+-]&lt;/code&gt; does not include all RFC-permitted characters. We've never regretted that.&lt;/p&gt;

&lt;h3&gt;
  
  
  02 — &lt;code&gt;ipv4&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Catches:&lt;/strong&gt; every well-formed IPv4. The octet alternation &lt;code&gt;25[0-5]|2[0-4]\d|[01]?\d\d?&lt;/code&gt; rejects out-of-range numbers like &lt;code&gt;999.1.1.1&lt;/code&gt;, which keeps the false-positive rate low.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False negatives:&lt;/strong&gt; zero. Any string that parses as an IPv4 matches this regex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives:&lt;/strong&gt; version strings (&lt;code&gt;10.4.2.1&lt;/code&gt;), build numbers, dates rendered as &lt;code&gt;2026.05.05&lt;/code&gt; — wait, that one fails the leading-zero rule, never mind. The real false-positive class is software version strings like &lt;code&gt;10.4.2.1&lt;/code&gt;, which the regex cannot distinguish from a private IP. We accept this. A version string tokenized as &lt;code&gt;&amp;lt;TOKEN_…&amp;gt;&lt;/code&gt; in an alert is annoying; an exfiltrated customer IP is a breach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff:&lt;/strong&gt; consider whether you actually want to tokenize private-range IPs (&lt;code&gt;10.0.0.0/8&lt;/code&gt;, &lt;code&gt;192.168.0.0/16&lt;/code&gt;, &lt;code&gt;172.16.0.0/12&lt;/code&gt;). They are usually internal infrastructure, not customer data. We tokenize them anyway because the line between "internal" and "customer" gets blurry when you're hosting webhooks for customer-on-premise systems, and the asymmetry from §01 still applies.&lt;/p&gt;

&lt;h3&gt;
  
  
  03 — &lt;code&gt;ipv6&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Catches:&lt;/strong&gt; fully-expanded IPv6 addresses. &lt;code&gt;2001:0db8:85a3:0000:0000:8a2e:0370:7334&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False negatives:&lt;/strong&gt; every common form of IPv6 you'll actually see. &lt;code&gt;::1&lt;/code&gt;, &lt;code&gt;2001:db8::1&lt;/code&gt;, &lt;code&gt;fe80::1%eth0&lt;/code&gt;, IPv4-mapped IPv6 (&lt;code&gt;::ffff:192.168.1.1&lt;/code&gt;). The &lt;code&gt;::&lt;/code&gt; zero-compression syntax is not handled; neither is the scope identifier; neither is mixed IPv4/IPv6 notation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives:&lt;/strong&gt; rare. The &lt;code&gt;:&lt;/code&gt; separator and the strict colon count make accidental matches uncommon.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff:&lt;/strong&gt; this is the worst pattern in the set, and we have not yet replaced it. The reason is that "ungrep'd IPv6" is itself a small class of leaks compared to email and bearer tokens. When we do replace it, the right move is two patterns — one for full form, one for compressed — both rooted in word boundaries with the high-entropy fallback as a backstop.&lt;/p&gt;

&lt;h3&gt;
  
  
  04 — &lt;code&gt;phone&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(?:\+?1[-.\s]?)?(?:\(?\d{3}\)?[-.\s]?)?\d{3}[-.\s]?\d{4}\b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Catches:&lt;/strong&gt; North American number formats: &lt;code&gt;5551234567&lt;/code&gt;, &lt;code&gt;555-123-4567&lt;/code&gt;, &lt;code&gt;(555) 123-4567&lt;/code&gt;, &lt;code&gt;+1 555 123 4567&lt;/code&gt;, &lt;code&gt;+1.555.123.4567&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False negatives:&lt;/strong&gt; non-NANP numbers (UK, EU, Asia). International formatting beyond &lt;code&gt;+1&lt;/code&gt;. Numbers written without separators that don't begin with the country code (&lt;code&gt;5551234567&lt;/code&gt; matches; &lt;code&gt;442012345678&lt;/code&gt; does not, by design — that pattern catches too many order numbers).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives:&lt;/strong&gt; seven-to-ten-digit numbers that are not phone numbers. Order IDs. Tracking codes. Long invoice numbers. The regex tokenizes all of them. This is the pattern with the highest false-positive rate in the set, and we accept it for the same reason as §02.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff:&lt;/strong&gt; if your traffic is global, swap this for &lt;code&gt;libphonenumber&lt;/code&gt; or a per-country regex set. The performance cost is real (tens of milliseconds per payload) but tractable for a worker that runs after the ingest 200 has already returned.&lt;/p&gt;

&lt;h3&gt;
  
  
  05 — &lt;code&gt;ssn&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\b\d{3}-\d{2}-\d{4}\b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Catches:&lt;/strong&gt; US Social Security Numbers in their canonical hyphenated form.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False negatives:&lt;/strong&gt; SSNs without dashes (&lt;code&gt;123456789&lt;/code&gt;). SSNs with spaces (&lt;code&gt;123 45 6789&lt;/code&gt;). All non-US national-ID formats.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives:&lt;/strong&gt; anything formatted like &lt;code&gt;XXX-XX-XXXX&lt;/code&gt;. Some product SKUs, some legacy account numbers, some date-range strings if your team uses an unusual format. Low rate in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff:&lt;/strong&gt; the regex is conservative on purpose. A payload containing &lt;code&gt;123456789&lt;/code&gt; is more likely to be an order number, an internal ID, or a build artifact than an SSN, and tokenizing every nine-digit run produces a meaningful fraction of false positives in observability traffic. If you specifically need to catch un-hyphenated SSNs (healthcare, payments, government), add a structural rule instead — a &lt;code&gt;ssn&lt;/code&gt; field in your event schema that you tokenize regardless of contents.&lt;/p&gt;

&lt;h3&gt;
  
  
  06 — &lt;code&gt;high_entropy&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\b[A-Za-z0-9+/=_-]{40,}\b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Catches:&lt;/strong&gt; the long, opaque strings you don't have a more specific pattern for. Bearer tokens. JWTs. Most provider API keys (Stripe &lt;code&gt;sk_live_…&lt;/code&gt;, AWS &lt;code&gt;AKIA…&lt;/code&gt; keys when emitted with the secret, GCP service-account JSON values). Session IDs. Most cryptographic hashes used as identifiers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False negatives:&lt;/strong&gt; anything under 40 characters. Some short access keys (a few cloud providers issue 32-char keys; those slip through). UUIDs without dashes (32 chars) — usually not credentials, but worth knowing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives:&lt;/strong&gt; long base64-encoded blobs that are not credentials — encoded protobufs, serialized state strings, image data URIs, signed-but-not-secret payloads. These tokenize as opaque. Engineers occasionally complain that "the tokenizer redacted my decoded protobuf"; we ask them whether the protobuf contained a customer's name and they reread their own alert and stop complaining.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tradeoff:&lt;/strong&gt; the threshold is the entire pattern. We picked 40 by walking back from "what's the shortest credential we want to catch" (≈40-char JWT header) and "what's the longest non-credential we don't want to catch" (32-char dashless UUID, hex SHA-256). There is no universally correct number. Instrument the false-positive rate against your traffic and iterate. If you change the number, change it once and write down why; do not let it drift.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the regex set does not catch
&lt;/h2&gt;

&lt;p&gt;Four categories of PII do not yield to regex, and pretending they do is how detectors gain a reputation for being theater.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Personal names.&lt;/strong&gt; "John Smith" is two common English nouns concatenated. There is no regex that distinguishes "John Smith reported the issue" from "John Smith Auto Parts." The right answer is a structural rule: any field whose schema name is &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;customer_name&lt;/code&gt;, &lt;code&gt;display_name&lt;/code&gt;, &lt;code&gt;full_name&lt;/code&gt;, &lt;code&gt;first_name&lt;/code&gt;, &lt;code&gt;last_name&lt;/code&gt;, &lt;code&gt;account_holder&lt;/code&gt; — tokenize unconditionally, regardless of contents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Addresses.&lt;/strong&gt; Free-text street addresses are unbounded. Same answer: structural rule on field name (&lt;code&gt;address&lt;/code&gt;, &lt;code&gt;street&lt;/code&gt;, &lt;code&gt;mailing_address&lt;/code&gt;, &lt;code&gt;billing_address&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free-text disclosures.&lt;/strong&gt; "The customer mentioned their phone is 5551234567 and their kid's birthday is on the 5th" — the regex set catches the phone number, but the surrounding context is itself revealing. There is no good regex defense. The defense is a structural rule that says any field named &lt;code&gt;notes&lt;/code&gt;, &lt;code&gt;comment&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, &lt;code&gt;support_message&lt;/code&gt;, &lt;code&gt;customer_message&lt;/code&gt;, &lt;code&gt;summary&lt;/code&gt; is tokenized as a whole field rather than scanned for patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Account numbers, license plates, and other domain-specific identifiers.&lt;/strong&gt; These vary too much across industries. If you have them, you know their format; write a domain-specific regex and add it to the set. If you don't know your domain's identifier formats, you have a discovery problem before you have a detection problem.&lt;/p&gt;

&lt;p&gt;The pattern: &lt;strong&gt;regex catches the universal cases; structural rules on field names catch the contextual cases.&lt;/strong&gt; A serious tokenizer does both. A toy tokenizer does only the first and lets the second class slip through. If you only have time to build one layer, build the structural rules — most of the high-value leaks are in &lt;code&gt;notes&lt;/code&gt;-shaped fields, not in stringified payloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  A note on order and replacement
&lt;/h2&gt;

&lt;p&gt;The detection regexes return &lt;code&gt;(type, value, index)&lt;/code&gt; triples. To turn them into a sanitized payload, you have to replace each match with a placeholder without invalidating the indices of subsequent matches.&lt;/p&gt;

&lt;p&gt;The naive approach replaces left-to-right and adjusts every later index by the length delta. This works but is fiddly. The cleaner shape is to sort matches by index and replace right-to-left, which leaves earlier indices untouched. The cleanest shape is to split the original string on the match boundaries and join with placeholders, which sidesteps index arithmetic entirely.&lt;/p&gt;

&lt;p&gt;Whichever you pick, the relevant invariant is: &lt;strong&gt;deduplicate matches that overlap.&lt;/strong&gt; The high-entropy pattern can match a substring that also matches the email pattern (a long enough local-part). Pick one — the more specific pattern wins, every time — and discard the other before replacement.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shipping bar
&lt;/h2&gt;

&lt;p&gt;If you're building this for production, the bar to clear before you trust the detector with real traffic:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A hand-labeled sample of 200+ alerts from your actual pipeline&lt;/strong&gt;, not synthetic data. Run the detector against it, count false negatives by category. If any category exceeds 5%, fix the regex or add a structural rule before shipping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A way to measure false-positive rate continuously in production.&lt;/strong&gt; A weekly report of "tokens issued per category, normalized to traffic volume" — sudden spikes mean the regex started matching something it didn't before.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A reveal flow&lt;/strong&gt; for the engineer who needs to see the original. Without a fast reveal flow, the false-positive rate stops being free — every false positive becomes a pager-duty for some on-call engineer who needs to know what &lt;code&gt;&amp;lt;TOKEN_a1b2c3&amp;gt;&lt;/code&gt; actually was.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first regex you ship will not be the last one. Plan the iteration loop in.&lt;/p&gt;




&lt;p&gt;The pattern set above is the production set as of the date on this post. If you want to read the full module — including the detector function, the sort-by-index ordering, and the rationale comments — it's at &lt;code&gt;packages/shared/src/pii-detect.ts&lt;/code&gt; in the &lt;a href="https://theculprit.ai" rel="noopener noreferrer"&gt;Culprit&lt;/a&gt; repo. The companion piece, on the rest of the pipeline (encrypt-at-ingest, per-tenant token dictionary, audited reveal), is &lt;a href="https://theculprit.ai/blog/keep-pii-out-of-alert-pipeline" rel="noopener noreferrer"&gt;How to keep PII out of your alert pipeline&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>privacy</category>
      <category>programming</category>
      <category>security</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to keep PII out of your alert pipeline</title>
      <dc:creator>Stella Lin</dc:creator>
      <pubDate>Fri, 08 May 2026 11:47:34 +0000</pubDate>
      <link>https://dev.to/stella_lin_82914c71e25769/how-to-keep-pii-out-of-your-alert-pipeline-2cod</link>
      <guid>https://dev.to/stella_lin_82914c71e25769/how-to-keep-pii-out-of-your-alert-pipeline-2cod</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://theculprit.ai/blog/keep-pii-out-of-alert-pipeline" rel="noopener noreferrer"&gt;theculprit.ai/blog/keep-pii-out-of-alert-pipeline&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Every alert that crosses your wire has a PII problem you have not yet acknowledged.&lt;/p&gt;

&lt;p&gt;This is not a sermon. It is a thing you can verify in about ten minutes. Open the last hour of whatever your team uses for paging — Slack, email, a webhook fan-out, a chat-ops channel — and grep for &lt;code&gt;@&lt;/code&gt;. You will find customer email addresses in stack traces. You will find them in user-supplied form data echoed back into the error message. You will find them in the body of "user X did Y and it failed" notifications that someone wrote in 2022 and nobody has touched since. Now grep for &lt;code&gt;192.&lt;/code&gt;, &lt;code&gt;10.&lt;/code&gt;, &lt;code&gt;172.16.&lt;/code&gt; — there are your customer IPs, surfaced into a chat tool whose retention you do not control. Now grep for &lt;code&gt;Bearer&lt;/code&gt; — those are the API tokens your authentication middleware accidentally included in the panic dump.&lt;/p&gt;

&lt;p&gt;The reason this happens is not negligence. The reason it happens is that the alert pipeline is the one place in your stack where the rules of "what data is allowed to leave the boundary" were never written down. The application layer has an ORM that knows which fields are PII. The data warehouse has a column-level access policy. The customer-facing API has a serializer with explicit field allowlists. The alert pipeline has a &lt;code&gt;console.error(err)&lt;/code&gt; and the assumption that whoever reads the alert is sufficiently trustworthy. That assumption stops being true the moment the alert routes to a third-party LLM, a vendor support portal, or a Slack workspace whose member list has drifted.&lt;/p&gt;

&lt;p&gt;This piece is about what to actually do about that.&lt;/p&gt;

&lt;h2&gt;
  
  
  01 — Why the obvious fixes don't survive contact with reality
&lt;/h2&gt;

&lt;p&gt;There are three obvious fixes. All three fail in interesting ways. Walking through why is most of the work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix one: strip PII at the application layer before the alert is emitted.&lt;/strong&gt; The pitch is clean — every &lt;code&gt;logger.error()&lt;/code&gt; call site gets wrapped in a sanitizer that knows the domain types and redacts accordingly. In practice this fails along two axes. The first is enforcement: you cannot reliably grep your codebase for "every place a user-supplied string ends up in an error path." Stack traces capture local variables in some runtimes. Third-party libraries throw errors whose messages include the offending input verbatim. A request-validation middleware throws &lt;code&gt;Invalid email: customer@example.com&lt;/code&gt; and there is no place in your application code where you "decided" to log that. The second is drift: even if you ship a sanitizer today, the next engineer adds a new field, a new exception type, a new integration that emits its own errors, and the sanitizer's allowlist quietly stops keeping up. The fix degrades to a security checklist item that nobody owns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix two: drop alerts that contain PII.&lt;/strong&gt; This sounds principled until you remember why you have an alert pipeline. The whole point is that something is broken, and the alert is your evidence. If your detector flags an alert as containing PII and your response is to discard it, you have built a system that hides the bugs that involve customer data — which is approximately every interesting bug. The detector also has false positives. Drop on false positive and you have built a system that drops random alerts at random rates. This fix tends to get rolled out enthusiastically, generate one P0 about a missed page, and get rolled back within a week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix three: redact in the LLM prompt.&lt;/strong&gt; This one is recent. The shape is "we use an AI tool to triage incidents; we'll add a redaction layer in front of the prompt." It fails because by the time the data reaches that redaction layer, it has already been written to your alert store, your queue, your log aggregator, your chat tool, and probably your email. The LLM prompt is not the boundary. The ingest path is the boundary. Redacting at the prompt is solving a symptom three hops downstream of the cause.&lt;/p&gt;

&lt;p&gt;The pattern in all three is the same: each fix tries to add a filter at one specific point in the pipeline, while the real problem is that the pipeline has many exit points and the data is in plaintext at all of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  02 — The shape of a fix that does work
&lt;/h2&gt;

&lt;p&gt;The architecture worth building has four properties. None of them are novel. Most of them appear in compliance-driven sectors — healthcare, payments — where the obvious fixes have been tried and discarded for thirty years. They are unfamiliar to the observability stack only because observability has historically been treated as an internal tool, not as a data plane that crosses trust boundaries.&lt;/p&gt;

&lt;p&gt;The properties:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The boundary is at ingest, not at presentation.&lt;/strong&gt; The first thing that happens to an inbound alert is encryption-and-vault. Plaintext does not survive the receiving handler.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PII is replaced with reversible placeholders before any downstream consumer sees the payload.&lt;/strong&gt; Correlation, LLM analysis, notification, log aggregation — all of these operate on a sanitized event whose fields read like &lt;code&gt;&amp;lt;TOKEN_a1b2c3&amp;gt;&lt;/code&gt; rather than &lt;code&gt;paula.holman@acme.com&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The placeholder ↔ value map is per-tenant and encrypted with a tenant-scoped key.&lt;/strong&gt; A leak of one tenant's vault cannot unlock another tenant's tokens. A leak of the application database without the per-tenant key cannot unlock anything.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reveal is a single audited route.&lt;/strong&gt; When an authorized user needs to see the original value — debugging a customer-reported incident, responding to a subpoena — they do so through one endpoint that checks tenant scope on every call and writes to an append-only audit log.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The composite property: there is no path through the system that produces plaintext PII as a byproduct. Producing plaintext requires a deliberate, authenticated, audited request. That is the bar.&lt;/p&gt;

&lt;p&gt;Below is a sketch of the data flow. It is deliberately minimal — every edge here is load-bearing, and every node corresponds to a thing you have to either build or buy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftheculprit.ai%2Fblog%2Fpii-pipeline.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftheculprit.ai%2Fblog%2Fpii-pipeline.svg" alt="Data flow diagram: HMAC-signed webhook hits the edge ingest worker, which encrypts the payload into the vault and enqueues it. The tokenizer worker swaps PII for placeholders and fans the sanitized event out to correlation, LLM RCA, and notification — all token-only. Below a dashed trust boundary, the audited reveal route is the single authenticated path back to plaintext."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two observations about this sketch. First, every arrow that leaves the tokenizer carries &lt;code&gt;&amp;lt;TOKEN_…&amp;gt;&lt;/code&gt; placeholders, including the one to the LLM. Second, the &lt;code&gt;REVEAL ROUTE&lt;/code&gt; is the only edge in the entire diagram where plaintext crosses a process boundary, and that crossing is audited.&lt;/p&gt;

&lt;h2&gt;
  
  
  03 — The four hard parts
&lt;/h2&gt;

&lt;p&gt;The architecture above is easy to describe and tedious to build. Each of the four properties from §02 has a "how do we actually" attached to it that takes a week or two to get right.&lt;/p&gt;

&lt;h3&gt;
  
  
  03.1 — Encrypt before vault, with no plaintext window
&lt;/h3&gt;

&lt;p&gt;The naive ingest handler looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;raw_alerts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;alert_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is wrong in a way that is not visible from the code. The &lt;code&gt;raw_alerts&lt;/code&gt; table holds plaintext. Every backup of that table holds plaintext. Every read replica holds plaintext. Anyone with read access to the application database — your DBA, your ops engineer with break-glass credentials, the support engineer who joined last month and got read-only access by default — has plaintext access to every alert your customers have ever sent. The encryption needs to be inside the same statement as the insert, with the key sourced from a place that the application database does not itself hold.&lt;/p&gt;

&lt;p&gt;The fixed shape uses Postgres' &lt;code&gt;pgcrypto&lt;/code&gt; extension and a server-held symmetric key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rawBody&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;vault_alert&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;p_payload_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;p_signature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-aiops-signature&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rejected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="cm"&gt;/* opaque pointer, no body */&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;vault_alert&lt;/code&gt; RPC, defined once in a migration, performs the HMAC verification and the &lt;code&gt;pgp_sym_encrypt(p_payload_text, current_setting('app.vault_key'))&lt;/code&gt; insert atomically. The application code never holds the key. The database never holds plaintext. The queue carries an opaque identifier — if the queue gets backed up, leaks, or is replayed, no PII is exposed.&lt;/p&gt;

&lt;p&gt;The instinct to skip this step and "just trust the database" is strong. Resist it. The threat model is not "an attacker has root on Postgres." The threat model is "a backup ends up in S3 with the wrong ACL." Encryption-at-rest provided by the platform does not protect you from that. Per-row encryption with an application-held key does.&lt;/p&gt;

&lt;h3&gt;
  
  
  03.2 — A per-tenant token dictionary that does not become a privacy hole itself
&lt;/h3&gt;

&lt;p&gt;The tokenizer's job is to walk a payload, find every PII match, and emit a sanitized version where each match is replaced with a placeholder. The natural shape is a dictionary table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;token_dictionary&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;tenant_id&lt;/span&gt;    &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;placeholder&lt;/span&gt;  &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- e.g. "TOKEN_a1b2c3"&lt;/span&gt;
  &lt;span class="n"&gt;encrypted_value&lt;/span&gt; &lt;span class="n"&gt;bytea&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;pii_type&lt;/span&gt;     &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- "email" | "ipv4" | ...&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;   &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are three traps here. The first is using a global placeholder space — if &lt;code&gt;TOKEN_a1b2c3&lt;/code&gt; means &lt;code&gt;customer@acme.com&lt;/code&gt; for tenant A and &lt;code&gt;192.168.1.1&lt;/code&gt; for tenant B, you have inadvertently built a side channel where tenant B's reveal endpoint can confirm whether a given placeholder was issued in tenant A. Always scope the lookup by &lt;code&gt;(tenant_id, placeholder)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The second is encrypting the value with a single global key. The dictionary is, by construction, a high-value target — it is the table that, if leaked, undoes all the work above. The encryption key for &lt;code&gt;encrypted_value&lt;/code&gt; should be derived per-tenant, ideally from a master key combined with &lt;code&gt;tenant_id&lt;/code&gt;. A leak of the dictionary alone yields ciphertext you cannot decrypt without the master key. A leak of the master key alone yields nothing without the dictionary. You have to lose both.&lt;/p&gt;

&lt;p&gt;The third is determinism. If you regenerate placeholders on every ingest, the same email address shows up under five different tokens across five alerts and your correlation engine can no longer tell that "the same customer keeps tripping the same bug." The fix: hash &lt;code&gt;(tenant_id, normalized_value)&lt;/code&gt; and use the hash as the placeholder identifier. Same value within a tenant → same placeholder, every time. Different value or different tenant → different placeholder.&lt;/p&gt;

&lt;h3&gt;
  
  
  03.3 — Match gracefully: false positives are noise, false negatives are leaks
&lt;/h3&gt;

&lt;p&gt;The detection layer is a regex set. Six patterns will catch most of what a typical alert payload contains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PII_PATTERNS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b[&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z0-9._%+-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+@&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z0-9.-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\.[&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z&lt;/span&gt;&lt;span class="se"&gt;]{2,}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ipv4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(?:(?:&lt;/span&gt;&lt;span class="sr"&gt;25&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;0-5&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;|2&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;0-4&lt;/span&gt;&lt;span class="se"&gt;]\d&lt;/span&gt;&lt;span class="sr"&gt;|&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;01&lt;/span&gt;&lt;span class="se"&gt;]?\d\d?)\.){3}(?:&lt;/span&gt;&lt;span class="sr"&gt;25&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;0-5&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;|2&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;0-4&lt;/span&gt;&lt;span class="se"&gt;]\d&lt;/span&gt;&lt;span class="sr"&gt;|&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;01&lt;/span&gt;&lt;span class="se"&gt;]?\d\d?)\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ipv6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b(?:[&lt;/span&gt;&lt;span class="sr"&gt;0-9a-fA-F&lt;/span&gt;&lt;span class="se"&gt;]{1,4}&lt;/span&gt;&lt;span class="sr"&gt;:&lt;/span&gt;&lt;span class="se"&gt;){7}[&lt;/span&gt;&lt;span class="sr"&gt;0-9a-fA-F&lt;/span&gt;&lt;span class="se"&gt;]{1,4}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;phone&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;(?:\+?&lt;/span&gt;&lt;span class="sr"&gt;1&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;-.&lt;/span&gt;&lt;span class="se"&gt;\s]?)?(?:\(?\d{3}\)?[&lt;/span&gt;&lt;span class="sr"&gt;-.&lt;/span&gt;&lt;span class="se"&gt;\s]?)?\d{3}[&lt;/span&gt;&lt;span class="sr"&gt;-.&lt;/span&gt;&lt;span class="se"&gt;\s]?\d{4}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ssn&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b\d{3}&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\d{2}&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\d{4}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;high_entropy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\b[&lt;/span&gt;&lt;span class="sr"&gt;A-Za-z0-9+&lt;/span&gt;&lt;span class="se"&gt;/&lt;/span&gt;&lt;span class="sr"&gt;=_-&lt;/span&gt;&lt;span class="se"&gt;]{40,}\b&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two notes on this set. The phone pattern is intentionally loose — it will match some things that look phone-shaped but are not (long order numbers, tracking IDs). That is the right tradeoff. A false positive becomes an opaque token in the alert; the engineer can still triage the alert and, if needed, reveal the original value. A false negative becomes a customer phone number sitting in your Slack history forever. The asymmetry is real. Tune toward false positives.&lt;/p&gt;

&lt;p&gt;The high-entropy bucket exists because the most consequential leaks are not patterns we know about — they are bearer tokens, session IDs, and API keys whose format depends on whoever issued them. A 40+ character base64-ish blob is not always a credential, but it is almost never something you want surfaced to a third party. The threshold of 40 is tuned to skip UUID-without-dashes (32 chars) and short build SHAs while still catching JWTs and most provider tokens. Lower the threshold and you'll start catching legitimate identifiers; raise it and you'll start missing short access keys. There is no universally correct value. Pick one, instrument the false-positive rate against your traffic, and iterate.&lt;/p&gt;

&lt;p&gt;What this set does not catch: free-text personal names ("John Smith"), structured-but-non-regex identifiers (most account numbers, license plates), and natural-language disclosures ("the customer mentioned their address is …"). For these you need either an ML-based classifier in front of the regex set or a structural rule that says "fields named &lt;code&gt;customer_name&lt;/code&gt;, &lt;code&gt;address&lt;/code&gt;, &lt;code&gt;notes&lt;/code&gt; always tokenize regardless of contents." The structural rule is cheaper, more predictable, and much easier to audit.&lt;/p&gt;

&lt;h3&gt;
  
  
  03.4 — Reveal as a single audited route
&lt;/h3&gt;

&lt;p&gt;Engineers will, eventually, need to see the original value. That is fine. The mistake is to scatter "decrypt this token" calls throughout the application. The right shape is a single route:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /api/incidents/:id/reveal
  body: { placeholders: ["TOKEN_a1b2c3", "TOKEN_b2c3d4"] }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The route checks: (a) the requesting user is authenticated; (b) the user's tenant matches the incident's tenant; (c) the user has the appropriate role for reveal (operator, admin); (d) the incident is one the user is permitted to view at all. Then, and only then, it pulls the encrypted values from the per-tenant dictionary, decrypts them server-side, returns plaintext over HTTPS, and writes a row to the audit log: who, when, which placeholders, which incident.&lt;/p&gt;

&lt;p&gt;The audit log is not optional. It is the thing that turns "I trust my own employees" from an aspiration into something you can demonstrate to an auditor. It is also what lets you answer the question "did anyone access this customer's data between 2026-04-12 and 2026-04-19?" without combing through application logs. Build it as part of the reveal route from day one; retrofitting it later is annoying.&lt;/p&gt;

&lt;p&gt;One subtle point: the reveal route should not return the entire decrypted payload by default. The caller has to name the placeholders they want. This sounds like friction, but it is what makes the audit log useful — instead of "user accessed incident 1234" (which tells you nothing about what was exposed), you get "user accessed &lt;code&gt;TOKEN_a1b2c3&lt;/code&gt; (email) and &lt;code&gt;TOKEN_b2c3d4&lt;/code&gt; (ipv4) on incident 1234" (which tells you exactly what data crossed the boundary). The friction is the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  04 — What you give up
&lt;/h2&gt;

&lt;p&gt;This is honest-tradeoffs time. Edge tokenization is not free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency.&lt;/strong&gt; The ingest handler now does an HMAC verify, an encrypted insert, and a queue publish before returning 200. On a warm Worker with the database in the same region, this is roughly 30–80ms more than a no-op write. For most alert pipelines, this is invisible — the sender does not care whether the 200 takes 5ms or 80ms. For pipelines where the sender is a tight retry loop with a low timeout, you may need to instrument and tune.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost.&lt;/strong&gt; You are paying for: a worker invocation per alert, a queue message per alert, an encrypted column write per alert, a tokenizer invocation per alert, a sanitized-event write per alert. None of these are individually expensive. At a million alerts a day, the storage and compute add up to dollars, not hundreds of dollars. The expensive thing in this architecture is the LLM call for root-cause analysis, and that is bounded separately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational complexity.&lt;/strong&gt; You now have a key-rotation procedure. You have a backup-and-restore procedure that has to handle the dictionary as a separate concern. You have a "what do we do if the per-tenant key is lost" runbook (answer: you cannot recover the data, which is the point). You have an audit log that needs its own retention policy. None of these are crushing — they are all things compliance-driven sectors have been doing for decades — but they are real engineering work.&lt;/p&gt;

&lt;h2&gt;
  
  
  05 — What you get back
&lt;/h2&gt;

&lt;p&gt;The thing you get back is not "compliance." Compliance is a side effect. The thing you get back is that the question "what would happen if our chat tool's history were leaked" stops requiring a long answer. You can route alerts to a third-party LLM without first negotiating a BAA, because there is no PHI in the prompt. You can grant a new vendor — error tracking, on-call, an analytics dashboard — read access to your alert stream without staging a security review for each one, because the alert stream does not contain customer data. You can ship faster.&lt;/p&gt;

&lt;p&gt;The compliance posture follows. SOC 2 Type II's common criteria around confidentiality become almost mechanically satisfiable: encryption at rest (vault), in transit (HTTPS only), and a documented incident-response procedure with a 72-hour notification window. HIPAA's technical safeguards — access control, audit controls, integrity, person/entity authentication, transmission security — map cleanly onto the routes and tables described above. None of this means you are certified. It means that when you decide to pursue certification, the architecture work is already done and the auditor's questions become procedural rather than structural.&lt;/p&gt;

&lt;p&gt;The other thing you get back is unchecked-code-review time for the rest of your team. Every PR no longer has to be scrutinized for "did this introduce a path that logs customer data." That path doesn't exist. There is one place in the codebase where plaintext appears, and it is a route handler with seventeen lines of authorization logic in front of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  06 — Where to start, if you want to start
&lt;/h2&gt;

&lt;p&gt;If you're convinced and looking at your own pipeline, the order matters. Do not start with the tokenizer; start with the vault.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pick the first byte of plaintext PII to remove from a downstream consumer. A reasonable first target is your incident-management tool, since it is usually the most-shared and the least-controlled. The goal of week one is "no plaintext customer email addresses in any incident-management notification we send."&lt;/li&gt;
&lt;li&gt;Build the encryption-at-ingest path. Move every alert-emitting service to write to the encrypted vault first; let the existing pipeline keep running off the encrypted blob, decrypted in the consumer. This is the dangerous step — get this wrong and you lose alerts. Run it in shadow mode for a week.&lt;/li&gt;
&lt;li&gt;Build the tokenizer. Start with the four highest-confidence patterns (email, IPv4, IPv6, SSN). Run it against historical traffic; measure your false-positive and false-negative rates against a hand-labeled sample of 200 alerts. Iterate the regex set until you can defend the numbers.&lt;/li&gt;
&lt;li&gt;Cut over the downstream consumers to the sanitized events. Keep the vault as a fallback for "we lost an alert" debugging.&lt;/li&gt;
&lt;li&gt;Build the reveal route and the audit log. Migrate any "I'll just SSH in and SELECT it" workflows to use the route. Turn on RLS and the lint rules that prevent the back door from being reopened.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can be at step 5 in six to eight weeks of focused work. The thing that gates the timeline is not the engineering — it is figuring out which integrations in your existing pipeline are actually emitting PII, and what the structural rules need to be to stop them. That part is investigative.&lt;/p&gt;

&lt;p&gt;If you'd rather not build the pipeline yourself, that is exactly what &lt;a href="https://theculprit.ai" rel="noopener noreferrer"&gt;Culprit&lt;/a&gt; is. The architecture above is a description of what we ship. The piece is here because we want the architecture to be the default approach to alerting, not a thing you have to discover by getting bitten.&lt;/p&gt;

</description>
      <category>pii</category>
      <category>observability</category>
      <category>tokenization</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
