<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Drew Kitchell</title>
    <description>The latest articles on DEV Community by Drew Kitchell (@dk970).</description>
    <link>https://dev.to/dk970</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866546%2F98c059e0-4a6e-4360-9a55-7995a47ca0ff.jpg</url>
      <title>DEV Community: Drew Kitchell</title>
      <link>https://dev.to/dk970</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dk970"/>
    <language>en</language>
    <item>
      <title>We kept seeing leaked PII into test data. Here’s what actually fixed it.</title>
      <dc:creator>Drew Kitchell</dc:creator>
      <pubDate>Tue, 07 Apr 2026 21:10:31 +0000</pubDate>
      <link>https://dev.to/dk970/we-kept-leaking-pii-into-test-data-heres-what-actually-fixed-it-51gm</link>
      <guid>https://dev.to/dk970/we-kept-leaking-pii-into-test-data-heres-what-actually-fixed-it-51gm</guid>
      <description>&lt;p&gt;We accidentally committed real user emails into test fixtures.&lt;/p&gt;

&lt;p&gt;More than once.&lt;/p&gt;

&lt;p&gt;Not because we didn’t know better—but because the system allowed it.&lt;/p&gt;

&lt;p&gt;Why this keeps happening&lt;/p&gt;

&lt;p&gt;If you’re working with real data pipelines, this is pretty easy to fall into:&lt;/p&gt;

&lt;p&gt;someone copies production data “just for testing”&lt;br&gt;
CSV fixtures get reused across environments&lt;br&gt;
test data slowly drifts toward real data over time&lt;br&gt;
everyone assumes someone else cleaned it&lt;/p&gt;

&lt;p&gt;Nothing malicious—just normal workflow shortcuts.&lt;/p&gt;

&lt;p&gt;What didn’t work&lt;/p&gt;

&lt;p&gt;We tried the obvious things:&lt;/p&gt;

&lt;p&gt;manual review&lt;br&gt;
“be careful” reminders&lt;br&gt;
catching it in PR comments&lt;/p&gt;

&lt;p&gt;None of that held up.&lt;/p&gt;

&lt;p&gt;If it makes it into a PR, it’s already too late.&lt;/p&gt;

&lt;p&gt;What actually worked&lt;/p&gt;

&lt;p&gt;We stopped treating this as a review problem and started treating it as a build-time failure.&lt;/p&gt;

&lt;p&gt;Specifically:&lt;/p&gt;

&lt;p&gt;scan for high-risk patterns (emails, tokens, etc.)&lt;br&gt;
fail CI on detection&lt;br&gt;
require explicit override if someone really needs to push&lt;/p&gt;

&lt;p&gt;Once it breaks the build, people fix it immediately.&lt;/p&gt;

&lt;p&gt;The bigger issue&lt;/p&gt;

&lt;p&gt;The deeper problem isn’t just PII.&lt;/p&gt;

&lt;p&gt;It’s that most systems don’t have a way to enforce or prove what data is flowing through them.&lt;/p&gt;

&lt;p&gt;This shows up in a lot of places:&lt;/p&gt;

&lt;p&gt;test data&lt;br&gt;
training data&lt;br&gt;
AI inputs/outputs&lt;br&gt;
downstream systems&lt;/p&gt;

&lt;p&gt;PII leakage is just one visible symptom.&lt;/p&gt;

&lt;p&gt;What we ended up doing&lt;/p&gt;

&lt;p&gt;We built a small local CLI to enforce this:&lt;/p&gt;

&lt;p&gt;deterministic pattern matching&lt;br&gt;
no network calls&lt;br&gt;
exits non-zero on high-risk findings&lt;/p&gt;

&lt;p&gt;Runs locally and in CI so nothing slips through.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/certifieddata/pii-scan.git" rel="noopener noreferrer"&gt;https://github.com/certifieddata/pii-scan.git&lt;/a&gt; &lt;/p&gt;

</description>
      <category>ai</category>
      <category>privacy</category>
    </item>
  </channel>
</rss>
