<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Felipe Carvajal Brown</title>
    <description>The latest articles on DEV Community by Felipe Carvajal Brown (@fcarvajalbrown).</description>
    <link>https://dev.to/fcarvajalbrown</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3813778%2Fbe74c6e6-9d36-4bac-b311-1b61a0b3cfba.jpeg</url>
      <title>DEV Community: Felipe Carvajal Brown</title>
      <link>https://dev.to/fcarvajalbrown</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fcarvajalbrown"/>
    <language>en</language>
    <item>
      <title>PII masking in Polars: MaskOps 2.0, and two metrics that lied to me</title>
      <dc:creator>Felipe Carvajal Brown</dc:creator>
      <pubDate>Thu, 11 Jun 2026 04:27:39 +0000</pubDate>
      <link>https://dev.to/fcarvajalbrown/pii-masking-in-polars-maskops-20-and-two-metrics-that-lied-to-me-2bh3</link>
      <guid>https://dev.to/fcarvajalbrown/pii-masking-in-polars-maskops-20-and-two-metrics-that-lied-to-me-2bh3</guid>
      <description>&lt;p&gt;MaskOps 2.0 shipped this week. Before I told anyone, I looked at my own numbers. Two of them were lying to me, in opposite directions.&lt;/p&gt;

&lt;p&gt;MaskOps is a Rust plugin for &lt;a href="https://pola.rs" rel="noopener noreferrer"&gt;Polars&lt;/a&gt; that does PII masking inside the dataframe: RUT, CPF, credit cards, IBANs, and twenty-odd more families, air-gapped, with no network call, ever. If you have reached for Microsoft Presidio and found it carries no Latin American identifiers, that is the gap MaskOps fills. It does check-digit-validated RUT, CPF, and CURP detection alongside the EU, US, and APAC families, as a native Polars expression. Version 2.0 is the enterprise line: configurable patterns, structured extraction, an audit pass that counts what it masked, and format-preserving encryption (GDPR Art. 4(5) pseudonymization) for the reversible cases. That part I was sure of. The numbers around it, less so.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first number lied against me: the benchmark
&lt;/h2&gt;

&lt;p&gt;The last thing I checked was the benchmark table in my own README. It said MaskOps ran at 0.4× to 0.7× the speed of plain Python &lt;code&gt;re&lt;/code&gt;. Slower than the language I wrote it to replace.&lt;/p&gt;

&lt;p&gt;I almost opened the profiler. Instead I read the benchmark harness. I should have read it first.&lt;/p&gt;

&lt;p&gt;Here is what it did. For every family, "Credit Card", "EU", "LatAm", it ran the full masker. All thirty-five pattern families at once. Then it compared the time against a Python baseline that ran one regex for that family.&lt;/p&gt;

&lt;p&gt;So the "Credit Card" row timed MaskOps scanning for cards, phones, IBANs, Korean RRNs, and thirty others, against Python scanning for cards. The proof sat in the table the whole time: every MaskOps row took the same 2.3 seconds regardless of family, because it always did all the work. Only the Python column moved.&lt;/p&gt;

&lt;p&gt;I was timing my engine doing thirty-five times the work and calling it slow.&lt;/p&gt;

&lt;p&gt;Two fixes. The first was the benchmark, not the code: compare like-for-like. When the row says "Credit Card", mask credit cards, the same job the baseline does. MaskOps already supports selection, so it was one argument: &lt;code&gt;mask_pii("text", patterns=["credit_card"])&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The second was real. Most rows in real data contain no PII. Every pattern MaskOps detects needs a digit, or an &lt;code&gt;@&lt;/code&gt;. A row with neither cannot match anything. So before any regex, walk the bytes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;has_pii_candidate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="nf"&gt;.bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.any&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="nf"&gt;.is_ascii_digit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;||&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sc"&gt;b'@'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If false, return the string untouched. On clean text this skips all thirty-five scans for the price of one pass over the bytes. Output does not change. The same 394 tests pass.&lt;/p&gt;

&lt;p&gt;PII masking in Polars, measured fairly. One million rows, median of three, against a pure-Python &lt;code&gt;re&lt;/code&gt; baseline with matching coverage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data profile&lt;/th&gt;
&lt;th&gt;Speedup vs Python&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;clean (no PII)&lt;/td&gt;
&lt;td&gt;11×–163×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed (50% PII)&lt;/td&gt;
&lt;td&gt;1.2×–3.2×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dense (every row)&lt;/td&gt;
&lt;td&gt;1.3×–2.7×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One family still loses on dense data. The European ID set runs four separate regex passes, and a single combined Python regex edges it out, 0.9×. I left that in the README. A table with no losses is a table someone tuned until it lied.&lt;/p&gt;

&lt;h2&gt;
  
  
  The second number lied for me: the downloads
&lt;/h2&gt;

&lt;p&gt;The other number was downloads. I shipped the 1.7 through 2.0 releases in one short burst, and the PyPI counter jumped from about ten a day to three and a half thousand on release day. A hundredfold, overnight.&lt;/p&gt;

&lt;p&gt;It would be easy to write "downloads are exploding." It would also be false.&lt;/p&gt;

&lt;p&gt;That spike sits exactly on the days I pushed releases. It is CI building wheels across the OS and Python matrix, mirrors syncing, bots crawling each new version. PyPI counts all of it. None of it is a person deciding to use the thing. Strip the release days and the real line is flat and small. Single digits, which is the honest state of a young project.&lt;/p&gt;

&lt;p&gt;So I am not going to tell you adoption is taking off. The download number is real and it is mostly noise, and pretending otherwise insults anyone who can open the same pypistats page I did.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I take from this
&lt;/h2&gt;

&lt;p&gt;Two instruments. One read low because it measured the wrong thing. One read high because it counted the wrong things. A metric is not a verdict. It is a measurement, and a measurement can be miscalibrated in your favor or against it, and you owe it to yourself to know which.&lt;/p&gt;

&lt;p&gt;Read the harness before the flame graph. Strip the release days before you celebrate the downloads. Then trust what is left.&lt;/p&gt;

&lt;p&gt;MaskOps is open source, MPL-2.0, on &lt;a href="https://pypi.org/project/maskops/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;. It does PII masking inside Polars, air-gapped, with check-digit validation so a random nine-digit number is not mistaken for an ID. It does not do named-entity recognition. The &lt;a href="https://github.com/fcarvajalbrown/MaskOps" rel="noopener noreferrer"&gt;source and the benchmark code&lt;/a&gt; are on GitHub. Run it. If your machine disagrees with mine, I want to know.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>python</category>
      <category>polars</category>
      <category>privacy</category>
    </item>
    <item>
      <title>I dropped this for three months. Here's what I added when I came back.</title>
      <dc:creator>Felipe Carvajal Brown</dc:creator>
      <pubDate>Mon, 08 Jun 2026 16:28:44 +0000</pubDate>
      <link>https://dev.to/fcarvajalbrown/i-dropped-this-for-three-months-heres-what-i-added-when-i-came-back-285b</link>
      <guid>https://dev.to/fcarvajalbrown/i-dropped-this-for-three-months-heres-what-i-added-when-i-came-back-285b</guid>
      <description>&lt;h1&gt;
  
  
  I dropped this for three months. Here's what I added when I came back.
&lt;/h1&gt;

&lt;p&gt;I started MaskOps in March. It masks PII in Polars DataFrames using Rust — no Python per row, no NLP models, just regex running on Arrow buffers.&lt;/p&gt;

&lt;p&gt;Then I got hired. Cencosud S.A. The project sat untouched until last week.&lt;/p&gt;

&lt;p&gt;Coming back to it, I had a backlog. I shipped the one I kept thinking about at work: &lt;code&gt;mask_pii_audit&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with masking alone
&lt;/h2&gt;

&lt;p&gt;Masking answers "is this field safe to store?" It doesn't answer "what kind of PII just came through, and how much of it?"&lt;/p&gt;

&lt;p&gt;Compliance teams need both. They need the masked value and a count of what was found — by family — without running the column twice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What mask_pii_audit does
&lt;/h2&gt;

&lt;p&gt;It returns a nested Struct: the masked text, plus a count for each of the 33 PII families.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;polars&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;maskops&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Call me at 555-123-4567. SSN: 123-45-6789.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IBAN: DE89370400440532013000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Nothing here.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]})&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mask_pii_audit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;masked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;counts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌───────────────────────────────┬────────────────────────────┐
│ masked                        ┆ counts                     │
╞═══════════════════════════════╪════════════════════════════╡
│ Call me at ***-***-****. SSN… ┆ {"phone": 1, "ssn": 1, …} │
│ IBAN: DE89******************  ┆ {"iban": 1, …}             │
│ Nothing here.                 ┆ {"phone": 0, "ssn": 0, …} │
└───────────────────────────────┴────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same masked output as &lt;code&gt;mask_pii&lt;/code&gt;. Zero fields mean no match.&lt;/p&gt;

&lt;h2&gt;
  
  
  One pass
&lt;/h2&gt;

&lt;p&gt;The counting happens inside the existing &lt;code&gt;replace_all&lt;/code&gt; call. A &lt;code&gt;Cell&amp;lt;u32&amp;gt;&lt;/code&gt; in the closure increments on each validated match. No second scan, no cloned strings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;replace_counted&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Regex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;render&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;Fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Captures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Cell&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0u32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="nf"&gt;.replace_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;caps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Captures&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;masked&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="nf"&gt;.set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;masked&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nb"&gt;None&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;caps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="nf"&gt;.to_string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="nf"&gt;.into_owned&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  A daily audit pattern
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mask_pii_audit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;free_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;counts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssn_total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;counts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;credit_card&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cc_total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;counts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iban&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iban_total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run this at ingest. Log the totals. Alert if a family appears that shouldn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where it stands
&lt;/h2&gt;

&lt;p&gt;v1.6.0. 33 PII families: EU IDs, US healthcare, LATAM nationals, APAC. Asterisk masking and FF3-1 format-preserving encryption. Polars lazy and streaming supported.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;maskops
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source: &lt;a href="https://github.com/fcarvajalbrown/MaskOps" rel="noopener noreferrer"&gt;github.com/fcarvajalbrown/MaskOps&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy to answer questions.&lt;/p&gt;

</description>
      <category>polars</category>
      <category>rust</category>
      <category>python</category>
      <category>privacy</category>
    </item>
    <item>
      <title>MaskOps 0.1.0: A Native Polars Plugin for High-Speed PII Masking in Python</title>
      <dc:creator>Felipe Carvajal Brown</dc:creator>
      <pubDate>Mon, 09 Mar 2026 03:55:40 +0000</pubDate>
      <link>https://dev.to/fcarvajalbrown/maskops-010-a-native-polars-plugin-for-high-speed-pii-masking-in-python-850</link>
      <guid>https://dev.to/fcarvajalbrown/maskops-010-a-native-polars-plugin-for-high-speed-pii-masking-in-python-850</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I built a Rust-powered Polars plugin that masks GDPR-sensitive data (IBAN, EU VAT) at up to 16 million rows per second — no NLP models, no spaCy, no Presidio overhead. &lt;code&gt;pip install maskops&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;If you work with financial data, healthcare records, or any GDPR-regulated dataset in Python, you've likely hit the same wall: &lt;strong&gt;de-identifying structured data at scale is painfully slow&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The go-to solution is &lt;a href="https://github.com/microsoft/presidio" rel="noopener noreferrer"&gt;Microsoft Presidio&lt;/a&gt;. It's powerful, but it's built for unstructured text — it spins up a full spaCy NLP pipeline to find a phone number in a CSV column. For structured DataFrames where you already &lt;em&gt;know&lt;/em&gt; which columns contain PII, that's enormous overhead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Presidio with spaCy NER: ~1,000–5,000 rows/s&lt;/li&gt;
&lt;li&gt;Presidio with regex-only recognizers: ~10,000–50,000 rows/s&lt;/li&gt;
&lt;li&gt;Pure Python &lt;code&gt;re&lt;/code&gt; module: ~1,100,000 rows/s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these integrate natively with Polars, the fastest DataFrame library in Python.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: maskops
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;maskops&lt;/code&gt; is a &lt;strong&gt;native Polars expression plugin&lt;/strong&gt; written in Rust. It extends Polars with two new expressions — &lt;code&gt;mask_pii()&lt;/code&gt; and &lt;code&gt;contains_pii()&lt;/code&gt; — that run directly on Arrow memory buffers with zero Python overhead per row.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;polars&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;maskops&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payments.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Mask all PII in a column
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mask_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# "Transfer to DE89370400440532013000" → "Transfer to DE89******************"
&lt;/span&gt;
&lt;span class="c1"&gt;# Boolean detection — filter rows containing PII
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;free_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No model downloads, no engine initialization, no spaCy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;Tested on 1,000,000 rows, Intel i-series CPU, Python 3.14, Windows.&lt;/p&gt;

&lt;h3&gt;
  
  
  maskops throughput
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Profile&lt;/th&gt;
&lt;th&gt;Expression&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Rows/s&lt;/th&gt;
&lt;th&gt;MB/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;clean (no PII)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mask_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.404s&lt;/td&gt;
&lt;td&gt;2,477,599&lt;/td&gt;
&lt;td&gt;54.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;clean (no PII)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;contains_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.169s&lt;/td&gt;
&lt;td&gt;5,915,970&lt;/td&gt;
&lt;td&gt;130.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dense (all PII)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mask_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.385s&lt;/td&gt;
&lt;td&gt;722,104&lt;/td&gt;
&lt;td&gt;15.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dense (all PII)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;contains_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.059s&lt;/td&gt;
&lt;td&gt;16,987,879&lt;/td&gt;
&lt;td&gt;373.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed (50/50)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mask_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.760s&lt;/td&gt;
&lt;td&gt;1,315,407&lt;/td&gt;
&lt;td&gt;28.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed (50/50)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;contains_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.133s&lt;/td&gt;
&lt;td&gt;7,498,315&lt;/td&gt;
&lt;td&gt;165.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  vs pure Python regex (same machine)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Profile&lt;/th&gt;
&lt;th&gt;maskops &lt;code&gt;mask_pii&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;Python &lt;code&gt;re&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;clean&lt;/td&gt;
&lt;td&gt;0.404s&lt;/td&gt;
&lt;td&gt;0.925s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.3×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dense&lt;/td&gt;
&lt;td&gt;1.385s&lt;/td&gt;
&lt;td&gt;1.653s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.2×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed&lt;/td&gt;
&lt;td&gt;0.760s&lt;/td&gt;
&lt;td&gt;1.337s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.8×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;On clean and mixed data maskops is consistently faster. On dense data (every row is a full IBAN) both are regex-bound — the bottleneck is the pattern itself, not Python overhead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  vs Microsoft Presidio (estimated)
&lt;/h3&gt;

&lt;p&gt;Presidio processes structured DataFrames via &lt;code&gt;presidio-structured&lt;/code&gt;, which runs a spaCy NLP pipeline per row. Based on community reports and the architecture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Throughput (structured data)&lt;/th&gt;
&lt;th&gt;Requires NLP model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;maskops&lt;/td&gt;
&lt;td&gt;~700K–17M rows/s&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Presidio (regex-only recognizers)&lt;/td&gt;
&lt;td&gt;~10–50K rows/s*&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Presidio (spaCy NER)&lt;/td&gt;
&lt;td&gt;~1–5K rows/s*&lt;/td&gt;
&lt;td&gt;Yes (250MB+)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;* Estimated from community benchmarks and Presidio's own documentation noting it is "not optimized for bulk structured data." &lt;a href="https://github.com/microsoft/presidio/discussions/1226" rel="noopener noreferrer"&gt;Microsoft confirmed no official throughput benchmarks exist.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;maskops is purpose-built for structured data pipelines where Presidio's NLP overhead is unnecessary.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The key is the &lt;a href="https://docs.pola.rs/user-guide/plugins/expr_plugins/" rel="noopener noreferrer"&gt;Polars expression plugin system&lt;/a&gt;, introduced in Polars 0.20. It allows you to register custom Rust functions that Polars calls directly on Arrow &lt;code&gt;ChunkedArray&lt;/code&gt; buffers — bypassing Python entirely for the hot loop.&lt;/p&gt;

&lt;p&gt;The architecture is three layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Python (user code)
    ↓  register_plugin_function()
Polars expression engine
    ↓  Arrow ChunkedArray
Rust (maskops core)
    ↓  regex::Regex on &amp;amp;str slices
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each PII type lives in its own Rust module (&lt;code&gt;iban.rs&lt;/code&gt;, &lt;code&gt;vat.rs&lt;/code&gt;) with a compiled &lt;code&gt;once_cell::Lazy&amp;lt;Regex&amp;gt;&lt;/code&gt; — the regex is compiled once at startup, not per row.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Rust side — called directly by Polars on each string slice&lt;/span&gt;
&lt;span class="nd"&gt;#[polars_expr(output_type=String)]&lt;/span&gt;
&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;mask_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;PolarsResult&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;ca&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="nf"&gt;.str&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StringChunked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ca&lt;/span&gt;&lt;span class="nf"&gt;.apply&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;opt_val&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;opt_val&lt;/span&gt;&lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;borrow&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Cow&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Owned&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;mask_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="nf"&gt;.into_series&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Supported PII Patterns (v0.1.0)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Coverage&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;IBAN&lt;/td&gt;
&lt;td&gt;All 36 SEPA countries&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;DE89370400440532013000&lt;/code&gt; → &lt;code&gt;DE89******************&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EU VAT&lt;/td&gt;
&lt;td&gt;All 27 EU member states&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;DE123456789&lt;/code&gt; → &lt;code&gt;DE*********&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tested against Faker-generated data in 8 EU locales: DE, FR, ES, IT, NL, PL, PT, SE.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Not Just Use Polars &lt;code&gt;.str.replace()&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;You could write &lt;code&gt;pl.col("x").str.replace_all(pattern, "****")&lt;/code&gt; directly in Polars. The problem:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You need one expression per PII type&lt;/strong&gt; — maskops applies all patterns in a single pass.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No detection&lt;/strong&gt; — Polars has no &lt;code&gt;contains_pii()&lt;/code&gt; equivalent without writing the regex yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No masking logic&lt;/strong&gt; — &lt;code&gt;mask_pii&lt;/code&gt; preserves the IBAN country code and check digits, which is standard practice for audit trails. A raw &lt;code&gt;str.replace_all&lt;/code&gt; would wipe everything.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Roadmap
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;v0.1.1&lt;/strong&gt;: Email, phone number, IP address patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;v0.1.2&lt;/strong&gt;: Format-Preserving Encryption (FPE/FF3-1) for reversible masking + PyPI publish&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;v0.2.0&lt;/strong&gt;: Latin American IDs (Chilean RUT, Brazilian CPF, Mexican CURP)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Install &amp;amp; Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;maskops
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;polars&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;maskops&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transaction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Payment from DE89370400440532013000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invoice VAT: DE123456789&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No PII here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_columns&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mask_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transaction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;masked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;maskops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transaction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;has_pii&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────┬──────────────────────────────────┬─────────┐
│ transaction                         ┆ masked                           ┆ has_pii │
╞═════════════════════════════════════╪══════════════════════════════════╪═════════╡
│ Payment from DE89370400440532013000 ┆ Payment from DE89*************** ┆ true    │
│ Invoice VAT: DE123456789            ┆ Invoice VAT: DE*********         ┆ true    │
│ No PII here                         ┆ No PII here                      ┆ false   │
└─────────────────────────────────────┴──────────────────────────────────┴─────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source code: &lt;a href="https://github.com/fcarvajalbrown/MaskOps" rel="noopener noreferrer"&gt;github.com/fcarvajalbrown/MaskOps&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Rust, pyo3-polars, and maturin. Contributions welcome.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;#rust&lt;/code&gt; &lt;code&gt;#python&lt;/code&gt; &lt;code&gt;#polars&lt;/code&gt; &lt;code&gt;#gdpr&lt;/code&gt; &lt;code&gt;#dataengineering&lt;/code&gt; &lt;code&gt;#privacy&lt;/code&gt; &lt;code&gt;#pii&lt;/code&gt; &lt;code&gt;#opensource&lt;/code&gt;&lt;/p&gt;

</description>
      <category>performance</category>
      <category>privacy</category>
      <category>python</category>
      <category>rust</category>
    </item>
  </channel>
</rss>
