<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dhroov Gupta</title>
    <description>The latest articles on DEV Community by Dhroov Gupta (@dhroov7).</description>
    <link>https://dev.to/dhroov7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F236530%2F77a3aed6-9081-468a-89a7-3cd7fd6ae80f.jpeg</url>
      <title>DEV Community: Dhroov Gupta</title>
      <link>https://dev.to/dhroov7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dhroov7"/>
    <language>en</language>
    <item>
      <title>How we built a PII masking layer for LLM APIs — local detection, reversible tokens, one line to integrate</title>
      <dc:creator>Dhroov Gupta</dc:creator>
      <pubDate>Mon, 25 May 2026 17:23:59 +0000</pubDate>
      <link>https://dev.to/dhroov7/how-we-built-a-pii-masking-layer-for-llm-apis-local-detection-reversible-tokens-one-line-to-12c8</link>
      <guid>https://dev.to/dhroov7/how-we-built-a-pii-masking-layer-for-llm-apis-local-detection-reversible-tokens-one-line-to-12c8</guid>
      <description>&lt;p&gt;If you're building LLM features on top of OpenAI or Anthropic, you're almost certainly sending raw user data to a third-party model provider. Names, emails, phone numbers, tax IDs, health records — whatever your users type, it goes straight to the API.&lt;/p&gt;

&lt;p&gt;Here's the uncomfortable part: every attempt to fix this problem seems to make it worse. The most obvious fix — sending your text to a cloud anonymisation service first — means you're solving a data privacy problem by sending your sensitive data to another third party.&lt;/p&gt;

&lt;p&gt;I was talking to a healthtech team recently that had been blocked from using GPT-4 for clinical notes for months. Not because the engineers didn't want to — they did. Legal wouldn't sign off because every API call meant patient data leaving their infrastructure. The problem wasn't capability. It was the missing privacy boundary between their data and the LLM.&lt;/p&gt;

&lt;p&gt;Armos is that boundary. A local detection and masking layer that sits between your application and the LLM API — PII never leaves your server, and real values are restored in the response automatically.&lt;/p&gt;

&lt;p&gt;This is how it works under the hood.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with the obvious approaches
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Option 1: Regex scrubbing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Fast to write, breaks constantly. Email regexes miss edge cases. Names are impossible. You end up with a pile of patterns that need constant maintenance and still let things through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 2: Send everything to a cloud anonymisation API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Same problem, different server. You haven't kept the data in-house — you've just added a hop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 3: Build it yourself with Presidio&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Microsoft's Presidio is excellent — it's what powers Armos's detection. But it's detection only. You still need to build the masking layer, the vault, the de-masking logic, and wire it into your SDK calls. That's a week of work for a first pass and months of edge cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Armos does instead
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96nmc6g1eb6ovxsd4i60.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96nmc6g1eb6ovxsd4i60.png" alt="How it works" width="799" height="580"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three steps, all local:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Detect&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Presidio + spaCy runs on the text before it leaves your process. No network call. No data sent anywhere during detection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Mask with reversible tokens&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Detected entities are replaced with deterministic tokens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Patient John Smith, Aadhaar 2345 6789 0123"
→
"Patient [PII:NAME:c4587843], Aadhaar [PII:AADHAAR:473adcf3]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The token format encodes the entity type and a hash of the original value. Same value always maps to the same token — so if "John Smith" appears twice, it gets the same token both times, and the LLM can reason about it consistently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Restore&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After the LLM responds, the library scans the output for tokens and swaps them back. Your application receives the original text. The model never saw the real values.&lt;/p&gt;




&lt;h2&gt;
  
  
  The token vault
&lt;/h2&gt;

&lt;p&gt;Tokens need to map back to real values. The library keeps a vault — a simple key-value store — inside the process by default, with an optional Redis backend for cross-process persistence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In-memory (default)
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ArmosOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Redis-backed — tokens survive across requests and processes
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ArmosOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The vault never leaves your infrastructure. Armos has no server. There's no telemetry, no cloud component.&lt;/p&gt;




&lt;h2&gt;
  
  
  The integration
&lt;/h2&gt;

&lt;p&gt;This is the entire change to existing code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# After
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;armos&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ArmosOpenAI&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ArmosOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything downstream works identically — same method signatures, same response objects. The masking and de-masking happen invisibly inside the privacy layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What gets detected
&lt;/h2&gt;

&lt;p&gt;10 entity types out of the box:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Names&lt;/strong&gt; — via spaCy NER (en_core_web_lg)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email, phone, credit card, IP&lt;/strong&gt; — Presidio built-ins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aadhaar, PAN&lt;/strong&gt; — custom regex recognisers (Indian identifiers that no existing tool handles reliably)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSN, IBAN&lt;/strong&gt; — Presidio built-ins with checksum validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API keys&lt;/strong&gt; — custom pattern recogniser for OpenAI, AWS, GitHub key formats&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Accuracy
&lt;/h2&gt;

&lt;p&gt;I ran a 1,000-sample benchmark across all entity types:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Entity&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Email&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aadhaar&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PAN&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSN&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IBAN&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credit card&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phone&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API keys&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IP address&lt;/td&gt;
&lt;td&gt;99.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Person name&lt;/td&gt;
&lt;td&gt;96.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 3.6% miss rate on names is entirely Indian names — &lt;code&gt;en_core_web_lg&lt;/code&gt; was trained predominantly on Western text. I'm working on a supplemental approach for this.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Streaming support (&lt;code&gt;stream=True&lt;/code&gt; currently passes through unmasked)&lt;/li&gt;
&lt;li&gt;Async clients (&lt;code&gt;AsyncOpenAI&lt;/code&gt;, &lt;code&gt;AsyncAnthropic&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;LangChain and LlamaIndex integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The library is early and I'm actively looking for teams using LLMs on sensitive data who want to trial it and shape where it goes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/armos-ai/armos-python" rel="noopener noreferrer"&gt;github.com/armos-ai/armos-python&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Docs:&lt;/strong&gt; &lt;a href="https://armos.dev" rel="noopener noreferrer"&gt;armos.dev&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;armos
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're hitting this problem or have thoughts on the approach, I'd love to hear from you in the comments.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>privacy</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Distributed Rate Limiter</title>
      <dc:creator>Dhroov Gupta</dc:creator>
      <pubDate>Fri, 28 Apr 2023 22:03:04 +0000</pubDate>
      <link>https://dev.to/dhroov7/distributed-rate-limiter-194e</link>
      <guid>https://dev.to/dhroov7/distributed-rate-limiter-194e</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I recently created a rate limiter library for the distributed systems that can be used to control and limit the number of requests made within a specific period of time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;The idea behind the library is to create a token bucket that replenishes tokens at a certain rate. Each time a request is made, the library checks if there are enough tokens available in the bucket. If there are, it removes a token and allows the request. If not, it rejects the request.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;I implemented the token bucket algorithm using Redis as a distributed storage system. The library, called &lt;code&gt;dist-rate&lt;/code&gt;, takes an options object that includes the number of tokens in the token bucket, the duration for which the tokens are replenished, and an instance of the Redis client to use for distributed locking and storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;To use the &lt;code&gt;dist-rate-limiter&lt;/code&gt; library in your project, simply install it via npm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;dist-rate-limiter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To use the package, create an instance of the &lt;code&gt;DistRate&lt;/code&gt; class and call the &lt;code&gt;execute()&lt;/code&gt; method with a unique ID for each request. The method returns a boolean indicating whether the request should be allowed or not.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;IORedis&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ioredis&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DistRate&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;distrate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;redisClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;IORedis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Cluster&lt;/span&gt;&lt;span class="p"&gt;([...]);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rateLimiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DistRate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;redisClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allowed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;rateLimiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can then use the rateLimiter instance to control and limit the number of requests made to your API or server.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;The dist-rate library is available on both GitHub and npm. You can find the GitHub repository here: &lt;/p&gt;

&lt;p&gt;Github - &lt;a href="https://github.com/Dhroov7/distRate" rel="noopener noreferrer"&gt;https://github.com/Dhroov7/distRate&lt;/a&gt;&lt;br&gt;
NPM -  &lt;a href="https://www.npmjs.com/package/dist-rate" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/dist-rate&lt;/a&gt;&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>distributedsystems</category>
      <category>redis</category>
      <category>ratelimiter</category>
    </item>
    <item>
      <title>HacktoberFest 2019</title>
      <dc:creator>Dhroov Gupta</dc:creator>
      <pubDate>Wed, 25 Sep 2019 18:52:08 +0000</pubDate>
      <link>https://dev.to/dhroov7/hacktoberfest-2019-2l2c</link>
      <guid>https://dev.to/dhroov7/hacktoberfest-2019-2l2c</guid>
      <description>&lt;p&gt;Hey Everyone,&lt;br&gt;
As we all know HacktoberFest 2019 is starting in 6 days.&lt;br&gt;
So, for new open source developers i've created a repository having easy and beginner friendly issues, so that every new comer can contribute and earn the T-shirt from DigitalOcean...isn't that great!! :)&lt;/p&gt;

&lt;p&gt;Link to the repository:&lt;br&gt;
&lt;a href="https://github.com/Dhroov7/HacktoberFest2019" rel="noopener noreferrer"&gt;Click here&lt;/a&gt;&lt;/p&gt;

</description>
      <category>hacktoberfest</category>
      <category>opensource</category>
      <category>github</category>
    </item>
  </channel>
</rss>
