DEV Community

Defyzzz
Defyzzz

Posted on

I Built a Browser Extension That Catches Your Secrets Before AI Does

We all do it. Copy a chunk of code, paste it into ChatGPT, ask "why doesn't this work?" — and accidentally send your AWS keys, database credentials, or a client's email along with it.

I did it once. I pasted a config file into Claude and realized — three seconds too late — that it contained a production database connection string. Nothing bad happened. But that "oh no" feeling stuck with me.

So I built Nolex — a Chrome extension that sits between your clipboard and AI platforms, scanning everything you upload or paste for sensitive data. If it finds something, it shows you exactly what and lets you redact it before sending.

The key part: everything runs locally in your browser. No servers, no cloud, no telemetry. Your data never leaves your machine.

The Problem Is Bigger Than You Think

Every day, millions of developers paste code snippets, configs, logs, and documents into AI chatbots. Most of the time it's fine. But sometimes that snippet contains:

  • An sk-proj-... OpenAI API key
  • AWS credentials (AKIA...)
  • A postgresql://user:password@host/db connection string
  • A private SSH key
  • A JWT token with user data
  • Credit card numbers from test logs
  • Phone numbers and emails from customer data

You don't even notice it's there. The AI platform does — and now your secret lives in someone else's training data, logs, or at minimum their server memory.

This isn't hypothetical. Samsung banned ChatGPT after engineers leaked source code. GitHub found thousands of exposed API keys in public repos — and AI-assisted workflows make this worse, not better.

How Nolex Works

Nolex intercepts two things:

  1. File uploads — when you drag a .env, .json, .py, or any other file into ChatGPT/Claude/DeepSeek
  2. Clipboard paste — when you Ctrl+V code or text into the chat input

Before the data reaches the AI platform, Nolex scans it against 30+ regex patterns covering:

Category Examples
AI Platform Keys OpenAI, Anthropic, Google AI, DeepSeek, Hugging Face, Mistral, Cohere, Replicate
Cloud & Infra AWS Access/Secret Keys, AWS Session Tokens
Developer Tokens GitHub PAT, GitHub OAuth, Slack Bot/User Tokens, Discord Bot Tokens
Payment Stripe Secret/Restricted Keys, Stripe Webhooks
Databases PostgreSQL, MySQL, MongoDB, Redis connection strings
Personal Data Email addresses, phone numbers (international + RU format), credit card numbers
Auth JWT tokens, SSH/RSA private keys
Webhooks Slack Webhooks, Discord Webhooks

If anything matches, you see an interactive dialog:

Each finding is highlighted in the text preview. You can:

  • See exactly what was detected and where
  • Click on any finding to jump to its location in the text
  • Choose which findings to redact and which to keep
  • Replace sensitive values with safe placeholders like ***AWS_KEY_REDACTED***
  • Cancel the upload entirely

If nothing suspicious is found — the file or paste goes through silently. Zero friction when there's nothing to worry about.

Architecture: Why Local-First Matters

I considered building a SaaS. Backend API, user accounts, cloud scanning. It would've been easier to monetize.

But think about it: would you send your API keys to another cloud service just to check if you're about to send your API keys to a cloud service? The irony would be painful.

So Nolex runs entirely in the browser:

┌─────────────────────────────────────┐
│           Your Browser              │
│                                     │
│  ┌──────────┐    ┌──────────────┐   │
│  │ You paste │───▶│ content.js   │   │
│  │ or upload │    │ (bridge)     │   │
│  └──────────┘    └──────┬───────┘   │
│                         │           │
│                  ┌──────▼───────┐   │
│                  │ interceptor  │   │
│                  │ (fetch/XHR   │   │
│                  │  monkey-patch│   │
│                  └──────┬───────┘   │
│                         │           │
│                  ┌──────▼───────┐   │
│                  │ detector.js  │   │
│                  │ (30+ regex)  │   │
│                  └──────┬───────┘   │
│                         │           │
│               findings? │           │
│              ┌──────────┼────────┐  │
│              │ yes       │ no    │  │
│       ┌──────▼──────┐   │       │  │
│       │  dialog.js  │   │       │  │
│       │ (review UI) │   │       │  │
│       └─────────────┘   │       │  │
│                   ┌─────▼─────┐ │  │
│                   │ Send as-is│ │  │
│                   └───────────┘ │  │
└─────────────────────────────────┘  │
                                     │
              ❌ Nothing leaves here  │
─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The extension uses Chrome's Manifest V3 with only two permissions:

  • storage — to save your settings locally
  • host_permissions — to intercept requests on AI platform pages

No tabs, no activeTab, no scripting. Minimal attack surface.

The Interception Trick

The interesting technical challenge was intercepting file uploads before they reach the server. AI platforms use fetch() or XMLHttpRequest to send data. Nolex monkey-patches both:

const originalFetch = window.fetch;
window.fetch = async function(url, options) {
    // Extract file from request body
    // Scan with detector.js
    // If findings → show dialog, wait for user decision
    // If clean → pass through to original fetch
    return originalFetch.apply(this, arguments);
};
Enter fullscreen mode Exit fullscreen mode

For clipboard, it hooks into the paste event before the AI platform's own handler processes it.

This means Nolex works on any website — not just ChatGPT. If a site uses fetch to upload files, Nolex can scan them. It works on Claude, DeepSeek, Gemini, Copilot, and any other AI tool that runs in the browser.

Smart Constructor: Build Your Own Patterns

The built-in 30+ patterns cover common cases, but every team has unique secrets. Internal service tokens, custom API key formats, employee IDs — things no generic tool would know about.

That's why Nolex includes a Smart Constructor:

You can:

  • Write custom regex patterns with a name and replacement template
  • Test them against sample text in real-time
  • Group patterns into categories
  • Import/Export pattern sets as JSON — share with your team

This is especially useful for security teams who want to enforce company-wide policies. Create a pattern set, export it, distribute to the team.

What's Coming: AI-Powered Detection

Regex is great for structured secrets — API keys have predictable formats, emails follow RFC 5322, credit cards pass the Luhn algorithm. But what about unstructured sensitive data?

Think about this scenario: you paste a support ticket into ChatGPT to draft a response. The ticket contains:

"Customer John Smith (john.smith@company.com) reported that the payment from card ending 4242 failed at our London office on Jan 15..."

Regex catches the email and maybe the card fragment. But what about "John Smith"? "London"? The fact that this is a real person's real support ticket?

This is where Named Entity Recognition (NER) comes in — and it's what we're building next for Nolex Pro.

How NER Works in the Browser

We're using Transformers.js — a JavaScript port of Hugging Face's Transformers library — to run a multilingual BERT model directly in the browser:

Model: bert-base-multilingual-cased-ner-hrl
Size: ~710MB (quantized, cached after first load)
Languages: 100+ (English, Russian, German, French, Chinese...)
Categories: Person, Location, Organization, Miscellaneous
Enter fullscreen mode Exit fullscreen mode

The model runs entirely client-side using WebAssembly and WebGL. No API calls. No server. Your text never leaves your browser — just like the regex engine.

Here's what the pipeline looks like:

Raw text → Strip HTML/XML tags
         → Normalize UPPERCASE → Title Case
         → Split into 200-char chunks (BERT limit)
         → Run NER on each chunk
         → Merge B-/I- tokens into entities
         → Deduplicate substrings
         → Map back to original text positions
         → Show in dialog alongside regex findings
Enter fullscreen mode Exit fullscreen mode

One fun challenge: BERT completely ignores UPPERCASE text. Feed it "JOHN SMITH" and it returns nothing. Feed it "John Smith" and it gets 99% confidence. So we normalize case before analysis but display the original text in the dialog.

Another: BERT's tokenizer splits words into subwords. "Yovovich" becomes "Yovo" + "##vich". The ## prefix means "continuation of previous word" — we merge them back during post-processing.

PDF Document Scanning

We're also adding PDF scanning — extract text from uploaded PDFs using Mozilla's pdf.js, then run both regex and NER on the content:

PDFs are tricky because pdf.js returns text items in PDF object order, not reading order. A two-column layout might return all of column 2 before column 1. We sort by Y-coordinate (top to bottom) then X-coordinate (left to right) to reconstruct the natural reading flow.

The Numbers

Since publishing on the Chrome Web Store in March 2026:

  • 30+ detection patterns covering 8 categories
  • Zero data sent to any server, ever
  • Works on any website that uses fetch/XHR
  • Extension size: ~150KB (without Pro features)
  • Scan time: < 50ms for regex, 1-3 seconds for NER (first load: ~30s for model download)

Try It

If you've ever had that "oh no" moment after pasting something into an AI chatbot — this is for you.

Top comments (2)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

A common pitfall is relying on AI tools without understanding their limitations. In our experience with enterprise teams, we often see the real challenge isn't about getting AI to work, but integrating it with existing workflows and ensuring data privacy. A simple framework we use is the "Design for Privacy First," which involves identifying sensitive data at the start and using tools like ChatGPT with strict access controls. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)

Collapse
 
defyzzz profile image
Defyzzz

Thanks Ali! You're right — privacy should be the starting point, not an afterthought. That's exactly why Nolex runs 100% locally in the browser — no data ever leaves the device, even during AI-powered analysis. And it's not limited to AI platforms — it works everywhere you can send a message: Slack, Reddit, messengers, email, any website with file uploads or text input. The "access controls" approach works for enterprise, but individual users need something simpler: a tool that just blocks sensitive data before it goes anywhere. No setup, no policies to configure.