Defyzzz

Posted on Apr 11

I Built a Browser Extension That Catches Your Secrets Before AI Does

#nolex #javascript #privacy #ai

We all do it. Copy a chunk of code, paste it into ChatGPT, ask "why doesn't this work?" — and accidentally send your AWS keys, database credentials, or a client's email along with it.

I did it once. I pasted a config file into Claude and realized — three seconds too late — that it contained a production database connection string. Nothing bad happened. But that "oh no" feeling stuck with me.

So I built Nolex — a Chrome extension that sits between your clipboard and AI platforms, scanning everything you upload or paste for sensitive data. If it finds something, it shows you exactly what and lets you redact it before sending.

The key part: everything runs locally in your browser. No servers, no cloud, no telemetry. Your data never leaves your machine.

The Problem Is Bigger Than You Think

Every day, millions of developers paste code snippets, configs, logs, and documents into AI chatbots. Most of the time it's fine. But sometimes that snippet contains:

An sk-proj-... OpenAI API key
AWS credentials (AKIA...)
A postgresql://user:password@host/db connection string
A private SSH key
A JWT token with user data
Credit card numbers from test logs
Phone numbers and emails from customer data

You don't even notice it's there. The AI platform does — and now your secret lives in someone else's training data, logs, or at minimum their server memory.

This isn't hypothetical. Samsung banned ChatGPT after engineers leaked source code. GitHub found thousands of exposed API keys in public repos — and AI-assisted workflows make this worse, not better.

How Nolex Works

Nolex intercepts two things:

File uploads — when you drag a .env, .json, .py, or any other file into ChatGPT/Claude/DeepSeek
Clipboard paste — when you Ctrl+V code or text into the chat input

Before the data reaches the AI platform, Nolex scans it against 30+ regex patterns covering:

Category	Examples
AI Platform Keys	OpenAI, Anthropic, Google AI, DeepSeek, Hugging Face, Mistral, Cohere, Replicate
Cloud & Infra	AWS Access/Secret Keys, AWS Session Tokens
Developer Tokens	GitHub PAT, GitHub OAuth, Slack Bot/User Tokens, Discord Bot Tokens
Payment	Stripe Secret/Restricted Keys, Stripe Webhooks
Databases	PostgreSQL, MySQL, MongoDB, Redis connection strings
Personal Data	Email addresses, phone numbers (international + RU format), credit card numbers
Auth	JWT tokens, SSH/RSA private keys
Webhooks	Slack Webhooks, Discord Webhooks

If anything matches, you see an interactive dialog:

Each finding is highlighted in the text preview. You can:

See exactly what was detected and where
Click on any finding to jump to its location in the text
Choose which findings to redact and which to keep
Replace sensitive values with safe placeholders like ***AWS_KEY_REDACTED***
Cancel the upload entirely

If nothing suspicious is found — the file or paste goes through silently. Zero friction when there's nothing to worry about.

Architecture: Why Local-First Matters

I considered building a SaaS. Backend API, user accounts, cloud scanning. It would've been easier to monetize.

But think about it: would you send your API keys to another cloud service just to check if you're about to send your API keys to a cloud service? The irony would be painful.

So Nolex runs entirely in the browser:

┌─────────────────────────────────────┐
│           Your Browser              │
│                                     │
│  ┌──────────┐    ┌──────────────┐   │
│  │ You paste │───▶│ content.js   │   │
│  │ or upload │    │ (bridge)     │   │
│  └──────────┘    └──────┬───────┘   │
│                         │           │
│                  ┌──────▼───────┐   │
│                  │ interceptor  │   │
│                  │ (fetch/XHR   │   │
│                  │  monkey-patch│   │
│                  └──────┬───────┘   │
│                         │           │
│                  ┌──────▼───────┐   │
│                  │ detector.js  │   │
│                  │ (30+ regex)  │   │
│                  └──────┬───────┘   │
│                         │           │
│               findings? │           │
│              ┌──────────┼────────┐  │
│              │ yes       │ no    │  │
│       ┌──────▼──────┐   │       │  │
│       │  dialog.js  │   │       │  │
│       │ (review UI) │   │       │  │
│       └─────────────┘   │       │  │
│                   ┌─────▼─────┐ │  │
│                   │ Send as-is│ │  │
│                   └───────────┘ │  │
└─────────────────────────────────┘  │
                                     │
              ❌ Nothing leaves here  │
─────────────────────────────────────┘

The extension uses Chrome's Manifest V3 with only two permissions:

storage — to save your settings locally
host_permissions — to intercept requests on AI platform pages

No tabs, no activeTab, no scripting. Minimal attack surface.

The Interception Trick

The interesting technical challenge was intercepting file uploads before they reach the server. AI platforms use fetch() or XMLHttpRequest to send data. Nolex monkey-patches both:

const originalFetch = window.fetch;
window.fetch = async function(url, options) {
    // Extract file from request body
    // Scan with detector.js
    // If findings → show dialog, wait for user decision
    // If clean → pass through to original fetch
    return originalFetch.apply(this, arguments);
};

For clipboard, it hooks into the paste event before the AI platform's own handler processes it.

This means Nolex works on any website — not just ChatGPT. If a site uses fetch to upload files, Nolex can scan them. It works on Claude, DeepSeek, Gemini, Copilot, and any other AI tool that runs in the browser.

Smart Constructor: Build Your Own Patterns

The built-in 30+ patterns cover common cases, but every team has unique secrets. Internal service tokens, custom API key formats, employee IDs — things no generic tool would know about.

That's why Nolex includes a Smart Constructor:

You can:

Write custom regex patterns with a name and replacement template
Test them against sample text in real-time
Group patterns into categories
Import/Export pattern sets as JSON — share with your team

This is especially useful for security teams who want to enforce company-wide policies. Create a pattern set, export it, distribute to the team.

What's Coming: AI-Powered Detection

Regex is great for structured secrets — API keys have predictable formats, emails follow RFC 5322, credit cards pass the Luhn algorithm. But what about unstructured sensitive data?

Think about this scenario: you paste a support ticket into ChatGPT to draft a response. The ticket contains:

"Customer John Smith (john.smith@company.com) reported that the payment from card ending 4242 failed at our London office on Jan 15..."

Regex catches the email and maybe the card fragment. But what about "John Smith"? "London"? The fact that this is a real person's real support ticket?

This is where Named Entity Recognition (NER) comes in — and it's what we're building next for Nolex Pro.

How NER Works in the Browser

We're using Transformers.js — a JavaScript port of Hugging Face's Transformers library — to run a multilingual BERT model directly in the browser:

Model: bert-base-multilingual-cased-ner-hrl
Size: ~710MB (quantized, cached after first load)
Languages: 100+ (English, Russian, German, French, Chinese...)
Categories: Person, Location, Organization, Miscellaneous

The model runs entirely client-side using WebAssembly and WebGL. No API calls. No server. Your text never leaves your browser — just like the regex engine.

Here's what the pipeline looks like:

Raw text → Strip HTML/XML tags
         → Normalize UPPERCASE → Title Case
         → Split into 200-char chunks (BERT limit)
         → Run NER on each chunk
         → Merge B-/I- tokens into entities
         → Deduplicate substrings
         → Map back to original text positions
         → Show in dialog alongside regex findings

One fun challenge: BERT completely ignores UPPERCASE text. Feed it "JOHN SMITH" and it returns nothing. Feed it "John Smith" and it gets 99% confidence. So we normalize case before analysis but display the original text in the dialog.

Another: BERT's tokenizer splits words into subwords. "Yovovich" becomes "Yovo" + "##vich". The ## prefix means "continuation of previous word" — we merge them back during post-processing.

PDF Document Scanning

We're also adding PDF scanning — extract text from uploaded PDFs using Mozilla's pdf.js, then run both regex and NER on the content:

PDFs are tricky because pdf.js returns text items in PDF object order, not reading order. A two-column layout might return all of column 2 before column 1. We sort by Y-coordinate (top to bottom) then X-coordinate (left to right) to reconstruct the natural reading flow.

The Numbers

Since publishing on the Chrome Web Store in March 2026:

30+ detection patterns covering 8 categories
Zero data sent to any server, ever
Works on any website that uses fetch/XHR
Extension size: ~150KB (without Pro features)
Scan time: < 50ms for regex, 1-3 seconds for NER (first load: ~30s for model download)