The Problem: Regex is "Dumb"
If you’ve ever built a secret scanner or a linter, you know the struggle. You write a Regex for an API key, and suddenly, every random SHA hash or CSS hex code gets flagged as a "Critical Security Risk."
I got tired of False Positives in my VS Code extension, DotEnvy. I wanted a scanner that didn't just match patterns—I wanted one that understood context.
The Solution: A Hybrid Engine (Regex + AI)
In version 1.4, I introduced a new architecture. Instead of relying solely on client-side logic, I offloaded the heavy lifting to a custom Python LLM Service hosted on Railway.
The Logic Flow:
Layer 1 (Local): Fast Regex & Entropy analysis runs in VS Code. If it looks safe, we ignore it.
Layer 2 (The Filter): If the local score is ambiguous (> 0.4), we pause.
Layer 3 (The Brain): The extension queries my Python backend. The LLM analyzes the variable name, surrounding code, and entropy to give a final verdict.
The "Secret" Sauce: The 80/20 Rule
Integrating the LLM was only about 20% of the new code, but it provides 80% of the intelligence.
Frontend: TypeScript (VS Code API) handling async/debounced events.
Backend: Python service on Railway.
Result: Zero false positives. It can tell the difference between a real Stripe key and a random test string.
Watch it in Action
I ran a full suite of integration tests to prove stability. The system now validates connection, analyzes entropy, and confirms secrets in real-time.
Under the Hood
Performance: I implemented debouncing so we don't spam the server on every keystroke.
Security: The API key is injected at build time, keeping the repo clean.
Reliability: Added a circuit breaker pattern—if the server is down, it falls back to local analysis seamlessly.
Let me know what you think about hybrid AI/Local architectures in the comments!
Top comments (0)