DEV Community

Tanush Bhootra
Tanush Bhootra

Posted on

How I Built a Real-Time Biometric Telemetry Pipeline to Detect AI Code Injections in VS Code

Hey everyone! I’m a 19-year-old student developer, and over the last few weeks, I’ve been building an independent developer-security pipeline called Epistemic Protocol.

With the explosion of offshore engineering contracting and AI-generated code bloat, technical teams face a severe software supply chain visibility gap. Managers see the final Pull Request, but they have zero confirmation of how that code was authored.

Here is a deep-dive look at the 3-tier architecture I built to solve this entirely in public.

1. The Local Sensor Tier (VS Code Client)

Instead of taking a primitive approach like reading code snippets, I went the biometric route. I built a lightweight VS Code extension that passively captures the raw mechanics of developer interaction. It logs keystroke flight-times (the milliseconds between key presses) and dwell-times (how long a key remains depressed), alongside character-per-second (CPS) velocity curves and sudden mass paste boundaries.

2. The Ingestion Tier (Next.js Telemetry Bus)

To prevent local IDE input lag, the extension buffers metadata asynchronously in memory and flushes payloads structured as compressed JSON packets during natural pauses. These packets hit a high-throughput Next.js Telemetry Bus running edge endpoints. Tenant spaces are isolated using custom cryptographic tokens generated inside a secure configuration vault.

3. The Analytics Tier (Python FastAPI & ML Node)

The Telemetry Bus routes payload vectors straight to a standalone Python FastAPI service. The service executes an Isolation Forest Machine Learning model. Because normal human typing rhythm is highly chaotic, variable, and bound by cognitive muscle memory, a script-emulated dump or a massive 500-line ChatGPT copy-paste is flagged by the model as a stark, structural anomaly instantly.

The microservice returns a calculated risk parameter to a centralized, multi-tenant database.


The Live Interface

Here is a breakdown of how the workspace status maps inside the IDE versus the operational overview:

When data hits the streaming sync layer, individual developer session packets register inside the pipeline logs:

The centralized company command console handles multiple tenants concurrently, displaying a holographic heatmap of team code integrity thresholds:


Build-In-Public Learnings

Architecting this taught me a massive lesson about scaling WebSocket loops and asynchronous micro-buffering under heavy data throughput. Latency across the cloud sync plane is practically zero, and local machine performance footprint is non-existent.

V1 is officially live and deployed on the internet. I'd love to hear your unedited feedback on the transport mechanics, security model, or ML logic!

Test the live dashboard beta here: https://epistemic-lac.vercel.app/

Top comments (0)