How I built a production-ready proctoring system that runs entirely in the browser — no native apps, no server-side video processing.
Online proctoring has traditionally required heavy native desktop applications that install system-level hooks, capture screen contents, and stream video to cloud servers for analysis. These solutions are invasive, expensive to operate, and create privacy concerns.
I built Cerberus.js to prove that modern browser APIs — combined with client-side machine learning — can deliver comparable proctoring capabilities without a single native dependency.
The core constraint: All processing happens locally on the device. No video or audio is ever transmitted to a server.
Architecture Overview
Cerberus.js is a Next.js 16 application with a React 19 frontend. The monitoring system is composed of five independent hooks that feed events into a centralized scoring engine:
useInfractionMatrix() ── Central scoring engine
├── useHardwareModule() ── Screen share, monitors, devices
├── useFocusLockdown() ── Fullscreen, keyboard, visibility
├── useVisionEngine() ── MediaPipe face detection
└── useAudioEngine() ── Web Audio API analysis
Each hook monitors a specific attack vector and calls addEvent() when suspicious behavior is detected. The scoring engine deduplicates events (via per-type cooldown timers), computes a running score, and gates events when the suspension threshold (100pts) is reached.
Face Detection with MediaPipe
The most technically interesting component is the face detection engine. We use @mediapipe/tasks-vision with the BlazeFace model — a lightweight face detector that runs in real-time on the GPU via WebGL.
Graceful Degradation Strategy
const modelSources = [FACE_MODEL_LOCAL, FACE_MODEL_URL];
const delegates = ["GPU", "CPU"] as const;
for (const modelPath of modelSources) {
for (const delegate of delegates) {
try {
detector = await FaceDetector.createFromOptions(wasmFileset, {
baseOptions: { modelAssetPath: modelPath, delegate },
minDetectionConfidence: 0.5,
});
break;
} catch { continue; }
}
if (detector) break;
}
This attempts four combinations in order:
- Local model file (
public/models/) on GPU - Local model file on CPU
- CDN-hosted model on GPU
- CDN-hosted model on CPU
If none succeed, we fire a vision_init_fail event (10pts) and continue with degraded monitoring.
Audio Analysis with Web Audio API
Instead of recording audio and sending it to a server, we use the Web Audio API's AnalyserNode to compute RMS (Root Mean Square) values in real-time entirely on the client:
const ctx = new AudioContext();
const src = ctx.createMediaStreamSource(stream);
const ana = ctx.createAnalyser();
ana.fftSize = 256;
src.connect(ana);
const tick = () => {
ana.getByteTimeDomainData(buf);
let sum = 0;
for (let i = 0; i < len; i++) {
const v = (buf[i] - 128) / 128;
sum += v * v;
}
const rms = Math.sqrt(sum / len);
// Accumulate exceeding threshold
if (rms >= AUDIO_RMS_THRESHOLD) {
accumMs.current += 100;
}
// Flag if sustained for 1.5s
if (accumMs.current >= AUDIO_ACCUMULATION_MS) {
addEvent("mic_threshold_exceeded", ...);
accumMs.current = 0;
}
animId.current = requestAnimationFrame(tick);
};
The accumulation window prevents false positives from brief noises while catching sustained audio events (like someone reading answers aloud).
The Scoring Engine
The useInfractionMatrix hook is the central nervous system. Key design decisions:
Cooldown Dedup via Refs
Each event type has an independent cooldown timer (stored in a ref map, not state). This prevents event storms without dropping genuinely distinct events:
const isCoolingDown = (type: InfractionEventType): boolean => {
const cooldown = EVENT_COOLDOWN_MS[type];
if (!cooldown) return false;
const last = lastEventTime.current[type];
if (!last) return false;
return Date.now() - last < cooldown;
};
Suspension Gate via Ref
Once the score hits 100, a ref permanently blocks all future events. This is critical because multiple callbacks (RAF loop, event listeners) can fire in the same render cycle — state alone wouldn't prevent race conditions:
if (suspendedRef.current) return;
setTotalScore((prev) => {
const next = prev + score;
if (next >= SUSPENSION_THRESHOLD && !suspendedRef.current) {
suspendedRef.current = true;
setSuspended(true);
}
return next;
});
Session Security
Session tokens are HMAC-SHA256 signed JWTs created using the Web Crypto API — zero external dependencies:
const key = await crypto.subtle.importKey(
"raw", secret, { name: "HMAC", hash: "SHA-256" },
false, ["sign", "verify"]
);
const sig = await crypto.subtle.sign(
{ name: "HMAC", hash: "SHA-256" },
key, new TextEncoder().encode(`${header}.${payload}`)
);
The API layer enforces:
- IP-based rate limiting: 10 session creations/hour, 100 submissions/hour
- Session-based rate limiting: 30 submissions/minute per session
- CSRF protection: Origin/referer validation on all POST endpoints
- Body size limits: 100KB maximum payload
Submission Reliability
When the network is unavailable during submission, the client caches the event log in localStorage and retries with exponential backoff:
const RETRY_BACKOFF_MS = [2_000, 5_000, 15_000, 30_000, 60_000];
// On page load, flush any pending submissions
useEffect(() => {
const retry = async () => {
const pending = loadPending();
for (const sub of pending) {
if (sub.retryCount >= MAX_RETRIES) continue;
const backoff = RETRY_BACKOFF_MS[sub.retryCount];
if (Date.now() - sub.lastAttempt < backoff) continue;
const ok = await submitToServer(sub);
if (ok) removePending(sub.id);
}
};
retry();
}, []);
What I Learned
React refs vs state: For high-frequency monitoring loops, refs are essential. State updates trigger re-renders; refs don't. But refs also don't notify you when they change — you must account for this in component lifecycle.
MediaPipe in the browser: The
tasks-visionpackage is surprisingly efficient. Face detection runs at 30+ fps on modern hardware with GPU delegate. The WASM files are large (~5MB total) but bundle once and cache.Browser API limitations: You cannot enumerate running applications, pre-select "Entire Screen" in the share picker, or prevent all forms of cheating. The goal is to raise the cost of cheating, not eliminate it.
Graceful degradation is essential: The model might fail to load (network issues, incompatible hardware, browser restrictions). Every monitoring module must handle failure without breaking the assessment.
Try It Yourself
git clone https://github.com/harishkotra/cerberus-js.git
cd cerberus-js
cp .env.example .env.local
npm install
npm run dev
Open http://localhost:3000, click "Begin Assessment", and grant the requested permissions. The proctoring dashboard will show your camera feed, audio levels, hardware status, and a live event timeline.
Screenshots
The full source code is available on GitHub.




Top comments (0)