Building Cerberus.js: Browser-Native Online Proctoring with Client-Side AI

#ai #programming #productivity #dailybuild2026

How I built a production-ready proctoring system that runs entirely in the browser — no native apps, no server-side video processing.

Online proctoring has traditionally required heavy native desktop applications that install system-level hooks, capture screen contents, and stream video to cloud servers for analysis. These solutions are invasive, expensive to operate, and create privacy concerns.

I built Cerberus.js to prove that modern browser APIs — combined with client-side machine learning — can deliver comparable proctoring capabilities without a single native dependency.

The core constraint: All processing happens locally on the device. No video or audio is ever transmitted to a server.

Architecture Overview

Cerberus.js is a Next.js 16 application with a React 19 frontend. The monitoring system is composed of five independent hooks that feed events into a centralized scoring engine:

useInfractionMatrix() ── Central scoring engine
  ├── useHardwareModule()  ── Screen share, monitors, devices
  ├── useFocusLockdown()   ── Fullscreen, keyboard, visibility
  ├── useVisionEngine()    ── MediaPipe face detection
  └── useAudioEngine()     ── Web Audio API analysis

Each hook monitors a specific attack vector and calls addEvent() when suspicious behavior is detected. The scoring engine deduplicates events (via per-type cooldown timers), computes a running score, and gates events when the suspension threshold (100pts) is reached.

Face Detection with MediaPipe

The most technically interesting component is the face detection engine. We use @mediapipe/tasks-vision with the BlazeFace model — a lightweight face detector that runs in real-time on the GPU via WebGL.

Graceful Degradation Strategy

const modelSources = [FACE_MODEL_LOCAL, FACE_MODEL_URL];
const delegates = ["GPU", "CPU"] as const;

for (const modelPath of modelSources) {
  for (const delegate of delegates) {
    try {
      detector = await FaceDetector.createFromOptions(wasmFileset, {
        baseOptions: { modelAssetPath: modelPath, delegate },
        minDetectionConfidence: 0.5,
      });
      break;
    } catch { continue; }
  }
  if (detector) break;
}

This attempts four combinations in order:

Local model file (public/models/) on GPU
Local model file on CPU
CDN-hosted model on GPU
CDN-hosted model on CPU

If none succeed, we fire a vision_init_fail event (10pts) and continue with degraded monitoring.

Audio Analysis with Web Audio API

Instead of recording audio and sending it to a server, we use the Web Audio API's AnalyserNode to compute RMS (Root Mean Square) values in real-time entirely on the client:

const ctx = new AudioContext();
const src = ctx.createMediaStreamSource(stream);
const ana = ctx.createAnalyser();
ana.fftSize = 256;
src.connect(ana);

const tick = () => {
  ana.getByteTimeDomainData(buf);
  let sum = 0;
  for (let i = 0; i < len; i++) {
    const v = (buf[i] - 128) / 128;
    sum += v * v;
  }
  const rms = Math.sqrt(sum / len);
  // Accumulate exceeding threshold
  if (rms >= AUDIO_RMS_THRESHOLD) {
    accumMs.current += 100;
  }
  // Flag if sustained for 1.5s
  if (accumMs.current >= AUDIO_ACCUMULATION_MS) {
    addEvent("mic_threshold_exceeded", ...);
    accumMs.current = 0;
  }
  animId.current = requestAnimationFrame(tick);
};

The accumulation window prevents false positives from brief noises while catching sustained audio events (like someone reading answers aloud).

The Scoring Engine

The useInfractionMatrix hook is the central nervous system. Key design decisions:

Cooldown Dedup via Refs

Each event type has an independent cooldown timer (stored in a ref map, not state). This prevents event storms without dropping genuinely distinct events:

const isCoolingDown = (type: InfractionEventType): boolean => {
  const cooldown = EVENT_COOLDOWN_MS[type];
  if (!cooldown) return false;
  const last = lastEventTime.current[type];
  if (!last) return false;
  return Date.now() - last < cooldown;
};

Suspension Gate via Ref

Once the score hits 100, a ref permanently blocks all future events. This is critical because multiple callbacks (RAF loop, event listeners) can fire in the same render cycle — state alone wouldn't prevent race conditions:

if (suspendedRef.current) return;

setTotalScore((prev) => {
  const next = prev + score;
  if (next >= SUSPENSION_THRESHOLD && !suspendedRef.current) {
    suspendedRef.current = true;
    setSuspended(true);
  }
  return next;
});

Session Security

Session tokens are HMAC-SHA256 signed JWTs created using the Web Crypto API — zero external dependencies:

const key = await crypto.subtle.importKey(
  "raw", secret, { name: "HMAC", hash: "SHA-256" },
  false, ["sign", "verify"]
);

const sig = await crypto.subtle.sign(
  { name: "HMAC", hash: "SHA-256" },
  key, new TextEncoder().encode(`${header}.${payload}`)
);

The API layer enforces:

IP-based rate limiting: 10 session creations/hour, 100 submissions/hour
Session-based rate limiting: 30 submissions/minute per session
CSRF protection: Origin/referer validation on all POST endpoints
Body size limits: 100KB maximum payload

Submission Reliability

When the network is unavailable during submission, the client caches the event log in localStorage and retries with exponential backoff:

const RETRY_BACKOFF_MS = [2_000, 5_000, 15_000, 30_000, 60_000];

// On page load, flush any pending submissions
useEffect(() => {
  const retry = async () => {
    const pending = loadPending();
    for (const sub of pending) {
      if (sub.retryCount >= MAX_RETRIES) continue;
      const backoff = RETRY_BACKOFF_MS[sub.retryCount];
      if (Date.now() - sub.lastAttempt < backoff) continue;
      const ok = await submitToServer(sub);
      if (ok) removePending(sub.id);
    }
  };
  retry();
}, []);

What I Learned

React refs vs state: For high-frequency monitoring loops, refs are essential. State updates trigger re-renders; refs don't. But refs also don't notify you when they change — you must account for this in component lifecycle.
MediaPipe in the browser: The tasks-vision package is surprisingly efficient. Face detection runs at 30+ fps on modern hardware with GPU delegate. The WASM files are large (~5MB total) but bundle once and cache.
Browser API limitations: You cannot enumerate running applications, pre-select "Entire Screen" in the share picker, or prevent all forms of cheating. The goal is to raise the cost of cheating, not eliminate it.
Graceful degradation is essential: The model might fail to load (network issues, incompatible hardware, browser restrictions). Every monitoring module must handle failure without breaking the assessment.

Try It Yourself

git clone https://github.com/harishkotra/cerberus-js.git
cd cerberus-js
cp .env.example .env.local
npm install
npm run dev

Open http://localhost:3000, click "Begin Assessment", and grant the requested permissions. The proctoring dashboard will show your camera feed, audio levels, hardware status, and a live event timeline.