DEV Community

I've been building SEISMOGRAPH for 3 weeks. Here's what shipped today

tl;dr: pip install seismograph-probe โ€” a Python probe that detects silent LLM API drift using CUSUM change-point detection, with privacy-preserving signal aggregation. 103 tests passing. Dashboard live. Open source.

Three weeks ago I asked a question I couldn't answer:

"Did GPT-4 just change underneath me, or is it my prompt?"

No latency spike. No downtime. Just subtly different outputs from the same prompts, same parameters, same everything. I spent days debugging something that wasn't my fault.

So I built a detector.

Today I'm shipping it publicly.

What's actually working right now
This isn't a concept post. Here's what's live:
The probe SDK โ€” on PyPI today
pip install seismograph-probe

from probe.sdk import ProbeSDK

sdk = ProbeSDK(provider="openai", model="gpt-4-turbo")

result = sdk.run_canary_suite()

print(result.drift_score) # 0.0 stable โ†’ 1.0 significant shift

The probe runs โ‰ค200 canary prompts at temperature=0 daily. These are semantically stable tasks โ€” deterministic questions, structured reasoning, format-adherence checks. The goal is a reliable behavioral baseline, not a capability benchmark.

Privacy boundary: raw prompts and model outputs never leave your machine. The probe extracts SHA-256 feature hashes, distributional stats, and DP-noised aggregates. That's all that transmits.
CUSUM change-point detection โ€” running
The correlation engine uses CUSUM (Cumulative Sum) โ€” a sequential statistical test that's sensitive to gradual drift, not just threshold crossings.

When I backtest against a known LLM behavioral shift event (Augโ€“Sep 2025):

Day 0: CUSUM statistic: 0.12 (stable baseline)

Day 11: First elevation detected

Day 19: Alert threshold crossed โ† SEISMOGRAPH fires

Day 57: Public postmortem published

38-day lead time. That's the number I keep coming back to.
Ingestion gateway โ€” deployed
FastAPI gateway with:

Ed25519-signed batch verification (unsigned batches rejected atomically)
Pydantic v2 schema validation
SQLAlchemy ORM + SQLite (ClickHouse migration planned for Phase 2)
Bearer token auth on audit export endpoint
Public dashboard โ€” live at localhost, hosted version coming
Dark-mode model weather dashboard. Polls /v1/weather every 60 seconds. Shows per-model drift status across your fleet.

GET /v1/weather

โ†’ [{ "model": "gpt-4-turbo", "status": "STABLE", ... },

{ "model": "claude-3-5-sonnet", "status": "STABLE", ... }]
Test suite โ€” 103/103 passing
Not "it works on my machine." 103 tests across probe SDK, storage layer, gateway, CUSUM detector, privacy boundary, and auth. Zero ruff violations across 22 Python files.
Provider ToS compliance โ€” checked
Before adding any provider to the canary suite, I verify it doesn't violate their Terms of Service. Done for: OpenAI โœ…, Anthropic โœ…, Google Gemini โœ…, Mistral โœ…, Cohere โœ…. Documented in docs/PROVIDER_TOS_CHECKS.md.

What's NOT done yet (being honest)
No hosted gateway yet. The gateway runs locally. Public ingestion endpoint is Phase 1.
No Bayesian online detector yet. CUSUM is running. BayesianOnlineDetector.update() is deferred โ€” it's on the backlog.
No federation yet. Right now it's single-org. The cross-observer agreement scoring that makes it genuinely valuable is Phase 2.
No cloud dashboard. localhost:8000 only for now.

This is Phase 0: I'm proving the detection logic works before scaling it.

The architecture in one diagram
Your app

โ”‚ (gen_ai.* OTel spans)

โ–ผ

ProbeSDK

โ”‚ SHA-256 hashes + DP-noised stats only

โ”‚ Ed25519-signed batch

โ–ผ

Ingestion Gateway (FastAPI)

โ”‚ signature check โ†’ schema validation โ†’ store

โ–ผ

SQLite / ClickHouse

โ”‚

โ–ผ

CUSUM Detector โ”€โ”€โ–บ DriftAlert

โ”‚

โ–ผ

/v1/weather dashboard

OTel-native throughout. If you're already emitting gen_ai.* spans, the adapter plugs straight in.

Why this matters (and why it has to be federated)
A single organization's drift signal is almost useless. Your outputs change because your users change. Your prompts change. Your context windows change.

But if 15 independent organizations running the same canary suite all see correlated semantic drift on the same day โ€” that's a model change. That's the signal you can act on.

Single-org signal = private fleet data (yours only).
Multi-org correlated signal = public drift alert.

That's the design. Federation is Phase 2. The local probe is shippable today.

Try it / follow along
GitHub: github.com/Tania-coder/SEISMOGRAPH
PyPI: pypi.org/project/seismograph-probe

If you've been burned by a silent model change โ€” I want to hear about it. Open an issue, or find me on Twitter @tatyanti.

The probe is Apache 2.0. The gateway will be too.

Tatiana Radchenko ยท AI Infrastructure ยท Aarhus, Denmark
Building in public. Phase 0 of 3.

Top comments (0)