Alpha Info

Posted on Apr 26

Building self-correcting time-series analysis for Claude Code: 50% 100% accuracy with no manual config

#claudecode #ai #mlops #showdev

TL;DR: I built a Claude Code skill that gives Claude structural eyes for time-series. The interesting bits: (1) it auto-tunes its own configuration via leave-one-out cross-validation (50% → 100% on real PhysioNet ECG), and (2) when the first call returns borderline, the skill self-corrects with a 4-stage cascade (alt-domain → fingerprint dim inspection → sliding-window escalation). Validated across 30 use cases on real and shape-realistic data. Free tier, MIT licensed, one-line install.

The problem nobody talks about

Most LLM agents fail silently on time-series.

Ask Claude "is this CSV column anomalous?" and it'll happily write a scipy.stats.zscore or numpy rolling-mean check. The code runs. The answer comes back. But the methodology is fragile: Z-score detects mean shifts, not regime changes. Rolling means smear short events. Both miss when the structure of a signal changes — same mean, same variance, different shape.

This is the LLM-as-data-scientist gap. The model picks the easiest tool, not the right one.

What I built

AlphaInfo for Claude is a Claude Code skill that wires the AlphaInfo Structural Intelligence API into Claude conversations. The API perceives signal structure (5-D fingerprint: local/fractal/spectral/transition/trend dimensions). The skill teaches Claude when and how to reach for it.

A typical interaction:

You: "I have CPU metrics from yesterday — anything weird?"

Claude: [calls quick_anomaly()]
         "Critical anomaly detected at 14:00, severity 75/100. The sustained
          spike pattern differs structurally from your normal diurnal cycle.
          Audit ID: 5533a276-... (replayable for compliance)."

One HTTP call. Calibrated severity. Audit trail. No statistical code generation.

The interesting part #1: smart_anomaly() is a 4-stage self-correcting cascade

Most "AI tooling" fails because the AI picks the wrong knob and doesn't know it. So I built a cascade that detects borderline results and auto-escalates:

from lib.autotune import smart_anomaly

result = smart_anomaly(client, signal, plan, sampling_rate=10.0)

Stages:

Quick check (1 quota). If severity > 65 → done.
Alternative domains (3 quota). API has 10 domain calibrations (finance, biomedical, security, etc.). If user picked wrong, try alternatives.
Fingerprint dim inspection (1 quota). If scalar score is borderline, look at the 5-D fingerprint. Sometimes ONE dimension dropped sharply (e.g., sim_transition=0.48 for added harmonics) while the scalar averaged it out.
Sliding-window escalation (5-10 quota). For localized regime changes, the global view misses them but a window-by-window comparison nails the boundary.

The cascade pays only for what's needed. A clean anomaly costs 1 quota. A subtle bearing-wear case (which I'll show below) goes through all 4 stages for 11 quota.

Validated lifts

I ran this against intentionally hard cases:

Case	Naive single call	After cascade	Method that won
k8s pod restart spike	sev 42 (attention)	sev 92 critical + window 115-145	Stage 4 (monitor)
Bearing wear (pure spectral, stationary)	sev 20 (normal)	sev 67 alert + "sharp transitions" diagnosis	Stage 3 (fingerprint)
Climate heat wave (60-day temp)	sev 50 borderline	sev 75 alert + window 15-45	Stage 3+4
Deploy regression (latency +30%, same shape)	sev 19 normal	sev 61 alert + amplitude-shift warning	Stage 3+4

The bearing wear case is my favorite: a pure spectral change (added harmonics at same total energy) that any naive scalar comparison misses. The fingerprint inspection sees sim_transition=0.48 and produces a specific diagnosis: "sharp transitions / abrupt events / regime boundary."

The skill was perceiving the change the whole time — just not in the scalar score. Stage 3 unlocks it.

The interesting part #2: autotune_classifier hits 100% CV without manual config

Classification with structural fingerprints is competitive with trained classifiers — IF you pick the right reference signal and classifier. For a long time, this was a manual tuning game.

I tested on real PhysioNet MIT-BIH ECG record 208 (Normal vs PVC arrhythmia):

Naive setup (sine reference + raw centroid distance): 50% accuracy (random)
Manual tuning (mean-beat reference + LDA on standardized fingerprints): 95%
autotune_classifier (8-combo cross-validation): 100% CV / 80% held-out, 24 quota total

The auto-tuner enumerates {reference: mean_class_0, median_class_0, sine} × {classifier: centroid_raw, centroid_norm, kNN, LDA} — that's 8 combinations. Scores each via leave-one-out CV on the training set. Picks the winner. Returns a predict() closure ready to use.

from lib.autotune import autotune_classifier

# 8 N beats + 8 V beats from PhysioNet record 208
labeled = [('N', beat) for beat in n_beats[:8]] + [('V', beat) for beat in v_beats[:8]]

result = autotune_classifier(client, labeled, plan,
                              sampling_rate=360.0, domain='biomedical')

print(result['best_config'])
# {'reference_strategy': 'mean_class_0', 'classifier': 'lda_norm', 'normalize': True}
print(result['cv_accuracy'])
# 1.0

# Use the winner to classify a new beat
prediction = result['predict'](new_beat)  # 'N' or 'V'

Zero domain expertise required from the user (or Claude). The skill found the right combo automatically.

The interesting part #3: plan-aware behavior

The skill detects the user's API plan tier (Free / Starter / Growth / Pro / Enterprise) on init and adapts every call to fit the plan's caps:

plan = detect_plan(client)
# {'name': 'Free', 'caps': {
#   'max_channels': 3, 'max_batch_size': 10,
#   'max_signal_length': 10_000, 'monthly_limit': 50, ...
# }, 'remaining': 47}

When you ask the skill to do something larger than your plan allows, it auto-truncates and surfaces a contextual upgrade hint instead of returning a surprise quota error:

multi_channel(client, channels={...8 sensors...}, plan=plan)
# On Free (3-channel cap), drops 5 channels, returns:
# {
#   'delator_channel': 'airflow',
#   'channels_dropped': 5,
#   'upgrade_hint': "Capped at 3 on Free. Starter ($49/mo) lifts to 8."
# }

This is the skill acting as a conversion funnel: every limit hit is a contextual demo of what the next tier unlocks.

What this validates

I ran 30 real or shape-realistic use cases across 12 buyer segments. Key results:

Segment	Real-data validation
🐛 DevOps / SRE	CPU spikes, memory leaks, multi-service faults — all critical sev
🤖 MLOps	Model accuracy drift, feature covariate shift — alert sev 65-73
📈 Quant / Fintech	Real yfinance: SPY regime, BTC, VIX, Treasuries
🩺 Bio / Health	Real PhysioNet ECG arrhythmia: 100% CV via autotune
🛡️ Security / Protection	Account takeover (sev 84), ransomware-pattern (sev 92), credential-stuffing (sev 96)
🚀 SaaS / Product	DAU drops, conversion funnel anomalies
🏭 Industrial IoT	Multi-sensor canal delator (which-of-N is failing)

Plus Audio, Climate, Gaming, Streaming, Logistics. Full matrix in USE_CASES.md.

How to try it

One-line install:

curl -fsSL https://raw.githubusercontent.com/info-dev-13/alphainfo-claude-skill/main/install.sh | sh

The script:

Clones the skill into ~/.claude/skills/alphainfo
Installs the alphainfo Python SDK
Detects existing ALPHAINFO_API_KEY — or opens the registration page

Free tier: 50 analyses/month, no credit card. Get a key at alphainfo.io/register.

After install, just talk to Claude in any project:

"is this CSV column anomalous?"
"compare this metric before and after deploy"
"which of my 8 sensors is misbehaving?"
"when did this stream change?"

Claude will route through the skill automatically.

Repo + links

GitHub: github.com/info-dev-13/alphainfo-claude-skill
HF Space (live demo): huggingface.co/spaces/Alphainfo/alphainfo-claude-skill
AlphaInfo API docs: alphainfo.io/v1/guide
License: MIT

I'd love feedback from anyone building agent tools. What use cases am I missing? What domain probes would you want next? Comments open here, issues on GitHub.

If you find this useful, a ⭐ on the repo would help me know it's worth iterating. 🙏

DEV Community