TL;DR: I built a Claude Code skill that gives Claude structural eyes for time-series. The interesting bits: (1) it auto-tunes its own configuration via leave-one-out cross-validation (50% → 100% on real PhysioNet ECG), and (2) when the first call returns borderline, the skill self-corrects with a 4-stage cascade (alt-domain → fingerprint dim inspection → sliding-window escalation). Validated across 30 use cases on real and shape-realistic data. Free tier, MIT licensed, one-line install.
The problem nobody talks about
Most LLM agents fail silently on time-series.
Ask Claude "is this CSV column anomalous?" and it'll happily write a scipy.stats.zscore or numpy rolling-mean check. The code runs. The answer comes back. But the methodology is fragile: Z-score detects mean shifts, not regime changes. Rolling means smear short events. Both miss when the structure of a signal changes — same mean, same variance, different shape.
This is the LLM-as-data-scientist gap. The model picks the easiest tool, not the right one.
What I built
AlphaInfo for Claude is a Claude Code skill that wires the AlphaInfo Structural Intelligence API into Claude conversations. The API perceives signal structure (5-D fingerprint: local/fractal/spectral/transition/trend dimensions). The skill teaches Claude when and how to reach for it.
A typical interaction:
You: "I have CPU metrics from yesterday — anything weird?"
Claude: [calls quick_anomaly()]
"Critical anomaly detected at 14:00, severity 75/100. The sustained
spike pattern differs structurally from your normal diurnal cycle.
Audit ID: 5533a276-... (replayable for compliance)."
One HTTP call. Calibrated severity. Audit trail. No statistical code generation.
The interesting part #1: smart_anomaly() is a 4-stage self-correcting cascade
Most "AI tooling" fails because the AI picks the wrong knob and doesn't know it. So I built a cascade that detects borderline results and auto-escalates:
from lib.autotune import smart_anomaly
result = smart_anomaly(client, signal, plan, sampling_rate=10.0)
Stages:
-
Quick check (1 quota). If
severity > 65→ done. -
Alternative domains (3 quota). API has 10 domain calibrations (
finance,biomedical,security, etc.). If user picked wrong, try alternatives. -
Fingerprint dim inspection (1 quota). If scalar score is borderline, look at the 5-D fingerprint. Sometimes ONE dimension dropped sharply (e.g.,
sim_transition=0.48for added harmonics) while the scalar averaged it out. - Sliding-window escalation (5-10 quota). For localized regime changes, the global view misses them but a window-by-window comparison nails the boundary.
The cascade pays only for what's needed. A clean anomaly costs 1 quota. A subtle bearing-wear case (which I'll show below) goes through all 4 stages for 11 quota.
Validated lifts
I ran this against intentionally hard cases:
| Case | Naive single call | After cascade | Method that won |
|---|---|---|---|
| k8s pod restart spike | sev 42 (attention) | sev 92 critical + window 115-145 | Stage 4 (monitor) |
| Bearing wear (pure spectral, stationary) | sev 20 (normal) | sev 67 alert + "sharp transitions" diagnosis | Stage 3 (fingerprint) |
| Climate heat wave (60-day temp) | sev 50 borderline | sev 75 alert + window 15-45 | Stage 3+4 |
| Deploy regression (latency +30%, same shape) | sev 19 normal | sev 61 alert + amplitude-shift warning | Stage 3+4 |
The bearing wear case is my favorite: a pure spectral change (added harmonics at same total energy) that any naive scalar comparison misses. The fingerprint inspection sees sim_transition=0.48 and produces a specific diagnosis: "sharp transitions / abrupt events / regime boundary."
The skill was perceiving the change the whole time — just not in the scalar score. Stage 3 unlocks it.
The interesting part #2: autotune_classifier hits 100% CV without manual config
Classification with structural fingerprints is competitive with trained classifiers — IF you pick the right reference signal and classifier. For a long time, this was a manual tuning game.
I tested on real PhysioNet MIT-BIH ECG record 208 (Normal vs PVC arrhythmia):
- Naive setup (sine reference + raw centroid distance): 50% accuracy (random)
- Manual tuning (mean-beat reference + LDA on standardized fingerprints): 95%
-
autotune_classifier(8-combo cross-validation): 100% CV / 80% held-out, 24 quota total
The auto-tuner enumerates {reference: mean_class_0, median_class_0, sine} × {classifier: centroid_raw, centroid_norm, kNN, LDA} — that's 8 combinations. Scores each via leave-one-out CV on the training set. Picks the winner. Returns a predict() closure ready to use.
from lib.autotune import autotune_classifier
# 8 N beats + 8 V beats from PhysioNet record 208
labeled = [('N', beat) for beat in n_beats[:8]] + [('V', beat) for beat in v_beats[:8]]
result = autotune_classifier(client, labeled, plan,
sampling_rate=360.0, domain='biomedical')
print(result['best_config'])
# {'reference_strategy': 'mean_class_0', 'classifier': 'lda_norm', 'normalize': True}
print(result['cv_accuracy'])
# 1.0
# Use the winner to classify a new beat
prediction = result['predict'](new_beat) # 'N' or 'V'
Zero domain expertise required from the user (or Claude). The skill found the right combo automatically.
The interesting part #3: plan-aware behavior
The skill detects the user's API plan tier (Free / Starter / Growth / Pro / Enterprise) on init and adapts every call to fit the plan's caps:
plan = detect_plan(client)
# {'name': 'Free', 'caps': {
# 'max_channels': 3, 'max_batch_size': 10,
# 'max_signal_length': 10_000, 'monthly_limit': 50, ...
# }, 'remaining': 47}
When you ask the skill to do something larger than your plan allows, it auto-truncates and surfaces a contextual upgrade hint instead of returning a surprise quota error:
multi_channel(client, channels={...8 sensors...}, plan=plan)
# On Free (3-channel cap), drops 5 channels, returns:
# {
# 'delator_channel': 'airflow',
# 'channels_dropped': 5,
# 'upgrade_hint': "Capped at 3 on Free. Starter ($49/mo) lifts to 8."
# }
This is the skill acting as a conversion funnel: every limit hit is a contextual demo of what the next tier unlocks.
What this validates
I ran 30 real or shape-realistic use cases across 12 buyer segments. Key results:
| Segment | Real-data validation |
|---|---|
| 🐛 DevOps / SRE | CPU spikes, memory leaks, multi-service faults — all critical sev |
| 🤖 MLOps | Model accuracy drift, feature covariate shift — alert sev 65-73 |
| 📈 Quant / Fintech | Real yfinance: SPY regime, BTC, VIX, Treasuries |
| 🩺 Bio / Health | Real PhysioNet ECG arrhythmia: 100% CV via autotune |
| 🛡️ Security / Protection | Account takeover (sev 84), ransomware-pattern (sev 92), credential-stuffing (sev 96) |
| 🚀 SaaS / Product | DAU drops, conversion funnel anomalies |
| 🏭 Industrial IoT | Multi-sensor canal delator (which-of-N is failing) |
Plus Audio, Climate, Gaming, Streaming, Logistics. Full matrix in USE_CASES.md.
How to try it
One-line install:
curl -fsSL https://raw.githubusercontent.com/info-dev-13/alphainfo-claude-skill/main/install.sh | sh
The script:
- Clones the skill into
~/.claude/skills/alphainfo - Installs the
alphainfoPython SDK - Detects existing
ALPHAINFO_API_KEY— or opens the registration page
Free tier: 50 analyses/month, no credit card. Get a key at alphainfo.io/register.
After install, just talk to Claude in any project:
"is this CSV column anomalous?"
"compare this metric before and after deploy"
"which of my 8 sensors is misbehaving?"
"when did this stream change?"
Claude will route through the skill automatically.
Repo + links
- GitHub: github.com/info-dev-13/alphainfo-claude-skill
- HF Space (live demo): huggingface.co/spaces/Alphainfo/alphainfo-claude-skill
- AlphaInfo API docs: alphainfo.io/v1/guide
- License: MIT
I'd love feedback from anyone building agent tools. What use cases am I missing? What domain probes would you want next? Comments open here, issues on GitHub.
If you find this useful, a ⭐ on the repo would help me know it's worth iterating. 🙏
Top comments (0)