DEV Community

Alpha Info
Alpha Info

Posted on

Building self-correcting time-series analysis for Claude Code: 50% 100% accuracy with no manual config

TL;DR: I built a Claude Code skill that gives Claude structural eyes for time-series. The interesting bits: (1) it auto-tunes its own configuration via leave-one-out cross-validation (50% → 100% on real PhysioNet ECG), and (2) when the first call returns borderline, the skill self-corrects with a 4-stage cascade (alt-domain → fingerprint dim inspection → sliding-window escalation). Validated across 30 use cases on real and shape-realistic data. Free tier, MIT licensed, one-line install.

The problem nobody talks about

Most LLM agents fail silently on time-series.

Ask Claude "is this CSV column anomalous?" and it'll happily write a scipy.stats.zscore or numpy rolling-mean check. The code runs. The answer comes back. But the methodology is fragile: Z-score detects mean shifts, not regime changes. Rolling means smear short events. Both miss when the structure of a signal changes — same mean, same variance, different shape.

This is the LLM-as-data-scientist gap. The model picks the easiest tool, not the right one.

What I built

AlphaInfo for Claude is a Claude Code skill that wires the AlphaInfo Structural Intelligence API into Claude conversations. The API perceives signal structure (5-D fingerprint: local/fractal/spectral/transition/trend dimensions). The skill teaches Claude when and how to reach for it.

A typical interaction:

You: "I have CPU metrics from yesterday — anything weird?"

Claude: [calls quick_anomaly()]
         "Critical anomaly detected at 14:00, severity 75/100. The sustained
          spike pattern differs structurally from your normal diurnal cycle.
          Audit ID: 5533a276-... (replayable for compliance)."
Enter fullscreen mode Exit fullscreen mode

One HTTP call. Calibrated severity. Audit trail. No statistical code generation.

The interesting part #1: smart_anomaly() is a 4-stage self-correcting cascade

Most "AI tooling" fails because the AI picks the wrong knob and doesn't know it. So I built a cascade that detects borderline results and auto-escalates:

from lib.autotune import smart_anomaly

result = smart_anomaly(client, signal, plan, sampling_rate=10.0)
Enter fullscreen mode Exit fullscreen mode

Stages:

  1. Quick check (1 quota). If severity > 65 → done.
  2. Alternative domains (3 quota). API has 10 domain calibrations (finance, biomedical, security, etc.). If user picked wrong, try alternatives.
  3. Fingerprint dim inspection (1 quota). If scalar score is borderline, look at the 5-D fingerprint. Sometimes ONE dimension dropped sharply (e.g., sim_transition=0.48 for added harmonics) while the scalar averaged it out.
  4. Sliding-window escalation (5-10 quota). For localized regime changes, the global view misses them but a window-by-window comparison nails the boundary.

The cascade pays only for what's needed. A clean anomaly costs 1 quota. A subtle bearing-wear case (which I'll show below) goes through all 4 stages for 11 quota.

Validated lifts

I ran this against intentionally hard cases:

Case Naive single call After cascade Method that won
k8s pod restart spike sev 42 (attention) sev 92 critical + window 115-145 Stage 4 (monitor)
Bearing wear (pure spectral, stationary) sev 20 (normal) sev 67 alert + "sharp transitions" diagnosis Stage 3 (fingerprint)
Climate heat wave (60-day temp) sev 50 borderline sev 75 alert + window 15-45 Stage 3+4
Deploy regression (latency +30%, same shape) sev 19 normal sev 61 alert + amplitude-shift warning Stage 3+4

The bearing wear case is my favorite: a pure spectral change (added harmonics at same total energy) that any naive scalar comparison misses. The fingerprint inspection sees sim_transition=0.48 and produces a specific diagnosis: "sharp transitions / abrupt events / regime boundary."

The skill was perceiving the change the whole time — just not in the scalar score. Stage 3 unlocks it.

The interesting part #2: autotune_classifier hits 100% CV without manual config

Classification with structural fingerprints is competitive with trained classifiers — IF you pick the right reference signal and classifier. For a long time, this was a manual tuning game.

I tested on real PhysioNet MIT-BIH ECG record 208 (Normal vs PVC arrhythmia):

  • Naive setup (sine reference + raw centroid distance): 50% accuracy (random)
  • Manual tuning (mean-beat reference + LDA on standardized fingerprints): 95%
  • autotune_classifier (8-combo cross-validation): 100% CV / 80% held-out, 24 quota total

The auto-tuner enumerates {reference: mean_class_0, median_class_0, sine} × {classifier: centroid_raw, centroid_norm, kNN, LDA} — that's 8 combinations. Scores each via leave-one-out CV on the training set. Picks the winner. Returns a predict() closure ready to use.

from lib.autotune import autotune_classifier

# 8 N beats + 8 V beats from PhysioNet record 208
labeled = [('N', beat) for beat in n_beats[:8]] + [('V', beat) for beat in v_beats[:8]]

result = autotune_classifier(client, labeled, plan,
                              sampling_rate=360.0, domain='biomedical')

print(result['best_config'])
# {'reference_strategy': 'mean_class_0', 'classifier': 'lda_norm', 'normalize': True}
print(result['cv_accuracy'])
# 1.0

# Use the winner to classify a new beat
prediction = result['predict'](new_beat)  # 'N' or 'V'
Enter fullscreen mode Exit fullscreen mode

Zero domain expertise required from the user (or Claude). The skill found the right combo automatically.

The interesting part #3: plan-aware behavior

The skill detects the user's API plan tier (Free / Starter / Growth / Pro / Enterprise) on init and adapts every call to fit the plan's caps:

plan = detect_plan(client)
# {'name': 'Free', 'caps': {
#   'max_channels': 3, 'max_batch_size': 10,
#   'max_signal_length': 10_000, 'monthly_limit': 50, ...
# }, 'remaining': 47}
Enter fullscreen mode Exit fullscreen mode

When you ask the skill to do something larger than your plan allows, it auto-truncates and surfaces a contextual upgrade hint instead of returning a surprise quota error:

multi_channel(client, channels={...8 sensors...}, plan=plan)
# On Free (3-channel cap), drops 5 channels, returns:
# {
#   'delator_channel': 'airflow',
#   'channels_dropped': 5,
#   'upgrade_hint': "Capped at 3 on Free. Starter ($49/mo) lifts to 8."
# }
Enter fullscreen mode Exit fullscreen mode

This is the skill acting as a conversion funnel: every limit hit is a contextual demo of what the next tier unlocks.

What this validates

I ran 30 real or shape-realistic use cases across 12 buyer segments. Key results:

Segment Real-data validation
🐛 DevOps / SRE CPU spikes, memory leaks, multi-service faults — all critical sev
🤖 MLOps Model accuracy drift, feature covariate shift — alert sev 65-73
📈 Quant / Fintech Real yfinance: SPY regime, BTC, VIX, Treasuries
🩺 Bio / Health Real PhysioNet ECG arrhythmia: 100% CV via autotune
🛡️ Security / Protection Account takeover (sev 84), ransomware-pattern (sev 92), credential-stuffing (sev 96)
🚀 SaaS / Product DAU drops, conversion funnel anomalies
🏭 Industrial IoT Multi-sensor canal delator (which-of-N is failing)

Plus Audio, Climate, Gaming, Streaming, Logistics. Full matrix in USE_CASES.md.

How to try it

One-line install:

curl -fsSL https://raw.githubusercontent.com/info-dev-13/alphainfo-claude-skill/main/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

The script:

  1. Clones the skill into ~/.claude/skills/alphainfo
  2. Installs the alphainfo Python SDK
  3. Detects existing ALPHAINFO_API_KEY — or opens the registration page

Free tier: 50 analyses/month, no credit card. Get a key at alphainfo.io/register.

After install, just talk to Claude in any project:

"is this CSV column anomalous?"
"compare this metric before and after deploy"
"which of my 8 sensors is misbehaving?"
"when did this stream change?"
Enter fullscreen mode Exit fullscreen mode

Claude will route through the skill automatically.

Repo + links


I'd love feedback from anyone building agent tools. What use cases am I missing? What domain probes would you want next? Comments open here, issues on GitHub.

If you find this useful, a ⭐ on the repo would help me know it's worth iterating. 🙏

Top comments (0)