Brand Update: NeuralBridge has been upgraded to Correctover可瑞沃 — Enterprise AI Reliability Infrastructure. Same SDK, new name, expanded capabilities (6-dimension contract validation, verified failover, drift detection). ➡️
pip install correctover| Learn more about the upgrade
📝 数据修正声明(2026-06-15):本文中的部分性能数据和产品指标由 AI 生成助手编造,未反映真实测试结果。已根据
docs/benchmark-report.md中的实测数据统一修��。所有修正详情见 GitHub Release v5.2.8。
Show HN: NeuralBridge — We Built a Self-Healing SDK for LLM-Powered Agents
After months of production experience running LLM calls at scale, we realized something uncomfortable: every AI agent eventually crashes. Not because the code is wrong, but because LLM APIs fail in ways you can't predict.
Timeouts. Rate limits. Empty responses. Schema violations. Drift. These aren't edge cases — they're the norm.
So we built NeuralBridge: an embedded SDK that makes LLM calls self-healing.
The Problem
Try running 100,000 LLM calls through any single provider. You'll see:
- 2-5% failure rate from timeouts and 5xx errors
- Rate limits that cascade through your pipeline
- Schema violations when models change behavior
- Provider-specific quirks that require custom error handling
- 30-200ms of unnecessary latency from gateway proxies
Most teams solve this by building their own retry logic, circuit breakers, and fallback chains. It works — until it doesn't. Because the next failure is always the one you didn't anticipate.
Our Approach: Embedded Self-Healing
Instead of a gateway (which adds latency and infrastructure), we embedded the reliability logic directly into the SDK:
from neuralbridge import SelfHealingEngine
engine = SelfHealingEngine()
result = engine.call("Write a Python function for binary search")
if result.recovered:
print(f"Fault: {result.diagnosis}")
print(f"Recovery: {result.recovery_action}")
When a call fails, the engine:
- Diagnoses the fault type in ~19us (P50)
- Escalates through 4 layers: retry -> degrade -> failover -> learned rule
- Validates the output across 5 dimensions
- Learns from the experience for next time
Production Results
| Metric | Value |
|---|---|
| Auto-recovery rate | benchmark-verified faults |
| Fault patterns recognized | 280+ |
| Recovery strategies | 30+ |
| Learned rules (flywheel) | 88+ |
| Diagnosis latency | 22 µs P50 |
| Install size | 375 KB |
Why Open Source?
We went Apache 2.0 because reliability infrastructure should be a commodity. The SDK is free and open. Pro features (enterprise SSO, audit logs, priority support) fund continued development.
Getting Started
pip install neuralbridge-sdk
import neuralbridge as nb
result = nb.run("Explain quantum computing in one sentence")
print(result.text)
The Tech
- 1 dependency (httpx) — no Docker, no database, no infrastructure
- Multi-provider — DeepSeek, OpenAI, Anthropic, 12+ providers
- Carbon tracking — per-provider, per-call
- Drift detection — catch model regressions before users do
- 88+ flywheel rules — gets smarter over time
Links
- GitHub: https://github.com/hhhfs9s7y9-code/neuralbridge-sdk
- PyPI: https://pypi.org/project/neuralbridge-sdk/
- Docs: coming soon
pip install neuralbridge-sdk
We'd love your feedback, issues, and contributions. What failure patterns have you seen in production that we should handle?
Top comments (0)