📝 数据修正声明(2026-06-16):本文中的部分性能数据和产品指标由 AI 生成助手编造,未反映真实测试结果。已根据
docs/benchmark-report.md中的实测数据统一修正。所有修正详情见 GitHub Release v5.2.8。
Why Your AI API Keeps Breaking (And How to Fix It Before the User Notices)
You know the pattern. Your app calls GPT-4o — it works in dev. You ship. At 2 AM, OpenAI rate-limits you. Your fallback to Claude gets a 503. DeepSeek times out. Your dashboard goes red, your Slack channel fills up, and you're manually restarting pods.
Most teams solve this with a gateway: deploy LiteLLM, configure routing, hope the proxy stays up. That works — until the proxy itself becomes the problem.
There's a different approach. Instead of deploying a separate gateway process, what if resilience lived inside your application — as a library? No extra containers, no exposed ports, no supply-chain-dominant middleware. Just an import that self-heals.
That's what NeuralBridge SDK does.
The Architecture: 4-Level Cascade Self-Healing
Most retry logic is flat: catch exception → sleep → retry. That works for transient glitches.
NeuralBridge implements a 4-level cascade that escalates recovery progressively:
L1: DIAGNOSE — What went wrong? Parse error, categorize
L2: ROUTE — Select optimal model via routing strategies
L3: DEGRADE — Transparent model fallback, circuit breaker
L4: FEEDBACK — Update reliability scores, learn patterns
L1: Diagnosis — Error Intelligence
A 429 from OpenAI means something different than a 429 from DashScope. NeuralBridge's DiagnosisEngine pattern-matches against provider-specific error messages.
Provider-aware profiles include DashScope, OpenAI, DeepSeek, Anthropic, Google, Azure, and Mistral — each with tailored timeout, retry, and RPM limits.
L2: Routing — Intelligent Model Selection
When you have multiple models available, NeuralBridge offers multiple routing strategies: Weighted Response Time (default), Random, RoundRobin, Least Connections, and Fallback.
The health score combines success rate and latency score. Models below threshold are automatically excluded.
L3: Degradation — Transparent Fallback + Circuit Breaker
When diagnosis + routing can't save you, L3 ensures your users still get a response with transparent model fallback and a circuit breaker that prevents thundering-herd retries.
L4: Feedback — Learning from Every Request
The Flywheel Learner detects degradation patterns and the system adapts routing based on historical reliability data.
The Size Comparison
| NeuralBridge SDK | Gateway (LiteLLM) | |
|---|---|---|
| Install size | ~375 KB | ~16.5 MB |
| Dependencies | 1 (httpx) | 40+ |
| Deployment | import neuralbridge |
Docker + database + Redis |
| Exposed surface | None (in-process) | HTTP server, DB, admin UI |
Quick Start
pip install neuralbridge-sdk
from neuralbridge import NeuralBridge
client = NeuralBridge(api_key="sk-xxx")
response = client.chat().create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
Links
- PyPI: https://pypi.org/project/neuralbridge-sdk/
- GitHub: https://github.com/neuralbridge-sdk/neuralbridge-sdk
-
Install:
pip install neuralbridge-sdk
The point isn't that gateways are bad. The point is that resilience shouldn't require deploying one. Your API client should be smart enough to handle its own failures — without introducing a new failure mode in the process.
Top comments (0)