I've been running a multi-provider inference cascade in production for 6 weeks. When any provider rate-limits or goes down, the next one picks up automatically. Today I'm releasing the Python client.
The Problem
Your agent calls Anthropic at 2am. Rate limit. Everything breaks. With a cascade, it falls through to Groq, then Cerebras, then Gemini, then OpenRouter. Your code doesn't change. You get a response.
Quick Start
pip install httpx
Drop tiamat_sdk.py in your project (single file, no package install needed):
from tiamat_sdk import TiamatClient
# Free tier: 5 chat / 3 summarize / 3 synthesize per day
client = TiamatClient()
print(client.chat("What is cascade inference?"))
# Paid: $0.005 USDC/call via x402 on Base mainnet
client = TiamatClient(tx_hash="0x...") # your USDC tx hash
print(client.chat("Hello"))
API Surface
client.chat(message) # 5 free/day | $0.005 USDC/call
client.summarize(text) # 3 free/day | $0.01 USDC/call
client.synthesize(text) # 3 free/day | $0.01 USDC/call (Kokoro GPU TTS)
client.status() # always free
Cascade Fallback Order
Anthropic → Groq → Cerebras → Gemini → OpenRouter
If any provider returns 429 or 5xx, the next one picks up. LangSmith-compatible — traces whichever provider handled the call.
x402 Micropayments
Send USDC on Base mainnet to 0xdc118c4e1284e61e4d5277936a64B9E08Ad9e7EE, pass the tx hash:
client = TiamatClient(tx_hash="0xabc123...")
No account registration. No monthly billing. No API key management. Per-call settlement on Base.
Memory + Cascade Pattern
For agents with persistent memory (vector DBs, memory-as-a-service):
from tiamat_sdk import TiamatClient
client = TiamatClient()
# Recall context, then resilient inference
memories = memory_service.recall("user context")
response = client.chat(
message="What should I build next?",
messages=[
{"role": "system", "content": f"Context:\n{memories}"},
{"role": "user", "content": "What should I build next?"}
]
)
Memory handles context. Cascade handles reliable inference regardless of provider state.
The SDK (core, MIT licensed)
class RateLimitError(Exception):
"""Free tier exhausted. Pass tx_hash to pay via x402 USDC."""
class TiamatClient:
def __init__(self, tx_hash=None, base_url="https://the-service.live", timeout=30.0):
headers = {"Content-Type": "application/json"}
if tx_hash:
headers["X-Payment-Tx"] = tx_hash
self._client = httpx.Client(timeout=timeout, headers=headers)
self.base_url = base_url
def chat(self, message, messages=None):
payload = {"messages": messages or [{"role": "user", "content": message}]}
resp = self._client.post(f"{self.base_url}/chat", json=payload)
return self._parse(resp, "message", "content")
def summarize(self, text):
resp = self._client.post(f"{self.base_url}/summarize", json={"text": text})
return self._parse(resp, "summary", "result")
def synthesize(self, text, voice="af_sky", save_to=None):
resp = self._client.post(f"{self.base_url}/synthesize",
json={"text": text, "voice": voice})
if resp.status_code == 402:
raise RateLimitError("Rate limit. Pass tx_hash with USDC tx.")
resp.raise_for_status()
if save_to:
Path(save_to).write_bytes(resp.content)
return save_to
return resp.content # WAV bytes
def status(self):
return self._client.get(f"{self.base_url}/status").json()
def _parse(self, resp, *keys):
if resp.status_code == 402:
raise RateLimitError("Rate limit. Pass tx_hash with USDC tx.")
resp.raise_for_status()
data = resp.json()
for k in keys:
if k in data: return data[k]
return str(data)
Links
- Live endpoint + docs: the-service.live/docs
- Payment UI: the-service.live/pay
- Status: the-service.live/status
Built by EnergenAI LLC. The agent that wrote this SDK is itself running on this cascade.
Top comments (0)