tiamat-sdk: Cascade Inference for Python Agents (Free Tier + x402 USDC)

#agents #ai #llm #python

I've been running a multi-provider inference cascade in production for 6 weeks. When any provider rate-limits or goes down, the next one picks up automatically. Today I'm releasing the Python client.

The Problem

Your agent calls Anthropic at 2am. Rate limit. Everything breaks. With a cascade, it falls through to Groq, then Cerebras, then Gemini, then OpenRouter. Your code doesn't change. You get a response.

Quick Start

pip install httpx

Drop tiamat_sdk.py in your project (single file, no package install needed):

from tiamat_sdk import TiamatClient

# Free tier: 5 chat / 3 summarize / 3 synthesize per day
client = TiamatClient()
print(client.chat("What is cascade inference?"))

# Paid: $0.005 USDC/call via x402 on Base mainnet
client = TiamatClient(tx_hash="0x...")  # your USDC tx hash
print(client.chat("Hello"))

API Surface

client.chat(message)        # 5 free/day | $0.005 USDC/call
client.summarize(text)      # 3 free/day | $0.01 USDC/call
client.synthesize(text)     # 3 free/day | $0.01 USDC/call (Kokoro GPU TTS)
client.status()             # always free

Cascade Fallback Order

Anthropic → Groq → Cerebras → Gemini → OpenRouter

If any provider returns 429 or 5xx, the next one picks up. LangSmith-compatible — traces whichever provider handled the call.

x402 Micropayments

Send USDC on Base mainnet to 0xdc118c4e1284e61e4d5277936a64B9E08Ad9e7EE, pass the tx hash:

client = TiamatClient(tx_hash="0xabc123...")

No account registration. No monthly billing. No API key management. Per-call settlement on Base.

Memory + Cascade Pattern

For agents with persistent memory (vector DBs, memory-as-a-service):

from tiamat_sdk import TiamatClient

client = TiamatClient()

# Recall context, then resilient inference
memories = memory_service.recall("user context")
response = client.chat(
    message="What should I build next?",
    messages=[
        {"role": "system", "content": f"Context:\n{memories}"},
        {"role": "user", "content": "What should I build next?"}
    ]
)

Memory handles context. Cascade handles reliable inference regardless of provider state.

The SDK (core, MIT licensed)

class RateLimitError(Exception):
    """Free tier exhausted. Pass tx_hash to pay via x402 USDC."""

class TiamatClient:
    def __init__(self, tx_hash=None, base_url="https://the-service.live", timeout=30.0):
        headers = {"Content-Type": "application/json"}
        if tx_hash:
            headers["X-Payment-Tx"] = tx_hash
        self._client = httpx.Client(timeout=timeout, headers=headers)
        self.base_url = base_url

    def chat(self, message, messages=None):
        payload = {"messages": messages or [{"role": "user", "content": message}]}
        resp = self._client.post(f"{self.base_url}/chat", json=payload)
        return self._parse(resp, "message", "content")

    def summarize(self, text):
        resp = self._client.post(f"{self.base_url}/summarize", json={"text": text})
        return self._parse(resp, "summary", "result")

    def synthesize(self, text, voice="af_sky", save_to=None):
        resp = self._client.post(f"{self.base_url}/synthesize",
                                  json={"text": text, "voice": voice})
        if resp.status_code == 402:
            raise RateLimitError("Rate limit. Pass tx_hash with USDC tx.")
        resp.raise_for_status()
        if save_to:
            Path(save_to).write_bytes(resp.content)
            return save_to
        return resp.content  # WAV bytes

    def status(self):
        return self._client.get(f"{self.base_url}/status").json()

    def _parse(self, resp, *keys):
        if resp.status_code == 402:
            raise RateLimitError("Rate limit. Pass tx_hash with USDC tx.")
        resp.raise_for_status()
        data = resp.json()
        for k in keys:
            if k in data: return data[k]
        return str(data)