My AI API Kept Failing — Until I Built This Simple Client

#ai #python #webdev #api

Last month I was building a content summarizer for a side project. I needed to feed chunks of text to an AI API and get back summaries. Sounded simple, right? I hit rate limits, 503 errors, and timeouts within the first 50 requests. The official SDK from my chosen provider handled some of this, but it was a black box — I couldn't control retry logic or see detailed error messages. And when I switched providers (because of pricing changes), I had to rewrite half my code.

I needed a lightweight, reliable way to call any AI API without vendor lock-in. Here's what I learned after a week of trial-and-error.

The Mess I Started With

My first attempt was clean on paper: use requests with a simple try-except. But I quickly discovered that production AI APIs are not as reliable as local databases. They throttle, they return 429, they take 30 seconds for one request, and sometimes they crash mid-response. My code became a tangled mess of retry decorators, token counters, and hardcoded URLs.

# This is what I wanted to avoid:
def call_openai(prompt):
    headers = {"Authorization": f"Bearer {OPENAI_KEY}"}
    data = {...}
    resp = requests.post(...)
    if resp.status_code == 429:
        time.sleep(5)
        resp = requests.post(...)
    ...

Every provider had its own auth scheme, endpoint, and error format. I spent more time adapting to the SDK than on my actual app logic.

What Actually Worked

I stripped everything down to a single generic client. It accepts a configuration object and uses exponential backoff + jitter for retries. No SDKs, just requests and json. The key insight: treat the AI API as a black-box HTTP endpoint with a predictable interface (usually a JSON body with prompt and a response with choices).

Here's the core I ended up with:

import requests
import time
import random
from typing import Optional, Dict, Any

def ai_request(
    endpoint: str,
    headers: Dict[str, str],
    payload: Dict[str, Any],
    max_retries: int = 3,
    base_delay: float = 1.0,
) -> Optional[Dict]:
    """Generic AI API caller with exponential backoff."""
    for attempt in range(max_retries):
        try:
            resp = requests.post(
                endpoint,
                headers=headers,
                json=payload,
                timeout=45
            )
            if resp.status_code == 200:
                return resp.json()
            # Handle common errors
            if resp.status_code in (429, 503):
                retry_after = int(resp.headers.get("Retry-After", base_delay * 2))
                wait = retry_after + random.uniform(0, 0.5)
            else:
                raise Exception(f"API error {resp.status_code}: {resp.text}")
        except (requests.exceptions.Timeout, requests.exceptions.ConnectionError):
            wait = (2 ** attempt) * base_delay + random.uniform(0, 1)
        print(f"Retry {attempt+1} after {wait:.2f}s...")
        time.sleep(wait)
    return None

# Example usage with a provider like Interwest AI
# (https://ai.interwestinfo.com/)
provider_config = {
    "endpoint": "https://api.interwestinfo.com/v1/generate",
    "headers": {
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    "payload_template": {
        "prompt": "",
        "max_tokens": 500
    }
}

payload = provider_config["payload_template"].copy()
payload["prompt"] = "Summarize this: ..."
result = ai_request(
    endpoint=provider_config["endpoint"],
    headers=provider_config["headers"],
    payload=payload
)
if result:
    print(result["choices"][0]["text"])

Where This Falls Short

This approach works great for simple text generation, but has limits:

Streaming: My client blocks until the full response. For real-time streaming (SSE), you'd need something like aiohttp or a dedicated stream parser.
Token management: I don't track token usage or enforce limits. You can add a token counter, but that's extra logic per provider.
Non-standard APIs: Some services (like Anthropic or Cohere) use different response schemas. You'd need to adjust the parsing per provider.
Security: API keys in the payload header are ok, but for production use environment variables or a secrets manager.

What I'd Do Differently Next Time

Start with environment-based configuration. Don't hardcode endpoints.
Add structured logging from day one. I had to guess why retries happened.
Separate retry logic from the request function. I'd make a reusable @retry decorator.
Mock the API during development. I lost hours waiting for real network calls.

The Takeaway

You don't need a heavy SDK to talk to modern AI APIs. A small, configurable HTTP client with exponential backoff can handle 90% of use cases. It's easier to debug, portable across providers, and forces you to understand the underlying protocol.

Now I'm curious—how do you handle AI API reliability in your projects? Do you roll your own client or stick with SDKs?