DEV Community

ZNY
ZNY

Posted on

DEV.TO ARTICLE 34: AI API Error Handling and Reliability: Production Best Practices

Target Keyword: "ai api error handling best practices"
Tags: api,error-handling,programming,developer,reliability
Type: Best Practices Guide


Content

AI API Error Handling and Reliability: Production Best Practices

Production AI applications fail in ways traditional software doesn't. Models go down, tokens run out, responses hallucinate, and rate limits hit at the worst moments. Here's how to build reliable AI-powered systems.

The AI Reliability Problem

Traditional APIs return consistent responses or clear errors. AI APIs introduce new failure modes:

  1. Model outages — The provider's model goes down
  2. Rate limits — You've exhausted your quota mid-request
  3. Token limits — Your prompt exceeds context window
  4. Hallucinations — Model returns plausible but wrong answers
  5. Timeout — Request takes too long and hangs
  6. Invalid JSON — Model returns malformed structured data

Retry Logic with Exponential Backoff

async function withRetry<T>(
  fn: () => Promise<T>,
  options: {
    maxRetries?: number;
    baseDelay?: number;
    maxDelay?: number;
    onRetry?: (attempt: number, error: Error) => void;
  } = {}
): Promise<T> {
  const {
    maxRetries = 3,
    baseDelay = 1000,
    maxDelay = 30000,
    onRetry
  } = options;

  for (let attempt = 1; attempt <= maxRetries + 1; attempt++) {
    try {
      return await fn();
    } catch (error) {
      const isRetryable = isRetryableError(error);
      const isLastAttempt = attempt > maxRetries;

      if (isLastAttempt || !isRetryable) {
        throw error;
      }

      const delay = Math.min(baseDelay * Math.pow(2, attempt - 1), maxDelay);
      onRetry?.(attempt, error as Error);
      await sleep(delay);
    }
  }
  throw new Error('Unreachable');
}

function isRetryableError(error: unknown): boolean {
  if (error instanceof AIAPIError) {
    // 429 = rate limit, 500 = server error, 503 = service unavailable
    return [429, 500, 502, 503].includes(error.status);
  }
  // Network timeout
  return (error as NodeJS.ErrnoException).code === 'ETIMEDOUT';
}
Enter fullscreen mode Exit fullscreen mode

Handling Rate Limits Gracefully

async function chatWithRateLimit(
  client: ClaudeClient,
  messages: ChatMessage[],
  options: { maxWait?: number } = {}
): Promise<ChatResponse> {
  const { maxWait = 60000 } = options;
  const startTime = Date.now();

  while (true) {
    try {
      return await client.chat(messages);
    } catch (error) {
      if (!(error instanceof AIAPIError) || error.status !== 429) throw error;

      const retryAfter = error.retryAfter || 1000;
      const elapsed = Date.now() - startTime;

      if (elapsed + retryAfter > maxWait) {
        throw new Error('Rate limit exceeded, max wait time reached');
      }

      console.log(`Rate limited, waiting ${retryAfter}ms...`);
      await sleep(retryAfter);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Structured Output Validation

Models often return invalid JSON. Always validate:

import { z } from 'zod';

const CodeReviewSchema = z.object({
  bugs: z.array(z.object({
    line: z.number(),
    severity: z.enum(['low', 'medium', 'high']),
    description: "z.string()"
  })),
  suggestions: z.array(z.string()),
  score: z.number().min(0).max(10)
});

async function reviewCode(code: string): Promise<CodeReviewSchema> {
  const response = await client.chat([
    { role: 'user', content: `Review this code:\n\n${code}\n\nReturn valid JSON.` }
  ]);

  try {
    const parsed = JSON.parse(response.choices[0].message.content);
    return CodeReviewSchema.parse(parsed);
  } catch {
    // Fallback: retry with stricter prompting
    return reviewCodeWithFallback(code);
  }
}
Enter fullscreen mode Exit fullscreen mode

Circuit Breaker Pattern

Prevent cascading failures when AI API is degraded:

class CircuitBreaker {
  private failures = 0;
  private lastFailure = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';

  constructor(
    private readonly threshold: number = 5,
    private readonly timeout: number = 60000
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > this.timeout) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit breaker is open');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    this.failures = 0;
    this.state = 'closed';
  }

  private onFailure() {
    this.failures++;
    this.lastFailure = Date.now();
    if (this.failures >= this.threshold) {
      this.state = 'open';
    }
  }
}

// Usage
const breaker = new CircuitBreaker(5, 60000);
const result = await breaker.execute(() => client.chat(messages));
Enter fullscreen mode Exit fullscreen mode

Timeout Strategy

Set both connection and request timeouts:

const response = await fetch(url, {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
  body: JSON.stringify(payload),
  signal: AbortSignal.timeout(120000) // 2 minute timeout
});
Enter fullscreen mode Exit fullscreen mode

Cost Control

Prevent runaway costs with token budgets:

class TokenBudget {
  private spent = 0;

  constructor(
    private readonly maxBudget: number,
    private readonly costPerToken: number
  ) {}

  async executeWithBudget<T>(fn: () => Promise<T>): Promise<T> {
    const estimatedCost = this.estimateCost(fn);

    if (this.spent + estimatedCost > this.maxBudget) {
      throw new Error(`Budget exceeded. Spent: ${this.spent}, Max: ${this.maxBudget}`);
    }

    const result = await fn();
    this.spent += estimatedCost;
    return result;
  }

  private estimateCost(fn: () => Promise<T>): number {
    // Rough estimate based on input size
    return 0; // Would need implementation
  }

  getSpent(): number {
    return this.spent;
  }
}
Enter fullscreen mode Exit fullscreen mode

Building Reliable AI Applications

The key insight: AI APIs require defensive programming at a level traditional APIs don't. Layer retry logic, circuit breakers, validation, and cost controls to build systems that degrade gracefully rather than fail catastrophically.

Get started with reliable AI API access: ofox.ai


This article contains affiliate links.


Tags: api,error-handling,programming,developer,reliability
Canonical URL: https://dev.to/zny10289

Top comments (0)