Cliprise for Cliprise

Posted on Mar 9

How to Integrate Multiple AI Generation APIs Without Rebuilding Your Architecture Every 6 Months

#ai #api #architecture #systemdesign

AI generation APIs are moving faster than any other part of the stack right now. A model that doesn't exist in January is the production standard by June. A model that leads every benchmark in Q1 is deprecated by Q3.

If you're building an application that depends on AI video, image, or art generation - and you've integrated directly against a single model API - you've built a dependency that will require architectural intervention within months, not years.

I know because I built that architecture first. Then rebuilt it. Then rebuilt it again. Eventually I stopped rebuilding and built the abstraction layer that makes rebuilding unnecessary.

This is what that abstraction layer looks like, why each decision was made, and the specific patterns that prevent AI generation API churn from becoming a permanent engineering liability.

The Problem With Direct Model Integration

Direct integration against a single AI generation API looks like this:

// Direct integration - brittle
const response = await fetch('https://api.specific-model-provider.com/v1/generate', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.PROVIDER_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    prompt: userPrompt,
    // Provider-specific parameters
    num_inference_steps: 50,
    guidance_scale: 7.5,
    width: 1024,
    height: 1024
  })
})

This works until:

The provider changes their parameter schema (common at every major model version)
The provider changes their authentication approach
A better model becomes available from a different provider
The provider goes down and you need failover
Your application needs to support multiple model tiers for different use cases

Each of these events requires touching every callsite in your codebase that directly calls the provider API. In a production application generating content at scale, this is multiple files, multiple test suites, and coordinated deployment. For a fast-moving space where these events happen every few months, the maintenance cost compounds into a significant engineering liability.

The Abstraction Layer That Solves This

The solution is a generation client abstraction that normalizes provider-specific APIs behind a stable interface your application code calls. Provider changes, model additions, and failover logic live in the abstraction layer. Application code never changes when models change.

// Generation client interface - stable
interface GenerationClient {
  generateImage(params: ImageGenerationParams): Promise<GenerationResult>
  generateVideo(params: VideoGenerationParams): Promise<GenerationResult>
  getModelCapabilities(modelId: string): ModelCapabilities
}

// Normalized parameter types - stable
interface ImageGenerationParams {
  prompt: string
  negativePrompt?: string
  modelId: string
  width: number
  height: number
  aspectRatio?: AspectRatio
  seed?: number
  style?: string
  quality?: 'draft' | 'standard' | 'high'
}

interface VideoGenerationParams {
  prompt: string
  negativePrompt?: string
  modelId: string
  duration: number
  aspectRatio: AspectRatio
  referenceImage?: string // base64 or URL for image-to-video
  seed?: number
  quality?: 'fast' | 'standard' | 'cinematic'
}

interface GenerationResult {
  id: string
  status: 'pending' | 'processing' | 'complete' | 'failed'
  outputUrl?: string
  thumbnailUrl?: string
  creditsConsumed: number
  modelId: string
  params: ImageGenerationParams | VideoGenerationParams
  error?: string
}

The interface your application calls never changes. The provider adapters behind it absorb all schema differences.

Provider Adapter Pattern

Each model provider gets its own adapter that implements the generation client interface and handles provider-specific translation:

// Provider adapter - absorbs provider-specific schema
class FluxProAdapter implements GenerationClient {
  private apiKey: string
  private baseUrl = 'https://api.flux-provider.com/v1'

  constructor(apiKey: string) {
    this.apiKey = apiKey
  }

  async generateImage(params: ImageGenerationParams): Promise<GenerationResult> {
    // Translate normalized params to provider-specific schema
    const providerParams = {
      prompt: params.prompt,
      negative_prompt: params.negativePrompt,
      // Provider uses different parameter names
      inference_steps: this.qualityToSteps(params.quality),
      cfg_scale: 7.0,
      width: params.width,
      height: params.height,
      seed: params.seed ?? Math.floor(Math.random() * 1000000)
    }

    const response = await fetch(`${this.baseUrl}/images/generate`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(providerParams)
    })

    const data = await response.json()

    // Translate provider response to normalized result
    return {
      id: data.id,
      status: this.mapStatus(data.status),
      outputUrl: data.output_url,
      creditsConsumed: data.credits_used,
      modelId: params.modelId,
      params
    }
  }

  private qualityToSteps(quality?: string): number {
    const stepMap = { draft: 20, standard: 35, high: 50 }
    return stepMap[quality ?? 'standard']
  }

  private mapStatus(providerStatus: string): GenerationResult['status'] {
    const statusMap = {
      'queued': 'pending',
      'running': 'processing',
      'succeeded': 'complete',
      'failed': 'failed'
    }
    return (statusMap[providerStatus] ?? 'pending') as GenerationResult['status']
  }

  async generateVideo(params: VideoGenerationParams): Promise<GenerationResult> {
    throw new Error('FluxPro does not support video generation')
  }

  getModelCapabilities(modelId: string): ModelCapabilities {
    return {
      supportsImage: true,
      supportsVideo: false,
      maxResolution: { width: 2048, height: 2048 },
      supportedAspectRatios: ['1:1', '16:9', '9:16', '4:3', '3:4'],
      supportsNegativePrompt: true,
      supportsSeed: true,
      supportsImageToVideo: false
    }
  }
}

When a provider changes their parameter schema, you update one adapter file. Zero application code changes.

The Router: Model Selection as a First-Class Concern

The abstraction layer enables something more powerful than provider isolation: model routing as explicit application logic.

Different content types have different optimal models. Social format video and cinematic brand video have different quality-cost tradeoffs. Product photography and editorial creative have different model optimization targets. Without a router, model selection happens implicitly in application code - scattered across components, hardcoded against specific model IDs, difficult to change without touching multiple files.

A router makes model selection explicit and configurable:

interface RoutingRule {
  contentType: ContentType
  quality: 'draft' | 'standard' | 'high'
  modelId: string
  fallbackModelId?: string
}

type ContentType = 
  | 'social-video'
  | 'cinematic-video'
  | 'product-photography'
  | 'editorial-image'
  | 'art-generation'
  | 'concept-draft'

class GenerationRouter {
  private rules: RoutingRule[]
  private clients: Map<string, GenerationClient>

  constructor(rules: RoutingRule[], clients: Map<string, GenerationClient>) {
    this.rules = rules
    this.clients = clients
  }

  route(contentType: ContentType, quality: RoutingRule['quality']): string {
    const rule = this.rules.find(
      r => r.contentType === contentType && r.quality === quality
    )
    if (!rule) throw new Error(`No routing rule for ${contentType}/${quality}`)
    return rule.modelId
  }

  async generate(
    contentType: ContentType,
    quality: RoutingRule['quality'],
    params: Omit<ImageGenerationParams | VideoGenerationParams, 'modelId'>
  ): Promise<GenerationResult> {
    const modelId = this.route(contentType, quality)
    const client = this.clients.get(this.getProviderForModel(modelId))

    if (!client) throw new Error(`No client configured for model ${modelId}`)

    const fullParams = { ...params, modelId }

    try {
      if (this.isVideoContent(contentType)) {
        return await client.generateVideo(fullParams as VideoGenerationParams)
      }
      return await client.generateImage(fullParams as ImageGenerationParams)
    } catch (error) {
      // Fallback to secondary model on failure
      const rule = this.rules.find(
        r => r.contentType === contentType && r.quality === quality
      )
      if (rule?.fallbackModelId) {
        console.warn(`Primary model ${modelId} failed, falling back to ${rule.fallbackModelId}`)
        const fallbackParams = { ...fullParams, modelId: rule.fallbackModelId }
        const fallbackClient = this.clients.get(this.getProviderForModel(rule.fallbackModelId))
        if (fallbackClient) {
          return this.isVideoContent(contentType)
            ? await fallbackClient.generateVideo(fallbackParams as VideoGenerationParams)
            : await fallbackClient.generateImage(fallbackParams as ImageGenerationParams)
        }
      }
      throw error
    }
  }

  private isVideoContent(contentType: ContentType): boolean {
    return contentType === 'social-video' || contentType === 'cinematic-video'
  }

  private getProviderForModel(modelId: string): string {
    // Model-to-provider mapping - update when adding new models
    const modelProviderMap: Record<string, string> = {
      'kling-3-0': 'kling',
      'sora-2': 'openai',
      'veo-3-1-fast': 'google',
      'veo-3-1-quality': 'google',
      'flux-2-pro': 'flux',
      'google-imagen-4': 'google',
      'midjourney': 'midjourney',
      'ideogram-v3': 'ideogram',
      'nano-banana-2': 'google',
      'wan-2-6': 'wan',
      'hailuo-02': 'hailuo',
      'seedance-2-0': 'seedance'
    }
    return modelProviderMap[modelId] ?? 'unknown'
  }
}

// Configuration - update when models change, zero application code changes
const routingConfig: RoutingRule[] = [
  { contentType: 'cinematic-video', quality: 'high', modelId: 'kling-3-0', fallbackModelId: 'sora-2' },
  { contentType: 'cinematic-video', quality: 'standard', modelId: 'veo-3-1-quality', fallbackModelId: 'kling-3-0' },
  { contentType: 'social-video', quality: 'standard', modelId: 'veo-3-1-fast', fallbackModelId: 'hailuo-02' },
  { contentType: 'social-video', quality: 'draft', modelId: 'wan-2-6' },
  { contentType: 'product-photography', quality: 'high', modelId: 'flux-2-pro', fallbackModelId: 'google-imagen-4' },
  { contentType: 'editorial-image', quality: 'high', modelId: 'midjourney' },
  { contentType: 'art-generation', quality: 'standard', modelId: 'nano-banana-2' },
  { contentType: 'concept-draft', quality: 'draft', modelId: 'wan-2-6' }
]

Application code now calls router.generate('social-video', 'standard', params). When the optimal model for social video changes, you update the routing config. Zero application code changes.

Async Generation and Status Polling

Most production AI generation APIs are asynchronous - you submit a request and poll for completion rather than waiting synchronously. Building this correctly in the abstraction layer means application code never deals with polling directly:

class AsyncGenerationClient {
  private router: GenerationRouter
  private pollIntervalMs = 2000
  private maxPollAttempts = 150 // 5 minutes max

  async generateAndWait(
    contentType: ContentType,
    quality: RoutingRule['quality'],
    params: Omit<ImageGenerationParams | VideoGenerationParams, 'modelId'>
  ): Promise<GenerationResult> {
    // Submit generation
    const pending = await this.router.generate(contentType, quality, params)

    if (pending.status === 'complete') return pending
    if (pending.status === 'failed') throw new Error(pending.error ?? 'Generation failed')

    // Poll for completion
    return this.pollUntilComplete(pending.id, pending.modelId)
  }

  private async pollUntilComplete(
    generationId: string,
    modelId: string,
    attempt = 0
  ): Promise<GenerationResult> {
    if (attempt >= this.maxPollAttempts) {
      throw new Error(`Generation ${generationId} timed out after ${this.maxPollAttempts} attempts`)
    }

    await this.sleep(this.pollIntervalMs)

    const result = await this.getStatus(generationId, modelId)

    if (result.status === 'complete') return result
    if (result.status === 'failed') throw new Error(result.error ?? 'Generation failed')

    // Exponential backoff after first 10 attempts
    if (attempt > 10) this.pollIntervalMs = Math.min(this.pollIntervalMs * 1.5, 10000)

    return this.pollUntilComplete(generationId, modelId, attempt + 1)
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms))
  }

  private async getStatus(generationId: string, modelId: string): Promise<GenerationResult> {
    // Route status check to correct provider adapter
    const provider = this.router['getProviderForModel'](modelId)
    const client = this.router['clients'].get(provider)
    if (!client) throw new Error(`No client for model ${modelId}`)
    // Each adapter implements getStatus - omitted for brevity
    return client.getStatus(generationId)
  }
}

This pattern also enables webhook-based completion notification as an alternative to polling - the architecture supports both without application code changes.

The Multi-Model Platform Approach: When to Build vs When to Use

The architecture above is the right approach when you're building a product where generation is a core capability and direct provider relationships are commercially justified.

For applications where AI generation is a feature rather than the core product, there is a third option between "build your own abstraction layer" and "integrate directly against a single provider": use a multi-model platform API that already implements this architecture.

Cliprise provides an API that exposes 47+ models - including every model discussed in this article - behind a single normalized interface. The API integration guide and developers page cover the specific endpoints and integration patterns.

The build-vs-use decision comes down to one question: do you need to negotiate direct provider relationships for commercial terms, or does a unified API with standard credit pricing meet your production requirements?

For most applications integrating AI generation as a feature rather than building generation infrastructure as a product, the unified API approach eliminates the abstraction layer engineering and ongoing provider maintenance, at the cost of dependency on the platform API rather than direct provider relationships.

Parameter Standardization: The Schema Problem Nobody Talks About

Beyond the routing and provider abstraction, there is a parameter standardization problem that compounds across every model you integrate.

Different providers use different parameter names for the same concepts:

Concept	Flux	Stability	Google	OpenAI
Prompt adherence	`guidance_scale`	`cfg_scale`	`guidance_scale`	`quality`
Noise steps	`num_inference_steps`	`steps`	`sample_steps`	(abstracted)
Output dimensions	`width` + `height`	`width` + `height`	`aspect_ratio`	`size`
Seed for reproducibility	`seed`	`seed`	`seed`	(not supported)
Style conditioning	`style_preset`	`style`	`style`	(not supported)

Without standardization, your application code either contains provider-specific parameter logic (tight coupling) or ignores these parameters entirely (losing significant output quality control).

The normalized parameter types in the abstraction layer above solve this - application code sets quality: 'high' and the adapter translates to the correct provider parameter and value. The seed values guide for reproducible generation covers the specific behavior of seed parameters across major providers.

Rate Limiting and Cost Management

At production scale, two operational concerns compound quickly without explicit handling: rate limiting and credit cost.

Rate limiting: Most AI generation APIs enforce rate limits at the account level - requests per minute or concurrent generations. Without explicit handling, rate limit errors from one content type can block generation for all content types sharing the same API credentials.

class RateLimitedGenerationClient implements GenerationClient {
  private client: GenerationClient
  private queue: Array<() => Promise<void>> = []
  private activeRequests = 0
  private maxConcurrent: number
  private minRequestIntervalMs: number
  private lastRequestTime = 0

  constructor(
    client: GenerationClient,
    maxConcurrent = 3,
    minRequestIntervalMs = 500
  ) {
    this.client = client
    this.maxConcurrent = maxConcurrent
    this.minRequestIntervalMs = minRequestIntervalMs
  }

  async generateImage(params: ImageGenerationParams): Promise<GenerationResult> {
    return this.withRateLimit(() => this.client.generateImage(params))
  }

  async generateVideo(params: VideoGenerationParams): Promise<GenerationResult> {
    return this.withRateLimit(() => this.client.generateVideo(params))
  }

  private async withRateLimit<T>(fn: () => Promise<T>): Promise<T> {
    // Wait if at concurrent limit
    while (this.activeRequests >= this.maxConcurrent) {
      await this.sleep(100)
    }

    // Enforce minimum interval between requests
    const timeSinceLastRequest = Date.now() - this.lastRequestTime
    if (timeSinceLastRequest < this.minRequestIntervalMs) {
      await this.sleep(this.minRequestIntervalMs - timeSinceLastRequest)
    }

    this.activeRequests++
    this.lastRequestTime = Date.now()

    try {
      return await fn()
    } finally {
      this.activeRequests--
    }
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms))
  }

  getModelCapabilities(modelId: string): ModelCapabilities {
    return this.client.getModelCapabilities(modelId)
  }
}

Cost tracking: The normalized GenerationResult type includes creditsConsumed. Wrapping the generation router with cost tracking gives you per-content-type cost data without requiring application-level instrumentation:

class CostTrackingRouter extends GenerationRouter {
  private costLog: Array<{ modelId: string; contentType: string; credits: number; timestamp: Date }> = []

  async generate(
    contentType: ContentType,
    quality: RoutingRule['quality'],
    params: Omit<ImageGenerationParams | VideoGenerationParams, 'modelId'>
  ): Promise<GenerationResult> {
    const result = await super.generate(contentType, quality, params)

    this.costLog.push({
      modelId: result.modelId,
      contentType,
      credits: result.creditsConsumed,
      timestamp: new Date()
    })

    return result
  }

  getCostByContentType(): Record<string, number> {
    return this.costLog.reduce((acc, entry) => {
      acc[entry.contentType] = (acc[entry.contentType] ?? 0) + entry.credits
      return acc
    }, {} as Record<string, number>)
  }
}

This data is how you calculate the acceptable output rate and effective cost per deliverable metrics that drive rational model routing decisions. The cost optimization guide for multi-model platforms covers the analytical framework.

Adding a New Model: What It Should Look Like

The test of whether this architecture is working is how long it takes to add a new model when it releases.

With the abstraction layer in place:

Create a new provider adapter class implementing GenerationClient - or extend an existing provider adapter if the new model is from the same provider with a different model ID
Register the adapter in the client map
Add routing rules for the new model
Deploy

Zero application code changes. Zero test suite changes outside the new adapter. If you're doing this in more than a day for a model from an existing provider, the abstraction layer has a gap.

The models currently integrated in production AI generation applications worth having adapters for: Kling 3.0, Sora 2, Veo 3.1 Fast, Veo 3.1 Quality, Flux 2 Pro, Google Imagen 4, Midjourney, Ideogram v3, Nano Banana 2, Hailuo 02, Seedance 2.0.

The model comparison tool provides current capability data for routing rule decisions. Full model catalog: www.cliprise.app/models.

Summary

The abstraction layer that makes multi-model AI generation integration maintainable has four components:

Normalized interfaces - GenerationClient, ImageGenerationParams, VideoGenerationParams, GenerationResult. Application code calls these interfaces. Provider details never leak into application code.

Provider adapters - one per provider, implementing the normalized interface, absorbing all provider-specific schema translation. Provider API changes require updating one file.

Generation router - content type and quality tier to model ID mapping as explicit configuration. Model selection as a configurable routing decision rather than implicit hardcoding.

Operational wrappers - rate limiting and cost tracking as composable layers around the router. These concerns are handled systematically rather than scattered through application code.

The AI generation model landscape will continue changing faster than any other part of the stack. Building the abstraction layer that insulates your application from that churn is the difference between AI generation as a maintainable feature and AI generation as a permanent source of architectural debt.

The author builds Cliprise - a multi-model AI generation platform with 47+ video, image, and art generation models accessible via unified API and web/mobile interfaces.

DEV Community