DEV Community

Gloria
Gloria

Posted on

I Stopped Managing 4 AI SDKs and Routed Everything Through Crun.ai API — Here's the Code

Crun Homepage

Let me paint a picture.

You're three months into your AI-powered SaaS. You've integrated Flux for image generation, Kling for video, a second image model for higher-resolution output, and an LLM for copy assistance. Four providers. Four API keys. Four completely different authentication patterns. Four sets of response formats. And four independent async handling strategies — because one provider is synchronous, two are async-poll, and one prefers webhooks.

You ship it. It works. You move on.

Then Kling updates their callback field names. Then your Flux adapter breaks because you forgot the temporary URL has a 12-hour TTL and your queue runs slow on weekends. Then you hire a new engineer, they touch the LLM integration, and somehow the image generation pipeline starts throwing 401s.

Sound familiar?

This is "SDK sprawl" — and it quietly ate about 40% of my team's engineering time before we did something about it. This post shows exactly what we did, with real working code.

The Before Picture: What SDK Sprawl Actually Looks Like

Here's what our codebase looked like before the refactor. Four different functions, four different patterns, zero shared logic:

// ❌ BEFORE: Flux image generation
async function generateFluxImage(prompt: string) {
  const response = await fetch('https://api.flux.ai/v1/generate', {
    method: 'POST',
    headers: {
      'X-Flux-Key': process.env.FLUX_API_KEY!,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ prompt, width: 1024, height: 1024 }),
  })
  const data = await response.json()
  // Returns a temp URL that expires in 12 hours — must download immediately
  return data.image_url
}

// ❌ BEFORE: Kling video generation (completely different pattern)
async function generateKlingVideo(prompt: string) {
  const submit = await fetch('https://api.kling.ai/v1/video/create', {
    method: 'POST',
    headers: { Authorization: `Bearer ${process.env.KLING_KEY}` },
    body: JSON.stringify({ prompt, duration: 5 }),
  })
  const { task_id } = await submit.json()

  // Different polling logic from every other provider
  while (true) {
    await sleep(5000)
    const status = await fetch(`https://api.kling.ai/v1/video/${task_id}`, {
      headers: { Authorization: `Bearer ${process.env.KLING_KEY}` },
    })
    const data = await status.json()
    if (data.status === 'completed') return data.output_video  // different field name
    if (data.status === 'error') throw new Error(data.error_message)
  }
}

// ❌ BEFORE: Third provider — yet another pattern, yet another auth style
async function generateHDImage(prompt: string) {
  // ... completely different auth, different field names, different error codes
}
Enter fullscreen mode Exit fullscreen mode

Three providers, three completely different implementations. No shared retry logic. No unified error handling. No consistent file storage strategy. And when any one of these providers changed something — which they all did, regularly — the fix lived in exactly one place and nowhere else could benefit from it.

The After: One Client, One Pattern, Every Model

After migrating to Crun.ai API, the same functionality looks like this:

// ✅ AFTER: One client setup covers everything
import axios from 'axios'

const crun = axios.create({
  baseURL: 'https://api.crun.ai/v1',
  headers: {
    Authorization: `Bearer ${process.env.CRUN_API_KEY}`,
    'Content-Type': 'application/json',
  },
  timeout: 30_000,
})

// One image generation function — model is just a config param
async function generateImage(prompt: string, model = 'flux') {
  const { data } = await crun.post('/tasks/image', {
    model,   // swap to 'dall-e-3' by changing this string. Zero other changes.
    prompt,
    size: '1024x1024',
  })
  return data.output.url  // Persistent URL — no expiry, no manual download needed
}

// One video generation function — same shape as image
async function generateVideo(prompt: string, model = 'kling') {
  const { data } = await crun.post('/tasks/video', {
    model,
    prompt,
    duration: 5,
    aspect_ratio: '16:9',
  })
  return data.task_id  // Async — we'll poll or webhook from here
}
Enter fullscreen mode Exit fullscreen mode

Same shape. Same field names. Same auth. The model name is the only thing that changes between providers.

Let me walk through the full implementation now.

Step 1: Base Client + Environment Setup

npm install axios dotenv
Enter fullscreen mode Exit fullscreen mode
# .env
CRUN_API_KEY=your_key_here
CRUN_BASE_URL=https://api.crun.ai/v1
Enter fullscreen mode Exit fullscreen mode
// lib/crun-client.ts
import axios, { AxiosInstance } from 'axios'
import 'dotenv/config'

const createCrunClient = (): AxiosInstance => {
  const client = axios.create({
    baseURL: process.env.CRUN_BASE_URL,
    headers: {
      Authorization: `Bearer ${process.env.CRUN_API_KEY}`,
      'Content-Type': 'application/json',
    },
    timeout: 30_000,
  })

  // Unified error interceptor — one place, all providers
  client.interceptors.response.use(
    (res) => res,
    (err) => {
      const status = err.response?.status
      const message = err.response?.data?.error || err.message
      throw new Error(`[Crun API] ${status ?? 'Network Error'}: ${message}`)
    }
  )

  return client
}

export const crun = createCrunClient()
Enter fullscreen mode Exit fullscreen mode

One client. One error interceptor that covers every model call in your entire codebase.

Step 2: Image Generation — Sync, with Model Switching

Image tasks complete in seconds and return synchronously.

// lib/image.ts
import { crun } from './crun-client'

type ImageModel = 'flux' | 'dall-e-3'
type ImageSize = '512x512' | '1024x1024' | '1792x1024'

interface ImageResult {
  taskId: string
  url: string        // Persistent — no TTL, no manual download
  width: number
  height: number
  costUsd: number
}

export async function generateImage(
  prompt: string,
  model: ImageModel = 'flux',
  size: ImageSize = '1024x1024'
): Promise<ImageResult> {
  const { data } = await crun.post('/tasks/image', { model, prompt, size })

  return {
    taskId: data.task_id,
    url: data.output.url,
    width: data.output.width,
    height: data.output.height,
    costUsd: data.usage.cost,
  }
}
Enter fullscreen mode Exit fullscreen mode

Want to A/B test Flux vs. DALL·E? It's literally one argument:

// A/B test: route 50% of requests to each model
const model = Math.random() > 0.5 ? 'flux' : 'dall-e-3'
const result = await generateImage(prompt, model)
Enter fullscreen mode Exit fullscreen mode

No adapter rewrite. No separate code path. One line.

Step 3: Async Video Generation — Polling Mode

Video generation takes 30 seconds to 3 minutes. You have two solid options for handling the async lifecycle: polling and webhooks. Here's polling first — good for scripts and low-volume use cases.

// lib/video-polling.ts
import { crun } from './crun-client'

type VideoModel = 'kling' | 'luma'

interface VideoResult {
  taskId: string
  url: string
  durationSeconds: number
}

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms))

export async function generateVideoWithPolling(
  prompt: string,
  model: VideoModel = 'kling',
  options: { duration?: 5 | 10; pollIntervalMs?: number; timeoutMs?: number } = {}
): Promise<VideoResult> {
  const { duration = 5, pollIntervalMs = 5_000, timeoutMs = 300_000 } = options

  // 1. Submit the task
  const { data: submitData } = await crun.post('/tasks/video', {
    model,
    prompt,
    duration,
    aspect_ratio: '16:9',
  })
  const taskId: string = submitData.task_id
  console.log(`[video] Task submitted: ${taskId}`)

  // 2. Poll until done
  const deadline = Date.now() + timeoutMs

  while (Date.now() < deadline) {
    await sleep(pollIntervalMs)

    const { data: statusData } = await crun.get(`/tasks/${taskId}`)
    const { status } = statusData
    console.log(`[video] ${taskId}${status} (${Math.round((Date.now() - (deadline - timeoutMs)) / 1000)}s)`)

    if (status === 'succeeded') {
      return {
        taskId,
        url: statusData.output.url,
        durationSeconds: statusData.output.duration,
      }
    }

    if (status === 'failed') {
      throw new Error(`Task ${taskId} failed: ${statusData.error}`)
    }
    // 'pending' | 'processing' → keep waiting
  }

  throw new Error(`Task ${taskId} timed out after ${timeoutMs / 1000}s`)
}
Enter fullscreen mode Exit fullscreen mode

⚠️ Don't set pollIntervalMs below 3000ms. You'll burn rate limit quota and you won't get results any faster — model inference doesn't speed up because you're polling more often.

Step 4: Async Video Generation — Webhook Mode (Production-Recommended)

Polling holds a thread open for the duration of the task. In production, with many concurrent video jobs, that gets expensive fast. Use webhooks instead.

// lib/video-webhook.ts
import { crun } from './crun-client'

export async function submitVideoTask(
  prompt: string,
  model: 'kling' | 'luma' = 'kling',
  webhookUrl: string
): Promise<string> {
  const { data } = await crun.post('/tasks/video', {
    model,
    prompt,
    duration: 5,
    webhook: {
      url: webhookUrl,
      events: ['succeeded', 'failed'],  // Only fire on terminal states
    },
  })

  // Returns immediately with task_id — no blocking
  return data.task_id
}
Enter fullscreen mode Exit fullscreen mode

Your webhook receiver (Express/Fastify/any framework):

// routes/webhook.ts
import { Router } from 'express'
import { db } from '../db'
import { notifyUser } from '../notifications'

const router = Router()

router.post('/webhook/crun', async (req, res) => {
  // ✅ Respond within 5 seconds or Crun will retry the webhook
  // Acknowledge first, then process async
  res.json({ received: true })

  const { task_id, status, output, error } = req.body

  try {
    if (status === 'succeeded') {
      await db.tasks.update(task_id, {
        status: 'done',
        videoUrl: output.url,       // Persistent URL — save it directly
        completedAt: new Date(),
      })
      await notifyUser(task_id)

    } else if (status === 'failed') {
      await db.tasks.update(task_id, {
        status: 'failed',
        errorMessage: error,
      })
    }
  } catch (err) {
    console.error(`[webhook] Failed to process task ${task_id}:`, err)
    // Don't re-throw — response already sent
    // Queue for retry via your job system instead
  }
})

export { router as webhookRouter }
Enter fullscreen mode Exit fullscreen mode

⚠️ Critical: Send your 200 response before doing any async work. If your handler takes longer than 5 seconds to respond, the webhook will be retried — and you'll process the same completion event twice. The pattern above (respond → process async) is the right structure.

Step 5: Unified File URL — No More Expiry Bugs

This one deserves its own section because it's bitten so many teams.

When you integrate directly against most model APIs, the file URLs they return are temporary. They might be valid for 1 hour, 12 hours, or 24 hours depending on the provider. If your users access the URL after it expires — empty. If your queue runs slow and picks up a job 13 hours later — the file is gone.

Crun.ai API handles file storage for you. The URLs in output.url are persistent. You can store them directly in your database and reference them indefinitely.

That said, if you ever need to move files to your own storage (your own S3/OSS bucket for compliance, CDN reasons, etc.), here's a clean utility:

// lib/storage.ts
import fs from 'fs'
import path from 'path'
import axios from 'axios'

export async function downloadToLocal(
  sourceUrl: string,
  destPath: string
): Promise<void> {
  const response = await axios.get(sourceUrl, {
    responseType: 'stream',
    timeout: 60_000,
  })

  await new Promise<void>((resolve, reject) => {
    const writer = fs.createWriteStream(destPath)
    response.data.pipe(writer)
    writer.on('finish', resolve)
    writer.on('error', reject)
  })

  console.log(`Saved to: ${path.resolve(destPath)}`)
}

// Usage
await downloadToLocal(result.url, './outputs/product-video.mp4')
Enter fullscreen mode Exit fullscreen mode

Step 6: Reusable Retry Wrapper

Network errors happen. Model services have transient blips. Here's a retry decorator that works across all your Crun.ai API calls — image, video, text, everything.

// lib/retry.ts

interface RetryOptions {
  maxAttempts?: number
  baseDelayMs?: number
  maxDelayMs?: number
  shouldRetry?: (error: Error, attempt: number) => boolean
}

export async function withRetry<T>(
  fn: () => Promise<T>,
  options: RetryOptions = {}
): Promise<T> {
  const {
    maxAttempts = 3,
    baseDelayMs = 1_000,
    maxDelayMs = 30_000,
    shouldRetry = defaultShouldRetry,
  } = options

  let lastError: Error

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await fn()
    } catch (err) {
      lastError = err as Error

      if (attempt === maxAttempts || !shouldRetry(lastError, attempt)) {
        throw lastError
      }

      // Exponential backoff + jitter to prevent thundering herd
      const exponential = baseDelayMs * Math.pow(2, attempt - 1)
      const jitter = Math.random() * 1_000
      const delay = Math.min(exponential + jitter, maxDelayMs)

      console.warn(`[retry] Attempt ${attempt} failed. Retrying in ${Math.round(delay)}ms...`)
      await new Promise((r) => setTimeout(r, delay))
    }
  }

  throw lastError!
}

function defaultShouldRetry(error: Error, attempt: number): boolean {
  const message = error.message
  // Don't retry client errors (4xx) — those are our fault
  if (message.includes('400') || message.includes('401') ||
      message.includes('403') || message.includes('422')) {
    return false
  }
  // Do retry server errors (5xx) and network failures
  return true
}
Enter fullscreen mode Exit fullscreen mode

Using it is clean:

// Wrap any call with automatic retry
const image = await withRetry(
  () => generateImage('a golden retriever on a beach', 'flux'),
  { maxAttempts: 3, baseDelayMs: 1_000 }
)

// Works identically for video
const taskId = await withRetry(
  () => submitVideoTask(prompt, 'kling', webhookUrl),
  { maxAttempts: 2 }
)
Enter fullscreen mode Exit fullscreen mode

Same retry logic. Every model. One implementation.

The Complete Flow, End to End

Here's how these pieces fit together in a real request handler:

// Example: Express route that handles both image and video generation
import { withRetry } from './lib/retry'
import { generateImage } from './lib/image'
import { submitVideoTask } from './lib/video-webhook'
import { db } from './db'

app.post('/api/generate', async (req, res) => {
  const { type, prompt, model } = req.body

  try {
    if (type === 'image') {
      const result = await withRetry(() => generateImage(prompt, model))
      await db.assets.create({ type: 'image', url: result.url, taskId: result.taskId })
      return res.json({ success: true, url: result.url })
    }

    if (type === 'video') {
      const taskId = await withRetry(() =>
        submitVideoTask(prompt, model, `${process.env.SERVER_URL}/webhook/crun`)
      )
      await db.tasks.create({ taskId, status: 'pending', type: 'video' })
      return res.json({ success: true, taskId, status: 'pending' })
    }

    res.status(400).json({ error: 'Invalid generation type' })

  } catch (err) {
    console.error('[generate] Error:', err)
    res.status(500).json({ error: (err as Error).message })
  }
})
Enter fullscreen mode Exit fullscreen mode

Clean. Typed. Handles both sync and async generation patterns through the same route.

Quick Reference: Common Errors

Error Likely Cause Fix
401: Unauthorized Wrong or missing API key Check Authorization: Bearer <key> header format
422: Unprocessable Entity Invalid parameter value Check duration (only 5 or 10), size format
429: Too Many Requests Rate limit hit Add backoff in retry wrapper, reduce concurrency
task status: failed Content filter or prompt issue Check prompt for policy violations, simplify
Webhook not received Server not publicly accessible Use ngrok locally, verify URL in Crun dashboard

What Changed for Our Team

After migrating:

  • AI-related maintenance tickets: from ~11/quarter to ~1/quarter
  • Lines of AI integration code: from ~2,400 to ~380 (our code) + shared client
  • Time to add a new AI capability: from ~1.5 days to ~2 hours
  • Model switching for A/B tests: from 2-day migration to 1-line config change

The biggest change wasn't the numbers — it was that my engineers stopped treating the AI integration layer as "the thing that randomly breaks." It became infrastructure. Stable, boring, reliable infrastructure.

That's the goal.

Wrapping Up

The pattern here isn't magic — it's just good software engineering applied to AI integrations. Normalize your interfaces early, before your integration surface area becomes a domain unto itself.

Crun.ai API gives you a clean Task abstraction that handles the provider differences so you don't have to. If you're building an AI-powered app and you're on your second or third model integration, it's worth considering before the sprawl compounds.

The full repo for the examples in this post: bookmark it, fork it, adapt it.

Drop a comment if you've hit similar SDK sprawl issues — curious how others have approached it. 👇

Top comments (0)