Let me paint a picture.
You're three months into your AI-powered SaaS. You've integrated Flux for image generation, Kling for video, a second image model for higher-resolution output, and an LLM for copy assistance. Four providers. Four API keys. Four completely different authentication patterns. Four sets of response formats. And four independent async handling strategies — because one provider is synchronous, two are async-poll, and one prefers webhooks.
You ship it. It works. You move on.
Then Kling updates their callback field names. Then your Flux adapter breaks because you forgot the temporary URL has a 12-hour TTL and your queue runs slow on weekends. Then you hire a new engineer, they touch the LLM integration, and somehow the image generation pipeline starts throwing 401s.
Sound familiar?
This is "SDK sprawl" — and it quietly ate about 40% of my team's engineering time before we did something about it. This post shows exactly what we did, with real working code.
The Before Picture: What SDK Sprawl Actually Looks Like
Here's what our codebase looked like before the refactor. Four different functions, four different patterns, zero shared logic:
// ❌ BEFORE: Flux image generation
async function generateFluxImage(prompt: string) {
const response = await fetch('https://api.flux.ai/v1/generate', {
method: 'POST',
headers: {
'X-Flux-Key': process.env.FLUX_API_KEY!,
'Content-Type': 'application/json',
},
body: JSON.stringify({ prompt, width: 1024, height: 1024 }),
})
const data = await response.json()
// Returns a temp URL that expires in 12 hours — must download immediately
return data.image_url
}
// ❌ BEFORE: Kling video generation (completely different pattern)
async function generateKlingVideo(prompt: string) {
const submit = await fetch('https://api.kling.ai/v1/video/create', {
method: 'POST',
headers: { Authorization: `Bearer ${process.env.KLING_KEY}` },
body: JSON.stringify({ prompt, duration: 5 }),
})
const { task_id } = await submit.json()
// Different polling logic from every other provider
while (true) {
await sleep(5000)
const status = await fetch(`https://api.kling.ai/v1/video/${task_id}`, {
headers: { Authorization: `Bearer ${process.env.KLING_KEY}` },
})
const data = await status.json()
if (data.status === 'completed') return data.output_video // different field name
if (data.status === 'error') throw new Error(data.error_message)
}
}
// ❌ BEFORE: Third provider — yet another pattern, yet another auth style
async function generateHDImage(prompt: string) {
// ... completely different auth, different field names, different error codes
}
Three providers, three completely different implementations. No shared retry logic. No unified error handling. No consistent file storage strategy. And when any one of these providers changed something — which they all did, regularly — the fix lived in exactly one place and nowhere else could benefit from it.
The After: One Client, One Pattern, Every Model
After migrating to Crun.ai API, the same functionality looks like this:
// ✅ AFTER: One client setup covers everything
import axios from 'axios'
const crun = axios.create({
baseURL: 'https://api.crun.ai/v1',
headers: {
Authorization: `Bearer ${process.env.CRUN_API_KEY}`,
'Content-Type': 'application/json',
},
timeout: 30_000,
})
// One image generation function — model is just a config param
async function generateImage(prompt: string, model = 'flux') {
const { data } = await crun.post('/tasks/image', {
model, // swap to 'dall-e-3' by changing this string. Zero other changes.
prompt,
size: '1024x1024',
})
return data.output.url // Persistent URL — no expiry, no manual download needed
}
// One video generation function — same shape as image
async function generateVideo(prompt: string, model = 'kling') {
const { data } = await crun.post('/tasks/video', {
model,
prompt,
duration: 5,
aspect_ratio: '16:9',
})
return data.task_id // Async — we'll poll or webhook from here
}
Same shape. Same field names. Same auth. The model name is the only thing that changes between providers.
Let me walk through the full implementation now.
Step 1: Base Client + Environment Setup
npm install axios dotenv
# .env
CRUN_API_KEY=your_key_here
CRUN_BASE_URL=https://api.crun.ai/v1
// lib/crun-client.ts
import axios, { AxiosInstance } from 'axios'
import 'dotenv/config'
const createCrunClient = (): AxiosInstance => {
const client = axios.create({
baseURL: process.env.CRUN_BASE_URL,
headers: {
Authorization: `Bearer ${process.env.CRUN_API_KEY}`,
'Content-Type': 'application/json',
},
timeout: 30_000,
})
// Unified error interceptor — one place, all providers
client.interceptors.response.use(
(res) => res,
(err) => {
const status = err.response?.status
const message = err.response?.data?.error || err.message
throw new Error(`[Crun API] ${status ?? 'Network Error'}: ${message}`)
}
)
return client
}
export const crun = createCrunClient()
One client. One error interceptor that covers every model call in your entire codebase.
Step 2: Image Generation — Sync, with Model Switching
Image tasks complete in seconds and return synchronously.
// lib/image.ts
import { crun } from './crun-client'
type ImageModel = 'flux' | 'dall-e-3'
type ImageSize = '512x512' | '1024x1024' | '1792x1024'
interface ImageResult {
taskId: string
url: string // Persistent — no TTL, no manual download
width: number
height: number
costUsd: number
}
export async function generateImage(
prompt: string,
model: ImageModel = 'flux',
size: ImageSize = '1024x1024'
): Promise<ImageResult> {
const { data } = await crun.post('/tasks/image', { model, prompt, size })
return {
taskId: data.task_id,
url: data.output.url,
width: data.output.width,
height: data.output.height,
costUsd: data.usage.cost,
}
}
Want to A/B test Flux vs. DALL·E? It's literally one argument:
// A/B test: route 50% of requests to each model
const model = Math.random() > 0.5 ? 'flux' : 'dall-e-3'
const result = await generateImage(prompt, model)
No adapter rewrite. No separate code path. One line.
Step 3: Async Video Generation — Polling Mode
Video generation takes 30 seconds to 3 minutes. You have two solid options for handling the async lifecycle: polling and webhooks. Here's polling first — good for scripts and low-volume use cases.
// lib/video-polling.ts
import { crun } from './crun-client'
type VideoModel = 'kling' | 'luma'
interface VideoResult {
taskId: string
url: string
durationSeconds: number
}
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms))
export async function generateVideoWithPolling(
prompt: string,
model: VideoModel = 'kling',
options: { duration?: 5 | 10; pollIntervalMs?: number; timeoutMs?: number } = {}
): Promise<VideoResult> {
const { duration = 5, pollIntervalMs = 5_000, timeoutMs = 300_000 } = options
// 1. Submit the task
const { data: submitData } = await crun.post('/tasks/video', {
model,
prompt,
duration,
aspect_ratio: '16:9',
})
const taskId: string = submitData.task_id
console.log(`[video] Task submitted: ${taskId}`)
// 2. Poll until done
const deadline = Date.now() + timeoutMs
while (Date.now() < deadline) {
await sleep(pollIntervalMs)
const { data: statusData } = await crun.get(`/tasks/${taskId}`)
const { status } = statusData
console.log(`[video] ${taskId} → ${status} (${Math.round((Date.now() - (deadline - timeoutMs)) / 1000)}s)`)
if (status === 'succeeded') {
return {
taskId,
url: statusData.output.url,
durationSeconds: statusData.output.duration,
}
}
if (status === 'failed') {
throw new Error(`Task ${taskId} failed: ${statusData.error}`)
}
// 'pending' | 'processing' → keep waiting
}
throw new Error(`Task ${taskId} timed out after ${timeoutMs / 1000}s`)
}
⚠️ Don't set
pollIntervalMsbelow 3000ms. You'll burn rate limit quota and you won't get results any faster — model inference doesn't speed up because you're polling more often.
Step 4: Async Video Generation — Webhook Mode (Production-Recommended)
Polling holds a thread open for the duration of the task. In production, with many concurrent video jobs, that gets expensive fast. Use webhooks instead.
// lib/video-webhook.ts
import { crun } from './crun-client'
export async function submitVideoTask(
prompt: string,
model: 'kling' | 'luma' = 'kling',
webhookUrl: string
): Promise<string> {
const { data } = await crun.post('/tasks/video', {
model,
prompt,
duration: 5,
webhook: {
url: webhookUrl,
events: ['succeeded', 'failed'], // Only fire on terminal states
},
})
// Returns immediately with task_id — no blocking
return data.task_id
}
Your webhook receiver (Express/Fastify/any framework):
// routes/webhook.ts
import { Router } from 'express'
import { db } from '../db'
import { notifyUser } from '../notifications'
const router = Router()
router.post('/webhook/crun', async (req, res) => {
// ✅ Respond within 5 seconds or Crun will retry the webhook
// Acknowledge first, then process async
res.json({ received: true })
const { task_id, status, output, error } = req.body
try {
if (status === 'succeeded') {
await db.tasks.update(task_id, {
status: 'done',
videoUrl: output.url, // Persistent URL — save it directly
completedAt: new Date(),
})
await notifyUser(task_id)
} else if (status === 'failed') {
await db.tasks.update(task_id, {
status: 'failed',
errorMessage: error,
})
}
} catch (err) {
console.error(`[webhook] Failed to process task ${task_id}:`, err)
// Don't re-throw — response already sent
// Queue for retry via your job system instead
}
})
export { router as webhookRouter }
⚠️ Critical: Send your
200response before doing any async work. If your handler takes longer than 5 seconds to respond, the webhook will be retried — and you'll process the same completion event twice. The pattern above (respond → process async) is the right structure.
Step 5: Unified File URL — No More Expiry Bugs
This one deserves its own section because it's bitten so many teams.
When you integrate directly against most model APIs, the file URLs they return are temporary. They might be valid for 1 hour, 12 hours, or 24 hours depending on the provider. If your users access the URL after it expires — empty. If your queue runs slow and picks up a job 13 hours later — the file is gone.
Crun.ai API handles file storage for you. The URLs in output.url are persistent. You can store them directly in your database and reference them indefinitely.
That said, if you ever need to move files to your own storage (your own S3/OSS bucket for compliance, CDN reasons, etc.), here's a clean utility:
// lib/storage.ts
import fs from 'fs'
import path from 'path'
import axios from 'axios'
export async function downloadToLocal(
sourceUrl: string,
destPath: string
): Promise<void> {
const response = await axios.get(sourceUrl, {
responseType: 'stream',
timeout: 60_000,
})
await new Promise<void>((resolve, reject) => {
const writer = fs.createWriteStream(destPath)
response.data.pipe(writer)
writer.on('finish', resolve)
writer.on('error', reject)
})
console.log(`Saved to: ${path.resolve(destPath)}`)
}
// Usage
await downloadToLocal(result.url, './outputs/product-video.mp4')
Step 6: Reusable Retry Wrapper
Network errors happen. Model services have transient blips. Here's a retry decorator that works across all your Crun.ai API calls — image, video, text, everything.
// lib/retry.ts
interface RetryOptions {
maxAttempts?: number
baseDelayMs?: number
maxDelayMs?: number
shouldRetry?: (error: Error, attempt: number) => boolean
}
export async function withRetry<T>(
fn: () => Promise<T>,
options: RetryOptions = {}
): Promise<T> {
const {
maxAttempts = 3,
baseDelayMs = 1_000,
maxDelayMs = 30_000,
shouldRetry = defaultShouldRetry,
} = options
let lastError: Error
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn()
} catch (err) {
lastError = err as Error
if (attempt === maxAttempts || !shouldRetry(lastError, attempt)) {
throw lastError
}
// Exponential backoff + jitter to prevent thundering herd
const exponential = baseDelayMs * Math.pow(2, attempt - 1)
const jitter = Math.random() * 1_000
const delay = Math.min(exponential + jitter, maxDelayMs)
console.warn(`[retry] Attempt ${attempt} failed. Retrying in ${Math.round(delay)}ms...`)
await new Promise((r) => setTimeout(r, delay))
}
}
throw lastError!
}
function defaultShouldRetry(error: Error, attempt: number): boolean {
const message = error.message
// Don't retry client errors (4xx) — those are our fault
if (message.includes('400') || message.includes('401') ||
message.includes('403') || message.includes('422')) {
return false
}
// Do retry server errors (5xx) and network failures
return true
}
Using it is clean:
// Wrap any call with automatic retry
const image = await withRetry(
() => generateImage('a golden retriever on a beach', 'flux'),
{ maxAttempts: 3, baseDelayMs: 1_000 }
)
// Works identically for video
const taskId = await withRetry(
() => submitVideoTask(prompt, 'kling', webhookUrl),
{ maxAttempts: 2 }
)
Same retry logic. Every model. One implementation.
The Complete Flow, End to End
Here's how these pieces fit together in a real request handler:
// Example: Express route that handles both image and video generation
import { withRetry } from './lib/retry'
import { generateImage } from './lib/image'
import { submitVideoTask } from './lib/video-webhook'
import { db } from './db'
app.post('/api/generate', async (req, res) => {
const { type, prompt, model } = req.body
try {
if (type === 'image') {
const result = await withRetry(() => generateImage(prompt, model))
await db.assets.create({ type: 'image', url: result.url, taskId: result.taskId })
return res.json({ success: true, url: result.url })
}
if (type === 'video') {
const taskId = await withRetry(() =>
submitVideoTask(prompt, model, `${process.env.SERVER_URL}/webhook/crun`)
)
await db.tasks.create({ taskId, status: 'pending', type: 'video' })
return res.json({ success: true, taskId, status: 'pending' })
}
res.status(400).json({ error: 'Invalid generation type' })
} catch (err) {
console.error('[generate] Error:', err)
res.status(500).json({ error: (err as Error).message })
}
})
Clean. Typed. Handles both sync and async generation patterns through the same route.
Quick Reference: Common Errors
| Error | Likely Cause | Fix |
|---|---|---|
401: Unauthorized |
Wrong or missing API key | Check Authorization: Bearer <key> header format |
422: Unprocessable Entity |
Invalid parameter value | Check duration (only 5 or 10), size format |
429: Too Many Requests |
Rate limit hit | Add backoff in retry wrapper, reduce concurrency |
task status: failed |
Content filter or prompt issue | Check prompt for policy violations, simplify |
| Webhook not received | Server not publicly accessible | Use ngrok locally, verify URL in Crun dashboard |
What Changed for Our Team
After migrating:
- AI-related maintenance tickets: from ~11/quarter to ~1/quarter
- Lines of AI integration code: from ~2,400 to ~380 (our code) + shared client
- Time to add a new AI capability: from ~1.5 days to ~2 hours
- Model switching for A/B tests: from 2-day migration to 1-line config change
The biggest change wasn't the numbers — it was that my engineers stopped treating the AI integration layer as "the thing that randomly breaks." It became infrastructure. Stable, boring, reliable infrastructure.
That's the goal.
Wrapping Up
The pattern here isn't magic — it's just good software engineering applied to AI integrations. Normalize your interfaces early, before your integration surface area becomes a domain unto itself.
Crun.ai API gives you a clean Task abstraction that handles the provider differences so you don't have to. If you're building an AI-powered app and you're on your second or third model integration, it's worth considering before the sprawl compounds.
The full repo for the examples in this post: bookmark it, fork it, adapt it.
Drop a comment if you've hit similar SDK sprawl issues — curious how others have approached it. 👇

Top comments (0)