DEV Community

Xu Xinglian
Xu Xinglian

Posted on

Integrating AI Image & Video Generation: A Technical Deep Dive with Code Examples

As developers, we're constantly looking for ways to add value to our applications. AI-powered image and video generation has moved from experimental feature to must-have capability across industries—from e-commerce product visualization to content management systems, from social media tools to marketing automation platforms.

But here's the challenge: the AI model landscape changes weekly. A model that's state-of-the-art today might be surpassed by something faster and cheaper next month. Building directly against individual model providers creates tight coupling and technical debt.

Let's explore how modern aggregation platforms solve this problem and look at practical integration patterns with real code examples.

The Multi-Provider Challenge

If you've integrated AI generation capabilities, you know the pain points:

Inconsistent APIs: Each provider has different authentication schemes, request formats, and error handling patterns. One uses multipart form data, another expects base64-encoded JSON, a third uses presigned URLs.

Variable Performance: Cold start times range from instant to 30+ seconds. Some providers queue requests, others process immediately. Pricing structures vary wildly—per-request, per-second, per-pixel, credit-based systems.

Feature Fragmentation: Provider A has the best text-to-image model but weak video capabilities. Provider B excels at video but lacks editing features. Provider C offers editing but at 3x the cost.

According to Stack Overflow's 2024 Developer Survey, 76% of developers are using or plan to use AI tools in their workflow. But integration complexity remains the top barrier to adoption.

Architecture Pattern: Unified Generation Layer

Modern platforms like WaveSpeedAI implement an aggregation pattern that abstracts provider differences behind unified APIs. Here's how this architecture works:

// Traditional multi-provider approach
const providers = {
  imageGen: new ProviderA({ apiKey: process.env.PROVIDER_A_KEY }),
  videoGen: new ProviderB({ apiKey: process.env.PROVIDER_B_KEY }),
  imageEdit: new ProviderC({ token: process.env.PROVIDER_C_TOKEN })
};

// Each requires different implementations
async function generateImage(prompt) {
  return await providers.imageGen.create({
    text: prompt,
    size: '1024x1024'
  });
}

async function generateVideo(prompt) {
  return await providers.videoGen.generate({
    prompt: prompt,
    duration: 5,
    resolution: '720p'
  });
}

// Unified approach
const wavespeed = {
  baseURL: 'https://api.wavespeed.ai/v1',
  apiKey: process.env.WAVESPEED_API_KEY
};

async function generate(model, params) {
  const response = await fetch(`${wavespeed.baseURL}/generate`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${wavespeed.apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ model, ...params })
  });

  if (!response.ok) {
    throw new Error(`Generation failed: ${response.statusText}`);
  }

  return await response.json();
}

// Same function signature for images and videos
const image = await generate('wavespeed-ai/z-image/turbo', {
  prompt: 'A serene mountain landscape',
  width: 1024,
  height: 1024
});

const video = await generate('alibaba/wan-2.6/text-to-video', {
  prompt: 'Ocean waves at sunset',
  duration: 5
});
Enter fullscreen mode Exit fullscreen mode

Real-World Implementation: Image Generation Service

Let's build a production-ready image generation microservice with proper error handling, caching, and fallback logic:

import { createHash } from 'crypto';
import Redis from 'ioredis';

class ImageGenerationService {
  constructor(config) {
    this.apiKey = config.apiKey;
    this.baseURL = config.baseURL || 'https://api.wavespeed.ai/v1';
    this.redis = new Redis(config.redisUrl);
    this.defaultModel = 'wavespeed-ai/z-image/turbo';
    this.fallbackModel = 'bytedance/seedream-v4.5';
  }

  // Generate cache key from parameters
  getCacheKey(model, params) {
    const normalized = JSON.stringify({ model, ...params });
    return `img:${createHash('sha256').update(normalized).digest('hex')}`;
  }

  // Core generation with retry logic
  async generateWithRetry(model, params, maxRetries = 3) {
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        const response = await fetch(`${this.baseURL}/generate`, {
          method: 'POST',
          headers: {
            'Authorization': `Bearer ${this.apiKey}`,
            'Content-Type': 'application/json'
          },
          body: JSON.stringify({ model, ...params }),
          signal: AbortSignal.timeout(30000) // 30s timeout
        });

        if (!response.ok) {
          const error = await response.json();
          throw new Error(`API Error: ${error.message || response.statusText}`);
        }

        return await response.json();
      } catch (error) {
        console.error(`Attempt ${attempt} failed:`, error.message);

        if (attempt === maxRetries) {
          throw error;
        }

        // Exponential backoff
        await new Promise(resolve => 
          setTimeout(resolve, Math.pow(2, attempt) * 1000)
        );
      }
    }
  }

  // Public interface with caching and fallback
  async generate(prompt, options = {}) {
    const {
      model = this.defaultModel,
      width = 1024,
      height = 1024,
      useCache = true,
      quality = 'standard'
    } = options;

    const params = { prompt, width, height };
    const cacheKey = this.getCacheKey(model, params);

    // Check cache first
    if (useCache) {
      const cached = await this.redis.get(cacheKey);
      if (cached) {
        console.log('Cache hit for:', prompt.substring(0, 50));
        return JSON.parse(cached);
      }
    }

    try {
      // Try primary model
      const result = await this.generateWithRetry(model, params);

      // Cache successful generation (24 hour TTL)
      if (useCache) {
        await this.redis.setex(cacheKey, 86400, JSON.stringify(result));
      }

      return result;
    } catch (error) {
      console.error('Primary model failed, trying fallback:', error.message);

      // Fallback to alternative model
      try {
        const result = await this.generateWithRetry(this.fallbackModel, params);

        if (useCache) {
          await this.redis.setex(cacheKey, 86400, JSON.stringify(result));
        }

        return result;
      } catch (fallbackError) {
        throw new Error(`All generation attempts failed: ${fallbackError.message}`);
      }
    }
  }

  // Batch generation with concurrency control
  async generateBatch(prompts, options = {}) {
    const { concurrency = 3 } = options;
    const results = [];

    for (let i = 0; i < prompts.length; i += concurrency) {
      const batch = prompts.slice(i, i + concurrency);
      const batchResults = await Promise.all(
        batch.map(prompt => this.generate(prompt, options))
      );
      results.push(...batchResults);
    }

    return results;
  }
}

// Usage example
const service = new ImageGenerationService({
  apiKey: process.env.WAVESPEED_API_KEY,
  redisUrl: process.env.REDIS_URL
});

// Single generation
const image = await service.generate(
  'A futuristic cityscape at night with neon lights',
  { width: 1920, height: 1080, quality: 'high' }
);

// Batch generation
const prompts = [
  'Mountain landscape',
  'Ocean sunset',
  'Forest path'
];
const images = await service.generateBatch(prompts, { concurrency: 2 });
Enter fullscreen mode Exit fullscreen mode

Video Generation with Webhook Callbacks

Video generation takes longer than images (5-60 seconds depending on length and complexity). Synchronous APIs block during processing. Let's implement an async pattern with webhooks:

class VideoGenerationService {
  constructor(config) {
    this.apiKey = config.apiKey;
    this.baseURL = config.baseURL || 'https://api.wavespeed.ai/v1';
    this.webhookSecret = config.webhookSecret;
  }

  // Initiate async video generation
  async startGeneration(prompt, options = {}) {
    const {
      model = 'alibaba/wan-2.6/text-to-video',
      duration = 5,
      resolution = '720p',
      webhookUrl
    } = options;

    const response = await fetch(`${this.baseURL}/generate/async`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model,
        prompt,
        duration,
        resolution,
        webhook_url: webhookUrl
      })
    });

    if (!response.ok) {
      throw new Error(`Failed to start generation: ${response.statusText}`);
    }

    const { job_id, status } = await response.json();
    return { jobId: job_id, status };
  }

  // Poll for completion (alternative to webhooks)
  async pollStatus(jobId, maxAttempts = 60) {
    for (let attempt = 0; attempt < maxAttempts; attempt++) {
      const response = await fetch(
        `${this.baseURL}/generate/status/${jobId}`,
        {
          headers: {
            'Authorization': `Bearer ${this.apiKey}`
          }
        }
      );

      const { status, result, error } = await response.json();

      if (status === 'completed') {
        return result;
      }

      if (status === 'failed') {
        throw new Error(`Generation failed: ${error}`);
      }

      // Wait 2 seconds between polls
      await new Promise(resolve => setTimeout(resolve, 2000));
    }

    throw new Error('Generation timeout');
  }

  // Verify webhook signature (security)
  verifyWebhook(payload, signature) {
    const hmac = createHash('sha256')
      .update(JSON.stringify(payload))
      .update(this.webhookSecret)
      .digest('hex');

    return hmac === signature;
  }
}

// Express webhook endpoint
app.post('/webhooks/video-complete', async (req, res) => {
  const signature = req.headers['x-webhook-signature'];

  if (!videoService.verifyWebhook(req.body, signature)) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  const { job_id, status, result } = req.body;

  if (status === 'completed') {
    // Process completed video
    await processVideo(job_id, result);
  }

  res.json({ received: true });
});
Enter fullscreen mode Exit fullscreen mode

Feature Flagging for Model Selection

Don't hardcode model choices. Use feature flags to switch models without deployments:

class ModelSelector {
  constructor(flagService) {
    this.flags = flagService;
  }

  async getImageModel(quality = 'standard') {
    const modelConfig = await this.flags.get('image-generation-models');

    const models = {
      fast: modelConfig?.fast || 'wavespeed-ai/z-image/turbo',
      standard: modelConfig?.standard || 'wavespeed-ai/qwen-image/text-to-image-2512',
      premium: modelConfig?.premium || 'bytedance/seedream-v4.5'
    };

    return models[quality] || models.standard;
  }

  async getVideoModel(type = 'text-to-video') {
    const modelConfig = await this.flags.get('video-generation-models');

    const models = {
      'text-to-video': modelConfig?.textToVideo || 'alibaba/wan-2.6/text-to-video',
      'image-to-video': modelConfig?.imageToVideo || 'alibaba/wan-2.6/image-to-video',
      'video-edit': modelConfig?.videoEdit || 'kwaivgi/kling-video-o1/video-edit-fast'
    };

    return models[type] || models['text-to-video'];
  }
}

// Usage with LaunchDarkly, Split, or similar
const selector = new ModelSelector(featureFlagClient);

const model = await selector.getImageModel('premium');
const image = await generationService.generate(prompt, { model });
Enter fullscreen mode Exit fullscreen mode

Cost Optimization Strategies

AI generation costs add up quickly at scale. Here's how to optimize:

class CostOptimizer {
  constructor(config) {
    this.costThresholds = config.costThresholds || {
      fast: 0.01,    // $0.01 per generation
      standard: 0.05, // $0.05 per generation
      premium: 0.15   // $0.15 per generation
    };
  }

  // Choose cheapest model that meets quality requirements
  selectModelByBudget(userTier, requiredQuality) {
    const budgets = {
      free: 'fast',
      basic: 'standard',
      pro: 'premium'
    };

    const selectedTier = budgets[userTier] || 'fast';

    // Don't downgrade below required quality
    const qualityRank = { fast: 1, standard: 2, premium: 3 };
    if (qualityRank[requiredQuality] > qualityRank[selectedTier]) {
      return requiredQuality;
    }

    return selectedTier;
  }

  // Estimate cost before generation
  estimateCost(model, params) {
    const pricing = {
      'wavespeed-ai/z-image/turbo': 0.005,
      'wavespeed-ai/qwen-image/text-to-image-2512': 0.025,
      'bytedance/seedream-v4.5': 0.04,
      'alibaba/wan-2.6/text-to-video': 0.50,
      'alibaba/wan-2.6/image-to-video': 0.50
    };

    const baseCost = pricing[model] || 0.05;

    // Adjust for parameters
    let multiplier = 1;
    if (params.resolution === '1080p') multiplier *= 1.5;
    if (params.duration > 5) multiplier *= (params.duration / 5);

    return baseCost * multiplier;
  }

  // Track actual spending
  async trackCost(userId, model, cost) {
    await db.costs.create({
      user_id: userId,
      model: model,
      cost: cost,
      timestamp: new Date()
    });

    // Check if user exceeded budget
    const monthlySpend = await db.costs.aggregate({
      user_id: userId,
      timestamp: { $gte: startOfMonth() }
    });

    if (monthlySpend > userBudgetLimit) {
      await notifyBudgetExceeded(userId, monthlySpend);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Monitoring and Observability

Production AI services need proper monitoring:

import { metrics } from '@opentelemetry/api';

class GenerationMetrics {
  constructor() {
    this.meter = metrics.getMeter('generation-service');

    this.generationCounter = this.meter.createCounter('generations.total', {
      description: 'Total generation requests'
    });

    this.generationDuration = this.meter.createHistogram('generations.duration', {
      description: 'Generation duration in milliseconds'
    });

    this.generationCost = this.meter.createHistogram('generations.cost', {
      description: 'Generation cost in USD'
    });

    this.cacheHitRate = this.meter.createCounter('cache.hits', {
      description: 'Cache hit rate'
    });
  }

  recordGeneration(model, duration, cost, cached = false) {
    this.generationCounter.add(1, { model, cached: cached.toString() });
    this.generationDuration.record(duration, { model });
    this.generationCost.record(cost, { model });

    if (cached) {
      this.cacheHitRate.add(1, { model });
    }
  }
}

// Usage in generation service
const metrics = new GenerationMetrics();

async function generate(model, params) {
  const startTime = Date.now();

  try {
    const result = await actualGeneration(model, params);
    const duration = Date.now() - startTime;
    const cost = costOptimizer.estimateCost(model, params);

    metrics.recordGeneration(model, duration, cost, false);

    return result;
  } catch (error) {
    // Record failures too
    metrics.generationCounter.add(1, { 
      model, 
      status: 'failed',
      error: error.message 
    });
    throw error;
  }
}
Enter fullscreen mode Exit fullscreen mode

Practical Testing Strategies

Testing AI generation is tricky since outputs are non-deterministic. Focus on integration tests:

describe('Image Generation Service', () => {
  let service;

  beforeAll(() => {
    service = new ImageGenerationService({
      apiKey: process.env.TEST_API_KEY,
      redisUrl: process.env.TEST_REDIS_URL
    });
  });

  test('generates image successfully', async () => {
    const result = await service.generate('test prompt', {
      width: 512,
      height: 512,
      useCache: false
    });

    expect(result).toHaveProperty('image_url');
    expect(result.image_url).toMatch(/^https?:\/\//);
  }, 60000); // 60s timeout for generation

  test('uses cache on duplicate requests', async () => {
    const prompt = 'cached test prompt';
    const options = { width: 512, height: 512 };

    // First request
    const result1 = await service.generate(prompt, options);

    // Second request should hit cache
    const startTime = Date.now();
    const result2 = await service.generate(prompt, options);
    const duration = Date.now() - startTime;

    expect(duration).toBeLessThan(100); // Cache should be fast
    expect(result1.image_url).toBe(result2.image_url);
  });

  test('handles API errors gracefully', async () => {
    await expect(
      service.generate('', { model: 'nonexistent-model' })
    ).rejects.toThrow();
  });
});
Enter fullscreen mode Exit fullscreen mode

Resources and Further Reading

For developers looking to implement AI generation:

Platform Documentation: Check out WaveSpeedAI's API docs for detailed endpoint specifications, authentication guides, and model parameters.

Model Explorer: Browse available models at WaveSpeedAI's model catalog to compare capabilities, pricing, and performance characteristics.

Best Practices: The Google Cloud AI Best Practices guide offers excellent general guidance on production AI systems.

Cost Management: a16z's AI economics research provides insights into infrastructure costs and optimization strategies.

Conclusion

Integrating AI generation doesn't have to mean juggling dozens of provider APIs. Aggregation platforms provide production-ready infrastructure that handles the complexity while giving you flexibility to switch models as the landscape evolves.

The key is building the right abstraction layers:

  • Unified generation interfaces that work across models
  • Robust error handling and fallback logic
  • Caching to reduce costs and latency
  • Feature flags for zero-downtime model switching
  • Comprehensive monitoring and cost tracking

Whether you choose WaveSpeedAI or build your own abstraction layer, focus on decoupling your application logic from specific model implementations. The AI landscape will keep changing—your architecture should be ready for it.


Have questions about implementing AI generation in your stack? Drop them in the comments. I'd love to hear about your experiences and challenges with production AI integration.

Top comments (0)