Aon infotech

Posted on Jun 16

Building a Text-to-Image Pipeline with Next.js — Architecture Decisions

#nextjs

When I built the backend for free AI image generator no credit card, the core challenge was connecting a Next.js frontend to a GPU inference endpoint efficiently. Here's the architecture thinking — not the specific implementation, but the decisions that shaped it.

The Core Architecture Problem

Text-to-image generation has an unusual request profile:

Each request is computationally expensive (GPU inference)
Requests are unpredictable — traffic spikes randomly
Each generation takes 5-20 seconds
Users expect real-time feedback during the wait This rules out the standard REST API pattern where you send a request and wait for a response. A 15-second HTTP connection held open per user doesn't scale.

The solution: async job queue pattern.

User submits prompt → 
Job created in queue → 
Job ID returned immediately → 
Frontend polls for status → 
Result delivered when ready

The Three Architecture Options

Option A — Synchronous (Simple, Doesn't Scale)

// app/api/generate/route.js
export async function POST(request) {
  const { prompt } = await request.json();

  // BAD: holds connection open for 15+ seconds
  const image = await runInference(prompt);
  return NextResponse.json({ image });
}

Problem: HTTP connections time out. Vercel serverless functions have execution limits. One slow generation blocks the connection.

Option B — Async with Polling (What Works)

// app/api/generate/route.js — Submit job
export async function POST(request) {
  const { prompt } = await request.json();

  // Submit to inference queue, get job ID immediately
  const jobId = await submitToQueue({ prompt });

  // Return immediately — don't wait for completion
  return NextResponse.json({ jobId, status: 'processing' });
}

// app/api/status/[jobId]/route.js — Check status
export async function GET(request, { params }) {
  const { jobId } = params;
  const result = await checkJobStatus(jobId);

  return NextResponse.json({
    status: result.status, // 'processing' | 'completed' | 'failed'
    image: result.image ?? null,
  });
}

Frontend polls /api/status/[jobId] every 2 seconds until complete.

Option C — WebSockets (Best UX, More Complex)

// Real-time updates via WebSocket
// Backend notifies frontend when job completes
// No polling overhead
// More infrastructure to maintain

For an early-stage product, Option B (polling) is the right tradeoff — simpler to implement and debug, good enough UX.

The Frontend — State Management

'use client';
import { useState, useCallback } from 'react';
import Image from 'next/image';

export default function ImageGenerator() {
  const [prompt, setPrompt] = useState('');
  const [status, setStatus] = useState('idle');
  // idle | submitting | processing | complete | error
  const [imageUrl, setImageUrl] = useState(null);

  const generate = useCallback(async () => {
    if (!prompt.trim() || status === 'processing') return;

    setStatus('submitting');

    try {
      // Submit job
      const submitRes = await fetch('/api/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ prompt }),
      });

      const { jobId } = await submitRes.json();
      setStatus('processing');

      // Poll for result
      await pollForResult(jobId);

    } catch {
      setStatus('error');
    }
  }, [prompt, status]);

  const pollForResult = async (jobId) => {
    const maxAttempts = 60; // 2 min timeout

    for (let i = 0; i < maxAttempts; i++) {
      await new Promise(r => setTimeout(r, 2000));

      const statusRes = await fetch(`/api/status/${jobId}`);
      const data = await statusRes.json();

      if (data.status === 'completed') {
        setImageUrl(data.image);
        setStatus('complete');
        return;
      }

      if (data.status === 'failed') {
        setStatus('error');
        return;
      }
    }

    setStatus('error'); // Timeout
  };

  return (
    <div className="flex flex-col gap-4 max-w-2xl mx-auto p-6">
      <textarea
        value={prompt}
        onChange={(e) => setPrompt(e.target.value)}
        placeholder="Describe the image you want..."
        className="w-full p-3 border border-border rounded-xl 
          resize-none h-24 bg-background text-foreground"
        disabled={status === 'processing'}
      />

      <button
        onClick={generate}
        disabled={status === 'processing' || !prompt.trim()}
        className="bg-orange-500 text-white px-6 py-3 
          rounded-full font-semibold transition-colors
          hover:bg-orange-600 disabled:opacity-50"
      >
        {status === 'processing' ? 'Generating...' : 'Generate'}
      </button>

      {/* Loading skeleton — matches output dimensions */}
      {status === 'processing' && (
        <div className="w-full aspect-square rounded-2xl 
          bg-neutral-100 dark:bg-neutral-800 animate-pulse" />
      )}

      {/* Generated result */}
      {status === 'complete' && imageUrl && (
        <div className="relative w-full aspect-square 
          rounded-2xl overflow-hidden">
          <Image
            src={imageUrl}
            alt={prompt}
            fill
            className="object-cover"
            priority
          />
          <a
            href={imageUrl}
            download="generated.png"
            className="absolute bottom-4 right-4 
              bg-white/90 backdrop-blur text-black 
              px-4 py-2 rounded-full text-sm font-semibold
              hover:bg-white transition-colors"
          >
            Download
          </a>
        </div>
      )}

      {status === 'error' && (
        <p className="text-red-500 text-sm text-center">
          Something went wrong. Try again.
        </p>
      )}
    </div>
  );
}

Key UX Decisions

Skeleton loading over spinner

The skeleton placeholder has the exact dimensions of the output image. When the result arrives, no layout shift — the image drops into the same space.

{/* Dimensions match expected output */}
{status === 'processing' && (
  <div className="w-full aspect-square rounded-2xl 
    bg-neutral-100 animate-pulse" />
)}

Disable input during generation

Users shouldn't be able to submit a second job while one is processing. Disable both the input and the button.

Immediate button state change

The button should reflect 'submitting' state before the API call returns — not after. Users notice the lag.

Rate Limiting Without Accounts

Without user accounts, rate limiting falls on IP address:

// middleware.js
const requestCounts = new Map();

export function middleware(request) {
  const ip = request.headers.get('x-forwarded-for') 
    ?? 'unknown';
  const now = Date.now();
  const window = 60_000; // 1 minute
  const limit = 8; // requests per window

  const history = (requestCounts.get(ip) ?? [])
    .filter(t => now - t < window);

  if (history.length >= limit) {
    return new Response('Rate limit exceeded', { 
      status: 429 
    });
  }

  requestCounts.set(ip, [...history, now]);
  return NextResponse.next();
}

export const config = {
  matcher: '/api/generate',
};

Important: In-memory rate limiting resets on server restart and doesn't work across multiple instances. For production, use a distributed cache for rate limit state.

Performance Considerations

Problem	Solution
Cold starts on inference	Keep minimum workers warm
Large image payloads	Use CDN URL instead of base64
Layout shift on load	Pre-sized skeleton placeholder
Polling overhead	Stop polling after completion
Timeout handling	Max attempts with user feedback

What I'd Do Differently

Webhooks over polling. The inference backend notifying the frontend when complete is cleaner than the frontend repeatedly asking. Polling works, but it's chatty.

Better error categorization. A single 'error' state isn't enough. Timeout is different from inference failure is different from invalid prompt. Each deserves different user messaging.

For a full breakdown of how the no-account architecture affects product decisions, I wrote a detailed post here.

Questions about the architecture? Comments open.

DEV Community