田WB

Posted on Apr 11

How I Built an AI Birthday Photo Generator with Cloudflare Workers, Gemini 2.5 Flash, and FLUX.2 Pro

#ai #generativeai #cloudflarechallenge

TL;DR: Upload a selfie → Gemini analyzes the photo and writes 3 birthday scene prompts → FLUX.2 Pro generates the images → Cloudflare R2 stores them. The whole backend runs on Cloudflare Workers with zero servers.

I recently launched bdayphoto.com, an AI-powered birthday photo generator. Upload your photo and get 3 unique AI birthday celebration scenes in about 60 seconds. Here's how I built the whole thing on Cloudflare's serverless stack.

Before → After — one selfie, three AI-generated birthday scenes:

Original photo	Scene 1	Scene 2	Scene 3

The Stack

Frontend: Next.js 15 (App Router) → static export → Cloudflare Pages
Backend:  Cloudflare Workers (TypeScript)
Database: Cloudflare D1 (SQLite)
Storage:  Cloudflare R2
Cache:    Cloudflare KV (sessions)
Queue:    Cloudflare Queues
AI:       Gemini 2.5 Flash (via Replicate) + BFL FLUX.2 Pro
Payment:  PayPal
Auth:     Google OAuth

The entire backend runs on a single Cloudflare Worker. No EC2, no containers, no ops headaches.

The Core Pipeline

The generation flow looks like this:

User uploads photo
      ↓
Worker validates + deducts credits + enqueues job
      ↓
Queue consumer picks it up (runs up to 15 min)
      ↓
Step 1: Gemini 2.5 Flash analyzes the photo → outputs 3 scene prompts (JSON)
      ↓
Step 2: Submit 3 FLUX.2 Pro jobs sequentially → each fires a webhook when done
      ↓
Webhook handler saves images to R2, finalizes task
      ↓
User polls /api/task/:id → gets results

Step 1: Gemini Analyzes the Photo

The hardest part wasn't the image generation. It was writing a prompt good enough to make Gemini output exactly what FLUX needs.

I use Gemini 2.5 Flash via Replicate's API. The system prompt is ~800 words and instructs Gemini to:

Count people in the foreground (ignore background bystanders)
Describe each person's face features in detail (for face preservation in the generated image)
Design 3 completely different birthday scene themes
Output a structured JSON with start_prompt, end_prompt, and 3 scenes

The JSON structure separates shared prompt parts from scene-specific content:

{
  "people_count": 1,
  "start_prompt": "This is the same person from the reference photo. Preserve their exact face shape...",
  "end_prompt": "Shot on Canon EOS R5, 85mm f/1.4, photorealistic, 8K ultra HD",
  "scenes": [
    { "name": "Golden_Gala", "prompt": "Glamorous gold ballroom with 40 metallic gold balloons..." },
    { "name": "Tropical_Paradise", "prompt": "Vibrant beach party setting with palm trees..." },
    { "name": "Enchanted_Garden", "prompt": "Magical outdoor garden with floral arches..." }
  ]
}

When submitting to FLUX, I assemble the full prompt as:

const fullPrompt = [analysis.start_prompt, scene.prompt, analysis.end_prompt]
  .filter(Boolean)
  .join(' ');

This makes it easy to tune the shared "face preservation" instructions without touching each scene prompt.

Step 2: FLUX.2 Pro Image Generation

I use BFL's FLUX.2 Pro directly (not via Replicate or fal.ai). The BFL API supports:

input_image + input_image_2: both set to the user's photo — this enables face-consistent generation
webhook_url + webhook_secret: BFL calls back when done instead of polling

const body = {
  prompt: fullPrompt,
  input_image: dataUri,
  input_image_2: dataUri,   // face reference
  width: 1080,
  height: 1920,
  output_format: 'jpeg',
  safety_tolerance: 5,
  webhook_url: webhookUrl,
  webhook_secret: webhookSecret,
};

I submit 3 scenes sequentially, not in parallel. BFL's API occasionally returns "Task not found" errors when you hammer it too fast — sequential submission with a small gap is much more reliable.

Handling Webhooks on Cloudflare Workers

BFL fires a webhook when each image is ready. The webhook handler:

Verifies the webhook_secret (a UUID generated per task)
Downloads the image from BFL's CDN
Uploads it to R2
Updates the bfl_tasks record
Checks if all 3 are done → finalizes the parent task

// Idempotent finalization using CAS update
const updateResult = await env.DB.prepare(
  `UPDATE tasks SET status = 'done', r2_key_1 = ?, r2_key_2 = ?, r2_key_3 = ?, updated_at = ?
   WHERE id = ? AND status != 'done'`
).bind(r2Keys[1], r2Keys[2], r2Keys[3], now, taskId).run();

if (!updateResult.meta.changes || updateResult.meta.changes === 0) {
  // Already finalized by another concurrent webhook, skip
  return;
}

The AND status != 'done' guard makes the finalization idempotent — safe even if two webhooks arrive simultaneously.

Webhook Loss Compensation

Webhooks can fail or get lost. I added a polling compensation mechanism triggered when the user polls /api/task/:id:

// If task is generating and it's been > 30 seconds since submission,
// actively poll BFL's result API for any bfl_tasks that haven't completed
if (task.status === 'generating') {
  ctx.waitUntil(compensateMissingResults(taskId, userId, env));
}

The compensation function queries bfl_tasks where status = 'generating' and the task was submitted more than 30 seconds ago, then polls BFL directly. ctx.waitUntil() runs it asynchronously without delaying the HTTP response.

D1 Schema Design

The database has three main tables:

-- Tracks the overall generation job
CREATE TABLE tasks (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL,
  status TEXT NOT NULL,         -- pending → analyzing → generating → done
  gemini_analysis TEXT,         -- raw JSON from Gemini
  analyze_duration_sec REAL,
  scene_name_1 TEXT, scene_name_2 TEXT, scene_name_3 TEXT,
  r2_key_1 TEXT, r2_key_2 TEXT, r2_key_3 TEXT,
  credits_cost INTEGER,
  error_message TEXT,
  expires_at TEXT,
  created_at TEXT, updated_at TEXT
);

-- One record per FLUX job (3 per task)
CREATE TABLE bfl_tasks (
  id TEXT PRIMARY KEY,
  task_id TEXT NOT NULL,
  scene_index INTEGER NOT NULL,  -- 1, 2, or 3
  bfl_id TEXT,                   -- BFL's job ID
  polling_url TEXT,
  webhook_secret TEXT,
  status TEXT NOT NULL,          -- pending → generating → saving → done | failed
  r2_key TEXT,
  error_message TEXT,
  created_at TEXT, updated_at TEXT
);

-- Tracks concurrent generation lock per user
-- users table has a `generating_since` TEXT field
-- Atomic lock: UPDATE users SET generating_since = ? WHERE id = ? AND (generating_since = '' OR generating_since < ?)

The atomic lock for preventing concurrent generations is clever: it's a single UPDATE with a conditional WHERE — if the update affects 0 rows, someone else is already generating.

Credits System

I track credits with an event log pattern:

// Deduct credits + create task + log — all in one D1 batch
const batch = [
  env.DB.prepare(
    'UPDATE users SET credits = credits - ? WHERE id = ? AND credits >= ?'
  ).bind(creditsCost, userId, creditsCost),
  env.DB.prepare(
    'INSERT INTO tasks ...'
  ).bind(taskId, ...),
  env.DB.prepare(
    'INSERT INTO credit_logs (change_amount, balance_after, reason) VALUES (?, ?, ?)'
  ).bind(-creditsCost, newBalance, 'generate'),
];
await env.DB.batch(batch);

If the first statement affects 0 rows (concurrent deduction), I clean up and return a 403. The batch is atomic at the D1 level.

Partial failures are handled gracefully: if 1 of 3 FLUX jobs fails, I refund ceil(credits_cost * failed_count / 3) credits.

Frontend: Next.js + Static Export

The frontend is Next.js 15 with output: 'static' deployed to Cloudflare Pages. Static export means there's no Node.js server — just HTML/CSS/JS on the CDN.

For SEO, I needed server components to export metadata. Since the app was originally all 'use client', I refactored each page:

src/app/page.tsx          ← server component, exports metadata
src/components/home-page.tsx  ← client component with all the interactivity

This way Next.js can statically render the <head> with proper <title>, <meta>, and JSON-LD tags while keeping the interactive bits client-side.

Lessons Learned

1. Queue consumers are the secret weapon of Cloudflare Workers.
Without Queues, you'd have to orchestrate long jobs externally. With Queues, you get 15 minutes of wall time for complex AI pipelines.

2. Webhook + polling compensation is more robust than polling alone.
Webhooks are fast but can be lost. Pure polling is slow. The combination — webhooks for speed, compensation polling as a fallback — gives you reliability without burning money on unnecessary API calls.

3. Idempotent operations everywhere.
Multiple webhooks can fire. Queues can retry messages. D1 batches can partially fail. Design every write as a CAS (Compare-And-Swap) update with conditional WHERE clauses. OR IGNORE and AND status != 'done' guards have saved me from data corruption multiple times.

4. BFL over Replicate for FLUX.2 Pro.
BFL is the official API from Black Forest Labs (the FLUX team). It's cheaper, faster, and the parameters are documented correctly. Third-party platforms like Replicate have slightly different parameter semantics.

5. Split Gemini's output into start_prompt + scenes + end_prompt.
This separation makes it easy to iterate on face-preservation instructions globally without regenerating every scene prompt. It also keeps individual scene prompts concise and focused.

What's Next

Pinterest strategy for organic reach
More scene themes (holiday, vintage, anime style)
Batch generation improvements

Check out the live product at bdayphoto.com — you get 10 free credits on signup (no credit card required).

Happy to answer any questions about the architecture in the comments!

DEV Community