DEV Community

Cover image for Building an AI Recipe Generator with GPT-4o & Next.js (from prompt to plate)
Dereje Getahun
Dereje Getahun

Posted on • Originally published at github.com

Building an AI Recipe Generator with GPT-4o & Next.js (from prompt to plate)

“Choose ingredients, hit Generate, and watch GPT-4o cook.”

In this post I’ll break down how I wired Next.js 14, OpenAI, and MongoDB into a side-project that now serves 655 users.

1 · Why this project?

Confession: I don’t cook—at all. Yet I kept wishing there were a one-click way to turn leftover pantry items into a real recipe without learning to sauté. So I set out to build a tool that could:

  1. Transform any ingredient list into a full recipe in under 30 seconds.
  2. Show the dish & read the steps aloud (DALL·E image + TTS) so even non-cooks can follow along.
  3. Run on hobby-tier costs (≈ $50 / year for OpenAI + AWS at today’s usage).

That side project became Smart Recipe Generator—a Next.js 14 app that feeds ingredients to GPT-4o, generates a photo with DALL·E, and streams the recipe via TTS.

Today it’s fully open-source, with 655 signed-up users who’ve created 390 recipes (and counting).

(Costs scale with usage, so heavier cooking sprees will bump that $50/year, but it stays cheap at typical traffic.)

2 · Stack overview

Layer Tech
Front-end React (Next.js 14), Tailwind
AI OpenAI GPT-4o + DALL·E
Data MongoDB Atlas, pgvector (future)
Hosting Vercel + AWS S3/CloudFront

3 · Prompt engineering 101

Smart Recipe Generator uses 7 tiny prompts that each do one job:

Helper What it asks GPT-4o to do
getRecipeGenerationPrompt ↩︎ 3 diverse recipes from the ingredient list + diet prefs
getImageGenerationPrompt Craft a photorealistic DALL·E prompt for the dish
getIngredientValidationPrompt Check if “cuscus” is legit & suggest corrections
getRecipeNarrationPrompt Turn the JSON recipe into a 60-90 s audio script
getRecipeTaggingPrompt Spit out 10 SEO-friendly one-word tags
getChatAssistantSystemPrompt Set boundaries for the recipe-specific chat
(plus a minimal system prompt for ingredient-search autocomplete)

Below is the star of the show—the recipe-generation prompt (trimmed for readability):

export const getRecipeGenerationPrompt = (
  ingredients: Ingredient[],
  dietaryPreferences: DietaryPreference[]
) => `
I have the following ingredients: ${JSON.stringify(ingredients)}
${dietaryPreferences.length ? `and dietary preferences: ${dietaryPreferences.join(',')}` : ''}.
Please provide **three** delicious and diverse recipes in **valid JSON**:

[
  {
    "name": "Recipe Name",
    "ingredients": [
      { "name": "...", "quantity": "..." }
    ],
    "instructions": ["Do this first.", "Then do this."],
    "dietaryPreference": ["vegan"],
    "additionalInformation": {
      "tips": "...",
      "variations": "...",
      "servingSuggestions": "...",
      "nutritionalInformation": "..."
    }
  }
]

*No extra text, markdown, or step numbers.*  
Ensure recipes differ in cuisine/type and respect diet prefs.  
Quantities must include units.
`;
Enter fullscreen mode Exit fullscreen mode

Why this design works

  1. Exact JSON schema → zero post-processing headaches.

  2. Three recipes per call keeps users from hammering the button.

  3. Single prompt → cost ≈ $0.004 per generation at today’s rates.

  4. The other six prompts keep image, audio, and chat tasks isolated, so each subsystem can evolve independently.

4 · Generating images with DALL·E

Text-only recipes are helpful—but a hero shot sells the dish.

Every time a new recipe is saved, the backend fires one image call:

// pages/api/save-recipes.ts (simplified)
import { getImageGenerationPrompt } from '@/lib/prompts';
import { s3 } from '@/lib/aws';                   // thin AWS SDK wrapper

export async function generateRecipeImage(recipe: ExtendedRecipe) {
  const prompt = getImageGenerationPrompt(recipe.name, recipe.ingredients);

  // 1️⃣ Call DALL·E 3
  const { data } = await openai.images.generate({
    model: 'dall-e-3',
    prompt,
    size: '1024x1024',
    n: 1,
    response_format: 'url'
  });

  // 2️⃣ Cache in S3 → Public CloudFront URL
  const imgBuffer = await fetch(data[0].url).then(r => r.arrayBuffer());
  const key = `recipes/${recipe._id}.png`;

  await s3.putObject({
    Bucket: process.env.AWS_S3_BUCKET,
    Key: key,
    Body: Buffer.from(imgBuffer),
    ContentType: 'image/png',
    ACL: 'public-read'
  });

  return `https://${process.env.CLOUDFRONT_DOMAIN}/${key}`;
}
Enter fullscreen mode Exit fullscreen mode

The image prompt in plain English

“Create a high-resolution, photorealistic shot of \${recipeName} made from
\${ingredient list}. Plate it on a clean white plate with natural lighting that highlights the key ingredients.”

That one sentence:

  1. Names the dish → DALL·E picks plating style that matches the cuisine.
  2. Spells out every ingredient → improves visual accuracy (cilantro appears as garnish, etc.).
  3. Fixes environment (“clean white plate, natural lighting”) → keeps the gallery visually consistent.

Cost & performance tricks

Tweak Impact
1 image per recipe Users only need one hero; cuts DALL·E cost by ⅔.
512×512 on dev, 1024×1024 on prod Speeds local tests; full-res in production.
S3 + CloudFront First view pulls from S3; subsequent views are CDN-cached (≈ 30 ms).

With the image URL safely stored in MongoDB, the front-end just feeds it to Next.js’ <Image> component and enjoys instant, CDN-cached dishes across the app.

5 · On-demand narration with OpenAI TTS 🎧

Images are nice, but hands-free cooking is better.

Every recipe card has a Play button; the first time any user clicks it we:

  1. Check whether the recipe already has an audio URL.
  2. If not, call OpenAI TTS, store the MP3 in S3, save the URL to MongoDB.
  3. On all future clicks, we stream that cached MP3 instantly.
// RecipeDisplayModal.tsx  (simplified)
if (!recipe.audio) {
  await fetch(`/api/tts?recipeId=${recipe._id}`);   // ⬅️ generates + stores MP3 once
}
audio.src = recipe.audio ?? `/api/stream-audio?id=${recipe._id}`;
audio.load();
audio.play();
Enter fullscreen mode Exit fullscreen mode

The /api/tts route (server action)

// pages/api/tts.ts (simplified)
import { getRecipeNarrationPrompt } from '@/lib/prompts';
import { s3 } from '@/lib/aws';
import RecipeModel from '@/models/Recipe';               // Mongoose model

export default async function handler(req, res) {
  const recipe = await RecipeModel.findById(req.query.recipeId).lean();
  if (!recipe) return res.status(404).end();
  if (recipe.audio) return res.status(200).json({ url: recipe.audio });   // already cached ✅

  // 1️⃣ Build narration script
  const prompt = getRecipeNarrationPrompt(recipe);

  // 2️⃣ Call OpenAI TTS → base64 MP3
  const { audio } = await openai.audio.speech.create({
    model: 'tts-1-hd',
    voice: 'alloy',
    input: prompt,
    format: 'mp3'
  });

  // 3️⃣ Persist once in S3
  const key = `audio/${recipe._id}.mp3`;
  await s3.putObject({
    Bucket: process.env.AWS_S3_BUCKET,
    Key: key,
    Body: Buffer.from(audio, 'base64'),
    ContentType: 'audio/mpeg',
    ACL: 'public-read'
  });

  // 4️⃣ Update recipe doc with CDN URL
  const url = `https://${process.env.CLOUDFRONT_DOMAIN}/${key}`;
  await RecipeModel.updateOne({ _id: recipe._id }, { $set: { audio: url } });

  return res.status(200).json({ url });
}
Enter fullscreen mode Exit fullscreen mode

Mobile playback quirks solved

Fix Why it matters
audio.preload = 'auto' + explicit audio.load() iOS Safari ignores autoplay without it
Blob fallback (fetch → blob → ObjectURL) Handles browsers that block CORS streams
playsinline attribute Prevents full-screen takeover in iOS

Cost snapshot

  • One TTS call per unique recipe → no repeat charges
  • Avg. 90-sec script ≈ \$0.002 using tts-1-hd
  • Total TTS spend to date: <\$5 / year at current play volume

With text, image, and audio all CDN-cached, first-time load is sub-500 ms and repeat plays are instant.

6 · Zero-touch deployment on Vercel 🚀

One-click previews, automatic prod

Smart Recipe Generator ships the same way every time:

  1. Push → GitHub
  2. Vercel spins up a preview build for the PR
  3. GitHub Actions runs Cypress E2E in parallel
  4. If tests pass and the PR is merged into main, Vercel promotes that build to production

This flow gives you per-branch preview URLs (e.g.

https://smart-recipe-generator-git-feat-tags-dereje1.vercel.app) while keeping a green-tested main branch.

Vercel project settings

Setting Value
Framework Next.js 14
Build command next build (default)
Output .next
Environment variables OPENAI_API_KEY, MONGODB_URI, AWS_S3_BUCKET, etc. (added in Vercel Dashboard → Settings ▸ Environment Variables)
Custom domain smart-recipe-generator.vercel.app + CNAME for your own domain

GitHub Actions → Cypress E2E

All unit tests run inside Vercel’s build, but full-browser E2E lives in a separate CI job so a flaky UI test never blocks a deploy.

# .github/workflows/e2e.yml
name: E2E Tests (Local Dev Server)

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  cypress:
    runs-on: ubuntu-latest
    env:
      NEXT_PUBLIC_API_BASE_URL: http://localhost:3000
      MONGO_URI: mongodb://127.0.0.1:27017/dummy
      E2E: 1
    steps:
      - uses: actions/checkout@v3

      - name: Install dependencies
        run: npm ci

      - name: Run Cypress E2E tests locally
        run: npm run test:e2e
Enter fullscreen mode Exit fullscreen mode

Why separate?

  • Vercel deploys even if Cypress fails → you can still test fixes on the live preview.
  • E2E can spawn its own Mongo container without slowing Vercel builds.
  • Parallelism: build + test overlap, total CI time ≈ 4-5 min.

Zero-downtime rollbacks

vercel --prod promotes an immutable build.
Rolling back is one click in the Vercel dashboard (select an older build → Promote).

Because images, audio, and static assets live on S3 + CloudFront, switching builds never touches user files—old URLs stay valid indefinitely.

Cost at today’s scale

Service Plan Annual cost
Vercel Hobby Free (100 GB-hrs build) \$0
MongoDB Atlas M0 shared \$0
AWS S3 + CloudFront ~2 GB storage + 6 GB egress/mo \$11
OpenAI API GPT-4o, DALL·E, TTS \$39
Total ≈ \$50 / year

Costs scale linearly with generations & traffic, but CDN caching keeps egress surprisingly low.


With CI/CD locked in, new features ship the moment tests pass—no manual FTP, no servers to patch. Next we’ll wrap up with lessons learned and what’s coming next. 🎯

7 · Lessons learned & what’s next 🎯

Shipped ✅

  • Recipe-specific Chat Assistant – every dish now has its own GPT-4o mini-chat for substitutions, timing tweaks, & dietary swaps.
  • One-tap audio & hero image caching – keeps first-time AI cost low, repeat views free.

On the horizon 🚧

Idea Why it matters ETA
Step-by-step video generation Watching beats reading. As soon as Gen-AI video is both good and affordable, each recipe will auto-render a short vertical clip you can follow in the kitchen. Waiting on next-gen video APIs & pricing
Vector search with pgvector “Show me recipes similar to 🍝 Spicy Chickpea Pasta.” Semantic search > keyword matching. Q3-2025
Offline-first PWA Cache recipes, images, & MP3 locally so the app still works when Wi-Fi drops. Q4-2025

R&D rabbit holes I’m exploring 🧪

  • OpenAI’s new *Codex Agent* – automatic code refactors, test stubs & PR generation. Early tests already cut implementation time by ~50 %.
  • GPT-4o Vision – letting users snap a pantry photo and get instant recipe suggestions.
  • LangSmith + LangChain – tracing token usage to keep that \$50 / yr bill from ballooning.

👋 Your turn:

Have a feature idea? Open an issue or PR.

Want to hack on AI & Next.js? Check the good-first-issue label—contributors always welcome!


Thanks for reading! If this post helped you—or you just want more AI-powered food hacks—give the repo a ⭐️ and follow me for the next installment.


Try it / Star it ⭐

🌐 Live demo  → https://smart-recipe-generator.vercel.app

⭐️ GitHub → https://github.com/Dereje1/smart-recipe-generator

Top comments (0)