Dereje Getahun

Posted on Jun 15 • Originally published at github.com

Building an AI Recipe Generator with GPT-4o & Next.js (from prompt to plate)

#nextjs #openai #tutorial #showdev

“Choose ingredients, hit Generate, and watch GPT-4o cook.”

In this post I’ll break down how I wired Next.js 14, OpenAI, and MongoDB into a side-project that now serves 655 users.

1 · Why this project?

Confession: I don’t cook—at all. Yet I kept wishing there were a one-click way to turn leftover pantry items into a real recipe without learning to sauté. So I set out to build a tool that could:

Transform any ingredient list into a full recipe in under 30 seconds.
Show the dish & read the steps aloud (DALL·E image + TTS) so even non-cooks can follow along.
Run on hobby-tier costs (≈ $50 / year for OpenAI + AWS at today’s usage).

That side project became Smart Recipe Generator—a Next.js 14 app that feeds ingredients to GPT-4o, generates a photo with DALL·E, and streams the recipe via TTS.

Today it’s fully open-source, with 655 signed-up users who’ve created 390 recipes (and counting).

(Costs scale with usage, so heavier cooking sprees will bump that $50/year, but it stays cheap at typical traffic.)

2 · Stack overview

Layer	Tech
Front-end	React (Next.js 14), Tailwind
AI	OpenAI GPT-4o + DALL·E
Data	MongoDB Atlas, pgvector (future)
Hosting	Vercel + AWS S3/CloudFront

3 · Prompt engineering 101

Smart Recipe Generator uses 7 tiny prompts that each do one job:

Helper	What it asks GPT-4o to do
`getRecipeGenerationPrompt`	↩︎ 3 diverse recipes from the ingredient list + diet prefs
`getImageGenerationPrompt`	Craft a photorealistic DALL·E prompt for the dish
`getIngredientValidationPrompt`	Check if “cuscus” is legit & suggest corrections
`getRecipeNarrationPrompt`	Turn the JSON recipe into a 60-90 s audio script
`getRecipeTaggingPrompt`	Spit out 10 SEO-friendly one-word tags
`getChatAssistantSystemPrompt`	Set boundaries for the recipe-specific chat
(plus a minimal system prompt for ingredient-search autocomplete)

Below is the star of the show—the recipe-generation prompt (trimmed for readability):

export const getRecipeGenerationPrompt = (
  ingredients: Ingredient[],
  dietaryPreferences: DietaryPreference[]
) => `
I have the following ingredients: ${JSON.stringify(ingredients)}
${dietaryPreferences.length ? `and dietary preferences: ${dietaryPreferences.join(',')}` : ''}.
Please provide **three** delicious and diverse recipes in **valid JSON**:

[
  {
    "name": "Recipe Name",
    "ingredients": [
      { "name": "...", "quantity": "..." }
    ],
    "instructions": ["Do this first.", "Then do this."],
    "dietaryPreference": ["vegan"],
    "additionalInformation": {
      "tips": "...",
      "variations": "...",
      "servingSuggestions": "...",
      "nutritionalInformation": "..."
    }
  }
]

*No extra text, markdown, or step numbers.*  
Ensure recipes differ in cuisine/type and respect diet prefs.  
Quantities must include units.
`;

Why this design works

Exact JSON schema → zero post-processing headaches.
Three recipes per call keeps users from hammering the button.
Single prompt → cost ≈ $0.004 per generation at today’s rates.
The other six prompts keep image, audio, and chat tasks isolated, so each subsystem can evolve independently.

4 · Generating images with DALL·E

Text-only recipes are helpful—but a hero shot sells the dish.

Every time a new recipe is saved, the backend fires one image call:

// pages/api/save-recipes.ts (simplified)
import { getImageGenerationPrompt } from '@/lib/prompts';
import { s3 } from '@/lib/aws';                   // thin AWS SDK wrapper

export async function generateRecipeImage(recipe: ExtendedRecipe) {
  const prompt = getImageGenerationPrompt(recipe.name, recipe.ingredients);

  // 1️⃣ Call DALL·E 3
  const { data } = await openai.images.generate({
    model: 'dall-e-3',
    prompt,
    size: '1024x1024',
    n: 1,
    response_format: 'url'
  });

  // 2️⃣ Cache in S3 → Public CloudFront URL
  const imgBuffer = await fetch(data[0].url).then(r => r.arrayBuffer());
  const key = `recipes/${recipe._id}.png`;

  await s3.putObject({
    Bucket: process.env.AWS_S3_BUCKET,
    Key: key,
    Body: Buffer.from(imgBuffer),
    ContentType: 'image/png',
    ACL: 'public-read'
  });

  return `https://${process.env.CLOUDFRONT_DOMAIN}/${key}`;
}

The image prompt in plain English

“Create a high-resolution, photorealistic shot of \${recipeName} made from
\${ingredient list}. Plate it on a clean white plate with natural lighting that highlights the key ingredients.”

That one sentence:

Names the dish → DALL·E picks plating style that matches the cuisine.
Spells out every ingredient → improves visual accuracy (cilantro appears as garnish, etc.).
Fixes environment (“clean white plate, natural lighting”) → keeps the gallery visually consistent.

Cost & performance tricks

Tweak	Impact
1 image per recipe	Users only need one hero; cuts DALL·E cost by ⅔.
512×512 on dev, 1024×1024 on prod	Speeds local tests; full-res in production.
S3 + CloudFront	First view pulls from S3; subsequent views are CDN-cached (≈ 30 ms).

With the image URL safely stored in MongoDB, the front-end just feeds it to Next.js’ <Image> component and enjoys instant, CDN-cached dishes across the app.

5 · On-demand narration with OpenAI TTS 🎧

Images are nice, but hands-free cooking is better.

Every recipe card has a Play button; the first time any user clicks it we:

Check whether the recipe already has an audio URL.
If not, call OpenAI TTS, store the MP3 in S3, save the URL to MongoDB.
On all future clicks, we stream that cached MP3 instantly.

// RecipeDisplayModal.tsx  (simplified)
if (!recipe.audio) {
  await fetch(`/api/tts?recipeId=${recipe._id}`);   // ⬅️ generates + stores MP3 once
}
audio.src = recipe.audio ?? `/api/stream-audio?id=${recipe._id}`;
audio.load();
audio.play();

The `/api/tts` route (server action)

// pages/api/tts.ts (simplified)
import { getRecipeNarrationPrompt } from '@/lib/prompts';
import { s3 } from '@/lib/aws';
import RecipeModel from '@/models/Recipe';               // Mongoose model

export default async function handler(req, res) {
  const recipe = await RecipeModel.findById(req.query.recipeId).lean();
  if (!recipe) return res.status(404).end();
  if (recipe.audio) return res.status(200).json({ url: recipe.audio });   // already cached ✅

  // 1️⃣ Build narration script
  const prompt = getRecipeNarrationPrompt(recipe);

  // 2️⃣ Call OpenAI TTS → base64 MP3
  const { audio } = await openai.audio.speech.create({
    model: 'tts-1-hd',
    voice: 'alloy',
    input: prompt,
    format: 'mp3'
  });

  // 3️⃣ Persist once in S3
  const key = `audio/${recipe._id}.mp3`;
  await s3.putObject({
    Bucket: process.env.AWS_S3_BUCKET,
    Key: key,
    Body: Buffer.from(audio, 'base64'),
    ContentType: 'audio/mpeg',
    ACL: 'public-read'
  });

  // 4️⃣ Update recipe doc with CDN URL
  const url = `https://${process.env.CLOUDFRONT_DOMAIN}/${key}`;
  await RecipeModel.updateOne({ _id: recipe._id }, { $set: { audio: url } });

  return res.status(200).json({ url });
}

Mobile playback quirks solved

Fix	Why it matters
`audio.preload = 'auto'` + explicit `audio.load()`	iOS Safari ignores `autoplay` without it
Blob fallback (`fetch → blob → ObjectURL`)	Handles browsers that block CORS streams
`playsinline` attribute	Prevents full-screen takeover in iOS

Cost snapshot

One TTS call per unique recipe → no repeat charges
Avg. 90-sec script ≈ \$0.002 using tts-1-hd
Total TTS spend to date: <\$5 / year at current play volume

With text, image, and audio all CDN-cached, first-time load is sub-500 ms and repeat plays are instant.

6 · Zero-touch deployment on Vercel 🚀

One-click previews, automatic prod

Smart Recipe Generator ships the same way every time:

Push → GitHub
Vercel spins up a preview build for the PR
GitHub Actions runs Cypress E2E in parallel
If tests pass and the PR is merged into main, Vercel promotes that build to production

This flow gives you per-branch preview URLs (e.g.

https://smart-recipe-generator-git-feat-tags-dereje1.vercel.app) while keeping a green-tested main branch.

Vercel project settings

Setting	Value
Framework	Next.js 14
Build command	`next build` (default)
Output	`.next`
Environment variables	`OPENAI_API_KEY`, `MONGODB_URI`, `AWS_S3_BUCKET`, etc. (added in Vercel Dashboard → Settings ▸ Environment Variables)
Custom domain	`smart-recipe-generator.vercel.app` + `CNAME` for your own domain

GitHub Actions → Cypress E2E

All unit tests run inside Vercel’s build, but full-browser E2E lives in a separate CI job so a flaky UI test never blocks a deploy.

# .github/workflows/e2e.yml
name: E2E Tests (Local Dev Server)

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  cypress:
    runs-on: ubuntu-latest
    env:
      NEXT_PUBLIC_API_BASE_URL: http://localhost:3000
      MONGO_URI: mongodb://127.0.0.1:27017/dummy
      E2E: 1
    steps:
      - uses: actions/checkout@v3

      - name: Install dependencies
        run: npm ci

      - name: Run Cypress E2E tests locally
        run: npm run test:e2e

Why separate?

Vercel deploys even if Cypress fails → you can still test fixes on the live preview.
E2E can spawn its own Mongo container without slowing Vercel builds.
Parallelism: build + test overlap, total CI time ≈ 4-5 min.

Zero-downtime rollbacks

vercel --prod promotes an immutable build.
Rolling back is one click in the Vercel dashboard (select an older build → Promote).

Because images, audio, and static assets live on S3 + CloudFront, switching builds never touches user files—old URLs stay valid indefinitely.

Cost at today’s scale

Service	Plan	Annual cost
Vercel Hobby	Free (100 GB-hrs build)	\$0
MongoDB Atlas	M0 shared	\$0
AWS S3 + CloudFront	~2 GB storage + 6 GB egress/mo	\$11
OpenAI API	GPT-4o, DALL·E, TTS	\$39
Total	—	≈ \$50 / year

Costs scale linearly with generations & traffic, but CDN caching keeps egress surprisingly low.

With CI/CD locked in, new features ship the moment tests pass—no manual FTP, no servers to patch. Next we’ll wrap up with lessons learned and what’s coming next. 🎯

7 · Lessons learned & what’s next 🎯

Shipped ✅

Recipe-specific Chat Assistant – every dish now has its own GPT-4o mini-chat for substitutions, timing tweaks, & dietary swaps.
One-tap audio & hero image caching – keeps first-time AI cost low, repeat views free.

On the horizon 🚧

Idea	Why it matters	ETA
Step-by-step video generation	Watching beats reading. As soon as Gen-AI video is both good and affordable, each recipe will auto-render a short vertical clip you can follow in the kitchen.	Waiting on next-gen video APIs & pricing
Vector search with pgvector	“Show me recipes similar to 🍝 Spicy Chickpea Pasta.” Semantic search > keyword matching.	Q3-2025
Offline-first PWA	Cache recipes, images, & MP3 locally so the app still works when Wi-Fi drops.	Q4-2025

R&D rabbit holes I’m exploring 🧪

OpenAI’s new *Codex Agent* – automatic code refactors, test stubs & PR generation. Early tests already cut implementation time by ~50 %.
GPT-4o Vision – letting users snap a pantry photo and get instant recipe suggestions.
LangSmith + LangChain – tracing token usage to keep that \$50 / yr bill from ballooning.

👋 Your turn:

Have a feature idea? Open an issue or PR.

Want to hack on AI & Next.js? Check the good-first-issue label—contributors always welcome!

Thanks for reading! If this post helped you—or you just want more AI-powered food hacks—give the repo a ⭐️ and follow me for the next installment.

Try it / Star it ⭐

🌐 Live demo → https://smart-recipe-generator.vercel.app

⭐️ GitHub → https://github.com/Dereje1/smart-recipe-generator

DEV Community

Building an AI Recipe Generator with GPT-4o & Next.js (from prompt to plate)

1 · Why this project?

2 · Stack overview

3 · Prompt engineering 101

Why this design works

4 · Generating images with DALL·E

The image prompt in plain English

Cost & performance tricks

5 · On-demand narration with OpenAI TTS 🎧

The `/api/tts` route (server action)

Mobile playback quirks solved

Cost snapshot

6 · Zero-touch deployment on Vercel 🚀

One-click previews, automatic prod

Vercel project settings

GitHub Actions → Cypress E2E

Zero-downtime rollbacks

Cost at today’s scale

7 · Lessons learned & what’s next 🎯

Shipped ✅

On the horizon 🚧

R&D rabbit holes I’m exploring 🧪

Try it / Star it ⭐

Top comments (0)

1 · Why this project?

2 · Stack overview

3 · Prompt engineering 101

Why this design works

4 · Generating images with DALL·E

The image prompt in plain English

Cost & performance tricks

5 · On-demand narration with OpenAI TTS 🎧

The /api/tts route (server action)

Mobile playback quirks solved

Cost snapshot

6 · Zero-touch deployment on Vercel 🚀

One-click previews, automatic prod

Vercel project settings

GitHub Actions → Cypress E2E

Zero-downtime rollbacks

Cost at today’s scale

7 · Lessons learned & what’s next 🎯

Shipped ✅

On the horizon 🚧

R&D rabbit holes I’m exploring 🧪

Try it / Star it ⭐

The `/api/tts` route (server action)