“Choose ingredients, hit Generate, and watch GPT-4o cook.”
In this post I’ll break down how I wired Next.js 14, OpenAI, and MongoDB into a side-project that now serves 655 users.
1 · Why this project?
Confession: I don’t cook—at all. Yet I kept wishing there were a one-click way to turn leftover pantry items into a real recipe without learning to sauté. So I set out to build a tool that could:
- Transform any ingredient list into a full recipe in under 30 seconds.
- Show the dish & read the steps aloud (DALL·E image + TTS) so even non-cooks can follow along.
- Run on hobby-tier costs (≈ $50 / year for OpenAI + AWS at today’s usage).
That side project became Smart Recipe Generator—a Next.js 14 app that feeds ingredients to GPT-4o, generates a photo with DALL·E, and streams the recipe via TTS.
Today it’s fully open-source, with 655 signed-up users who’ve created 390 recipes (and counting).
(Costs scale with usage, so heavier cooking sprees will bump that $50/year, but it stays cheap at typical traffic.)
2 · Stack overview
Layer | Tech |
---|---|
Front-end | React (Next.js 14), Tailwind |
AI | OpenAI GPT-4o + DALL·E |
Data | MongoDB Atlas, pgvector (future) |
Hosting | Vercel + AWS S3/CloudFront |
3 · Prompt engineering 101
Smart Recipe Generator uses 7 tiny prompts that each do one job:
Helper | What it asks GPT-4o to do |
---|---|
getRecipeGenerationPrompt |
↩︎ 3 diverse recipes from the ingredient list + diet prefs |
getImageGenerationPrompt |
Craft a photorealistic DALL·E prompt for the dish |
getIngredientValidationPrompt |
Check if “cuscus” is legit & suggest corrections |
getRecipeNarrationPrompt |
Turn the JSON recipe into a 60-90 s audio script |
getRecipeTaggingPrompt |
Spit out 10 SEO-friendly one-word tags |
getChatAssistantSystemPrompt |
Set boundaries for the recipe-specific chat |
(plus a minimal system prompt for ingredient-search autocomplete) |
Below is the star of the show—the recipe-generation prompt (trimmed for readability):
export const getRecipeGenerationPrompt = (
ingredients: Ingredient[],
dietaryPreferences: DietaryPreference[]
) => `
I have the following ingredients: ${JSON.stringify(ingredients)}
${dietaryPreferences.length ? `and dietary preferences: ${dietaryPreferences.join(',')}` : ''}.
Please provide **three** delicious and diverse recipes in **valid JSON**:
[
{
"name": "Recipe Name",
"ingredients": [
{ "name": "...", "quantity": "..." }
],
"instructions": ["Do this first.", "Then do this."],
"dietaryPreference": ["vegan"],
"additionalInformation": {
"tips": "...",
"variations": "...",
"servingSuggestions": "...",
"nutritionalInformation": "..."
}
}
]
*No extra text, markdown, or step numbers.*
Ensure recipes differ in cuisine/type and respect diet prefs.
Quantities must include units.
`;
Why this design works
Exact JSON schema → zero post-processing headaches.
Three recipes per call keeps users from hammering the button.
Single prompt → cost ≈ $0.004 per generation at today’s rates.
The other six prompts keep image, audio, and chat tasks isolated, so each subsystem can evolve independently.
4 · Generating images with DALL·E
Text-only recipes are helpful—but a hero shot sells the dish.
Every time a new recipe is saved, the backend fires one image call:
// pages/api/save-recipes.ts (simplified)
import { getImageGenerationPrompt } from '@/lib/prompts';
import { s3 } from '@/lib/aws'; // thin AWS SDK wrapper
export async function generateRecipeImage(recipe: ExtendedRecipe) {
const prompt = getImageGenerationPrompt(recipe.name, recipe.ingredients);
// 1️⃣ Call DALL·E 3
const { data } = await openai.images.generate({
model: 'dall-e-3',
prompt,
size: '1024x1024',
n: 1,
response_format: 'url'
});
// 2️⃣ Cache in S3 → Public CloudFront URL
const imgBuffer = await fetch(data[0].url).then(r => r.arrayBuffer());
const key = `recipes/${recipe._id}.png`;
await s3.putObject({
Bucket: process.env.AWS_S3_BUCKET,
Key: key,
Body: Buffer.from(imgBuffer),
ContentType: 'image/png',
ACL: 'public-read'
});
return `https://${process.env.CLOUDFRONT_DOMAIN}/${key}`;
}
The image prompt in plain English
“Create a high-resolution, photorealistic shot of \${recipeName} made from
\${ingredient list}. Plate it on a clean white plate with natural lighting that highlights the key ingredients.”
That one sentence:
- Names the dish → DALL·E picks plating style that matches the cuisine.
- Spells out every ingredient → improves visual accuracy (cilantro appears as garnish, etc.).
- Fixes environment (“clean white plate, natural lighting”) → keeps the gallery visually consistent.
Cost & performance tricks
Tweak | Impact |
---|---|
1 image per recipe | Users only need one hero; cuts DALL·E cost by ⅔. |
512×512 on dev, 1024×1024 on prod | Speeds local tests; full-res in production. |
S3 + CloudFront | First view pulls from S3; subsequent views are CDN-cached (≈ 30 ms). |
With the image URL safely stored in MongoDB, the front-end just feeds it to Next.js’ <Image>
component and enjoys instant, CDN-cached dishes across the app.
5 · On-demand narration with OpenAI TTS 🎧
Images are nice, but hands-free cooking is better.
Every recipe card has a Play button; the first time any user clicks it we:
- Check whether the recipe already has an
audio
URL. - If not, call OpenAI TTS, store the MP3 in S3, save the URL to MongoDB.
- On all future clicks, we stream that cached MP3 instantly.
// RecipeDisplayModal.tsx (simplified)
if (!recipe.audio) {
await fetch(`/api/tts?recipeId=${recipe._id}`); // ⬅️ generates + stores MP3 once
}
audio.src = recipe.audio ?? `/api/stream-audio?id=${recipe._id}`;
audio.load();
audio.play();
The /api/tts
route (server action)
// pages/api/tts.ts (simplified)
import { getRecipeNarrationPrompt } from '@/lib/prompts';
import { s3 } from '@/lib/aws';
import RecipeModel from '@/models/Recipe'; // Mongoose model
export default async function handler(req, res) {
const recipe = await RecipeModel.findById(req.query.recipeId).lean();
if (!recipe) return res.status(404).end();
if (recipe.audio) return res.status(200).json({ url: recipe.audio }); // already cached ✅
// 1️⃣ Build narration script
const prompt = getRecipeNarrationPrompt(recipe);
// 2️⃣ Call OpenAI TTS → base64 MP3
const { audio } = await openai.audio.speech.create({
model: 'tts-1-hd',
voice: 'alloy',
input: prompt,
format: 'mp3'
});
// 3️⃣ Persist once in S3
const key = `audio/${recipe._id}.mp3`;
await s3.putObject({
Bucket: process.env.AWS_S3_BUCKET,
Key: key,
Body: Buffer.from(audio, 'base64'),
ContentType: 'audio/mpeg',
ACL: 'public-read'
});
// 4️⃣ Update recipe doc with CDN URL
const url = `https://${process.env.CLOUDFRONT_DOMAIN}/${key}`;
await RecipeModel.updateOne({ _id: recipe._id }, { $set: { audio: url } });
return res.status(200).json({ url });
}
Mobile playback quirks solved
Fix | Why it matters |
---|---|
audio.preload = 'auto' + explicit audio.load()
|
iOS Safari ignores autoplay without it |
Blob fallback (fetch → blob → ObjectURL ) |
Handles browsers that block CORS streams |
playsinline attribute |
Prevents full-screen takeover in iOS |
Cost snapshot
- One TTS call per unique recipe → no repeat charges
- Avg. 90-sec script ≈ \$0.002 using
tts-1-hd
- Total TTS spend to date: <\$5 / year at current play volume
With text, image, and audio all CDN-cached, first-time load is sub-500 ms and repeat plays are instant.
6 · Zero-touch deployment on Vercel 🚀
One-click previews, automatic prod
Smart Recipe Generator ships the same way every time:
- Push → GitHub
- Vercel spins up a preview build for the PR
- GitHub Actions runs Cypress E2E in parallel
- If tests pass and the PR is merged into
main
, Vercel promotes that build to production
This flow gives you per-branch preview URLs (e.g.
https://smart-recipe-generator-git-feat-tags-dereje1.vercel.app
) while keeping a green-tested main branch.
Vercel project settings
Setting | Value |
---|---|
Framework | Next.js 14 |
Build command |
next build (default)
|
Output | .next |
Environment variables |
OPENAI_API_KEY , MONGODB_URI , AWS_S3_BUCKET , etc. (added in Vercel Dashboard → Settings ▸ Environment Variables) |
Custom domain |
smart-recipe-generator.vercel.app + CNAME for your own domain |
GitHub Actions → Cypress E2E
All unit tests run inside Vercel’s build, but full-browser E2E lives in a separate CI job so a flaky UI test never blocks a deploy.
# .github/workflows/e2e.yml
name: E2E Tests (Local Dev Server)
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
cypress:
runs-on: ubuntu-latest
env:
NEXT_PUBLIC_API_BASE_URL: http://localhost:3000
MONGO_URI: mongodb://127.0.0.1:27017/dummy
E2E: 1
steps:
- uses: actions/checkout@v3
- name: Install dependencies
run: npm ci
- name: Run Cypress E2E tests locally
run: npm run test:e2e
Why separate?
- Vercel deploys even if Cypress fails → you can still test fixes on the live preview.
- E2E can spawn its own Mongo container without slowing Vercel builds.
- Parallelism: build + test overlap, total CI time ≈ 4-5 min.
Zero-downtime rollbacks
vercel --prod
promotes an immutable build.
Rolling back is one click in the Vercel dashboard (select an older build → Promote).
Because images, audio, and static assets live on S3 + CloudFront, switching builds never touches user files—old URLs stay valid indefinitely.
Cost at today’s scale
Service | Plan | Annual cost |
---|---|---|
Vercel Hobby | Free (100 GB-hrs build) | \$0 |
MongoDB Atlas | M0 shared | \$0 |
AWS S3 + CloudFront | ~2 GB storage + 6 GB egress/mo | \$11 |
OpenAI API | GPT-4o, DALL·E, TTS | \$39 |
Total | — | ≈ \$50 / year |
Costs scale linearly with generations & traffic, but CDN caching keeps egress surprisingly low.
With CI/CD locked in, new features ship the moment tests pass—no manual FTP, no servers to patch. Next we’ll wrap up with lessons learned and what’s coming next. 🎯
7 · Lessons learned & what’s next 🎯
Shipped ✅
- Recipe-specific Chat Assistant – every dish now has its own GPT-4o mini-chat for substitutions, timing tweaks, & dietary swaps.
- One-tap audio & hero image caching – keeps first-time AI cost low, repeat views free.
On the horizon 🚧
Idea | Why it matters | ETA |
---|---|---|
Step-by-step video generation | Watching beats reading. As soon as Gen-AI video is both good and affordable, each recipe will auto-render a short vertical clip you can follow in the kitchen. | Waiting on next-gen video APIs & pricing |
Vector search with pgvector | “Show me recipes similar to 🍝 Spicy Chickpea Pasta.” Semantic search > keyword matching. | Q3-2025 |
Offline-first PWA | Cache recipes, images, & MP3 locally so the app still works when Wi-Fi drops. | Q4-2025 |
R&D rabbit holes I’m exploring 🧪
- OpenAI’s new *Codex Agent* – automatic code refactors, test stubs & PR generation. Early tests already cut implementation time by ~50 %.
- GPT-4o Vision – letting users snap a pantry photo and get instant recipe suggestions.
- LangSmith + LangChain – tracing token usage to keep that \$50 / yr bill from ballooning.
👋 Your turn:
Have a feature idea? Open an issue or PR.
Want to hack on AI & Next.js? Check the good-first-issue
label—contributors always welcome!
Thanks for reading! If this post helped you—or you just want more AI-powered food hacks—give the repo a ⭐️ and follow me for the next installment.
Try it / Star it ⭐
🌐 Live demo → https://smart-recipe-generator.vercel.app
⭐️ GitHub → https://github.com/Dereje1/smart-recipe-generator
Top comments (0)