This is a submission for the Google AI Studio Multimodal Challenge
What I Built
FoodSnap Tutor is a frontend-only app that turns any food photo into instant cooking guidance. Upload an image and the app:
- Detects whether the image is food
- Identifies the likely dish name (with alternatives when uncertain)
- Generates a step-by-step recipe
- Estimates nutrition per serving (calories, protein, carbs, fat)
- Suggests a healthier variation and friendly moderation tips
Tech stack: React 19, TypeScript, Vite 6, Tailwind (CDN), and @google/genai calling Gemini 2.5 Flash. Everything runs in the browser β no backend required.
Demo
Live demo: https://foodsnap-tutor.vercel.app
Youtube video demo: https://youtu.be/criJO6OhDY0
Github repository: https://github.com/longphanquangminh/foodsnap-tutor
Screenshots:
β¨ Error screen:
How I Used Google AI Studio
I leveraged the @google/genai SDK to call Gemini 2.5 Flash with:
- An inline image part (base64-encoded upload).
- A structured prompt telling the model to reply in JSON.
- A response schema so the SDK validates the output client-side.
import { GoogleGenAI, Type } from "@google/genai";
// Strict JSON schema
const schema = {
type: Type.OBJECT,
properties: {
isFood: { type: Type.BOOLEAN },
dishName: { type: Type.STRING },
recipe: {
type: Type.OBJECT,
properties: {
ingredients: { type: Type.ARRAY, items: { type: Type.STRING } },
steps: { type: Type.ARRAY, items: { type: Type.STRING } }
},
required: ["ingredients", "steps"]
},
nutrition: {
type: Type.OBJECT,
properties: {
calories: { type: Type.STRING },
protein: { type: Type.STRING },
carbs: { type: Type.STRING },
fat: { type: Type.STRING }
},
required: ["calories", "protein", "carbs", "fat"]
},
healthierVariation: { type: Type.STRING },
friendlyAdvice: { type: Type.STRING }
},
required: ["isFood", "dishName", "recipe", "nutrition", "healthierVariation"]
};
// Convert upload to an image part
const fileToPart = async (file: File) => {
const dataUrl = await new Promise<string>((res, rej) => {
const r = new FileReader();
r.readAsDataURL(file);
r.onload = () => res(r.result as string);
r.onerror = rej;
});
const [meta, data] = dataUrl.split(",");
const mime = meta.match(/:(.*?);/)?.[1] ?? "image/jpeg";
return { inlineData: { mimeType: mime, data } };
};
export async function analyzeFoodImage(image: File) {
const ai = new GoogleGenAI({ apiKey: process.env.API_KEY! });
const imagePart = await fileToPart(image);
const prompt = "You are FoodSnap Tutor, an expert AI chef and nutritionist...";
const res = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: { parts: [imagePart, { text: prompt }] },
config: {
responseMimeType: "application/json",
responseSchema: schema
}
});
return JSON.parse(res.text);
}
Environment wiring (Vite):
// vite.config.ts
import { defineConfig, loadEnv } from "vite";
export default defineConfig(({ mode }) => {
const env = loadEnv(mode, ".", "");
return {
define: {
"process.env.API_KEY": JSON.stringify(env.GEMINI_API_KEY)
}
};
});
# .env.local
GEMINI_API_KEY=ai-xxxxxxxxxxxxxxxx
Multimodal Features
- Vision input: Users upload a dish photo that the SDK sends as inline image data.
- Structured output: Gemini returns validated JSON (recipe, nutrition, advice) for deterministic UI rendering.
- Single multimodal call: Image + text prompt β cohesive culinary analysis.
- UX touches: Drag-and-drop upload, instant preview, animated results, retry flow.
- Robustness: UI handles blocked content or JSON parse errors gracefully with friendly messages.
Notes
β’ Frontend-only: In production restrict the API key to allowed origins or proxy requests through a lightweight backend.
β’ Built with React + Vite + Tailwind for fast iteration and static deployment.
Thanks for reading! If youβd like to try it out or peek at the code, check the demo and repo links above. π
Top comments (0)