This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
PantryLens — a free, installable Progressive Web App that turns a photo of your fridge or pantry into a complete, ready-to-cook recipe in seconds.
The idea is simple: food waste is a real problem, and one of its biggest causes is not knowing what to cook with what you already have. PantryLens removes that friction entirely. You open the app, snap up to 3 photos of your ingredients, tap Generate Recipe, and watch a full recipe stream to your screen — live, token by token — before you even put your phone down.
Key features:
- 📷 Camera capture, file upload, or drag-and-drop (up to 3 images)
- ⚡ Recipe streams live to the screen as Gemma 4 generates it
- 📱 Installable PWA — works on iOS and Android home screens, no App Store needed
- 🔒 No account, no login, no data stored — your photos never leave the request cycle
- 🆓 Completely free to use
Demo
Live app: https://pantry-lens-one.vercel.app
Install on your mobile home screen: Open the web app in the browser and use the built-in "Add to Home Screen" feature
Code
klee1611
/
PantryLens
Turn your fridge photos into recipes — PWA powered by Google Gemma 4 via OpenRouter. Built for the Google Gemma 4 Hackathon on DEV.
PantryLens
Snap a photo of your fridge or pantry — get a recipe in seconds.
PantryLens is a progressive web app (PWA) that uses AI vision (Google Gemma 4) to identify ingredients from photos and stream a complete recipe directly to the screen, token by token.
How it works
- Capture — take a photo with your camera, upload from your gallery, or drag and drop (up to 3 images)
- Compress — the browser Canvas API resizes each image client-side to ≤1024 px before upload
- Analyze — a Next.js Edge Function proxies the images to OpenRouter's vision model
- Stream — the recipe streams back token-by-token, rendered progressively as Markdown
Demo
demo.mp4
Pin PWA to mobile home screen
Tech stack
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router, Edge Runtime) |
| UI | React 19, Tailwind CSS, react-markdown |
| AI | OpenRouter — google/gemma-4-26b-a4b-it
|
| Rate limiting | Upstash Redis (sliding window, 5 req/IP/hour) |
| Testing | Jest 29, Testing Library, |
The project is fully open source under the MIT license. The stack:
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router, Edge Runtime) |
| UI | React 19, Tailwind CSS, react-markdown |
| AI model | Gemma 4 via OpenRouter |
| Rate limiting | Upstash Redis (sliding window, 5 req/IP/hour) |
| Deploy | Vercel |
How I Used Gemma 4
The model: google/gemma-4-26b-a4b-it (E4B)
I used Gemma 4 E4B — the 26B total parameter Mixture-of-Experts model with 4B active parameters per forward pass, instruction-tuned. I accessed it through OpenRouter using the model ID google/gemma-4-26b-a4b-it.
The E4B variant was the right fit for three reasons:
- Vision capability — PantryLens is fundamentally a vision task. The model needs to look at a photo of a messy fridge shelf and accurately identify eggs, half-used condiments, wilting vegetables, and leftover containers. Gemma 4's multimodal architecture handles this reliably, even for cluttered or poorly lit photos.
- Instruction following — generating a well-structured recipe means strictly following a Markdown template with headings, bullet lists, and numbered steps. The E4B instruction-tuned variant follows these formatting constraints consistently, which is critical for the progressive rendering to look correct as it streams.
- Speed vs. quality balance — the MoE architecture means E4B activates only 4B parameters per token while drawing on the full 26B parameter knowledge base. In practice, this produces recipe quality close to the denser models at latency that works well for streaming UX.
Compare different models of Gemma 4 for this use case:
Why reject the Edge 2B (E2B) model? While E2B is phenomenal for air-gapped, localized execution, its compressed parameter count lacks the deep "world knowledge" required for deterministic, complex visual-to-JSON routing.
Why reject the Full Precision 31B model? Uncompressed FP16/BF16 models demand massive VRAM clusters. In a serverless architecture, this translates to an unacceptable Time-to-First-Token (TTFT), virtually guaranteeing HTTP 504 gateway timeouts before the proxy can return data.
The Optimal Route (26B-A4B): Utilizing the 4-bit Activation-Aware Quantized (A4B) variant delivered the exact sweet spot. It retains near-frontier reasoning capabilities for complex visual extraction, but the 4-bit memory footprint drastically accelerates upstream inference, enabling ultra-low latency token generation.
The architecture: an opaque streaming proxy
The most interesting engineering decision was how the model is integrated. The frontend never touches the API key or sees the system prompt. Here's the full flow:
The Next.js API route runs on the Edge Runtime — this is essential. Standard serverless functions time out after 10 seconds, which isn't enough for a full recipe stream. The Edge Runtime has no timeout on streaming responses, so Gemma 4 can take as long as it needs and every token pipes through immediately.
Client-side image compression was another non-obvious requirement. Vercel has a payload limit on serverless functions. Raw iPhone photos are 4–12 MB each — three of them would blow the limit before a single byte of AI processing begins. The solution: compress each image on-device using the browser's Canvas API (resize to ≤1024px, JPEG at 75% quality) before the upload. A 10 MB photo becomes ~150 KB. This happens invisibly in under a second on any modern device, including older iPhones.
The system prompt
Getting Gemma 4 to output clean, consistently structured Markdown for streaming required careful prompt engineering. The key lessons:
- Explicit blank-line rules: without being told explicitly that every ### heading must have a blank line before and after it, the model occasionally merged headings and body text onto the same line, which broke the Markdown renderer.
- Numbered list enforcement: instructions would sometimes collapse into a single paragraph unless the prompt explicitly stated "each step must be on its own line — never merge steps into a paragraph."
- Pantry staples assumption: by default the model sometimes refused to generate a recipe if it couldn't see salt or oil in the photo. Telling it to assume common pantry staples are always on hand (even if not visible) fixed this.
- Single-response constraint: without explicit instruction to never ask follow-up questions, the model occasionally responded with clarifying questions ("Are these all the ingredients?") instead of generating the recipe immediately.
The final prompt includes a section of explicit formatting rules before the template:
FORMATTING RULES — follow exactly:
- The recipe title uses ## (two hashes). It must be alone on its own line.
- Each section header uses ### (three hashes). It must be alone on its own line.
- Every heading must have one blank line before it AND one blank line after it.
- The instructions section is a numbered list. Each step is on its own line. Never merge steps into a paragraph.
Application Design Wrapping Up
The architecture here — Edge Function as opaque AI proxy, client-side compression, streaming passthrough — is a reusable pattern for any vision AI application that needs to:
- Keep API credentials out of the frontend
- Handle large image payloads within serverless limits
- Deliver a live "typing" UX without server timeouts
PantryLens is the simplest possible version of this pattern, which makes the code easy to read and adapt.

Top comments (0)