This is a submission for the Google AI Studio Multimodal Challenge
What I Built
ShopHealth Assistant is an AI-powered mobile application that revolutionizes how consumers understand product ingredients. By simply uploading an image or using live camera capture, users can instantly analyze packaged products to identify potential health risks, allergens, and additives. The app provides a comprehensive health score (0-100) with color-coded risk indicators and detailed ingredient breakdowns, making informed shopping decisions effortless.
The problem this solves is critical: most consumers struggle to understand complex ingredient lists and identify potential health concerns in packaged foods. ShopHealth Assistant democratizes this knowledge by providing instant, intelligent analysis that anyone can understand.
Demo
Github Repo: https://github.com/mr-teslaa/ai-shop-health-assistant/
The application features a clean, intuitive interface with three main interaction modes:
- Upload Image: Traditional image upload functionality for analyzing product labels
- Take Picture: Camera integration for capturing product images on-the-go
- Realtime Scan: Live camera preview with real-time analysis and audio feedback
Key features demonstrated:
- Risk score calculation with color-coded badges (green/yellow/red)
- Detailed ingredient categorization (additives, sweeteners, allergens, etc.)
- Statistical breakdown showing total ingredients, additives, and sweeteners
- Interactive drawer interface for seamless user experience
- Audio narration for accessibility and hands-free operation
Note: The screenshots show the app's sophisticated UI with risk scoring, ingredient analysis, and the three-tab navigation system for different scanning modes.
First Prompt
You are "ShopHealth Assistant" — an expert product + ML engineer and technical writer. You will produce precise, machine-readable outputs as well as human explanations. When asked to parse an image or extract ingredients, prefer returning structured JSON matching the schema provided. Be explicit about confidence and any ambiguity. When generating code, produce working, copy-paste-ready snippets in Python (FastAPI) and React Native (Expo). Always include short human-friendly text summaries for users and a short developer note describing assumptions.
Task A — Build spec / implementation plan:
Produce a full implementation-ready specification and step-by-step plan for a mobile/web app "ShopHealth" that:
- accepts an image upload or live camera capture of a packaged product (or a barcode),
- extracts product name, barcode (if present), and the ingredients list,
- normalizes ingredients and flags allergens/additives and other health risks based on a per-user profile,
- returns a human health report plus a structured JSON result (schema provided below).
Deliver:
1. Short product description & prioritized MVP feature list (1-paragraph + list).
2. System architecture bullet list: mobile, backend, ML pipeline, 3rd-party APIs and storage.
3. Exact inference pipeline (preprocess → OCR → parse → NER → knowledge lookup → scoring → report).
4. A recommended tech stack, including suggestions for on-device quick-scan vs cloud deep-scan.
5. REST endpoints and JSON contracts (FastAPI style).
6. Example React Native camera snippet and FastAPI upload endpoint.
7. A structured JSON response schema for parse results (see schema below).
8. Testing checklist & acceptance criteria for MVP.
Task B — When given an image (or OCRed text), always return both:
- A human text summary suitable for showing to a user on mobile (one short paragraph + one-line badge color).
- A structured JSON object following the `scan_result` schema below.
Important behavior:
- If confidence in any key extraction is <70%, mark that field `confidence_low: true` and include suggestions for re-capture (lighting, closer, barcode).
- If the product is matched via barcode to an external DB, include `product_source` and link to the matched DB id.
- For allergen matches, show the exact matched token and the span (substring) from the OCRed text, plus confidence.
- Use the `parsed_ingredients` array to return normalized tokens and categories (allergen, additive, sweetener, preservative, filler, unknown).
Return only valid JSON for the structured output field when asked to produce JSON — do not include explanatory text inside that JSON object. Provide human-friendly text separately.
---
## scan_result JSON schema
{
"scan_id": "string",
"product": {
"barcode": "string or null",
"name": "string or null",
"brand": "string or null",
"product_source": "string or null"
},
"raw_ocr_text": "string or null",
"parsed_ingredients": [
{
"original": "string",
"normalized": "string",
"category": "allergen | additive | sweetener | preservative | filler | spice | ingredient | unknown",
"is_allergen": true,
"is_added_sugar": false,
"e_number": "string or null",
"confidence": 0.0,
"text_span": "string or null"
}
],
"flags": {
"allergens": ["string"],
"high_sugar": true,
"high_salt": false,
"additives_present": ["string"]
},
"risk_score": 0.0,
"confidence": 0.0,
"confidence_low": false,
"human_summary": "string",
"badge": "green | yellow | red",
"suggestions": ["string"]
}
Second Prompt
instead of showing "Ingredient Analysis", show them in stats card with appropriate icons in grid auto layout.
User lucid-icons for all icon.
Third Prompt
In our grid stats card, below the stats we want to show all the ingredients name inside appropriate card, for example Sodium Citrate: Additive, Glucose Anhydrous: Sweetener
Fourth Prompt
instead of having only image upload option add a tab section, where there will be 2 tabs: Upload Image, Take Live Picture
Upload Image: User can upload image just we are doing now and get the analysis report as we have now Take
Live Picture: User can use their camera to take live picture and get the analysis report as we have now
Fifth Prompt
Now add a 3rd tab for Realtime analysis, when user go to realtime analysis tab it will enable the user camera and show full screen camera preview and a minimize drawer at bottom. when user swipe up the drawer will take the half of the screen and show the get the analysis report as we have now, if user swipe up again the drawer will take full screen to see the better visibility of product analysis, in drawer user can swipe down to minimize that again or user can click the x button at top right corner of the drawer to minimize the drawer in our Realtime analysis we will have a Speaker icon at top left corner of full screen camera preview, that enables user to hear the analysis real time, user can also toggle it on or off, we will not enable microphone or we will never take input from microphone
How I Used Google AI Studio
Google AI Studio's multimodal capabilities power the core functionality of ShopHealth Assistant through several key implementations:
- Vision API Integration: Utilized Gemini's vision models to extract text from product packaging images with high accuracy OCR capabilities
- Natural Language Processing: Leveraged Gemini's language understanding to parse and normalize ingredient lists from raw OCR text
-
Structured Data Extraction: Implemented prompt engineering to consistently return JSON-formatted results matching our
scan_result
schema - Real-time Processing: Optimized API calls for live camera analysis with minimal latency
- Confidence Scoring: Utilized Gemini's built-in confidence metrics to provide reliable analysis results
The application architecture uses Google AI Studio as the primary ML backend, handling complex multimodal tasks that would require multiple specialized models if built from scratch.
Multimodal Features
The Gemini 2.5 Pro capabilities significantly enhance the user experience:
Vision + Language Processing:
- Combines OCR text extraction with semantic understanding using Gemini Pro Vision to identify ingredients even when product labels have complex layouts, multiple languages, or poor image quality
- Processes barcode information alongside ingredient text for comprehensive product identification
Real-time Multimodal Analysis:
- Live camera processing with instant visual feedback powered by Gemini Pro Vision
- Audio narration feature that speaks analysis results, making the app accessible for visually impaired users or hands-free scenarios
- Contextual understanding that can differentiate between ingredient lists and other text on packaging
Intelligent Content Understanding:
- Recognizes ingredient categories (allergens, additives, sweeteners) using Gemini Pro Vision without requiring extensive training data
- Normalizes ingredient names across different naming conventions and languages
- Provides confidence scores for each detected element, ensuring users know the reliability of the analysis
The Gemini Pro Vision multimodal approach transforms what would traditionally be a cumbersome manual process into an intuitive, accessible, and highly accurate automated solution. Users can now make informed health decisions in seconds rather than minutes, regardless of their technical expertise or familiarity with ingredient terminology.
Built with React (Vite) for the frontend, containerized with Docker and served through Nginx, powered by Google AI Studio's Gemini models. The application demonstrates practical AI implementation for consumer health and wellness, with a production-ready containerized deployment setup for scalability and reliability.
Top comments (0)