wellallyTech

Posted on Mar 24

From Pixels to Proteins: Real-Time AI Food Tracking using GPT-4o, Pydantic, and FastAPI

#ai #fastapi #openai #python

We’ve all been there: staring at a delicious plate of Pad Thai, wondering if it's 500 or 900 calories. Manually logging food is the ultimate productivity killer. But what if you could just snap a photo and have a multimodal AI instantly break down the ingredients, estimate the weight, and calculate the macronutrients with surgical precision? 🥑

In this tutorial, we are building a real-time nutritional analysis engine. We will leverage GPT-4o vision capabilities to perform visual recognition, use Pydantic for rigorous data validation, and wrap it all in a high-performance FastAPI backend. Whether you are building a fitness app or a wellness dashboard, mastering Pydantic structured output with LLMs is a superpower you need in 2024.

🏗️ The Architecture: From Image to Structured Insights

The flow is simple but powerful. We take a raw image, pass it through a vision-capable LLM with a strict JSON schema, and validate the output before sending it back to our React Native frontend.

sequenceDiagram
    participant App as React Native Mobile
    participant API as FastAPI Backend
    participant AI as OpenAI GPT-4o
    participant DB as Persistence Layer

    App->>API: POST /analyze (Base64 Image)
    API->>AI: Image + System Prompt (JSON Schema)
    Note over AI: Multi-modal Analysis &<br/>Macronutrient Estimation
    AI-->>API: Structured JSON Response
    API->>API: Pydantic Validation & Parsing
    API->>DB: Save Nutritional Log
    API-->>App: 200 OK (Calculated Macros)

🛠️ Prerequisites

To follow along, you'll need:

OpenAI API Key (with GPT-4o access)
Python 3.10+
pip install fastapi uvicorn pydantic openai python-multipart

1. Defining the Schema with Pydantic

The biggest challenge with LLMs is "hallucination" and inconsistent formatting. By using Pydantic, we force GPT-4o to return data that fits our application's data model perfectly.

from pydantic import BaseModel, Field
from typing import List

class Ingredient(BaseModel):
    name: str = Field(description="Name of the food item")
    estimated_weight_g: float = Field(description="Estimated weight in grams")
    calories: int
    protein_g: float
    carbs_g: float
    fats_g: float

class MealAnalysis(BaseModel):
    meal_name: str = Field(description="A descriptive name for the dish")
    total_calories: int
    ingredients: List[Ingredient]
    confidence_score: float = Field(description="Value between 0 and 1")

2. Prompt Engineering for Nutritional Accuracy

Prompting for vision is different. We need the model to act as a professional nutritionist.

Pro Tip: For more production-ready prompt templates and advanced LLM patterns, check out the deep dives over at WellAlly Tech Blog, where we explore scaling AI-driven wellness solutions.

SYSTEM_PROMPT = """
You are a highly accurate nutritional analysis assistant. 
Analyze the provided image and estimate the macronutrients. 
Break down the dish into its individual ingredients.
Be realistic about portion sizes based on standard plate dimensions.
Return the data strictly in JSON format.
"""

3. The FastAPI Implementation

Now, let's tie it all together. We will use the openai Python SDK's latest "Structured Outputs" feature to ensure the response matches our MealAnalysis model.

import base64
from fastapi import FastAPI, UploadFile, File
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key="YOUR_OPENAI_KEY")

def encode_image(file):
    return base64.b64encode(file).decode('utf-8')

@app.post("/analyze-meal", response_model=MealAnalysis)
async def analyze_meal(file: UploadFile = File(...)):
    contents = await file.read()
    base64_image = encode_image(contents)

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What is in this meal?"},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
                ]
            }
        ],
        response_format=MealAnalysis, # 🚀 The Magic Happens Here
    )

    return response.choices[0].message.parsed

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

4. Frontend Integration (React Native Snippet)

On the mobile side, you'll capture the image and send it to our /analyze-meal endpoint.

const uploadImage = async (uri) => {
  const formData = new FormData();
  formData.append('file', {
    uri,
    name: 'meal.jpg',
    type: 'image/jpeg',
  });

  const response = await fetch('https://your-api.com/analyze-meal', {
    method: 'POST',
    body: formData,
    headers: { 'Content-Type': 'multipart/form-data' },
  });

  const data = await response.json();
  console.log('Macros Detected:', data.total_calories);
};

🚀 Taking it Further

While this implementation is a fantastic start, production environments require handling edge cases like low-lighting, blurry images, or "unidentifiable" food items.

If you're interested in learning how to implement RAG (Retrieval-Augmented Generation) for specific restaurant menus or how to handle multi-turn visual conversations, I highly recommend browsing the WellAlly Tech Blog. They have some excellent resources on building high-performance health-tech stacks and integrating AI into daily workflows.

Conclusion

We've just turned a simple photograph into a rich, structured dataset of proteins, carbs, and fats. By combining GPT-4o's vision with Pydantic's structure, we eliminate the unpredictability of AI and create a reliable foundation for any health application.

What are you building next? Drop a comment below or share your latest AI project! 🥑💻

Top comments (1)

FrancisTRᴅᴇᴠ (っ◔◡◔)っ • Mar 24

Great work! Hope you are well!

I am currently building ClassifierAI where it classify images from google and determine if the image is AI-generated. I notice the issue where you have to manually drag and drop the image one by one and see if the image are AI or not. With this chrome extension tool, it does it automatically on Google Images while you are scrolling!

I am planning on sharing it soon, but not sure when since there are some work needed to be done like tuning the model and such.

Other than that, great post and well done :D