albert nahas

Posted on Feb 15 • Originally published at leandine.hashnode.dev

Building a Menu Scanner with OCR and AI

#ai #mobile #tutorial #javascript

Scanning a restaurant menu with your phone and instantly seeing structured nutritional data feels like magic—but with the right combination of OCR and AI, it’s a project well within reach for developers. As consumer demand for transparent nutrition information grows, building a menu scanner that bridges the gap between printed menus and actionable health data is both an exciting technical challenge and a genuinely useful tool.

Let’s explore how to turn images of menus into structured, actionable nutrition data using modern OCR (Optical Character Recognition) and food AI techniques.

The Problem: From Pixels to Nutrition Insights

Menus come in all shapes, sizes, and fonts. Lighting conditions in restaurants are rarely ideal. To deliver a seamless experience, a menu scanner must:

Capture menu images (from camera input or photos)
Extract textual data accurately (even under suboptimal conditions)
Parse and structure the data (dish names, descriptions, prices)
Recognize food items and map them to nutrition databases
Present clear, actionable nutritional info

This pipeline combines classic computer vision, NLP, and a dash of AI wizardry. Let’s break down each stage, with practical guidance and code samples for building your own menu recognition tool.

Step 1: Capturing the Menu Image

Most modern web and mobile frameworks make it straightforward to access the camera or file system. In a browser, you can use the HTML <input type="file" accept="image/*" capture="environment"> element to let users snap a menu photo:

<input type="file" accept="image/*" capture="environment" id="menuPhoto">

For mobile apps, React Native’s react-native-image-picker or Flutter’s image_picker package make this step a breeze.

Pro Tip: Encourage users to take clear, flat photos with good lighting for best OCR results.

Step 2: Extracting Text with OCR

Once you have the menu image, the next job is to extract text. OCR (Optical Character Recognition) has seen huge advances thanks to deep learning.

Popular OCR Libraries

Tesseract.js: Pure JavaScript port of the popular Tesseract OCR engine—runs in-browser or Node.js.
Google Cloud Vision API: Commercial API, highly accurate, supports handwriting and multi-language.
Microsoft Azure Computer Vision: Another strong, cloud-based option.

Example: Using Tesseract.js in the Browser

import Tesseract from 'tesseract.js';

const image = document.getElementById('menuPhotoInput').files[0];

Tesseract.recognize(
  image,
  'eng',
  { logger: m => console.log(m) }
).then(({ data: { text } }) => {
  console.log('OCR Text:', text);
});

Tips for Better OCR:

Preprocess images: increase contrast, convert to grayscale, deskew.
Crop to menu area if possible.
Consider cloud OCR for tough cases (handwritten, unusual fonts).

Step 3: Structuring the OCR Output (Menu Parsing)

OCR gives you a block of text—often messy, with line breaks, inconsistent spacing, and sometimes errors. The next step is to convert this into a structured representation: typically, an array of menu items, each with a name, description, and price.

Parsing Strategies

Regex-based parsing: Useful when menus follow predictable patterns (e.g., "Dish Name .... $Price").
NLP techniques: Named Entity Recognition (NER) models can extract dish names, prices, and descriptions with more flexibility.
Custom heuristics: Combining rules, indentation, and font size (if available).

Example: Simple Regex Extraction

// Sample OCR output
const ocrText = `
Grilled Salmon .......... $15.99
A healthy portion of grilled salmon, served with steamed veggies.

Caesar Salad ............ $9.50
Romaine, parmesan, croutons, Caesar dressing.
`;

// Regex to extract "Dish Name .... $Price"
const dishRegex = /([A-Za-z\s]+)\.*\s+\$(\d+\.\d{2})/g;

let match;
const menuItems = [];

while ((match = dishRegex.exec(ocrText)) !== null) {
  const name = match[1].trim();
  const price = parseFloat(match[2]);
  // Optionally, grab the next line as description
  const lines = ocrText.split('\n');
  const idx = lines.findIndex(line => line.includes(name));
  const description = lines[idx + 1]?.trim() || '';
  menuItems.push({ name, price, description });
}

console.log(menuItems);
// Output: [{ name: "Grilled Salmon", price: 15.99, description: ... }, ...]

For complex or highly variable menus, consider using NLP libraries like spaCy (Python) or fine-tuning a transformer-based NER model (e.g., BERT) to identify dish names and prices.

Step 4: Menu Recognition with Food AI

Now you have structured menu data, but you need to map each dish to real-world nutritional info. This is where food AI and menu recognition shine.

Challenges

Dishes may have creative names ("The Big Kahuna Burger")
Ingredients are often not listed
Portion sizes may be ambiguous

Approaches

Database Lookup: Search for dish names in nutrition databases (e.g., USDA FoodData Central, Nutritionix API).
AI-powered Mapping: Use NLP/AI models to infer dish type, ingredients, and likely nutrition profile from the dish name and description.
Hybrid: Use AI to classify the dish, then fetch the closest match from a database.

Example: Using OpenAI’s GPT Model for Dish Classification

Suppose you want to classify dishes and estimate calories using an LLM (Large Language Model) API:

async function estimateNutrition(dish: { name: string, description: string }) {
  const prompt = `Given the dish "${dish.name}" described as "${dish.description}", estimate the main ingredients and calories per serving.`;
  // Replace with your favorite LLM API client
  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: { 'Authorization': `Bearer YOUR_API_KEY` },
    body: JSON.stringify({
      model: "gpt-3.5-turbo",
      messages: [{ role: "user", content: prompt }]
    })
  });
  const data = await response.json();
  return data.choices[0].message.content;
}

estimateNutrition({ name: "Grilled Salmon", description: "A healthy portion..." })
  .then(console.log);

Note: For production, you’ll want to combine this with a nutritional database for accuracy and consistency.

Tools and APIs

USDA FoodData Central API: Free, comprehensive, but may require fuzzy matching.
Nutritionix API: Commercial, with extensive branded foods.
Food AI platforms: Some commercial APIs use machine learning to recognize dishes from text and images.

Step 5: Presenting the Nutrition Data

Once you have nutrition info, the final step is to display it clearly to users. Consider:

Highlighting calories, macronutrients (protein, fat, carbs), and allergens
Providing healthier swap suggestions (optional)
Allowing users to filter by dietary preferences

A simple UI could show menu items alongside their estimated nutritional breakdown:

<ul>
  {menuItems.map(item => (
    <li key={item.name}>
      <strong>{item.name}</strong> (${item.price})<br />
      <em>{item.description}</em><br />
      <span>Calories: {item.calories} | Protein: {item.protein}g | Fat: {item.fat}g | Carbs: {item.carbs}g</span>
    </li>
  ))}
</ul>

For mobile, native components or frameworks like React Native/Flutter will let you build rich, interactive experiences.

Bonus: Making It Smarter Over Time

User Feedback: Let users correct dish recognition or nutrition info, training your models.
Crowdsourcing: Aggregate menu data and nutrition profiles across users and restaurants.
Image Recognition: Advanced: Recognize food items from plate photos, not just menu text.

Key Takeaways

Building a robust menu scanner involves a series of well-defined steps: capturing images, applying OCR for text extraction, parsing that text into structured menu items, and finally using food AI to estimate nutrition. While off-the-shelf tools can help at each stage, combining them thoughtfully—and refining your approach with real-world data—unlocks a powerful solution for menu recognition and nutrition analysis.

As AI and OCR tech continue to improve, so do the possibilities for smarter, healthier dining tools. Whether you're building a personal project or a commercial app, the path from camera input to structured, actionable nutrition data is clearer than ever. With the right mix of OCR food extraction and menu recognition algorithms, you’ll empower users to make more informed choices—one scanned menu at a time.

DEV Community