albert nahas

Posted on Feb 22 • Originally published at leandine.hashnode.dev

Building a Camera-Based Menu Scanner with React Native

#react #mobile #tutorial #javascript

The idea of building a mobile menu scanner—an app that lets users point their phone at a restaurant menu and instantly get nutritional info or translations—is both exciting and practical. Thanks to modern tools like React Native and Expo Camera, creating such an app is more accessible than ever. In this post, I’ll walk you through the essential steps for building a camera-based menu scanner app, focusing on the React Native ecosystem, image capture with Expo Camera, and building an AI-powered text extraction and analysis pipeline.

Why Build a Menu Scanner App?

Menu scanner apps solve real-world problems: they empower users to make healthier eating choices, navigate foreign menus, or quickly compare dishes. These apps leverage mobile OCR (Optical Character Recognition) to turn photos into actionable data, blending computer vision, machine learning, and slick native interfaces.

Before diving in, let’s break down the core components:

Camera integration: Capturing images of menus in real time.
Image processing: Prepping images for OCR (cropping, enhancing, etc.).
OCR: Extracting text from menu images reliably.
AI analysis: Optionally, analyzing extracted text for insights (e.g., nutrition, translation).

Let’s see how you can stitch these together in a React Native menu scanner app.

Setting Up React Native with Expo

First, create a new Expo project. Expo makes working with device APIs like the camera much smoother, and it bundles the Expo Camera library for easy integration.

npx create-expo-app MenuScanner
cd MenuScanner
npx expo install expo-camera expo-file-system

You’ll use expo-camera for camera access and expo-file-system to temporarily store images for processing.

Implementing the Camera UI with Expo Camera

Here’s a basic React Native component that displays the camera and lets users snap a photo:

import React, { useRef, useState } from 'react';
import { StyleSheet, View, TouchableOpacity, Text, Image } from 'react-native';
import { Camera, CameraType } from 'expo-camera';

export default function MenuCamera() {
  const cameraRef = useRef<Camera>(null);
  const [hasPermission, setHasPermission] = useState<boolean | null>(null);
  const [photoUri, setPhotoUri] = useState<string | null>(null);

  React.useEffect(() => {
    (async () => {
      const { status } = await Camera.requestCameraPermissionsAsync();
      setHasPermission(status === 'granted');
    })();
  }, []);

  const takePicture = async () => {
    if (cameraRef.current) {
      const photo = await cameraRef.current.takePictureAsync();
      setPhotoUri(photo.uri);
    }
  };

  if (hasPermission === null) {
    return <View />;
  }
  if (hasPermission === false) {
    return <Text>No access to camera</Text>;
  }

  return (
    <View style={styles.container}>
      {photoUri ? (
        <Image source={{ uri: photoUri }} style={styles.preview} />
      ) : (
        <Camera style={styles.camera} type={CameraType.back} ref={cameraRef} />
      )}
      <TouchableOpacity style={styles.button} onPress={takePicture}>
        <Text style={styles.buttonText}>Snap</Text>
      </TouchableOpacity>
    </View>
  );
}

const styles = StyleSheet.create({
  container: { flex: 1, justifyContent: 'center' },
  camera: { flex: 1 },
  preview: { flex: 1, resizeMode: 'contain' },
  button: {
    position: 'absolute', bottom: 40, alignSelf: 'center',
    backgroundColor: '#fff', padding: 16, borderRadius: 24,
  },
  buttonText: { fontSize: 18, fontWeight: 'bold' },
});

This snippet gives you a basic camera preview, a "Snap" button, and a photo preview after capturing.

Handling Image Processing for OCR

To maximize OCR accuracy, you might want to preprocess the image—cropping, converting to grayscale, or enhancing contrast. While React Native doesn’t have built-in image processing, you can use libraries like react-native-image-manipulator:

npx expo install expo-image-manipulator

Example: cropping and resizing the photo before OCR:

import * as ImageManipulator from 'expo-image-manipulator';

// ... after taking the picture
const processed = await ImageManipulator.manipulateAsync(
  photo.uri,
  [{ resize: { width: 1000 } }], // Resize to standard width for consistency
  { compress: 0.8, format: ImageManipulator.SaveFormat.JPEG }
);
setPhotoUri(processed.uri);

This step is optional but can improve mobile OCR performance, especially in low-light or skewed conditions.

Performing Mobile OCR: Extracting Text from Images

There are two main approaches for OCR in a React Native menu scanner app:

On-device OCR: Fast, private, but limited by device capability.
Cloud-based OCR: Offloads computation, usually more accurate, but requires Internet.

For on-device OCR, react-native-text-recognition works well (although it requires ejecting from Expo). For Expo-managed apps, cloud-based APIs are more practical.

Using a Cloud OCR API

Let’s use Google Cloud Vision as an example. After capturing and processing the image, send it to the API:

async function performOCR(imageUri: string): Promise<string> {
  // Convert image to base64
  const base64 = await FileSystem.readAsStringAsync(imageUri, {
    encoding: FileSystem.EncodingType.Base64,
  });

  // Prepare payload
  const body = {
    requests: [{
      image: { content: base64 },
      features: [{ type: 'TEXT_DETECTION' }]
    }]
  };

  const response = await fetch(
    `https://vision.googleapis.com/v1/images:annotate?key=YOUR_API_KEY`,
    {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(body),
    }
  );
  const result = await response.json();
  // Navigate to extracted text
  return result.responses[0]?.fullTextAnnotation?.text ?? '';
}

Don’t forget to secure your API keys and follow usage limits. Similar flows work for APIs like Microsoft Azure Computer Vision or open-source hosted Tesseract services.

Building the AI Analysis Pipeline

Once you’ve extracted menu text, you can analyze it for insights. For basic use cases, regular expressions or simple keyword searches can help identify ingredients, calories, or allergens.

For deeper analysis—like dish classification, nutrition estimation, or translation—consider integrating with AI APIs or services. Some options:

NLP APIs: Use OpenAI, Google Cloud Natural Language, or similar for entity extraction and categorization.
Nutrition databases: Cross-reference dish names with APIs like USDA FoodData Central or Edamam.
Translation: Use Google Translate API or Microsoft Translator for instant foreign menu translation.

Example: Basic allergen detection in extracted menu text:

const allergens = ['peanut', 'milk', 'egg', 'soy', 'wheat', 'tree nut', 'shellfish', 'fish'];

function highlightAllergens(text: string) {
  const regex = new RegExp(`\\b(${allergens.join('|')})\\b`, 'gi');
  return text.replace(regex, match => `[ALLERGEN: ${match}]`);
}

const processedText = highlightAllergens(extractedText);

For more advanced workflows, you might batch text analysis alongside image processing in a cloud function, returning structured dish data to the app.

UX Tips for a Menu Scanner App

Guide the camera: Overlay a frame or instructions to help users align the menu.
Feedback on OCR quality: Let users retake blurry or poorly-lit photos.
Text selection: Allow users to highlight or select specific menu items for analysis.
Privacy: Be transparent about cloud processing and image retention.

Beyond the Basics: Real-World Enhancements

Consider these extra features for a robust menu scanner:

Batch scanning: Capture multiple images for long menus.
Offline fallback: Cache basic OCR for offline use.
Personalization: Let users set dietary preferences and highlight relevant items.
Integration with platforms: For advanced menu insights, tools like LeanDine, Foodvisor, or MyFitnessPal offer APIs and analysis pipelines to enrich extracted menu data.

Key Takeaways

Building a menu scanner app with React Native and Expo Camera is now highly accessible. By combining live camera capture, basic image preprocessing, cloud-based mobile OCR, and lightweight AI analysis, you can empower users with instant access to menu information wherever they dine. The key is to start simple—focus on reliable image capture and text extraction—then iterate with smarter AI and richer UX. As cloud APIs and on-device ML continue to improve, so too will the possibilities for seamless, intelligent menu scanning experiences.

DEV Community