How to Turn a Recipe Screenshot into Structured JSON

#ai #api #tutorial #webdev

Anyone building a cooking, meal planning, or grocery app hits the same wall fast: people do not save recipes as clean data. They save them as pictures. A screenshot of a Pinterest pin. A frame grabbed from a TikTok. A photo of a page in a cookbook. A wall of text pasted into a note.

If your app needs ingredients, servings, and steps as real fields, that messy input is a genuine problem. Scraping a web page only works when there is a web page, and most recipes shared on social apps never have one. So you end up writing brittle parsers, bolting on OCR, and babysitting edge cases forever.

This is the exact gap I built an API to close.

The idea

Send a recipe in any common form, a photo, a screenshot, a web link, or pasted text, and get back clean structured JSON. Every ingredient comes split into a name, a quantity, and a unit, with the unit normalized to a fixed vocabulary. The category, the serving count, and the instructions come back too.

The piece most tools skip is the picture. Reading a recipe out of an image needs vision, not HTML parsing. That part is built in here, so you never have to stand up your own OCR pipeline or stitch together a vision model yourself.

One endpoint, three inputs

You call a single endpoint and send one of three body shapes.

From pasted text:

{
  "type": "text",
  "content": "Garlic Butter Shrimp. 1 lb shrimp, 3 tbsp butter, 4 cloves garlic, 1/2 tsp salt. Cook 5 minutes."
}

From a web link:

{
  "type": "url",
  "content": "https://www.example.com/garlic-butter-shrimp"
}

From a screenshot or photo, pass the image as base64 (you can send several photos of the same recipe and they get combined):

{
  "type": "image",
  "images": [
    { "base64": "<base64 image data>", "mimeType": "image/jpeg" }
  ]
}

A full call

Here is a request in Node. Grab your key and the exact host string from the API page on RapidAPI, then drop them in.

const res = await fetch("https://recipe-extractor-screenshot-photo-and-url-to-json.p.rapidapi.com/extract", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-RapidAPI-Key": "YOUR_RAPIDAPI_KEY",
    "X-RapidAPI-Host": "recipe-extractor-screenshot-photo-and-url-to-json.p.rapidapi.com"
  },
  body: JSON.stringify({
    type: "text",
    content: "Garlic Butter Shrimp. 1 lb shrimp, 3 tbsp butter, 4 cloves garlic, 1/2 tsp salt. Cook 5 minutes."
  })
});

const data = await res.json();
console.log(data.recipe);

And the response:

{
  "recipe": {
    "name": "Garlic Butter Shrimp",
    "description": "A quick garlic butter shrimp skillet.",
    "category": "dinner",
    "servings": 2,
    "ingredients": [
      { "name": "shrimp", "quantity": "1", "unit": "lb" },
      { "name": "butter", "quantity": "3", "unit": "tbsp" },
      { "name": "garlic", "quantity": "4", "unit": "clove" },
      { "name": "salt", "quantity": "0.5", "unit": "tsp" }
    ],
    "instructions": "Cook for 5 minutes."
  }
}

Notice the ingredients arrive already broken into fields, ready to drop straight into a database or a shopping cart. No second parsing pass, no regex zoo.

Where this fits

Meal planning and grocery apps can let users import a recipe by photo or link, then build a shopping list automatically from the structured ingredients. Nutrition trackers can read the ingredient list to estimate macros. Recipe organizers can digitize screenshots and cookbook photos into a searchable library. And if you are feeding a model, clean structured data beats raw HTML every time.

Try it

The API lives on RapidAPI with a free tier so you can test it in a minute. Find it here:

Recipe Extractor on RapidAPI

If you build something with it, I would love to hear what you made.

Top comments (1)

Theo Valmis • Jun 11

Recipe extraction is a deceptively good benchmark for structured output because the source fights back: fractions, ranges, 'a pinch', ingredients that only appear inside instruction steps. The schema carries most of the quality here. Loose fields like quantity-as-string push the ambiguity downstream to every consumer; strict fields force resolution once, at extraction time, while the surrounding context still exists to resolve it with. Same tradeoff as classic ETL, just with a model in the middle.