If you've ever tried to pull structured data out of a receipt or invoice, you know it's deceptively hard. A photo of a receipt is just pixels. Plain OCR turns it into a wall of text — but you still don't have fields. You want the merchant, the date, each line item with its price, the tax, and the total, in a clean shape your code can use.
This guide shows how to go from a receipt image (or PDF) to structured JSON in a single API call — with copy-paste examples in cURL, Node.js, and Python.
Why not just use raw OCR?
Traditional OCR (Tesseract and friends) gives you text, not data. You'd still have to write and maintain brittle parsing logic to find the total, separate line items, handle different receipt layouts, currencies, and languages. That parser breaks constantly as new formats show up.
A purpose-built receipt OCR API skips all of that: it returns a fixed, predictable JSON schema, so you never write parsing code.
The approach
We'll use the Receipt & Invoice OCR API on RapidAPI. You send an image URL, a base64 image, or a base64 PDF, and you get back structured JSON. There's a free tier (50 calls/month) so you can follow along at no cost.
Step 1 — Get an API key
- Go to the API listing on RapidAPI.
- Subscribe to the free Basic plan.
- Copy your
X-RapidAPI-Key.
Step 2 — Your first call (cURL)
curl -X POST 'https://receipt-extraction-api.p.rapidapi.com/v1/extract' \
-H 'content-type: application/json' \
-H 'X-RapidAPI-Key: YOUR_KEY' \
-H 'X-RapidAPI-Host: receipt-extraction-api.p.rapidapi.com' \
-d '{"image_url": "https://receipt-extraction-api.onrender.com/sample.png"}'
Step 3 — Node.js
const res = await fetch("https://receipt-extraction-api.p.rapidapi.com/v1/extract", {
method: "POST",
headers: {
"content-type": "application/json",
"X-RapidAPI-Key": process.env.RAPIDAPI_KEY,
"X-RapidAPI-Host": "receipt-extraction-api.p.rapidapi.com",
},
body: JSON.stringify({
image_url: "https://receipt-extraction-api.onrender.com/sample.png",
}),
});
const { data } = await res.json();
console.log(data.merchant.name, data.total); // "BLUE BOTTLE COFFEE" 21.26
Step 4 — Python
import os, requests
resp = requests.post(
"https://receipt-extraction-api.p.rapidapi.com/v1/extract",
headers={
"content-type": "application/json",
"X-RapidAPI-Key": os.environ["RAPIDAPI_KEY"],
"X-RapidAPI-Host": "receipt-extraction-api.p.rapidapi.com",
},
json={"image_url": "https://receipt-extraction-api.onrender.com/sample.png"},
)
data = resp.json()["data"]
print(data["merchant"]["name"], data["total"]) # BLUE BOTTLE COFFEE 21.26
The response schema
Every response has the same shape — every field is always present, and missing values are null, so you never have to write defensive checks:
{
"document_type": "receipt",
"merchant": { "name": "...", "address": "...", "phone": "..." },
"date": "2026-06-25", "time": "09:14", "currency": "USD",
"line_items": [
{ "description": "Cappuccino", "quantity": 2, "unit_price": 3.50, "total": 7.00 }
],
"subtotal": 16.75, "tax": 1.51, "tip": 3.00, "total": 21.26,
"payment": { "method": "visa", "card_last4": "4242" }
}
Working with PDFs and uploaded files
Instead of image_url, you can send a base64 image (with media_type) or a base64 PDF:
{ "pdf_base64": "JVBERi0xLjQ..." }
{ "image_base64": "iVBORw0KGgo...", "media_type": "image/png" }
This is handy when the file lives on the user's device and isn't publicly reachable by URL.
Where this is useful
- Expense and reimbursement apps — auto-fill expense reports from a photo.
- Accounting and bookkeeping tools — turn receipts into ledger entries.
- Personal finance apps — categorize spending from receipts.
- Spend analytics — structured line-item data for reporting.
Wrapping up
Extracting data from receipts used to mean training a model or babysitting an OCR-and-regex pipeline. With a dedicated receipt OCR API you make one request and get clean, typed JSON you can use immediately.
You can try it free here: Receipt & Invoice OCR API — and there's a live demo (image → JSON) at receipt-extraction-api.onrender.com.
If you build something with it, I'd love to hear about it — and what receipt edge cases I should handle next.
Top comments (0)