How to Parse Receipts with an API (Python + Node.js)

Alex Jay — Tue, 31 Mar 2026 10:15:29 +0000

Every expense management app has the same problem: users upload photos of receipts, and someone has to manually read the merchant name, total, tax,

tip, and line items. It's tedious, error-prone, and doesn't scale.

I built aPapyr to solve this. It's an API that reads receipts (and invoices, tax forms, bank statements) and returns structured JSON — including per-field confidence scores so you know what to trust.

Here's how to use it in Python and Node.js.

## Python


bash
  pip install apapyr

  from apapyr import aPapyr

  client = aPapyr("sk_live_your_key")
  result = client.extract("receipt.jpg", document_type="receipt")

  print(result.get_field("merchant_name"))  # "Starbucks Coffee"
  print(result.get_field("total"))           # 15.57
  print(result.get_field("tax"))             # 1.12
  print(result.get_field("tip"))             # 2.00
  print(result.get_field("payment_method"))  # "Visa ending 4242"

  Node.js

  npm install apapyr

  const { aPapyr } = require("apapyr");

  const client = new aPapyr("sk_live_your_key");
  const result = await client.extract("receipt.jpg", {
    documentType: "receipt"
  });

  console.log(result.getField("merchant_name")); // "Starbucks Coffee"
  console.log(result.getField("total"));          // 15.57
  console.log(result.getField("tip"));            // 2.00

  What You Get Back

  The API doesn't just OCR the text — it understands the receipt. It knows which number is the total, which is the tax, and which is a line item price.   Every field includes a confidence score:

  print(result.get_field_confidence("total"))          # 0.98 — very confident
  print(result.get_field_confidence("merchant_name"))  # 0.99
  print(result.get_field_confidence("tip"))             # 0.91 — handwritten, slightly less sure

  This lets you build smart automation: auto-process anything above 0.95 confidence, flag the rest for human review.

  Line Items

  It pulls individual items too:

  for item in result.line_items:
      name = item.get("description", {}).get("value")
      price = item.get("amount", {}).get("value")
      print(f"  {name}: ${price}")

  # Grande Caramel Macchiato: $5.95
  # Blueberry Muffin: $3.50
  # Bottled Water: $3.00

  Handles Messy Photos

  Crumpled receipt from your pocket? Blurry phone photo? Faded thermal paper? It uses AI vision models (not old-school OCR), so it understands context.   If the "5" in "$15.57" is smudged, it still knows the total because the line items add up to it.

  Auto-Detect Document Type

  Don't know if the user uploaded a receipt or an invoice? Let the API figure it out:

  result = client.extract("mystery_document.pdf")  # document_type defaults to "auto"
  print(result.document_type)  # "receipt"

  Works With AI Agents

  If you use Claude Code or Cursor, you can skip the SDK entirely:

  claude mcp add apapyr -- npx apapyr-mcp-server

  Then just say: "Parse this receipt and tell me the total." The AI agent calls aPapyr automatically.

  Try It

  - https://apapyr.com/demo.html — see real extraction results with sample documents
  - https://apapyr.com/free-tool.html — upload your own receipt, no signup needed
  - https://apapyr.com/dashboard.html — 50 pages/month free
  - https://apapyr.com/docs.html — full reference

  The free tier is enough to build and test your integration. Paid plans start at $49/month for 1,000 pages.

  ---
  aPapyr is on https://github.com/AkilaJ?tab=repositories&q=apapyr, https://pypi.org/project/apapyr/, and https://www.npmjs.com/package/apapyr.

How to Extract Data from Invoices with Python (3 Lines of Code)

Alex Jay — Sun, 29 Mar 2026 11:42:35 +0000

If you've ever had to manually type invoice data into a spreadsheet — vendor names, totals, line items, due dates — you know how painfully slow and error-prone it is.

I needed to automate this for a project and couldn't find anything that didn't require training custom ML models or setting up heavy cloud infrastructure. So I built
aPapyr — a simple API that reads invoices (and receipts, tax forms, bank statements) and returns clean, structured JSON.

Here's how it works in Python.

## Install


bash
  pip install apapyr

  Extract an Invoice

  from apapyr import aPapyr

  client = aPapyr("sk_live_your_key")
  result = client.extract("invoice.pdf")

  print(result.get_field("vendor_name"))  # "Acme Corp"
  print(result.get_field("total"))         # 1250.00
  print(result.get_field("due_date"))      # "2026-04-15"

  That's it. Three lines after setup. Send a PDF or image, get structured data back.

  What You Get Back

  Every field comes with a confidence score (0.0 to 1.0) so you know how reliable each value is:

  print(result.confidence)                     # 0.97 (overall)
  print(result.get_field_confidence("total"))  # 0.98
  print(result.get_field_confidence("notes"))  # 0.72 (handwritten, less certain)

  You decide your automation threshold. Confidence above 0.95? Auto-process it. Below 0.8? Flag it for human review.

  Line Items Too

  It doesn't just pull header fields — it extracts every line item:

  for item in result.line_items:
      desc = item.get("description", {}).get("value")
      qty = item.get("quantity", {}).get("value")
      amt = item.get("amount", {}).get("value")
      print(f"{desc}: {qty} x ${amt}")

  # Widget A: 50 x $25.00
  # Widget B: 30 x $12.50

  The API even cross-checks line item totals against the stated total and warns you if they don't add up.

  Flat Dictionary Output

  If you just want a simple key-value dict without confidence scores (for piping into a database or CSV):

  print(result.to_flat_dict())
  # {"document_type": "invoice", "vendor_name": "Acme Corp", "total": 1250.00, "due_date": "2026-04-15", ...}

  Supported Document Types

  It's not just invoices. Pass document_type="auto" (the default) and it detects the type automatically:

  - Invoices — vendor, total, tax, due date, line items
  - Receipts — merchant, items, subtotal, tax, tip, payment method
  - W-2 Tax Forms — employer, wages, withholdings
  - Bank Statements — balances, transaction history
  - Contracts — parties, dates, key terms

  Works With AI Agents Too

  If you use Claude Code, Cursor, or any MCP-compatible AI assistant:

  claude mcp add apapyr -- npx apapyr-mcp-server

  Then just ask: "Extract the data from invoice.pdf" — your AI handles everything.

  Try It Free

  - https://apapyr.com/free-tool.html — upload a document, no signup needed
  - https://apapyr.com/dashboard.html — 50 pages/month free, no credit card
  - https://apapyr.com/docs.html — full reference with examples in Python, Node.js, and cURL

  The free tier is enough to test it on real documents. If you're processing thousands of invoices, paid plans start at $49/month.

  ---
  aPapyr is open source on https://github.com/AkilaJ?tab=repositories&q=apapyr. Star it if you find it useful.

  ---

DEV Community: Alex Jay

How to Parse Receipts with an API (Python + Node.js)

How to Extract Data from Invoices with Python (3 Lines of Code)