If you've ever had to manually type invoice data into a spreadsheet — vendor names, totals, line items, due dates — you know how painfully slow and error-prone it is.
I needed to automate this for a project and couldn't find anything that didn't require training custom ML models or setting up heavy cloud infrastructure. So I built
aPapyr — a simple API that reads invoices (and receipts, tax forms, bank statements) and returns clean, structured JSON.
Here's how it works in Python.
## Install
bash
pip install apapyr
Extract an Invoice
from apapyr import aPapyr
client = aPapyr("sk_live_your_key")
result = client.extract("invoice.pdf")
print(result.get_field("vendor_name")) # "Acme Corp"
print(result.get_field("total")) # 1250.00
print(result.get_field("due_date")) # "2026-04-15"
That's it. Three lines after setup. Send a PDF or image, get structured data back.
What You Get Back
Every field comes with a confidence score (0.0 to 1.0) so you know how reliable each value is:
print(result.confidence) # 0.97 (overall)
print(result.get_field_confidence("total")) # 0.98
print(result.get_field_confidence("notes")) # 0.72 (handwritten, less certain)
You decide your automation threshold. Confidence above 0.95? Auto-process it. Below 0.8? Flag it for human review.
Line Items Too
It doesn't just pull header fields — it extracts every line item:
for item in result.line_items:
desc = item.get("description", {}).get("value")
qty = item.get("quantity", {}).get("value")
amt = item.get("amount", {}).get("value")
print(f"{desc}: {qty} x ${amt}")
# Widget A: 50 x $25.00
# Widget B: 30 x $12.50
The API even cross-checks line item totals against the stated total and warns you if they don't add up.
Flat Dictionary Output
If you just want a simple key-value dict without confidence scores (for piping into a database or CSV):
print(result.to_flat_dict())
# {"document_type": "invoice", "vendor_name": "Acme Corp", "total": 1250.00, "due_date": "2026-04-15", ...}
Supported Document Types
It's not just invoices. Pass document_type="auto" (the default) and it detects the type automatically:
- Invoices — vendor, total, tax, due date, line items
- Receipts — merchant, items, subtotal, tax, tip, payment method
- W-2 Tax Forms — employer, wages, withholdings
- Bank Statements — balances, transaction history
- Contracts — parties, dates, key terms
Works With AI Agents Too
If you use Claude Code, Cursor, or any MCP-compatible AI assistant:
claude mcp add apapyr -- npx apapyr-mcp-server
Then just ask: "Extract the data from invoice.pdf" — your AI handles everything.
Try It Free
- https://apapyr.com/free-tool.html — upload a document, no signup needed
- https://apapyr.com/dashboard.html — 50 pages/month free, no credit card
- https://apapyr.com/docs.html — full reference with examples in Python, Node.js, and cURL
The free tier is enough to test it on real documents. If you're processing thousands of invoices, paid plans start at $49/month.
---
aPapyr is open source on https://github.com/AkilaJ?tab=repositories&q=apapyr. Star it if you find it useful.
---
Top comments (0)