Accounts payable teams spend an average of 4.5 hours per week manually entering invoice data. That's 18 hours a month copying numbers from PDFs into accounting software.
With AI document extraction, you can reduce that to near zero — and it works even on scanned, hand-written, or poorly formatted invoices.
Here's how to build it in minutes.
What We're Building
A function that takes any invoice PDF (or image) and returns:
{
"invoiceNumber": "INV-2024-001",
"invoiceDate": "2024-03-15",
"dueDate": "2024-04-15",
"vendor": {
"name": "Acme Services Inc.",
"address": "123 Main St, Toronto, ON",
"taxNumber": "GST 123456789 RT0001"
},
"lineItems": [
{ "description": "Web development", "quantity": 40, "unitPrice": 150.00, "amount": 6000.00 }
],
"taxLines": [
{ "name": "GST", "rate": 5, "amount": 300.00 },
{ "name": "QST", "rate": 9.975, "amount": 598.50 }
],
"total": 6898.50,
"currency": "CAD"
}
The Code
Node.js / JavaScript
const fs = require('fs');
async function extractInvoice(filePath) {
// Convert file to base64
const fileBuffer = fs.readFileSync(filePath);
const base64 = fileBuffer.toString('base64');
// Detect MIME type from extension
const ext = filePath.split('.').pop().toLowerCase();
const mimeTypes = { pdf: 'application/pdf', jpg: 'image/jpeg', jpeg: 'image/jpeg', png: 'image/png' };
const mimeType = mimeTypes[ext] || 'application/pdf';
// Call the API
const response = await fetch('https://docusense.stackapi.dev/api/v1/documents/invoice', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.DOCUSENSE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ fileBase64: base64, mimeType, language: 'en' })
});
const result = await response.json();
if (!result.success) {
throw new Error(result.error?.message || 'Extraction failed');
}
return result.data;
}
// Usage
const invoice = await extractInvoice('./vendor-invoice.pdf');
console.log(`Total: ${invoice.total} ${invoice.currency}`);
console.log(`Due: ${invoice.dueDate}`);
Python
import base64
import httpx
import os
def extract_invoice(file_path: str) -> dict:
with open(file_path, "rb") as f:
base64_content = base64.b64encode(f.read()).decode("utf-8")
ext = file_path.rsplit(".", 1)[-1].lower()
mime_types = {"pdf": "application/pdf", "jpg": "image/jpeg", "png": "image/png"}
mime_type = mime_types.get(ext, "application/pdf")
response = httpx.post(
"https://docusense.stackapi.dev/api/v1/documents/invoice",
headers={
"Authorization": f"Bearer {os.environ['DOCUSENSE_API_KEY']}",
"Content-Type": "application/json"
},
json={"fileBase64": base64_content, "mimeType": mime_type, "language": "en"},
timeout=60.0
)
result = response.json()
if not result["success"]:
raise ValueError(result["error"]["message"])
return result["data"]
# Usage
invoice = extract_invoice("./vendor-invoice.pdf")
print(f"Invoice #{invoice['invoiceNumber']} — Total: {invoice['total']} {invoice['currency']}")
Real-World Integration: QuickBooks Auto-Entry
Here's a complete example that extracts an invoice and creates a QuickBooks bill automatically:
const QuickBooks = require('node-quickbooks');
const { extractInvoice } = require('./invoice-extractor');
async function processVendorInvoice(invoicePath, vendorId) {
// Step 1: Extract invoice data
const invoice = await extractInvoice(invoicePath);
// Step 2: Map to QuickBooks bill format
const bill = {
VendorRef: { value: vendorId },
TxnDate: invoice.invoiceDate,
DueDate: invoice.dueDate,
DocNumber: invoice.invoiceNumber,
Line: invoice.lineItems.map((item, i) => ({
Id: String(i + 1),
Amount: item.amount,
DetailType: 'AccountBasedExpenseLineDetail',
AccountBasedExpenseLineDetail: {
AccountRef: { value: '1' }, // Your expense account
BillableStatus: 'NotBillable',
UnitPrice: item.unitPrice,
Qty: item.quantity
},
Description: item.description
})),
TotalAmt: invoice.total,
CurrencyRef: { value: invoice.currency || 'CAD' }
};
// Step 3: Create in QuickBooks
return new Promise((resolve, reject) => {
qbo.createBill(bill, (err, result) => {
if (err) reject(err);
else resolve(result);
});
});
}
// Process all invoices in a folder
const invoiceFiles = fs.readdirSync('./inbox').filter(f => f.endsWith('.pdf'));
for (const file of invoiceFiles) {
try {
const bill = await processVendorInvoice(`./inbox/${file}`, VENDOR_ID);
console.log(`✓ Created bill ${bill.DocNumber} — $${bill.TotalAmt}`);
fs.renameSync(`./inbox/${file}`, `./processed/${file}`);
} catch (err) {
console.error(`✗ Failed ${file}:`, err.message);
fs.renameSync(`./inbox/${file}`, `./errors/${file}`);
}
}
Handling Different Invoice Formats
The API handles all common formats without any configuration:
| Format | Supported | Notes |
|---|---|---|
| Digital PDF | ✅ | Best accuracy |
| Scanned PDF | ✅ | Works with most scan qualities |
| JPEG / PNG | ✅ | Phone photos work too |
| French invoices (TVA, TTC) | ✅ | Pass language: 'fr'
|
| Canadian (GST/HST/QST) | ✅ | Auto-detected |
| US invoices | ✅ | |
| European (VAT) | ✅ |
Processing a Folder of Invoices (Batch Approach)
const processInvoiceQueue = async (folderPath) => {
const files = fs.readdirSync(folderPath)
.filter(f => ['.pdf', '.jpg', '.png'].includes(path.extname(f).toLowerCase()));
console.log(`Processing ${files.length} invoices...`);
const results = await Promise.allSettled(
files.map(async (file) => {
const data = await extractInvoice(path.join(folderPath, file));
return { file, ...data };
})
);
const succeeded = results.filter(r => r.status === 'fulfilled').map(r => r.value);
const failed = results.filter(r => r.status === 'rejected');
console.log(`✓ ${succeeded.length} extracted, ✗ ${failed.length} failed`);
// Export to CSV
const csv = [
'File,InvoiceNumber,Date,Vendor,Total,Currency',
...succeeded.map(r =>
`${r.file},${r.invoiceNumber},${r.invoiceDate},"${r.vendor?.name}",${r.total},${r.currency}`
)
].join('\n');
fs.writeFileSync('./invoices-export.csv', csv);
return succeeded;
};
Error Handling Best Practices
async function extractInvoiceRobust(filePath, retries = 2) {
for (let attempt = 0; attempt <= retries; attempt++) {
try {
return await extractInvoice(filePath);
} catch (err) {
if (attempt === retries) throw err;
// Retry on timeout or server error
if (err.message.includes('timed out') || err.status >= 500) {
console.log(`Retry ${attempt + 1}/${retries} for ${filePath}`);
await new Promise(r => setTimeout(r, 2000 * (attempt + 1)));
continue;
}
throw err; // Don't retry on validation errors
}
}
}
Cost Breakdown
At scale, here's what this costs:
| Volume | Manual entry cost* | API cost |
|---|---|---|
| 100 invoices/month | ~$450 (5h × $90/h) | Free tier |
| 1,000 invoices/month | ~$4,500 | $19/month |
| 10,000 invoices/month | ~$45,000 | $49/month |
*Assuming $90/hour for an accounts payable specialist at 3 min/invoice.
The ROI is obvious at any scale above ~20 invoices/month.
Getting Started
- Get a free API key: RapidAPI — DocuSense API
- Free tier: 100 extractions/month
- No credit card required
# Quick test with curl
curl -X POST https://docusense.stackapi.dev/api/v1/documents/invoice \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d "{\"fileBase64\": \"$(base64 -i your-invoice.pdf)\", \"mimeType\": \"application/pdf\"}"
What's Next?
Once you have invoice extraction working, you can extend it to:
- Bank statement analysis — categorize transactions, detect patterns
- Contract extraction — pull payment terms, dates, party names
- T4/W-2 parsing — income verification for lending workflows
All available via the same API.
What's your current invoice processing workflow? Are you still doing it manually, or have you automated it? Share in the comments.
Top comments (0)