DEV Community

Adda
Adda

Posted on

Stop Manually Entering Invoices — AI Extraction in 5 Lines of Code

Accounts payable teams spend an average of 4.5 hours per week manually entering invoice data. That's 18 hours a month copying numbers from PDFs into accounting software.

With AI document extraction, you can reduce that to near zero — and it works even on scanned, hand-written, or poorly formatted invoices.

Here's how to build it in minutes.


What We're Building

A function that takes any invoice PDF (or image) and returns:

{
  "invoiceNumber": "INV-2024-001",
  "invoiceDate": "2024-03-15",
  "dueDate": "2024-04-15",
  "vendor": {
    "name": "Acme Services Inc.",
    "address": "123 Main St, Toronto, ON",
    "taxNumber": "GST 123456789 RT0001"
  },
  "lineItems": [
    { "description": "Web development", "quantity": 40, "unitPrice": 150.00, "amount": 6000.00 }
  ],
  "taxLines": [
    { "name": "GST", "rate": 5, "amount": 300.00 },
    { "name": "QST", "rate": 9.975, "amount": 598.50 }
  ],
  "total": 6898.50,
  "currency": "CAD"
}
Enter fullscreen mode Exit fullscreen mode

The Code

Node.js / JavaScript

const fs = require('fs');

async function extractInvoice(filePath) {
  // Convert file to base64
  const fileBuffer = fs.readFileSync(filePath);
  const base64 = fileBuffer.toString('base64');

  // Detect MIME type from extension
  const ext = filePath.split('.').pop().toLowerCase();
  const mimeTypes = { pdf: 'application/pdf', jpg: 'image/jpeg', jpeg: 'image/jpeg', png: 'image/png' };
  const mimeType = mimeTypes[ext] || 'application/pdf';

  // Call the API
  const response = await fetch('https://docusense.stackapi.dev/api/v1/documents/invoice', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.DOCUSENSE_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ fileBase64: base64, mimeType, language: 'en' })
  });

  const result = await response.json();

  if (!result.success) {
    throw new Error(result.error?.message || 'Extraction failed');
  }

  return result.data;
}

// Usage
const invoice = await extractInvoice('./vendor-invoice.pdf');
console.log(`Total: ${invoice.total} ${invoice.currency}`);
console.log(`Due: ${invoice.dueDate}`);
Enter fullscreen mode Exit fullscreen mode

Python

import base64
import httpx
import os

def extract_invoice(file_path: str) -> dict:
    with open(file_path, "rb") as f:
        base64_content = base64.b64encode(f.read()).decode("utf-8")

    ext = file_path.rsplit(".", 1)[-1].lower()
    mime_types = {"pdf": "application/pdf", "jpg": "image/jpeg", "png": "image/png"}
    mime_type = mime_types.get(ext, "application/pdf")

    response = httpx.post(
        "https://docusense.stackapi.dev/api/v1/documents/invoice",
        headers={
            "Authorization": f"Bearer {os.environ['DOCUSENSE_API_KEY']}",
            "Content-Type": "application/json"
        },
        json={"fileBase64": base64_content, "mimeType": mime_type, "language": "en"},
        timeout=60.0
    )

    result = response.json()
    if not result["success"]:
        raise ValueError(result["error"]["message"])

    return result["data"]

# Usage
invoice = extract_invoice("./vendor-invoice.pdf")
print(f"Invoice #{invoice['invoiceNumber']} — Total: {invoice['total']} {invoice['currency']}")
Enter fullscreen mode Exit fullscreen mode

Real-World Integration: QuickBooks Auto-Entry

Here's a complete example that extracts an invoice and creates a QuickBooks bill automatically:

const QuickBooks = require('node-quickbooks');
const { extractInvoice } = require('./invoice-extractor');

async function processVendorInvoice(invoicePath, vendorId) {
  // Step 1: Extract invoice data
  const invoice = await extractInvoice(invoicePath);

  // Step 2: Map to QuickBooks bill format
  const bill = {
    VendorRef: { value: vendorId },
    TxnDate: invoice.invoiceDate,
    DueDate: invoice.dueDate,
    DocNumber: invoice.invoiceNumber,
    Line: invoice.lineItems.map((item, i) => ({
      Id: String(i + 1),
      Amount: item.amount,
      DetailType: 'AccountBasedExpenseLineDetail',
      AccountBasedExpenseLineDetail: {
        AccountRef: { value: '1' },  // Your expense account
        BillableStatus: 'NotBillable',
        UnitPrice: item.unitPrice,
        Qty: item.quantity
      },
      Description: item.description
    })),
    TotalAmt: invoice.total,
    CurrencyRef: { value: invoice.currency || 'CAD' }
  };

  // Step 3: Create in QuickBooks
  return new Promise((resolve, reject) => {
    qbo.createBill(bill, (err, result) => {
      if (err) reject(err);
      else resolve(result);
    });
  });
}

// Process all invoices in a folder
const invoiceFiles = fs.readdirSync('./inbox').filter(f => f.endsWith('.pdf'));

for (const file of invoiceFiles) {
  try {
    const bill = await processVendorInvoice(`./inbox/${file}`, VENDOR_ID);
    console.log(`✓ Created bill ${bill.DocNumber} — $${bill.TotalAmt}`);
    fs.renameSync(`./inbox/${file}`, `./processed/${file}`);
  } catch (err) {
    console.error(`✗ Failed ${file}:`, err.message);
    fs.renameSync(`./inbox/${file}`, `./errors/${file}`);
  }
}
Enter fullscreen mode Exit fullscreen mode

Handling Different Invoice Formats

The API handles all common formats without any configuration:

Format Supported Notes
Digital PDF Best accuracy
Scanned PDF Works with most scan qualities
JPEG / PNG Phone photos work too
French invoices (TVA, TTC) Pass language: 'fr'
Canadian (GST/HST/QST) Auto-detected
US invoices
European (VAT)

Processing a Folder of Invoices (Batch Approach)

const processInvoiceQueue = async (folderPath) => {
  const files = fs.readdirSync(folderPath)
    .filter(f => ['.pdf', '.jpg', '.png'].includes(path.extname(f).toLowerCase()));

  console.log(`Processing ${files.length} invoices...`);

  const results = await Promise.allSettled(
    files.map(async (file) => {
      const data = await extractInvoice(path.join(folderPath, file));
      return { file, ...data };
    })
  );

  const succeeded = results.filter(r => r.status === 'fulfilled').map(r => r.value);
  const failed = results.filter(r => r.status === 'rejected');

  console.log(`✓ ${succeeded.length} extracted, ✗ ${failed.length} failed`);

  // Export to CSV
  const csv = [
    'File,InvoiceNumber,Date,Vendor,Total,Currency',
    ...succeeded.map(r => 
      `${r.file},${r.invoiceNumber},${r.invoiceDate},"${r.vendor?.name}",${r.total},${r.currency}`
    )
  ].join('\n');

  fs.writeFileSync('./invoices-export.csv', csv);
  return succeeded;
};
Enter fullscreen mode Exit fullscreen mode

Error Handling Best Practices

async function extractInvoiceRobust(filePath, retries = 2) {
  for (let attempt = 0; attempt <= retries; attempt++) {
    try {
      return await extractInvoice(filePath);
    } catch (err) {
      if (attempt === retries) throw err;

      // Retry on timeout or server error
      if (err.message.includes('timed out') || err.status >= 500) {
        console.log(`Retry ${attempt + 1}/${retries} for ${filePath}`);
        await new Promise(r => setTimeout(r, 2000 * (attempt + 1)));
        continue;
      }

      throw err; // Don't retry on validation errors
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Cost Breakdown

At scale, here's what this costs:

Volume Manual entry cost* API cost
100 invoices/month ~$450 (5h × $90/h) Free tier
1,000 invoices/month ~$4,500 $19/month
10,000 invoices/month ~$45,000 $49/month

*Assuming $90/hour for an accounts payable specialist at 3 min/invoice.

The ROI is obvious at any scale above ~20 invoices/month.


Getting Started

  1. Get a free API key: RapidAPI — DocuSense API
  2. Free tier: 100 extractions/month
  3. No credit card required
# Quick test with curl
curl -X POST https://docusense.stackapi.dev/api/v1/documents/invoice \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"fileBase64\": \"$(base64 -i your-invoice.pdf)\", \"mimeType\": \"application/pdf\"}"
Enter fullscreen mode Exit fullscreen mode

What's Next?

Once you have invoice extraction working, you can extend it to:

  • Bank statement analysis — categorize transactions, detect patterns
  • Contract extraction — pull payment terms, dates, party names
  • T4/W-2 parsing — income verification for lending workflows

All available via the same API.


What's your current invoice processing workflow? Are you still doing it manually, or have you automated it? Share in the comments.

Top comments (0)