DEV Community

Cover image for I Built an Invoice Parser API in One Day and Listed it on RapidAPI (Day 1 of 21)
Ruan Muller
Ruan Muller

Posted on

I Built an Invoice Parser API in One Day and Listed it on RapidAPI (Day 1 of 21)

I'm a self-taught developer from South Africa, currently studying for my Bachelor of Accounting. Today I started a challenge: build and publish a new API every day for 21 days straight.

Day 1 is done. Here's exactly what I built, how it works, and how you can use it.

The Problem:

Every accounting app, expense tracker, and bookkeeping tool has the same pain point, invoices come in as PDFs, images, or plain text, and somebody has to pull the structured data out of them.

Vendor name. Invoice number. Line items. Tax. Total. Due date.

It's repetitive, error-prone work. And every developer building a finance tool ends up writing the same messy regex logic to solve it.

So I built an API that does it for them.


What I Built:

The Invoice & Receipt Parser API — send it raw invoice text, get back clean structured JSON.

Input:

{
  "text": "Acme Corp\nInvoice No: INV-2024-0042\nDate: 15/03/2024\nDue: 15/04/2024\n\nWeb Design Services  2  $1500.00  $3000.00\nSEO Optimization     1   $800.00   $800.00\n\nSubtotal: $3800.00\nVAT 15%:   $570.00\nTotal Due: $4370.00"
}
Enter fullscreen mode Exit fullscreen mode

Output:

{
  "success": true,
  "data": {
    "document_type": "invoice",
    "vendor_name": "Acme Corp",
    "invoice_number": "INV-2024-0042",
    "dates": {
      "invoice_date": "15/03/2024",
      "due_date": "15/04/2024"
    },
    "currency": "USD",
    "totals": {
      "subtotal": 3800,
      "tax_rate": 15,
      "tax_amount": 570,
      "discount": null,
      "shipping": null,
      "total": 4370
    },
    "line_items": [
      {
        "description": "Web Design Services",
        "quantity": 2,
        "unit_price": 1500,
        "amount": 3000
      },
      {
        "description": "SEO Optimization",
        "quantity": 1,
        "unit_price": 800,
        "amount": 800
      }
    ],
    "confidence": {
      "score": 100,
      "level": "high"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

No AI costs. No third-party dependencies. Pure Node.js, which means near-zero running costs and sub-100ms response times.


The 6 Endpoints

Method Endpoint Description
GET /health Check API is online
POST /parse Full extraction — all fields
POST /parse/totals Financial totals only
POST /parse/line-items Line items only
POST /parse/vendor Vendor details only
POST /validate Completeness score + missing fields

The lightweight endpoints are useful for high-volume pipelines where you only need one piece of data and don't want to pay for a full parse every time.


How It Works

The core logic is a chain of regex-based extractors, each one focused on one thing.

function fullParse(text) {
  return {
    document_type:  detectDocType(text),
    vendor_name:    extractVendor(text),
    invoice_number: extractInvoiceNumber(text),
    dates:          extractDates(text),
    currency:       extractCurrency(text),
    totals:         extractTotals(text),
    line_items:     extractLineItems(text),
    payment_info:   extractPaymentInfo(text),
    contact:        extractContact(text),
    confidence:     calculateConfidence(data),
  };
}
Enter fullscreen mode Exit fullscreen mode

Currency detection checks for 25+ currency codes and symbols:

const CURRENCY_CODES = ["USD","EUR","GBP","ZAR","INR",...];

function extractCurrency(text) {
  for (const code of CURRENCY_CODES) {
    if (new RegExp(`\\b${code}\\b`).test(text)) return code;
  }
  if (text.includes("$")) return "USD";
}
Enter fullscreen mode Exit fullscreen mode

One bug I caught during testing, the tax amount extractor was matching 15 from VAT 15% instead of the actual amount 570.00. The fix was requiring a decimal point in the match:

// BROKEN — matches "15" from "VAT 15%"
/(?:vat|tax)[\s:$]*([0-9,. ]+)/i

// FIXED — requires decimal format, skips percentages
/(?:vat|tax)[\s:%\d]*?[\s:$£€]+([0-9,]+\.[0-9]{2})/i
Enter fullscreen mode Exit fullscreen mode

Always test with real messy invoice text before shipping.


Tech Stack

  • Runtime: Node.js + Express
  • Hosting: Railway (free tier)
  • Marketplace: RapidAPI
  • Dependencies: express, cors, helmet, morgan, express-rate-limit

Zero paid APIs. Zero AI costs. The whole thing costs less than $5/month to run.


Pricing on RapidAPI

Plan Price Requests
Free $0 10/month
Basic $9.99/mo 500/month
Pro $29.99/mo 5,000/month

What I Extracted From This Build

My accounting background actually helped here. I knew exactly what fields matter on a real invoice, payment terms, VAT rates, SWIFT codes, IBAN numbers. That domain knowledge made the extractor more accurate than a generic solution would be.

It's a reminder that your background, whatever it is, is an advantage when building in the right niche.


Try It

The API is live on RapidAPI, search for Invoice Receipt Parser or find my profile at [https://rapidapi.com/user/ruanmul04].

Free tier gives you 10 requests/month to test it with your own invoices.


What's Next

Day 2 tomorrow — Password Strength & Security Scorer API.

If you want to follow the 21-day build challenge, follow me here on dev.to. I'll be posting every day with the full breakdown of what I built, why, and how.

Drop a comment if you're building APIs too, always keen to connect with other developers doing the same thing. 🇿🇦


Built in South Africa. Sold globally.

Top comments (0)