DEV Community

Cover image for Automating Procure-to-Pay with Precision: How I Engineered an End-to-End Invoice Processing System in n8n
Eeman Asghar
Eeman Asghar

Posted on

Automating Procure-to-Pay with Precision: How I Engineered an End-to-End Invoice Processing System in n8n

In enterprise operations, few workflows are as ripe for automation, and yet as fragile to implement as Procure-to-Pay (P2P). Mismatched invoices, missing approvals, buried purchase orders, and manual GRN reconciliation can grind any finance team to a halt. I built a production-grade P2P automation system using n8n to replace this fragmented mess with structured, auditable, and intelligent workflows.

This is not a prototype. It’s a complete automation backbone capable of ingesting vendor invoices from email, parsing and validating every detail, matching them against procurement records, and routing approvals or exceptions, all while maintaining a robust audit trail.

Why Automate This?

The traditional P2P process is loaded with friction:

  • Invoices arrive in unstructured formats, often scanned or image-based.
  • Manual extraction leads to errors and delays.
  • Validation steps — PO matching, GRN checks, approval routing — require constant coordination.
  • Finance teams spend hours triaging mismatches and searching shared drives.

My goal was to make this entire process deterministic, traceable, and modular, so each part could evolve independently without breaking the whole system.

What I Built

The architecture is composed of six connected n8n workflows, orchestrated through well-defined triggers, subflows, and handoffs:

  1. Gmail Trigger – Captures incoming invoice emails and attachments in real-time.
  2. PDF Validation – Ensures only legitimate PDF files are processed (rejects mislabeled or corrupted formats).
  3. PDF-to-Image Conversion – Uses PDF.co to normalize documents into image form for parsing consistency.
  4. OCR Pipeline – Applies AI-driven OCR (LLM-based) to extract raw text from invoice images, regardless of layout.
  5. Entity Extraction – Uses OpenAI to structure the output: PO number, GRN, vendor, amount, currency.
  6. Reconciliation Engine – Compares extracted data against SharePoint-based PO/GRN logs and flags mismatches.
  7. Approval Subflows – Routes approval requests via Gmail for PO and GRN when required, with outcomes logged back into Excel 365.

Each node is crafted for resilience, and every junction point includes validation guards, error exits, and structured logging.

Key Engineering Decisions

Modular Subflows

Instead of building a monolithic pipeline, I split the system into distinct workflows: one each for invoice parsing, PO approval, GRN validation, and audit logging. Subflows are triggered via Execute Workflow nodes with dynamic input mappings, keeping everything loosely coupled.

File Validation Before OCR

Early on, I ran into MIME-type mismatches: files with .pdf extensions that weren’t actually PDFs. I added MIME-type and extension checks to block malformed inputs early, saving compute and preventing downstream errors.

if ($binary.data.mimeType !== 'application/pdf') {
  throw new Error('Invalid file type: Must be a PDF');
}

Enter fullscreen mode Exit fullscreen mode

PDF.co Limitations

PDF.co’s API only accepts hosted files. To handle this, I built a two-stage upload + convert process. Once I had the temporary URL, I passed it into the converter. The result was a structured array of image links, which I flattened using Set and Split nodes to emit one item per page.

{
  "url": "https://secure-temp-file.pdf",
  "pages": "1-ALL",
  "outputFormat": "jpg"
}

Enter fullscreen mode Exit fullscreen mode

AI Parsing with Guardrails

OCR is handled by an OpenAI-powered LangChain node chain. I injected formatting instructions to ensure output is returned as structured JSON. If the response is malformed or empty, the system halts with an actionable error.

{
  "po_number": "PO-12345",
  "vendor": "Acme Supplies",
  "amount": 4520.75,
  "currency": "USD",
  "grn": "GRN-67890"
}

Enter fullscreen mode Exit fullscreen mode

Matching Logic with Excel 365

I used Microsoft Excel 365 on SharePoint to simulate an ERP-like registry. Matching logic compares AI outputs against actual records, and flags mismatches for review. This offered just enough structure for automation without needing a full-blown database.

Audit Trail and Compliance

Each approval or exception is logged with a timestamp in Excel. This provides a transparent audit trail, something manual processes rarely achieve.

Measurable Business Value

Once integrated into daily operations, this workflow can save 3–5 hours per week per finance team member, depending on invoice volume. It reduces human touchpoints, flags mismatches instantly, and ensures that only compliant invoices get through.

Just as importantly, it gives operations leaders peace of mind: every invoice is processed the same way, every time, with full visibility and accountability.

This project reinforced something I believe strongly: real-world automation isn’t about flashy AI wrappers. It’s about invisible reliability, quietly moving data across systems with precision, handling the edge cases, and giving humans time back where it matters.

I didn’t just automate P2P. I redesigned how it could work better, and made that design real, one node at a time.

Top comments (0)