From PDF Invoices to Formatted Spreadsheets in Two API Calls

#api #programming

The Invoice Spreadsheet Problem

Every accounting team has the same workflow. Invoices arrive as PDFs — from vendors, contractors, service providers. Someone needs to get that data into a spreadsheet for reconciliation, approval, or import into accounting software.

The manual version: open the PDF, copy the line items, paste into an Excel spreadsheet, fix the formatting, repeat for the next invoice. The automated version usually involves an OCR service, a pile of regex to parse the output, and a separate process to generate the spreadsheet. Two different systems, two different failure modes, and a lot of glue code in between.

The core problem is the gap between extraction and generation. You get structured data out of a document, but then you need to transform it into a completely different format using a completely different tool. That's where the bugs live.

Extract Then Generate

The Document Extraction API and the Iteration Layer Sheet Generation API are designed to compose. The extraction output — structured JSON with typed fields — maps directly to sheet rows. No intermediate transformation layer, no format juggling.

Here's the full pipeline: PDF in, formatted XLSX out.

Step 1: Define the Extraction Schema

First, tell the Document Extraction API what fields to look for in your invoice. The schema describes the structure you expect.

{
  "fields": [
    {
      "name": "vendor_name",
      "type": "TEXT",
      "description": "Vendor legal name"
    },
    {
      "name": "invoice_number",
      "type": "TEXT",
      "description": "Invoice number"
    },
    {
      "name": "invoice_date",
      "type": "DATE",
      "description": "Invoice issue date"
    },
    {
      "name": "due_date",
      "type": "DATE",
      "description": "Payment due date"
    },
    {
      "name": "line_items",
      "type": "ARRAY",
      "description": "Invoice line items",
      "fields": [
        {
          "name": "description",
          "type": "TEXT",
          "description": "Line item description"
        },
        {
          "name": "quantity",
          "type": "DECIMAL",
          "description": "Line item quantity"
        },
        {
          "name": "unit_price",
          "type": "CURRENCY_AMOUNT",
          "description": "Unit price"
        },
        {
          "name": "total",
          "type": "CURRENCY_AMOUNT",
          "description": "Line item total"
        }
      ]
    },
    {
      "name": "subtotal",
      "type": "CURRENCY_AMOUNT",
      "description": "Invoice subtotal before tax"
    },
    {
      "name": "tax",
      "type": "CURRENCY_AMOUNT",
      "description": "Tax amount"
    },
    {
      "name": "total_due",
      "type": "CURRENCY_AMOUNT",
      "description": "Total amount due"
    }
  ]
}

Step 2: Extract the Invoice Data

Send the PDF to the extraction endpoint with your schema. The API returns structured JSON with typed values and confidence scores.

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const extractionResult = await client.extractDocument({
  files: [
    {
      type: "url",
      name: "invoice.pdf",
      url: "https://example.com/invoice.pdf",
    },
  ],
  schema: {
    fields: [
      {
        name: "vendor_name",
        type: "TEXT",
        description: "Vendor legal name",
      },
      {
        name: "invoice_number",
        type: "TEXT",
        description: "Invoice number",
      },
      {
        name: "invoice_date",
        type: "DATE",
        description: "Invoice issue date",
      },
      {
        name: "due_date",
        type: "DATE",
        description: "Payment due date",
      },
      {
        name: "line_items",
        type: "ARRAY",
        description: "Invoice line items",
        fields: [
          {
            name: "description",
            type: "TEXT",
            description: "Line item description",
          },
          {
            name: "quantity",
            type: "DECIMAL",
            description: "Line item quantity",
          },
          {
            name: "unit_price",
            type: "CURRENCY_AMOUNT",
            description: "Unit price",
          },
          {
            name: "total",
            type: "CURRENCY_AMOUNT",
            description: "Line item total",
          },
        ],
      },
      {
        name: "subtotal",
        type: "CURRENCY_AMOUNT",
        description: "Invoice subtotal before tax",
      },
      {
        name: "tax",
        type: "CURRENCY_AMOUNT",
        description: "Tax amount",
      },
      {
        name: "total_due",
        type: "CURRENCY_AMOUNT",
        description: "Total amount due",
      },
    ],
  },
});

const fields = extractionResult;

{
  "success": true,
  "data": {
    "fields": {
      "vendor_name": {
        "value": "Muller Services GmbH",
        "confidence": 0.97
      },
      "invoice_number": {
        "value": "INV-2026-0042",
        "confidence": 0.99
      },
      "invoice_date": {
        "value": "2026-03-15",
        "confidence": 0.95
      },
      "due_date": {
        "value": "2026-04-14",
        "confidence": 0.94
      },
      "line_items": {
        "value": [
          {
            "description": {
              "value": "Consulting services",
              "confidence": 0.96
            },
            "quantity": {
              "value": 40,
              "confidence": 0.98
            },
            "unit_price": {
              "value": 150.00,
              "confidence": 0.97
            },
            "total": {
              "value": 6000.00,
              "confidence": 0.97
            }
          }
        ],
        "confidence": 0.96
      },
      "subtotal": {
        "value": 6000.00,
        "confidence": 0.97
      },
      "tax": {
        "value": 1140.00,
        "confidence": 0.95
      },
      "total_due": {
        "value": 61140.00,
        "confidence": 0.98
      }
    }
  }
}

The extraction result gives you typed fields: fields.vendor_name.value is a string, fields.line_items.value is an array, fields.total_due.value is a number. Each field also carries a confidence score so you can flag low-confidence extractions for review.

Step 3: Build the Spreadsheet

Map the extracted data directly into the sheet generation payload. The line items become rows. The invoice metadata goes into a header section.

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const lineItemRows = fields.line_items.value.map((item) => [
  {
    value: item.description.value,
  },
  {
    value: item.quantity.value,
    format: "number",
  },
  {
    value: item.unit_price.value,
    format: "currency",
    currency_code: "EUR",
  },
  {
    value: item.total.value,
    format: "currency",
    currency_code: "EUR",
  },
]);

const summaryRows = [
  [
    {
      value: "Subtotal",
      styles: {
        is_bold: true,
      },
    },
    {
      value: null,
    },
    {
      value: null,
    },
    {
      value: fields.subtotal.value,
      format: "currency",
      currency_code: "EUR",
    },
  ],
  [
    {
      value: "Tax",
      styles: {
        is_bold: true,
      },
    },
    {
      value: null,
    },
    {
      value: null,
    },
    {
      value: fields.tax.value,
      format: "currency",
      currency_code: "EUR",
    },
  ],
  [
    {
      value: "Total Due",
      styles: {
        is_bold: true,
        font_size_in_pt: 12,
      },
    },
    {
      value: null,
    },
    {
      value: null,
    },
    {
      value: fields.total_due.value,
      format: "currency",
      currency_code: "EUR",
      styles: {
        is_bold: true,
        font_size_in_pt: 12,
      },
    },
  ],
];

const sheetResult = await client.generateSheet({
  format: "xlsx",
  styles: {
    header: {
      is_bold: true,
      background_color: "#2d3748",
      font_color: "#ffffff",
    },
    body: {
      font_size_in_pt: 10,
    },
  },
  sheets: [
    {
      name: `Invoice ${fields.invoice_number.value}`,
      columns: [
        {
          name: "Description",
          width: 35,
        },
        {
          name: "Quantity",
          width: 12,
        },
        {
          name: "Unit Price",
          width: 16,
        },
        {
          name: "Total",
          width: 16,
        },
      ],
      rows: [...lineItemRows, ...summaryRows],
    },
  ],
});

{
  "success": true,
  "data": {
    "buffer": "UEsDBBQAAAAIAA...",
    "mime_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  }
}

Two API calls. The PDF becomes structured data, and the structured data becomes a formatted spreadsheet. No local PDF parser, no XLSX library, no intermediate file formats.

Scaling to Multiple Invoices

The single-invoice pipeline extends naturally to batch processing. Extract all invoices, then consolidate them into one spreadsheet with multiple sheets — or one sheet with all line items.

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const invoiceFiles = ["invoice-001.pdf", "invoice-002.pdf", "invoice-003.pdf"];

const sheets = await Promise.all(
  invoiceFiles.map(async (filename) => {
    const pdf = await Bun.file(filename).arrayBuffer();
    const pdfBase64 = Buffer.from(pdf).toString("base64");

    const extraction = await client.extractDocument({
      file: pdfBase64,
      schema: extractionSchema,
    });

    const invoiceFields = extraction.data.fields;

    return {
      name: `Invoice ${invoiceFields.invoice_number.value}`,
      columns: [
        {
          name: "Description",
          width: 35,
        },
        {
          name: "Quantity",
          width: 12,
        },
        {
          name: "Unit Price",
          width: 16,
        },
        {
          name: "Total",
          width: 16,
        },
      ],
      rows: invoiceFields.line_items.value.map((item) => [
        {
          value: item.description.value,
        },
        {
          value: item.quantity.value,
          format: "number",
        },
        {
          value: item.unit_price.value,
          format: "currency",
          currency_code: "EUR",
        },
        {
          value: item.total.value,
          format: "currency",
          currency_code: "EUR",
        },
      ]),
    };
  }),
);

const consolidatedResult = await client.generateSheet({
  format: "xlsx",
  styles: {
    header: {
      is_bold: true,
      background_color: "#2d3748",
      font_color: "#ffffff",
    },
  },
  sheets,
});

{
  "success": true,
  "data": {
    "buffer": "UEsDBBQAAAAIAA...",
    "mime_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  }
}

Each invoice gets its own sheet tab. One XLSX file, all invoices, properly formatted.

When to Use This Pipeline

This extract-then-generate pattern works for any document-to-spreadsheet workflow:

Invoices — Vendor invoices to accounts payable spreadsheets
Receipts — Expense receipts to expense reports
Purchase orders — PO documents to procurement tracking sheets
Contracts — Contract terms to comparison matrices
Bank statements — PDF statements to transaction logs

The key insight is that document structure maps cleanly to spreadsheet structure. A table in a PDF maps to rows and columns. Currency values map to formatted currency cells. Dates map to date-formatted cells. The extraction gives you typed data, and the sheet generation consumes typed data. No parsing, no regex, no format conversion in between.

Get Started

The Document Extraction docs cover schema design and field types. The Sheet Generation docs cover the full spreadsheet API including styles, formulas, and multi-sheet support.

Both APIs use the same authentication and the same response format. If you're already using one, adding the other takes minutes. Start the 7-day trial to try the pipeline yourself.