DEV Community

Iteration Layer
Iteration Layer

Posted on • Originally published at iterationlayer.com

From PDF Invoices to Formatted Spreadsheets in Two API Calls

The Invoice Spreadsheet Problem

Every accounting team has the same workflow. Invoices arrive as PDFs — from vendors, contractors, service providers. Someone needs to get that data into a spreadsheet for reconciliation, approval, or import into accounting software.

The manual version: open the PDF, copy the line items, paste into Excel, fix the formatting, repeat for the next invoice. The automated version usually involves an OCR service, a pile of regex to parse the output, and a separate process to generate the spreadsheet. Two different systems, two different failure modes, and a lot of glue code in between.

The core problem is the gap between extraction and generation. You get structured data out of a document, but then you need to transform it into a completely different format using a completely different tool. That's where the bugs live.

Extract Then Generate

The Document Extraction API and the Sheet Generation API are designed to compose. The extraction output — structured JSON with typed fields — maps directly to sheet rows. No intermediate transformation layer, no format juggling.

Here's the full pipeline: PDF in, formatted XLSX out.

Step 1: Define the Extraction Schema

First, tell the Document Extraction API what fields to look for in your invoice. The schema describes the structure you expect.

{
  "vendor_name": {
    "type": "text"
  },
  "invoice_number": {
    "type": "text"
  },
  "invoice_date": {
    "type": "date"
  },
  "due_date": {
    "type": "date"
  },
  "line_items": {
    "type": "array",
    "items": {
      "description": {
        "type": "text"
      },
      "quantity": {
        "type": "number"
      },
      "unit_price": {
        "type": "currency"
      },
      "total": {
        "type": "currency"
      }
    }
  },
  "subtotal": {
    "type": "currency"
  },
  "tax": {
    "type": "currency"
  },
  "total_due": {
    "type": "currency"
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Extract the Invoice Data

Send the PDF to the extraction endpoint with your schema. The API returns structured JSON with typed values and confidence scores.

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const extractionResult = await client.extract({
  file: invoiceBase64,
  schema: {
    vendor_name: {
      type: "text",
    },
    invoice_number: {
      type: "text",
    },
    invoice_date: {
      type: "date",
    },
    due_date: {
      type: "date",
    },
    line_items: {
      type: "array",
      items: {
        description: {
          type: "text",
        },
        quantity: {
          type: "number",
        },
        unit_price: {
          type: "currency",
        },
        total: {
          type: "currency",
        },
      },
    },
    subtotal: {
      type: "currency",
    },
    tax: {
      type: "currency",
    },
    total_due: {
      type: "currency",
    },
  },
});

const fields = extractionResult.data.fields;
Enter fullscreen mode Exit fullscreen mode
{
  "success": true,
  "data": {
    "fields": {
      "vendor_name": {
        "value": "Acme Services LLC",
        "confidence": 0.97
      },
      "invoice_number": {
        "value": "INV-2026-0042",
        "confidence": 0.99
      },
      "invoice_date": {
        "value": "2026-03-15",
        "confidence": 0.95
      },
      "due_date": {
        "value": "2026-04-14",
        "confidence": 0.94
      },
      "line_items": {
        "value": [
          {
            "description": {
              "value": "Consulting services",
              "confidence": 0.96
            },
            "quantity": {
              "value": 40,
              "confidence": 0.98
            },
            "unit_price": {
              "value": 150.00,
              "confidence": 0.97
            },
            "total": {
              "value": 6000.00,
              "confidence": 0.97
            }
          }
        ],
        "confidence": 0.96
      },
      "subtotal": {
        "value": 6000.00,
        "confidence": 0.97
      },
      "tax": {
        "value": 495.00,
        "confidence": 0.95
      },
      "total_due": {
        "value": 6495.00,
        "confidence": 0.98
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The extraction result gives you typed fields: fields.vendor_name.value is a string, fields.line_items.value is an array, fields.total_due.value is a number. Each field also carries a confidence score so you can flag low-confidence extractions for review.

Step 3: Build the Spreadsheet

Map the extracted data directly into the sheet generation payload. The line items become rows. The invoice metadata goes into a header section.

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const lineItemRows = fields.line_items.value.map((item) => [
  {
    value: item.description.value,
  },
  {
    value: item.quantity.value,
    format: "number",
  },
  {
    value: item.unit_price.value,
    format: "currency",
    currency_code: "USD",
  },
  {
    value: item.total.value,
    format: "currency",
    currency_code: "USD",
  },
]);

const summaryRows = [
  [
    {
      value: "Subtotal",
      styles: {
        is_bold: true,
      },
    },
    {
      value: null,
    },
    {
      value: null,
    },
    {
      value: fields.subtotal.value,
      format: "currency",
      currency_code: "USD",
    },
  ],
  [
    {
      value: "Tax",
      styles: {
        is_bold: true,
      },
    },
    {
      value: null,
    },
    {
      value: null,
    },
    {
      value: fields.tax.value,
      format: "currency",
      currency_code: "USD",
    },
  ],
  [
    {
      value: "Total Due",
      styles: {
        is_bold: true,
        font_size_in_pt: 12,
      },
    },
    {
      value: null,
    },
    {
      value: null,
    },
    {
      value: fields.total_due.value,
      format: "currency",
      currency_code: "USD",
      styles: {
        is_bold: true,
        font_size_in_pt: 12,
      },
    },
  ],
];

const sheetResult = await client.generateSheet({
  format: "xlsx",
  styles: {
    header: {
      is_bold: true,
      background_color: "#2d3748",
      font_color: "#ffffff",
    },
    body: {
      font_size_in_pt: 10,
    },
  },
  sheets: [
    {
      name: `Invoice ${fields.invoice_number.value}`,
      columns: [
        {
          name: "Description",
          width: 35,
        },
        {
          name: "Quantity",
          width: 12,
        },
        {
          name: "Unit Price",
          width: 16,
        },
        {
          name: "Total",
          width: 16,
        },
      ],
      rows: [...lineItemRows, ...summaryRows],
    },
  ],
});
Enter fullscreen mode Exit fullscreen mode
{
  "success": true,
  "data": {
    "buffer": "UEsDBBQAAAAIAA...",
    "mime_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  }
}
Enter fullscreen mode Exit fullscreen mode

Two API calls. The PDF becomes structured data, and the structured data becomes a formatted spreadsheet. No local PDF parser, no XLSX library, no intermediate file formats.

Scaling to Multiple Invoices

The single-invoice pipeline extends naturally to batch processing. Extract all invoices, then consolidate them into one spreadsheet with multiple sheets — or one sheet with all line items.

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const invoiceFiles = ["invoice-001.pdf", "invoice-002.pdf", "invoice-003.pdf"];

const sheets = await Promise.all(
  invoiceFiles.map(async (filename) => {
    const pdf = await Bun.file(filename).arrayBuffer();
    const pdfBase64 = Buffer.from(pdf).toString("base64");

    const extraction = await client.extract({
      file: pdfBase64,
      schema: extractionSchema,
    });

    const invoiceFields = extraction.data.fields;

    return {
      name: `Invoice ${invoiceFields.invoice_number.value}`,
      columns: [
        {
          name: "Description",
          width: 35,
        },
        {
          name: "Quantity",
          width: 12,
        },
        {
          name: "Unit Price",
          width: 16,
        },
        {
          name: "Total",
          width: 16,
        },
      ],
      rows: invoiceFields.line_items.value.map((item) => [
        {
          value: item.description.value,
        },
        {
          value: item.quantity.value,
          format: "number",
        },
        {
          value: item.unit_price.value,
          format: "currency",
          currency_code: "USD",
        },
        {
          value: item.total.value,
          format: "currency",
          currency_code: "USD",
        },
      ]),
    };
  }),
);

const consolidatedResult = await client.generateSheet({
  format: "xlsx",
  styles: {
    header: {
      is_bold: true,
      background_color: "#2d3748",
      font_color: "#ffffff",
    },
  },
  sheets,
});
Enter fullscreen mode Exit fullscreen mode
{
  "success": true,
  "data": {
    "buffer": "UEsDBBQAAAAIAA...",
    "mime_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  }
}
Enter fullscreen mode Exit fullscreen mode

Each invoice gets its own sheet tab. One XLSX file, all invoices, properly formatted.

When to Use This Pipeline

This extract-then-generate pattern works for any document-to-spreadsheet workflow:

  • Invoices — Vendor invoices to accounts payable spreadsheets
  • Receipts — Expense receipts to expense reports
  • Purchase orders — PO documents to procurement tracking sheets
  • Contracts — Contract terms to comparison matrices
  • Bank statements — PDF statements to transaction logs

The key insight is that document structure maps cleanly to spreadsheet structure. A table in a PDF maps to rows and columns. Currency values map to formatted currency cells. Dates map to date-formatted cells. The extraction gives you typed data, and the sheet generation consumes typed data. No parsing, no regex, no format conversion in between.

Get Started

The Document Extraction docs cover schema design and field types. The Sheet Generation docs cover the full spreadsheet API including styles, formulas, and multi-sheet support.

Both APIs use the same authentication and the same response format. If you're already using one, adding the other takes minutes. Sign up for a free account to try the pipeline yourself.

Top comments (0)