DEV Community

Cover image for Scanned Invoice Processing with the Large Language Model
Ranjan Dailata
Ranjan Dailata

Posted on

Scanned Invoice Processing with the Large Language Model

Introduction

In this blog post, you will be guided with the steps on how to accomplish the scanned invoice parsing using the state of the art "Gemini Pro Vision" Large Language Model. You will be stunned at the way how the LLMs are capable of parsing and extracting the structured information.

Hands on

You will be now demonstrated the most excited part of getting the hands dirty in performing the invoice OCR. Please follow the below steps.

GeminiPro

  • On the prompt editor, mention the following set of prompt instructions for effectively parsing the invoice images.
Prompt 1: Identify metadata like invoice number, date, currency 
Prompt 2: Extract supplier details like name, address, contact info
Prompt 3: Identify customer name and billing address
Prompt 4: Classify invoice type as products, services, rentals
Prompt 5: Parse out line items table from document
Prompt 6: Split line items into individual entries
Prompt 7: Extract item description from each line entry  
Prompt 8: Identify units or quantity billed per line item 
Prompt 9: Define rate/price per unit per line entry
Prompt 10: Calculate subtotal for each line item based on rate*quantity
Prompt 11: Sum all line item subtotals for grand total amount
Prompt 12: Extract total taxes for summed tax amounts
Prompt 13: Classify extracted information into schema

Convert the response to JSON format
Enter fullscreen mode Exit fullscreen mode
  • Include the below statement for outputting the response in JSON format.

Convert the response to JSON format

  • Paste the invoice image which you wish to process on the Free Form Prompt editor just below the "Convert the response to JSON format" statement.

InvoiceOCR

  • Run the prompt to see the invoice structured JSON information.
{
  "invoice_number": "52148",
  "invoice_date": "2020-01-02",
  "currency": "USD",
  "supplier_name": "Brand Name",
  "supplier_address": "24 Dummy Street Area, Location, Lorem Ipsum, 570x55x",
  "customer_name": "Dwayne Clark",
  "customer_address": "24 Dummy Street Area, Location, Lorem Ipsum, 570x55x",
  "invoice_type": "products",
  "line_items": [
    {
      "item_description": "Lorem Ipsum Dolor",
      "quantity": 1,
      "rate": 50.00,
      "subtotal": 50.00
    },
    {
      "item_description": "Pellentesque id neque ligula",
      "quantity": 3,
      "rate": 20.00,
      "subtotal": 60.00
    },
    {
      "item_description": "Interdum et malesuada fames",
      "quantity": 1,
      "rate": 10.00,
      "subtotal": 20.00
    },
    {
      "item_description": "Vivamus volutpat facibus",
      "quantity": 1,
      "rate": 90.00,
      "subtotal": 90.00
    }
  ],
  "subtotal": 220.00,
  "taxes": 0.00,
  "total": 220.00
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

Hope you have learned the art of scanned invoice processing using the large language model via the crafted prompt instructions. Please be careful in using the LLMs. This blog post is for educational purposes, do not process with sensitive documents consisting of personal information. Please understand the Google Terms and Conditions in using the Maker Suit.

GoogleTermsAndConditions

Top comments (0)