DEV Community

Cover image for Parsing Invoices with Mindee's API
Doug Sillars for Mindee

Posted on • Originally published at mindee.com

Parsing Invoices with Mindee's API

Parsing Invoices with Mindee’s API

 

 

Scanning and inputting data from invoices into accounting systems is a slow and tedious process.  Using Mindee’s invoice parsing API, all of the pertinent data can be automatically and accurately extracted from your invoice in seconds, allowing for fast and painless data entry.  

 

Interested in learning more? In this tutorial, we will walk through the steps to use Mindee’s Invoice API. 

 

API Prerequisites

 

  1. You’ll need Mindee account. Sign up for free.  Then confirm your email to login.
  2. An invoice.  Use a recently received invoice, or do a Google Image search for a invoice and download a few to test with.  

 

Setting up the API

 

Log into you Mindee account and access your Invoices API environment by clicking the Invoices card :

 

To activate the API, click the “Try for Free” button to access the free tier. You’ll land on the dashboard page - this page will show the usage of your account with the Invoice API - of course, it is empty now.  On the left navigation, there are links to “Documentation”, “Credentials” and “Live Interface”.  The docs tab has all of the technical details you’ll need to build for the Invoce API endpoint, and the Live Interface is a cool interactive demo. Rather than try out the demo, we want to build with the API,  so click on “Credentials” to create an API token.

 

Add a new token. In this example, I’ve named it “Tutorial”


 

Click “Add New Key” and you’ll be able to see your API token.

Now, we are ready to make an API call.  In this example, we’ll be using cURL.

 

curl -X POST \ https://api.mindee.net/products/invoices/v1/predict  
    -H 'X-Inferuser-Token: {apiToken}'  
    -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW'  
    -F file=@/path/to/your/file.png

 

 

Simply replace ```{apiToken}``` with your new API token and ```/path/to/your/file/png``` with the path to your invoice - wither an image or a pdf. 

 

NOTE: You can also copy this code right from the documentation tab of the API with your API token inserted for you.

Here’s the invoice I used - the roofer who fixed the gasket around our chimney this spring (I’ve obfuscated my home address in the image):

 

 

Pasting the cURL sample into my terminal, I hit enter and about a second later, I received a JSON response with the receipt details.  The full JSON can be accessed here (link to JSON file).  Since the response is quite verbose, we will walk through the various fields section by section.

 

Extracted fields

 

Summary & Documents section

 

The first two sections of the response contain information about the API call made:

    "call": {
        "endpoint": { "name": "invoices", "version": "1.0" },
        "finished_at": "2020-09-11T18:39:58+00:00",
        "id": "3b4ac75a-f800-466b-bfc4-56c8f8e75134",
        "n_documents": 1,
        "n_inputs": 1,
        "processing_time": 1.047,
        "started_at": "2020-09-11T18:39:57+00:00"
    },
    "documents": [
        {
            "id": "5d72b0c2-16dd-481d-9e8b-031c5d1ab33a",
            "name": "Screenshot 2020-09-11 at 19.39.13.png"
        }
    ],

 

The call section tells us that we ran on the v1 of the invoice endpoint, uploading one document that is one page long.  After a second, the file was processed, and the response transmitted back to me.  The documents section gives the Mindee id for the file, and the filename (and you can tell I used a screenshot of the original invoice).

 

Predictions: 

 

This is the fun part, where the API extraxcts pertinent details form teh invoice automatcilly, saving tedious manual data entry.  Every invoice is different, and will have different fields. This invoice, being sent from the UNited States will not have some common European features like and IBAN or Tax Id.  

Other peices are rpesent in this invoice, but not picked up by the API.  We are planning multiple additional releases to this API in 2020 that will continue to improve the accuracy and precision of the results.  

 

Company Number

In this case, there is no company number in the invoice:

"company_number": {
                "probability": 0,
                "segmentation": { "bounding_box": [] },
                "type": "N/A",
                "value": "N/A"
            },

 

Due Date

My invoice does have a due date, but it was not extracted. 

            "due_date": {
                "iso": "N/A",
                "probability": 0,
                "raw": "N/A",
                "segmentation": { "bounding_box": [] }
            },

 

Invoice Date

The work was done in late June of this year, and the API correctly extracts the invoice date, along with four [x,y] points boxing in the value:

            "invoice_date": {
                "iso": "2020-06-29",
                "probability": 0.99,
                "segmentation": {
                    "bounding_box": [
                        [ 0.766, 0.392 ],
                        [ 0.855, 0.392 ],
                        [ 0.855, 0.416 ],
                        [ 0.766, 0.416 ]
                    ]
                }
            },

 

Invoice Number

The invoice is numbered 1277, which is extracted by the API, along with the four [x,y] corrdinates denoting the location in the image.

            "invoice_number": {
                "probability": 0.99,
                "segmentation": {
                    "bounding_box": [
                        [ 0.766, 0.363 ],
                        [ 0.807, 0.363 ],
                        [ 0.807, 0.382 ],
                        [ 0.766, 0.382 ]
                    ]
                },
                "value": "1277"
            },

 

Locale

Based on the information on the invoice, the API is able to predict (with 82% confidence) that this bill originated in the United States, is in USD, and is in English.

"locale": { "currency": "USD", "language": "en", "probability": 0.82 },

 

Orientation

The invoice did not require rotation to be parsed.

"orientation": { "degrees": 0, "probability": 0.99 },

 

Payment Details

The invoice does not contain any payment details, so were not extracted.

"payment_details": {
                "iban": "N/A",
                "probability": 0,
                "segmentation": { "bounding_box": [] }
            },

 

Supplier

The API is only 50% confident in this result, but it does extract the name of the company (despite without any spaces in the name).  

            "supplier": {
                "probability": 0.5,
                "segmentation": {
                    "bounding_box": [
                        [ 0.446, 0.066 ],
                        [ 0.561, 0.066 ],
                        [ 0.561, 0.083 ],
                        [ 0.446, 0.083 ]
                    ]
                },
                "value": "SILVERHAMMER"
            },

 

Tax Id

Companies in the US generally do not list their Tax IDs on invoices, so this was not found.

            "tax_id": {
                "probability": 0,
                "segmentation": { "bounding_box": [] },
                "value": "N/A"
            },

 

Taxes

The API did not extract the texes, or the percentage tax rate (we have an update later this year t better extract US tax details.)

"taxes": [],

 

Total without Taxes

The API did not extract the taxes, but did get the cost before taxes with 99% confidence.  

    "total_excl": {
                "amount": 350,
                "probability": 0.99,
                "segmentation": {
                    "bounding_box": [
                        [ 0.874, 0.803 ],
                        [ 0.931, 0.803 ],
                        [ 0.931, 0.825 ],
                        [ 0.874, 0.825 ]
                    ]
                }
            },

 

Total with Taxes

The total with taxes was alos extracted accurately with 99% confidence, meaning that the tax value could easily be calculated with a simple difference.

"total_incl": {
                "amount": 380.45,
                "probability": 0.99,
                "segmentation": {
                    "bounding_box": [
                        [ 0.842, 0.897 ],
                        [ 0.933, 0.897 ],
                        [ 0.933, 0.93 ],
                        [ 0.842, 0.93 ]
                    ]
                }
            }

 

Conclusion

Manual entry of invoices is a time consuming task. Adding the Mindee API to your system will simplify the extraction and make your accounts paybale processes run more accurately and smoothly.  Do you have more questions? Click the chat button at the bottom right.

 

 

 

 

Top comments (0)