DEV Community

Cover image for Empower your models with ready-to-use AI APIs on Dataiku
Eden AI
Eden AI

Posted on • Originally published at edenai.co

Empower your models with ready-to-use AI APIs on Dataiku

In this tutorial, we'll show you how to integrate Eden AI's Invoice parser API into your data processing workflow using Dataiku to help streamline your financial operations and free up time for more important tasks.

The same process applies if you want to include other features like : Image taggingExplicit content detectionText analysis and many more AI APIs we offer.

Build AI on Dataiku with Eden AI

Eden AI is used by AI experts to quickly test, choose and integrate ready-to-use AI APIs. Managing multiple accounts for each app can be a tough job, but with Eden AI, you can connect and manage all your APIs on a single account.

Since some AI providers can be complex to implement, we wanted to simplify the integration to make AI APIs accessible as fast as possible.

Eden AI allows you to solve multiple AI tasks on Dataiku:

Another advantage of using Eden AI on Dataiku is the flexibility it provides in terms of selecting the best AI features and providers for a particular task, or even combining multiple providers to create a solution more suited for their use case.

Let's practice with Invoice parsing!

Just like Receipt and Resume Parsing, Invoice Parsing is a tool powered by OCR to extract and digitalize meaningful data, Computer Vision to identify structure of the document, and NLP techniques to pin down the fields. Invoice parser technology extracts key information from an invoice (.pdf, .png or .jpg format) such as the invoice ID, total amount due, invoice date, customer name, etc.

Image description

Invoice Processing implies the necessity of software and technology to automate the processing and management of invoices. It includes tasks such as capturing invoice data, validating it in comparison to purchase orders, and routing it for approval, payment and archiving. The goal of AI in invoice processing is to improve efficiency, accuracy, and speed in handling invoices without any human intervention.

‍Try Eden AI for FREE

How to execute invoice parsing in Dataiku?

If you're looking for an easier and faster way to execute invoice parsing API in Dataiku, skip the tutorial and watch the video below:

https://www.loom.com/share/1cae52725a80469399b9fb0bd7f7a089

The steps to extract information from invoices using Eden AI invoice parser in Dataiku are as follows:

  1. Get your API key and install Dataiku DSS
  2. Create or open a Dataiku project.
  3. Create a folder dataset and upload your invoices.
  4. Create a new recipe and choose the type of recipe you want to create.
  5. Code the connection to Eden Ai invoice parser API and extract basic information from the invoice.
  6. Import the invoices from the folder dataset.
  7. Call the function defined in your code and write the dataframe response into the output dataset.

1. Get started with Dataiku

To use the Eden AI API with Dataiku, you’ll need the following requirements:

  • Dataiku DSS installed and configured.
  • Your API key for FREE on Eden AI:

Image description

Get your API key for FREE

2. Create a project in Dataiku

To begin with, you’ll need to create a new Dataiku project or open an existing one:

https://uploads-ssl.webflow.com/61e7d259b7746e3f63f0b6be/6400627a8a1590c0e705452e_dataiku%20create%20project.png

Once your project is open, click on "New Dataset" located on the right-hand side panel, then select the "Folder" option to create a folder dataset:

https://uploads-ssl.webflow.com/61e7d259b7746e3f63f0b6be/6400629d3bd77765b28fc380_create%20folder.png

https://uploads-ssl.webflow.com/61e7d259b7746e3f63f0b6be/640062924ae3955180c56bd1_files%20-%3E%20files%20in%20folder.png

3. Create your first code recipe in Dataiku

Next, you’ll need to upload your invoices in the folder as follows:

https://uploads-ssl.webflow.com/61e7d259b7746e3f63f0b6be/640062c894856a37d180c34b_your%20invoice.png

Once your invoices are imported into the folder, you’ll need to create a new recipe by clicking on the action button. Then, select the new code recipe:

https://uploads-ssl.webflow.com/61e7d259b7746e3f63f0b6be/640062db81f95a2aa5441b4f_choose%20recipe.png

You can choose the type of recipe you want to create, such as Python or Shell. You will also need to create a dataset output for the recipe and give it a name:

https://uploads-ssl.webflow.com/61e7d259b7746e3f63f0b6be/640063f27efc0c20dbb95802_output%20dataset.png

https://uploads-ssl.webflow.com/61e7d259b7746e3f63f0b6be/640062ea81f95a2bda441d32_invoice%20recipe.png

4. Start coding the connection to Eden AI

After creating the recipe, you can start coding the connection to Eden AI invoice parser. You’ll need to define the invoice parser endpoint that you want to connect to and call the API with your key:

def edenai_invoice(invoice, providers):
   url = "https://api.edenai.run/v2/ocr/invoice_parser"
   totals = []
   sub_totals = []
   customer_names = []
   customer_addresses = []
   headers = {
       "authorization": "Bearer Your API KEY"
   }
   data={"providers": ','.join(providers), "language":"en"}
   files = {"file": ("image.png", invoice, "application/octet-stream")}

   try:

    response = requests.post(url, data=data, files=files, headers=headers).json()
   except ValueError as e:
       raise ValueError(str(e))

   if 'error' in response:
       raise Exception(response['error']['message'])
Enter fullscreen mode Exit fullscreen mode

5. Put the response in a Pandas dataframe

Once you have retrieved the data from the API, you’ll need to put the response in a Pandas dataframe. In this example, we chose to extract some basic information from the invoice, such as total, subtotal, customer name, and customer address:

 for pro in providers:
       total = response.get(pro,{}).get('extracted_data',[{}])[0].get('invoice_total')
       sub_total = response.get(pro,{}).get('extracted_data',[{}])[0].get('invoice_subtotal')
       customer_name = response.get(pro,{}).get('extracted_data',[{}])[0].get('customer_information',{}).get('customer_name')
       customer_address = response.get(pro,{}).get('extracted_data',[{}])[0].get('customer_information',{}).get('customer_address')
       totals.append(total)
       sub_totals.append(sub_total)
       customer_names.append(customer_name)
       customer_addresses.append(customer_address)


   df = pd.DataFrame(list(zip(totals, sub_totals, customer_names, customer_addresses)),columns =['total', 'sub_totals','customer_name','customer_address'])
   df.insert(loc=0, column='providers', value=providers)
Enter fullscreen mode Exit fullscreen mode

6. Import your invoices

Once you have coded your Eden AI invoice call and returned the data in a structured format (Pandas dataframe), you’ll need to import the invoices from the folder dataset.


# You can either go through all the files in the folder or in our case one file.
for img in [item["fullPath"] for item in invoices.get_path_details()["children"]]:
   with invoices.get_download_stream(path=img) as stream:
       data=stream.read()
Enter fullscreen mode Exit fullscreen mode

7. Call the function defined in your code

Finally, you’ll need to call the function defined early on and apply it to the invoices with the providers that you want.

# Call the process function
output_df = edenai_invoice(data, ['mindee','google','microsoft','amazon','base64'])
Enter fullscreen mode Exit fullscreen mode

Last but not least, don’t forget to write the dataframe response into the output dataset!

invoice_output_dataset = dataiku.Dataset("invoice_output") # import your output data
invoice_output_dataset.write_with_schema(test) # Write your results
Enter fullscreen mode Exit fullscreen mode

By following these steps, you’ll be able to get the extracted information from the invoices in a structured format as follow :

https://uploads-ssl.webflow.com/61e7d259b7746e3f63f0b6be/640062f79daa176bfe8cfce1_data_response.png

Congrats 🥳 You're all set and ready to automate your invoice processing with Dataiku!

You can access to the full code sample for the recipe here :

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import requests


# Read recipe inputs
invoices = dataiku.Folder("OqrjsSjE")
invoices_info = invoices.get_info()


# Compute recipe outputs
# ==============================================================================
# AUXILIARY FUNCTIONS
# ==============================================================================
def edenai_invoice(invoice, providers):
   url = "https://api.edenai.run/v2/ocr/invoice_parser"
   totals = []
   sub_totals = []
   customer_names = []
   customer_addresses = []
   headers = {
       "authorization": "Bearer Your API KEY"
   }
   data={"providers": ','.join(providers), "language":"en"}
   files = {"file": ("image.png", invoice, "application/octet-stream")}


   try:
       response = requests.post(url, data=data, files=files, headers=headers).json()
   except ValueError as e:
       raise ValueError(str(e))

   if 'error' in response:
       raise Exception(response['error']['message'])

   for pro in providers:
       total = response.get(pro,{}).get('extracted_data',[{}])[0].get('invoice_total')
       sub_total = response.get(pro,{}).get('extracted_data',[{}])[0].get('invoice_subtotal')
       customer_name = response.get(pro,{}).get('extracted_data',[{}])[0].get('customer_information',{}).get('customer_name')
       customer_address = response.get(pro,{}).get('extracted_data',[{}])[0].get('customer_information',{}).get('customer_address')
       totals.append(total)
       sub_totals.append(sub_total)
       customer_names.append(customer_name)
       customer_addresses.append(customer_address)


   df = pd.DataFrame(list(zip(totals, sub_totals, customer_names, customer_addresses)),columns =['total', 'sub_totals','customer_name','customer_address'])
   df.insert(loc=0, column='providers', value=providers)

   return df

# You can either go through all the files in the folder or in our case one file.
for img in [item["fullPath"] for item in invoices.get_path_details()["children"]]:
   with invoices.get_download_stream(path=img) as stream:
       data=stream.read()

# Call the process function
output_df = edenai_invoice(data, ['mindee','google','microsoft','amazon','base64'])


invoice_output_dataset = dataiku.Dataset("invoice_output") # import your output data
invoice_output_dataset.write_with_schema(test) # Write your results
Enter fullscreen mode Exit fullscreen mode

If you're interesting in more low-code tools, have a look at our step-by-step tutorials on how to bring AI to your application with Power AppsGoogle App ScriptRetoolMakeIFTTTn8nBubble, and Zapier.

Create your Account on Eden AI

Top comments (0)