Extracting structured data from PDFs and other document types is a common challenge. Instead of relying on unreliable OCR solutions or manually copying and pasting data, Documind provides a powerful API that allows you to automate data extraction seamlessly.
In this guide, you’ll learn how to:
- Set up authentication to securely access the API
- Create an extraction job for processing
- Poll for job completion and retrieve the extracted data
To make this practical, we’ll extract key details from this invoice, including:
- Invoice number
- Invoice date
- Due date
- Supplier name
- Supplier address
- Supplier email
- Supplier phone
- Supplier VAT number
- Payment method
- Bank name
- IBAN
- SWIFT/BIC code
- Payment reference
- Subtotal
- Tax amount
- Total amount
- Item name
- Item SKU
- Item quantity
- Item unit price
- Item discount
- Item total price
Once the data is extracted, it can be stored in a database for tracking invoices and payments or sent directly to accounting platforms like Xero, QuickBooks, and other financial tools. This guide will take you through the complete process so you can integrate Documind’s API into your own applications.
Step 1: Setting Up Authentication
Before making requests to the Documind API, you'll need to install the necessary dependencies and obtain your API key for authentication.
1.1 Install Required Packages
Ensure you have Node.js installed, then install axios
(for making HTTP requests) and dotenv
(for securely storing API keys):
npm install axios dotenv
1.2 Get Your API Key
- Sign in to the Documind Dashboard
- Navigate to the Settings page
- Copy your Secret API Key
To keep your key secure, create a .env file in your project directory and store the key there:
DOCUMIND_API_KEY=your-secret-api-key
Documind is still in private beta sign up hereto get access.
1.3 Create an Axios Instance
Now, set up an axios
instance with the base URL of the API and authentication headers:
import 'dotenv/config'
import axios from 'axios';
// Load API key from environment variables
const API_KEY = process.env.DOCUMIND_API_KEY;
const documindAPI = axios.create({
baseURL: 'https://api.documind.xyz',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
}
});
Step 2: Creating an Extraction Job
To extract data from the invoice, you need to create an extraction job. This involves sending the document's URL to Documind along with a schema that defines the structure of the data you want to extract.
2.1 Defining the Schema
The schema acts as a blueprint, specifying what information should be extracted from the document. It outlines the key fields and their expected data types.
For a detailed breakdown of schema definitions and best practices, check out this guide. You can also find additional schema examples in the documentation.
Here’s how you can define a schema for an invoice:
const schema =[
{
"name": "invoiceNumber",
"type": "string",
"description": "Unique identifier for the invoice"
},
{
"name": "invoiceDate",
"type": "string",
"description": "Date when the invoice was issued"
},
{
"name": "dueDate",
"type": "string",
"description": "Payment due date for the invoice"
},
{
"name": "supplier",
"type": "object",
"description": "Details of the supplier issuing the invoice",
"children": [
{
"name": "name",
"type": "string",
"description": "Supplier's name"
},
{
"name": "email",
"type": "string",
"description": "Supplier's email address"
},
{
"name": "vatNumber",
"type": "string",
"description": "Supplier's VAT or Tax ID"
}
]
},
{
"name": "payment",
"type": "object",
"description": "Payment details for the invoice",
"children": [
{
"name": "paymentMethod",
"type": "enum",
"description": "Payment method",
"values": ["Bank Transfer", "Credit Card", "Cheque"]
},
{
"name": "bankDetails",
"type": "object",
"description": "Bank details for wire transfers",
"children": [
{
"name": "bankName",
"type": "string",
"description": "Name of the bank"
},
{
"name": "iban",
"type": "string",
"description": "International Bank Account Number (IBAN)"
},
{
"name": "swift",
"type": "string",
"description": "SWIFT/BIC code for international transfers"
},
{
"name": "reference",
"type": "string",
"description": "Reference text for the payment"
}
]
}
]
},
{
"name": "financialSummary",
"type": "object",
"description": "Breakdown of financial details",
"children": [
{
"name": "subtotal",
"type": "number",
"description": "Total amount before taxes"
},
{
"name": "tax",
"type": "number",
"description": "Total tax amount applied"
},
{
"name": "totalAmount",
"type": "number",
"description": "Final total amount due after taxes"
}
]
},
{
"name": "items",
"type": "array",
"description": "List of purchased items in the invoice",
"children": [
{
"name": "name",
"type": "string",
"description": "Name of the item"
},
{
"name": "sku",
"type": "string",
"description": "Stock Keeping Unit (SKU) identifier"
},
{
"name": "quantity",
"type": "number",
"description": "Number of units purchased"
},
{
"name": "unitPrice",
"type": "number",
"description": "Price per unit"
},
{
"name": "discount",
"type": "number",
"description": "Discount applied per unit"
},
{
"name": "totalPrice",
"type": "number",
"description": "Total price for the item"
}
]
}
]
2.2 Send the PDF to Documind
Now, let's send the extraction job with the document url and the schema above:
async function createJob(file) {
try {
const response = await documindAPI.post('/run-job', {
file,
schema
});
return response.data.id; // Store Job ID for polling
} catch (error) {
console.error('Error creating extraction job:', error.response ? error.response.data : error.message);
}
}
What Happens Here?
- We send a POST request to
/run-job
- The file URL points to the document to be processed
- The schema defines the expected structure of extracted data
- The API returns a Job ID, which we use to check the status and get the results
Step 3: Polling for Job Completion
Next, poll the API until the job is complete.
async function pollJob(jobId, maxRetries = 5, delay = 5000) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const { data } = await documindAPI.get(`/job/${jobId}`);
if (data.status === "COMPLETED") {
return data.result;
}
if (data.status === "FAILED") {
throw new Error(`Extraction failed for Job ID: ${jobId}`);
}
await new Promise(resolve => setTimeout(resolve, delay));
} catch (error) {
console.error("Error retrieving job status:", error.response?.data || error.message);
if (attempt === maxRetries) throw new Error(`Max retries reached. Job ID: ${jobId}`);
}
}
}
Step 4: Putting Everything Together
async function extractData(file) {
const jobId = await createJob(file);
if (!jobId) throw new Error("Failed to create extraction job.");
const result = await pollJob(jobId);
// You can save the extracted data to a JSON file to see the results
fs.writeFileSync("invoice.json", JSON.stringify(result, null, 2));
}
// Usage
const file = "<Add your file URL here>"
extractData(file)
.then(() => console.log("Extraction process completed."))
.catch(error => console.error("Error:", error));
The Result
Once completed, you should receive structured JSON data like this:
{
"items": [
{
"sku": "SRV-1001",
"name": "Cloud Server Hosting",
"discount": 0,
"quantity": 1,
"unitPrice": 3000,
"totalPrice": 3000
},
{
"sku": "LIC-4587",
"name": "Software Licensing",
"discount": 50,
"quantity": 5,
"unitPrice": 400,
"totalPrice": 1750
},
{
"sku": "CNS-2003",
"name": "Consulting Services",
"discount": 20,
"quantity": 10,
"unitPrice": 100,
"totalPrice": 980
}
],
"dueDate": "March 1, 2024",
"payment": {
"bankDetails": {
"iban": "GB29HBUK40127612345678",
"swift": "HBUKGB4B",
"bankName": "HSBC Bank",
"reference": "Invoice #INV-2024-019"
},
"paymentMethod": "Bank Transfer"
},
"supplier": {
"name": "Tech Solutions",
"email": "accounts@techsolutions.com",
"vatNumber": "GB123456789"
},
"invoiceDate": "February 1, 2024",
"invoiceNumber": "INV-2024-019",
"financialSummary": {
"tax": 286.5,
"subtotal": 5730,
"totalAmount": 6016.5
}
}
What Next?
- You can import the extracted data into databases like PostgreSQL, Firebase, MongoDB
- Sync payments with accounting software like Xero or Quickbooks
- Generate financial reports
- Trigger payment processing workflows
Now that you've seen how to extract structured data from PDFs, you can test it out yourself! Try uploading an invoice in the playground to see how the extraction works in real-time. When you're ready to integrate this into your own applications, sign up on Documind to start extracting structured data from PDFs and other documents.
Top comments (0)