The DocProcessor API provides intelligent document processing capabilities for extracting structured data from PDFs, invoices, contracts, and other document types. This API leverages machine learning to automatically identify and parse key information, eliminating manual data entry and reducing processing time.
What DocProcessor Does
DocProcessor analyzes uploaded documents and returns structured JSON data containing extracted text, metadata, and identified entities. The API supports various document formats including PDFs, invoices, contracts, forms, and receipts. It automatically detects document types and applies appropriate parsing logic to extract relevant fields such as dates, amounts, addresses, and custom data points.
Key capabilities include:
- Text extraction with positional data
- Entity recognition (dates, amounts, addresses, names)
- Document classification
- Table and form field detection
- Multi-page document processing
Getting Started
First, create an account to obtain your API key:
curl -X POST https://api.aaido.dev/signup \
-H "Content-Type: application/json" \
-d '{
"email": "your-email@example.com",
"password": "your-secure-password"
}'
The signup response includes your API key:
{
"status": "success",
"api_key": "dp_12345abcdef...",
"user_id": "user_67890"
}
Store this API key securely as you'll need it for all subsequent requests.
Basic Document Processing
Uploading and Processing a Document
The primary endpoint accepts document uploads via multipart form data:
curl -X POST https://api.aaido.dev/v1/products/docprocessor \
-H "Authorization: Bearer dp_12345abcdef..." \
-F "file=@invoice.pdf" \
-F "document_type=invoice" \
-F "extract_tables=true"
Parameters:
-
file: The document file (required) -
document_type: Hint for processing type (optional:auto,invoice,contract,receipt) -
extract_tables: Boolean flag for table extraction (default: false) -
language: Document language code (default:auto)
Response Structure
The API returns a comprehensive JSON response:
{
"document_id": "doc_abc123",
"status": "completed",
"document_type": "invoice",
"pages": 2,
"processing_time": 1247,
"extracted_data": {
"text": "Invoice #INV-2024-001\nDate: 2024-01-15\nAmount: $1,250.00...",
"entities": {
"invoice_number": "INV-2024-001",
"date": "2024-01-15",
"total_amount": 1250.00,
"currency": "USD",
"vendor_name": "Acme Corp",
"vendor_address": "123 Main St, City, ST 12345"
},
"tables": [
{
"page": 1,
"rows": [
["Item", "Quantity", "Price", "Total"],
["Web Development", "1", "$1000.00", "$1000.00"],
["Consulting", "5 hours", "$50.00", "$250.00"]
]
}
],
"confidence_scores": {
"overall": 0.94,
"entities": {
"invoice_number": 0.98,
"total_amount": 0.96,
"date": 0.92
}
}
}
}
Practical Use Cases
1. Invoice Processing Automation
Automate accounts payable by extracting key invoice data:
curl -X POST https://api.aaido.dev/v1/products/docprocessor \
-H "Authorization: Bearer dp_12345abcdef..." \
-F "file=@vendor_invoice.pdf" \
-F "document_type=invoice" \
-F "extract_tables=true" \
-F "extract_line_items=true"
This extracts vendor information, invoice numbers, dates, amounts, and line items for direct integration into accounting systems. The structured response allows automatic validation against purchase orders and streamlines approval workflows.
2. Contract Analysis and Data Extraction
Extract critical terms and dates from legal contracts:
curl -X POST https://api.aaido.dev/v1/products/docprocessor \
-H "Authorization: Bearer dp_12345abcdef..." \
-F "file=@service_agreement.pdf" \
-F "document_type=contract" \
-F "extract_clauses=true" \
-F "identify_parties=true"
The API identifies contract parties, effective dates, termination clauses, payment terms, and key obligations. This enables automated contract review processes and deadline tracking systems.
3. Receipt and Expense Management
Process expense receipts for automated reporting:
curl -X POST https://api.aaido.dev/v1/products/docprocessor \
-H "Authorization: Bearer dp_12345abcdef..." \
-F "file=@receipt.jpg" \
-F "document_type=receipt" \
-F "categorize_expenses=true"
Extracts merchant names, transaction amounts, dates, and expense categories for direct integration into expense management systems.
CI/CD Pipeline Integration
GitHub Actions Integration
Here's a practical example of integrating DocProcessor into a CI/CD pipeline for automated document processing:
name: Process Documents
on:
push:
paths: ['documents/**']
jobs:
process-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Process new documents
env:
DOCPROCESSOR_API_KEY: ${{ secrets.DOCPROCESSOR_API_KEY }}
run: |
for file in documents/*.pdf; do
if [[ -f "$file" ]]; then
echo "Processing $file"
response=$(curl -s -X POST https://api.aaido.dev/v1/products/docprocessor \
-H "Authorization: Bearer $DOCPROCESSOR_API_KEY" \
-F "file=@$file" \
-F "document_type=auto")
# Extract document ID for tracking
doc_id=$(echo $response | jq -r '.document_id')
# Save extracted data
echo $response | jq '.extracted_data' > "processed/${doc_id}.json"
# Validate processing success
status=$(echo $response | jq -r '.status')
if [ "$status" != "completed" ]; then
echo "Processing failed for $file"
exit 1
fi
fi
done
- name: Commit processed data
run: |
git config --local user.email "action@github.com"
git config --local user.name "GitHub Action"
git add processed/
git commit -m "Add processed document data" || exit 0
git push
Environment Variables Setup
Store your API key securely in your CI/CD environment:
# For GitHub Actions
# Add DOCPROCESSOR_API_KEY to repository secrets
# For Jenkins
export DOCPROCESSOR_API_KEY="dp_12345abcdef..."
# For Docker containers
docker run -e DOCPROCESSOR_API_KEY="dp_12345abcdef..." your-app
Error Handling
The API returns standard HTTP status codes with detailed error messages:
{
"error": "invalid_document_type",
"message": "Document type 'unknown' is not supported",
"supported_types": ["auto", "invoice", "contract", "receipt", "form"]
}
Common error codes:
-
400: Invalid request parameters -
401: Invalid or missing API key -
413: File size too large (max 10MB) -
415: Unsupported file format -
429: Rate limit exceeded
Best Practices
- File Size Optimization: Compress PDFs before upload to reduce processing time
- Document Type Hints: Specify document types when known to improve accuracy
- Batch Processing: Process multiple documents in parallel for better throughput
- Confidence Validation: Check confidence scores before using extracted data
- Rate Limiting: Implement proper rate limiting in your applications
The DocProcessor API streamlines document-heavy workflows by automating data extraction and enabling seamless integration into existing systems. For complete API documentation and advanced features, visit https://api.aaido.dev/products/docprocessor.
Top comments (0)