Every developer has faced this: you have a PDF invoice, a scanned receipt, or a resume — and you need the data in JSON. The traditional approach involves OCR libraries, regex parsing, and lots of edge-case handling.
I built ScoutExtract to solve this with a single API call.
How It Works
Send a POST request with:
- Your document (text, PDF as base64, or image as base64)
- A schema describing the fields you want
Get back typed JSON with confidence scores for every field.
Quick Example — Invoice Parsing
python
import requests
response = requests.post(
"https://api.ramlabs.dev/v1/extract",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"document": """
INVOICE #2024-0892
Vendor: CloudStack Solutions Inc.
Date: March 15, 2024
API Integration 1 $2,500.00
Cloud Hosting 3 $199.00
Subtotal: $3,097.00
Tax (8.875%): $274.86
Total: $3,371.86
""",
"schema": "invoice"
}
)
data = response.json()["data"]
print(f"Invoice: {data['invoice_number']['value']}") # 2024-0892
print(f"Total: ${data['total']['value']}") # 3371.86
print(f"Confidence: {data['total']['confidence']}") # 0.99
Pre-built Schemas
ScoutExtract includes schemas for common document types:
Schema Use Case
invoice Invoices, bills, purchase orders
receipt Store receipts, transaction records
resume Resumes, CVs
contract Agreements, legal contracts
Custom Schemas
Don't see your document type? Define your own:
custom_schema = {
"product_name": {"type": "string"},
"price_usd": {"type": "number"},
"in_stock": {"type": "boolean"},
"features": {
"type": "array",
"items": {"type": "string"}
}
}
response = requests.post(
"https://api.ramlabs.dev/v1/extract",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"document": product_page_text,
"schema": custom_schema
}
)
PDF Support
Extract from PDF files by sending base64-encoded content:
import base64
with open("invoice.pdf", "rb") as f:
pdf_b64 = base64.b64encode(f.read()).decode()
response = requests.post(
"https://api.ramlabs.dev/v1/extract",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"document": pdf_b64,
"documentType": "pdf",
"schema": "invoice"
}
)
Confidence Scores
Every extracted field includes a confidence score (0.0 to 1.0). This enables smart automation:
data = response.json()["data"]
for field_name, field_data in data.items():
confidence = field_data["confidence"]
if confidence > 0.9:
save_to_database(field_name, field_data["value"])
elif confidence > 0.7:
queue_for_review(field_name, field_data["value"], confidence)
else:
assign_to_human(field_name, field_data)
Pricing
Free: 25 extractions/month (no credit card)
Starter: $49/mo — 1,000 extractions
Pro: $199/mo — 5,000 extractions
Scale: $499/mo — 25,000 extractions
Links
Website: extract.ramlabs.dev
Docs: extract.ramlabs.dev/docs
GitHub: github.com/ramlabsdev/scoutextract-sdk
Blog: extract.ramlabs.dev/blog
Would love to hear what document types you'd find most useful. Drop a comment!
Top comments (0)