DEV Community

AI Engine
AI Engine

Posted on • Originally published at ai-engine.net

Image to Text API: Extract Text from Photos with Code Examples

Turning a document, receipt, or handwritten note into machine-readable text used to require heavy on-premise software. Today, an OCR API lets you extract text from any image with a single HTTP request.

Why Use an OCR API?

Open-source engines like Tesseract demand careful preprocessing — deskewing, binarization, language tuning — before they produce usable output. A cloud OCR API handles all of that behind the scenes.

  • No infrastructure — Skip GPU provisioning and model management
  • Multilingual — Dozens of languages and scripts out of the box
  • Handwriting recognition — Deep-learning models read cursive and messy handwriting
  • Structured output — Bounding boxes, line-level text, and confidence values

Code Example

import requests

url = "https://ocr-wizard.p.rapidapi.com/ocr"
headers = {
    "x-rapidapi-host": "ocr-wizard.p.rapidapi.com",
    "x-rapidapi-key": "YOUR_API_KEY",
}

with open("document.jpg", "rb") as f:
    response = requests.post(
        url,
        headers=headers,
        files={"image": ("doc.jpg", f, "image/jpeg")},
    )

data = response.json()
print(f"Language: {data['body']['detectedLanguage']}")
print(f"Text: {data['body']['fullText']}")

for word in data["body"]["annotations"]:
    print(f"  '{word['text']}' at {word['boundingPoly']}")
Enter fullscreen mode Exit fullscreen mode

The API returns the full extracted text, detected language, and word-level bounding boxes — useful for building searchable PDFs or overlaying highlights.

Use Cases

  • Receipt/invoice scanning — Parse totals, dates, and vendor names directly into accounting software
  • Document digitization — Convert scanned contracts or medical records into searchable text at scale
  • Handwriting-to-text — Students photograph handwritten homework and get a typed transcript
  • License plate/ID reading — Automate identity verification or parking management

Best Practices

  1. Provide clear, well-lit images — shadows and glare degrade recognition quality
  2. Let the API detect the language — works reliably across dozens of languages
  3. Crop to the region of interest — reduces noise and bandwidth
  4. Use word-level bounding boxes for searchable PDFs or region-specific extraction
  5. Batch with concurrency (5-10 at a time) for large document sets

👉 Read the full guide with cURL, Python, and JavaScript examples

Top comments (0)