Talal Bazerbachi

Posted on Apr 8

From Timeouts to Savings: How we optimized 24-page PDF parsing with Gemini & OpenRouter

#gemini #ai #llm #performance

I'm building Parsli, a document parser SaaS that's powered by Google Gemini for intelligent document processing.
Recently, a user hit a wall trying to process large, scanned PDFs. Here is the play-by-play of how we moved from a 4-minute timeout to a cost-effective, reliable pipeline.

The Problem: The Single-Pass Failure

Initially, we tried the "one big request" approach.

Step 1: Sent a 24-page scanned PDF as a single base64 blob to Gemini 2.5 Pro.
Result: 4+ minute hangs and serverless timeouts.
Step 2: Moved to a background worker (300s timeout). Still failed.

Key Lesson: Large multi-page documents cannot be treated as a single context window item if you want reliability.

The Pivot: Per-Page Chunking

We decided to split the PDF into 24 individual pages and process them in parallel.

1. The Cost Trap (Structured JSON)

Asking for structured JSON per page worked, but:

Cost: ~$3.12 per document.
Token bloat: 19,000 output tokens for simple JSON.

2. Solving the "502 Bad Gateway"

We noticed OpenRouter/Vertex errors.

Fix: Added provider routing to prefer Google AI Studio over Vertex. The errors vanished.

3. The "Markdown" Breakthrough

We changed the prompt from "Extract JSON" to "Convert to Markdown."

Result: Output tokens dropped from 12,000 to 300 per page.
Accuracy: Verification showed the OCR quality remained high.

Consolidating the Results

We tried using Gemini Flash to merge the 24 pages back into a single JSON. It failed to handle the volume.

Current Production Solution:
For "extract everything" requests, we now skip consolidation. The concatenated per-page Markdown is the output. It preserves layout and tables perfectly without the LLM overhead.

Research & What's Next

While our current stack uses Gemini, our research suggests other models might be even faster:

See the Benchmarks we are watching

HunyuanOCR (0.9B): Reportedly beats Gemini Pro at OCR fidelity.
PaddleOCR-VL: Claims 253% faster throughput than competitors.
Mistral OCR 3: Competitive pricing on Vertex AI ($1-2/1k pages).

Future Routing Strategy

We are testing a routing logic based on input size:

<10K tokens: GPT-4o Nano
>10K tokens: Claude Haiku

Are you building document parsers? I'd love to hear how you handle large file timeouts in the comments!

Check out Parsli.co

DEV Community