I recently built a PDF processing pipeline inside my project Playground.
The goal was simple:
Users upload PDFs, the API responds immediately, and heavy processing happens in the background without blocking request latency.
That sounds straightforward until traffic increases.
A naive implementation works fine for a few users — but starts falling apart when many uploads arrive at once.
In this article, I’ll walk through the architecture, the async design decisions, and a few lessons I learned while building it.
Repository: Playground on GitHub
The Problem
A common first implementation looks something like this:
- receive upload
- parse PDF inside the request
- extract text
- store results
- return response
It works — until multiple users upload files concurrently.
The biggest issue is that PDF parsing is not purely async work.
Libraries like pdfplumber perform CPU-heavy parsing. If that happens directly inside an async route, the event loop gets blocked.
That creates a chain reaction:
- slower response times
- reduced concurrency
- request pileups under load
In short:
Async helps with I/O. It does not automatically solve CPU-bound work.
My Solution
Instead of processing inside the request lifecycle, I split the pipeline into two stages.
Request path
The API only handles lightweight operations:
- validate file type
- perform idempotency checks
- upload raw file to S3
- create database record
- trigger background job
Then it returns immediately.
Background worker
The worker performs heavy tasks:
- download from S3
- scan file
- parse PDF
- extract structured data
- save final result
Architecture
Client
↓
FastAPI upload endpoint
↓
S3 object storage
↓
Inngest event
↓
Background worker
↓
Database
That separation keeps the API responsive even when parsing becomes expensive.
Why Idempotency Matters
One subtle issue in async pipelines is duplicate execution.
A user may retry the upload.
A network timeout may happen.
A background worker may retry after failure.
Without idempotency, the same file can be processed multiple times.
My upload endpoint checks for an existing idempotency key before doing any heavy work.
result = await db.execute(
select(IdempotencyKey).filter(IdempotencyKey.key == idempotency_key)
)
existing_key = result.scalars().first()
if existing_key and existing_key.response:
return json.loads(existing_key.response)
That small check prevents duplicate parsing and duplicate storage.
In production systems, this becomes surprisingly important.
Upload Endpoint: Return Fast
The upload route stores the file and immediately schedules background processing.
await inngest_client.send(
inngest.Event(
name="pdf/upload.requested",
data=event_payload,
)
)
return {
"file_id": file_record.id,
"status": FileStatus.PROCESSING.value,
"message": "PDF uploaded. Processing in background.",
}
The client receives a response immediately and can poll later for status updates.
That means the API is optimized for fast acknowledgement, not heavy computation.
CPU-bound Parsing Needs a Thread Boundary
The worker downloads the file and parses it in the background.
The important part is this:
file_bytes = await asyncio.to_thread(blocking_work)
pdf_data = await asyncio.to_thread(_parse_pdf_bytes, raw_bytes)
Why?
Because:
- S3 SDK calls are blocking
- PDF parsing is CPU-heavy
If those ran directly inside async functions, they would block the event loop.
asyncio.to_thread() moves that work off the main async execution path.
That was one of the most useful lessons from building this pipeline.
Background Workflow
The worker itself follows a predictable sequence:
- Download file from S3
- Scan file
- Parse PDF pages
- Extract structured metadata
- Save results to database
The parsing step returns:
- page-by-page text
- extracted tables
- full combined text
That makes the output usable for both search and downstream LLM enrichment.
Error Handling and Audit Logging
One thing I cared about early was observability.
Every major step writes audit events:
- upload received
- validation failed
- S3 stored
- processing started
- worker completed
- worker failed
That made debugging much easier than relying only on exception traces.
When async systems grow, visibility becomes just as important as the code itself.
Lessons Learned
1. Async is not magic
Async improves I/O concurrency.
It does not automatically make CPU-heavy workloads scalable.
2. Keep request handlers thin
Request handlers should do only what is necessary to accept work.
Heavy processing belongs in workers.
3. Idempotency is underrated
Retries are normal in distributed systems.
Designing for safe retries makes the whole pipeline more reliable.
4. Background jobs improve system stability
Moving parsing out of request-response flow improves:
- latency
- resilience
- operational predictability
What I’d Improve Next
A few things I’d like to add next:
- dedicated worker concurrency limits
- chunked parsing for very large PDFs
- metrics around parse duration and queue latency
- smarter retry classification for transient failures
Final Thoughts
This project was a useful reminder that scaling is often less about “making code faster” and more about putting work in the right place.
The biggest architectural decision was simple:
return fast, process later.
That one decision changed the entire behavior of the system.
Full repository:
Playground GitHub repository

Top comments (0)