DEV Community

Mohamed Elsayed Ali
Mohamed Elsayed Ali

Posted on

# How I Built a PDF Processing Pipeline Designed to Scale Toward 10k Users — FastAPI, Inngest & Smart Async Design

I recently built a PDF processing pipeline inside my project Playground.

The goal was simple:

Users upload PDFs, the API responds immediately, and heavy processing happens in the background without blocking request latency.

That sounds straightforward until traffic increases.

A naive implementation works fine for a few users — but starts falling apart when many uploads arrive at once.

In this article, I’ll walk through the architecture, the async design decisions, and a few lessons I learned while building it.

Repository: Playground on GitHub


The Problem

A common first implementation looks something like this:

  • receive upload
  • parse PDF inside the request
  • extract text
  • store results
  • return response

It works — until multiple users upload files concurrently.

The biggest issue is that PDF parsing is not purely async work.

Libraries like pdfplumber perform CPU-heavy parsing. If that happens directly inside an async route, the event loop gets blocked.

That creates a chain reaction:

  • slower response times
  • reduced concurrency
  • request pileups under load

In short:

Async helps with I/O. It does not automatically solve CPU-bound work.


My Solution

Instead of processing inside the request lifecycle, I split the pipeline into two stages.

Request path

The API only handles lightweight operations:

  • validate file type
  • perform idempotency checks
  • upload raw file to S3
  • create database record
  • trigger background job

Then it returns immediately.

Background worker

The worker performs heavy tasks:

  • download from S3
  • scan file
  • parse PDF
  • extract structured data
  • save final result

Architecture

Client
  ↓
FastAPI upload endpoint
  ↓
S3 object storage
  ↓
Inngest event
  ↓
Background worker
  ↓
Database
Enter fullscreen mode Exit fullscreen mode

PDF pipeline

That separation keeps the API responsive even when parsing becomes expensive.


Why Idempotency Matters

One subtle issue in async pipelines is duplicate execution.

A user may retry the upload.
A network timeout may happen.
A background worker may retry after failure.

Without idempotency, the same file can be processed multiple times.

My upload endpoint checks for an existing idempotency key before doing any heavy work.

result = await db.execute(
    select(IdempotencyKey).filter(IdempotencyKey.key == idempotency_key)
)
existing_key = result.scalars().first()

if existing_key and existing_key.response:
    return json.loads(existing_key.response)
Enter fullscreen mode Exit fullscreen mode

That small check prevents duplicate parsing and duplicate storage.

In production systems, this becomes surprisingly important.


Upload Endpoint: Return Fast

The upload route stores the file and immediately schedules background processing.

await inngest_client.send(
    inngest.Event(
        name="pdf/upload.requested",
        data=event_payload,
    )
)

return {
    "file_id": file_record.id,
    "status": FileStatus.PROCESSING.value,
    "message": "PDF uploaded. Processing in background.",
}
Enter fullscreen mode Exit fullscreen mode

The client receives a response immediately and can poll later for status updates.

That means the API is optimized for fast acknowledgement, not heavy computation.


CPU-bound Parsing Needs a Thread Boundary

The worker downloads the file and parses it in the background.

The important part is this:

file_bytes = await asyncio.to_thread(blocking_work)
pdf_data = await asyncio.to_thread(_parse_pdf_bytes, raw_bytes)
Enter fullscreen mode Exit fullscreen mode

Why?

Because:

  • S3 SDK calls are blocking
  • PDF parsing is CPU-heavy

If those ran directly inside async functions, they would block the event loop.

asyncio.to_thread() moves that work off the main async execution path.

That was one of the most useful lessons from building this pipeline.


Background Workflow

The worker itself follows a predictable sequence:

  1. Download file from S3
  2. Scan file
  3. Parse PDF pages
  4. Extract structured metadata
  5. Save results to database

The parsing step returns:

  • page-by-page text
  • extracted tables
  • full combined text

That makes the output usable for both search and downstream LLM enrichment.


Error Handling and Audit Logging

One thing I cared about early was observability.

Every major step writes audit events:

  • upload received
  • validation failed
  • S3 stored
  • processing started
  • worker completed
  • worker failed

That made debugging much easier than relying only on exception traces.

When async systems grow, visibility becomes just as important as the code itself.


Lessons Learned

1. Async is not magic

Async improves I/O concurrency.

It does not automatically make CPU-heavy workloads scalable.


2. Keep request handlers thin

Request handlers should do only what is necessary to accept work.

Heavy processing belongs in workers.


3. Idempotency is underrated

Retries are normal in distributed systems.

Designing for safe retries makes the whole pipeline more reliable.


4. Background jobs improve system stability

Moving parsing out of request-response flow improves:

  • latency
  • resilience
  • operational predictability

What I’d Improve Next

A few things I’d like to add next:

  • dedicated worker concurrency limits
  • chunked parsing for very large PDFs
  • metrics around parse duration and queue latency
  • smarter retry classification for transient failures

Final Thoughts

This project was a useful reminder that scaling is often less about “making code faster” and more about putting work in the right place.

The biggest architectural decision was simple:

return fast, process later.

That one decision changed the entire behavior of the system.


Full repository:

Playground GitHub repository

Top comments (0)