How we built a production-ready PDF-to-audio conversion pipeline—async jobs, Redis queues, physical file storage, and summary-only mode.
If you like it don't forget to Star HazelJS repository
The Problem: Documents That Speak
Long PDFs—reports, manuals, articles—are hard to consume on the go. Reading on a phone is tedious; listening is natural. We wanted to turn any PDF into an audiobook with minimal setup and production-grade reliability.
The challenges:
- Scale — A 50-page document can mean hundreds of TTS API calls
- Reliability — Jobs must survive restarts and handle long runtimes (5+ minutes)
- Flexibility — Sometimes you want the full doc read aloud; sometimes just a summary
- Storage — Audio files should live on disk, not bloat Redis
We built @hazeljs/pdf-to-audio to solve these. Here's how it works and how to use it.
What We Built: PDF-to-Audio Pipeline
The pipeline:
- Extract — Pull text from the PDF via pdf-parse
- Summarize (optional) — GPT generates a 2–4 sentence intro; preserves source language
- Chunk — Split text into TTS-sized segments (≤4096 chars for OpenAI)
- Convert — Generate speech per chunk with OpenAI TTS
- Merge — Concatenate audio buffers into one MP3
-
Store — Write the file to disk (default:
./data/pdf-to-audio/)
All of this runs in a BullMQ worker so long conversions don't block your API.
Why HazelJS
HazelJS is a TypeScript-first Node.js framework built for AI-native apps. For PDF-to-audio we use:
| Package | Role |
|---|---|
| @hazeljs/core | Controllers, DI, file upload |
| @hazeljs/ai | OpenAI TTS + GPT for summaries |
| @hazeljs/queue | BullMQ job queue (Redis-backed) |
| @hazeljs/rag | RecursiveTextSplitter for chunking |
No custom queue wiring or manual chunking—it's all plug-and-play.
Architecture: Async Submit → Status → Download
The API uses a non-blocking flow:
POST /convert → 202 { jobId }
GET /status/:id → { status, progress, message }
GET /download/:id → MP3 file (when completed)
Submit returns immediately; clients poll status and download when ready. Long jobs (e.g. 33 chunks × ~7s each ≈ 4 min) run in the worker without stalling—we use lockDuration: 30min and lockRenewTime: 30s so BullMQ doesn't mark them as stalled.
Physical file storage — Completed audio is written to outputDir (e.g. ./data/pdf-to-audio/{jobId}.mp3) instead of storing base64 in Redis. That reduces memory pressure and makes files easy to serve or archive.
Modes: Full Document vs. Summary Only
| Mode | Behavior | Use case |
|---|---|---|
| Default | AI summary intro + full document read aloud | Audiobook of the whole doc |
| summaryOnly | Only the summary (2–4 sentences) | Quick overview in audio |
| includeSummary: false | Full document only | No intro, just the content |
The summary prompt tells the model to preserve the document's language—no implicit translation.
Quick Start
1. Install
npm install @hazeljs/pdf-to-audio @hazeljs/core @hazeljs/ai @hazeljs/queue @hazeljs/rag ioredis
Requires Redis for the job queue.
2. Add to Your App
import { HazelModule } from '@hazeljs/core';
import { PdfToAudioModule } from '@hazeljs/pdf-to-audio';
@HazelModule({
imports: [
PdfToAudioModule.forRoot({
connection: {
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT || '6379', 10),
},
outputDir: './data/pdf-to-audio',
}),
],
})
export class AppModule {}
3. Use the API
# Submit
curl -X POST http://localhost:3000/api/pdf-to-audio/convert \
-F "file=@report.pdf" \
-F "voice=alloy"
# Response: {"jobId":"1"}
# Status
curl http://localhost:3000/api/pdf-to-audio/status/1
# Download (when status is "completed")
curl -O http://localhost:3000/api/pdf-to-audio/download/1 -o report.mp3
4. Or Use the CLI
# Full document with summary
hazel pdf-to-audio convert report.pdf --api-url http://localhost:3000 --wait -o report.mp3
# Summary only (2–4 sentences)
hazel pdf-to-audio convert report.pdf --api-url http://localhost:3000 --summary-only --wait -o summary.mp3
Options
| Option | Description | Default |
|---|---|---|
| voice | TTS voice (alloy, echo, fable, onyx, nova, shimmer) | alloy |
| model | TTS model (tts-1, tts-1-hd) | tts-1 |
| format | Output format (mp3, opus) | mp3 |
| includeSummary | Add AI summary at the start | true |
| summaryOnly | Output only the summary, skip full document | false |
Production Considerations
- Redis — Required for the queue; run it locally or in a managed service.
-
OpenAI API key — Set
OPENAI_API_KEYfor TTS and summaries. -
File cleanup — Files stay in
outputDir; add a cron job or TTL if you need automatic cleanup. - Long jobs — The worker is tuned for 30-minute lock duration with 30-second renewal; large PDFs are supported.
Summary
@hazeljs/pdf-to-audio turns PDFs into audiobooks with:
- Async job flow — Submit, poll, download
- Physical file storage — No Redis bloat
- Summary options — Full doc, summary only, or both
- Language preservation — Summaries stay in the document's language
- CLI + REST API — Use from the terminal or integrate into your app
If you're building document-to-audio features, give HazelJS and @hazeljs/pdf-to-audio a try. The example app includes a working setup—clone, run Redis, add your OPENAI_API_KEY, and you're set.
Top comments (0)