DEV Community

Cover image for Turn PDFs into Audiobooks with HazelJS and OpenAI TTS
Muhammad Arslan
Muhammad Arslan

Posted on • Edited on

Turn PDFs into Audiobooks with HazelJS and OpenAI TTS

How we built a production-ready PDF-to-audio conversion pipeline—async jobs, Redis queues, physical file storage, and summary-only mode.

If you like it don't forget to Star HazelJS repository


The Problem: Documents That Speak

Long PDFs—reports, manuals, articles—are hard to consume on the go. Reading on a phone is tedious; listening is natural. We wanted to turn any PDF into an audiobook with minimal setup and production-grade reliability.

The challenges:

  • Scale — A 50-page document can mean hundreds of TTS API calls
  • Reliability — Jobs must survive restarts and handle long runtimes (5+ minutes)
  • Flexibility — Sometimes you want the full doc read aloud; sometimes just a summary
  • Storage — Audio files should live on disk, not bloat Redis

We built @hazeljs/pdf-to-audio to solve these. Here's how it works and how to use it.


What We Built: PDF-to-Audio Pipeline

The pipeline:

  1. Extract — Pull text from the PDF via pdf-parse
  2. Summarize (optional) — GPT generates a 2–4 sentence intro; preserves source language
  3. Chunk — Split text into TTS-sized segments (≤4096 chars for OpenAI)
  4. Convert — Generate speech per chunk with OpenAI TTS
  5. Merge — Concatenate audio buffers into one MP3
  6. Store — Write the file to disk (default: ./data/pdf-to-audio/)

All of this runs in a BullMQ worker so long conversions don't block your API.


Why HazelJS

HazelJS is a TypeScript-first Node.js framework built for AI-native apps. For PDF-to-audio we use:

Package Role
@hazeljs/core Controllers, DI, file upload
@hazeljs/ai OpenAI TTS + GPT for summaries
@hazeljs/queue BullMQ job queue (Redis-backed)
@hazeljs/rag RecursiveTextSplitter for chunking

No custom queue wiring or manual chunking—it's all plug-and-play.


Architecture: Async Submit → Status → Download

The API uses a non-blocking flow:

POST /convert     → 202 { jobId }
GET /status/:id   → { status, progress, message }
GET /download/:id → MP3 file (when completed)
Enter fullscreen mode Exit fullscreen mode

Submit returns immediately; clients poll status and download when ready. Long jobs (e.g. 33 chunks × ~7s each ≈ 4 min) run in the worker without stalling—we use lockDuration: 30min and lockRenewTime: 30s so BullMQ doesn't mark them as stalled.

Physical file storage — Completed audio is written to outputDir (e.g. ./data/pdf-to-audio/{jobId}.mp3) instead of storing base64 in Redis. That reduces memory pressure and makes files easy to serve or archive.


Modes: Full Document vs. Summary Only

Mode Behavior Use case
Default AI summary intro + full document read aloud Audiobook of the whole doc
summaryOnly Only the summary (2–4 sentences) Quick overview in audio
includeSummary: false Full document only No intro, just the content

The summary prompt tells the model to preserve the document's language—no implicit translation.


Quick Start

1. Install

npm install @hazeljs/pdf-to-audio @hazeljs/core @hazeljs/ai @hazeljs/queue @hazeljs/rag ioredis
Enter fullscreen mode Exit fullscreen mode

Requires Redis for the job queue.

2. Add to Your App

import { HazelModule } from '@hazeljs/core';
import { PdfToAudioModule } from '@hazeljs/pdf-to-audio';

@HazelModule({
  imports: [
    PdfToAudioModule.forRoot({
      connection: {
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT || '6379', 10),
      },
      outputDir: './data/pdf-to-audio',
    }),
  ],
})
export class AppModule {}
Enter fullscreen mode Exit fullscreen mode

3. Use the API

# Submit
curl -X POST http://localhost:3000/api/pdf-to-audio/convert \
  -F "file=@report.pdf" \
  -F "voice=alloy"

# Response: {"jobId":"1"}

# Status
curl http://localhost:3000/api/pdf-to-audio/status/1

# Download (when status is "completed")
curl -O http://localhost:3000/api/pdf-to-audio/download/1 -o report.mp3
Enter fullscreen mode Exit fullscreen mode

4. Or Use the CLI

# Full document with summary
hazel pdf-to-audio convert report.pdf --api-url http://localhost:3000 --wait -o report.mp3

# Summary only (2–4 sentences)
hazel pdf-to-audio convert report.pdf --api-url http://localhost:3000 --summary-only --wait -o summary.mp3
Enter fullscreen mode Exit fullscreen mode

Options

Option Description Default
voice TTS voice (alloy, echo, fable, onyx, nova, shimmer) alloy
model TTS model (tts-1, tts-1-hd) tts-1
format Output format (mp3, opus) mp3
includeSummary Add AI summary at the start true
summaryOnly Output only the summary, skip full document false

Production Considerations

  • Redis — Required for the queue; run it locally or in a managed service.
  • OpenAI API key — Set OPENAI_API_KEY for TTS and summaries.
  • File cleanup — Files stay in outputDir; add a cron job or TTL if you need automatic cleanup.
  • Long jobs — The worker is tuned for 30-minute lock duration with 30-second renewal; large PDFs are supported.

Summary

@hazeljs/pdf-to-audio turns PDFs into audiobooks with:

  • Async job flow — Submit, poll, download
  • Physical file storage — No Redis bloat
  • Summary options — Full doc, summary only, or both
  • Language preservation — Summaries stay in the document's language
  • CLI + REST API — Use from the terminal or integrate into your app

If you're building document-to-audio features, give HazelJS and @hazeljs/pdf-to-audio a try. The example app includes a working setup—clone, run Redis, add your OPENAI_API_KEY, and you're set.

Top comments (0)