Muhammad Arslan

Posted on Feb 17 • Edited on Mar 14

Turn PDFs into Audiobooks with HazelJS and OpenAI TTS

#automation #javascript #openai #showdev

How we built a production-ready PDF-to-audio conversion pipeline—async jobs, Redis queues, physical file storage, and summary-only mode.

If you like it don't forget to Star HazelJS repository

The Problem: Documents That Speak

Long PDFs—reports, manuals, articles—are hard to consume on the go. Reading on a phone is tedious; listening is natural. We wanted to turn any PDF into an audiobook with minimal setup and production-grade reliability.

The challenges:

Scale — A 50-page document can mean hundreds of TTS API calls
Reliability — Jobs must survive restarts and handle long runtimes (5+ minutes)
Flexibility — Sometimes you want the full doc read aloud; sometimes just a summary
Storage — Audio files should live on disk, not bloat Redis

We built @hazeljs/pdf-to-audio to solve these. Here's how it works and how to use it.

What We Built: PDF-to-Audio Pipeline

The pipeline:

Extract — Pull text from the PDF via pdf-parse
Summarize (optional) — GPT generates a 2–4 sentence intro; preserves source language
Chunk — Split text into TTS-sized segments (≤4096 chars for OpenAI)
Convert — Generate speech per chunk with OpenAI TTS
Merge — Concatenate audio buffers into one MP3
Store — Write the file to disk (default: ./data/pdf-to-audio/)

All of this runs in a BullMQ worker so long conversions don't block your API.

Why HazelJS

HazelJS is a TypeScript-first Node.js framework built for AI-native apps. For PDF-to-audio we use:

Package	Role
@hazeljs/core	Controllers, DI, file upload
@hazeljs/ai	OpenAI TTS + GPT for summaries
@hazeljs/queue	BullMQ job queue (Redis-backed)
@hazeljs/rag	RecursiveTextSplitter for chunking

No custom queue wiring or manual chunking—it's all plug-and-play.

Architecture: Async Submit → Status → Download

The API uses a non-blocking flow:

POST /convert     → 202 { jobId }
GET /status/:id   → { status, progress, message }
GET /download/:id → MP3 file (when completed)

Submit returns immediately; clients poll status and download when ready. Long jobs (e.g. 33 chunks × ~7s each ≈ 4 min) run in the worker without stalling—we use lockDuration: 30min and lockRenewTime: 30s so BullMQ doesn't mark them as stalled.

Physical file storage — Completed audio is written to outputDir (e.g. ./data/pdf-to-audio/{jobId}.mp3) instead of storing base64 in Redis. That reduces memory pressure and makes files easy to serve or archive.

Modes: Full Document vs. Summary Only

Mode	Behavior	Use case
Default	AI summary intro + full document read aloud	Audiobook of the whole doc
summaryOnly	Only the summary (2–4 sentences)	Quick overview in audio
includeSummary: false	Full document only	No intro, just the content

The summary prompt tells the model to preserve the document's language—no implicit translation.

Quick Start

1. Install

npm install @hazeljs/pdf-to-audio @hazeljs/core @hazeljs/ai @hazeljs/queue @hazeljs/rag ioredis

Requires Redis for the job queue.

2. Add to Your App

import { HazelModule } from '@hazeljs/core';
import { PdfToAudioModule } from '@hazeljs/pdf-to-audio';

@HazelModule({
  imports: [
    PdfToAudioModule.forRoot({
      connection: {
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT || '6379', 10),
      },
      outputDir: './data/pdf-to-audio',
    }),
  ],
})
export class AppModule {}

3. Use the API

# Submit
curl -X POST http://localhost:3000/api/pdf-to-audio/convert \
  -F "file=@report.pdf" \
  -F "voice=alloy"

# Response: {"jobId":"1"}

# Status
curl http://localhost:3000/api/pdf-to-audio/status/1

# Download (when status is "completed")
curl -O http://localhost:3000/api/pdf-to-audio/download/1 -o report.mp3

4. Or Use the CLI

# Full document with summary
hazel pdf-to-audio convert report.pdf --api-url http://localhost:3000 --wait -o report.mp3

# Summary only (2–4 sentences)
hazel pdf-to-audio convert report.pdf --api-url http://localhost:3000 --summary-only --wait -o summary.mp3

Options

Option	Description	Default
voice	TTS voice (alloy, echo, fable, onyx, nova, shimmer)	alloy
model	TTS model (tts-1, tts-1-hd)	tts-1
format	Output format (mp3, opus)	mp3
includeSummary	Add AI summary at the start	true
summaryOnly	Output only the summary, skip full document	false

Production Considerations

Redis — Required for the queue; run it locally or in a managed service.
OpenAI API key — Set OPENAI_API_KEY for TTS and summaries.
File cleanup — Files stay in outputDir; add a cron job or TTL if you need automatic cleanup.
Long jobs — The worker is tuned for 30-minute lock duration with 30-second renewal; large PDFs are supported.

Summary

@hazeljs/pdf-to-audio turns PDFs into audiobooks with:

Async job flow — Submit, poll, download
Physical file storage — No Redis bloat
Summary options — Full doc, summary only, or both
Language preservation — Summaries stay in the document's language
CLI + REST API — Use from the terminal or integrate into your app

If you're building document-to-audio features, give HazelJS and @hazeljs/pdf-to-audio a try. The example app includes a working setup—clone, run Redis, add your OPENAI_API_KEY, and you're set.

DEV Community