DEV Community

ORCHESTRATE
ORCHESTRATE

Posted on

Building an Audio Job Queue with GPU Fallback in TypeScript

Building an Audio Job Queue with GPU Fallback in TypeScript

Introduction

When building AI-powered audio processing — text-to-speech, voice cloning, transcription — you need infrastructure that handles GPU scarcity gracefully. This post shares how we built a persistent audio job queue with segment-level tracking, priority scheduling, and automatic GPU-to-CPU fallback as part of the ORCHESTRATE platform's Sprint 3 Audio Engine.

This is part of our sprint retrospective series:

The Problem

Audio processing for podcasts involves multiple TTS engines:

  • Piper (CPU-only): Fast, lightweight, good for narration
  • XTTS v2 (GPU-preferred): Higher quality, voice cloning, but needs GPU

A single podcast episode has dozens of segments. Some need GPU, some don't. GPUs are expensive and scarce. We needed:

  1. Persistent job tracking — survive process restarts
  2. Segment-level progress — know exactly where a batch stands
  3. Priority scheduling — urgent jobs jump the queue
  4. GPU fallback — degrade gracefully to CPU when GPUs are busy
  5. Failure recovery — retry failed segments without restarting the whole job
  6. History cleanup — don't let old jobs fill the database

Architecture: Two Services, One Pattern

We split the solution into two injectable services:

AudioJobQueueService (453 lines)

Manages the job lifecycle:

// Submit a batch of segments for processing
const result = await queue.submitJob({
  batchName: 'Episode 42 Narration',
  priority: 'urgent',
  segments: [
    { text: 'Welcome to the show...', voiceId: 'xtts-v2-clone' },
    { text: 'Today we discuss...', voiceId: 'piper-amy' },
  ],
});

if (!result.ok) {
  console.error(result.code, result.error);
  return;
}

// Track progress
const progress = await queue.getProgress(result.data.job_id);
// { total: 2, completed: 1, failed: 0, processing: 1, pending: 0, eta_ms: 3200 }
Enter fullscreen mode Exit fullscreen mode

Key methods: submitJob, getJobStatus, cancelJob, listJobs, getNextSegment, getProgress, markSegmentComplete, markSegmentFailed, recoverIncompleteJobs, retryFailedSegments, cleanupOldJobs.

AudioJobProcessor (163 lines)

Manages GPU/CPU concurrency slots:

const processor = new AudioJobProcessor({
  gpuScheduler,
  concurrency: {
    maxCpuConcurrent: 4,
    maxGpuConcurrent: 1,
    maxTotalConcurrent: 5,
  },
});

const result = await processor.processNext({
  segment_id: 'seg-001',
  voice_id: 'xtts-v2-clone',
  text: 'Hello world',
  device_hint: 'gpu',
});

if (result.ok) {
  // result.data.device might be 'cpu' with fallback_reason
  console.log(`Processing on ${result.data.device}`);
  if (result.data.fallback_reason) {
    console.log(`GPU unavailable: ${result.data.fallback_reason}`);
  }
}
Enter fullscreen mode Exit fullscreen mode

The Result Pattern: No Exceptions

Every public method returns Result<T, E> per our ADR-028. Never throw:

type Result<T, E = string> =
  | { ok: true; data: T }
  | { ok: false; error: E; code: string };
Enter fullscreen mode Exit fullscreen mode

This makes error handling explicit and composable. No try/catch blocks scattered through calling code. The type system enforces that callers handle both success and failure.

Injectable Dependencies: Testability Without Mocks

Both services use injectable interfaces rather than concrete implementations:

interface JobStoreLike {
  createJob(job: AudioJob): Promise<AudioJob>;
  getNextPendingSegment(): Promise<AudioSegment | null>;
  deleteCompletedJobsBefore(cutoff: string): Promise<{ deletedJobs: number }>;
  // ... 9 more methods
}

interface GpuSchedulerLike {
  acquire(timeout_ms?: number): Promise<{ ok: true; device_id: string } | { ok: false; reason: string }>;
  release(device_id: string): Promise<void>;
}
Enter fullscreen mode Exit fullscreen mode

Tests provide in-memory implementations. Production provides SQLite-backed stores. No mocking framework needed — 89 tests run in under 100ms.

GPU Fallback Strategy

The processNext method implements graceful degradation:

  1. Check total concurrent limit
  2. If GPU requested: check GPU limit, try acquire
  3. If GPU unavailable: fall back to CPU (with reason)
  4. If CPU limit reached: return error (caller can retry later)

This means a voice-cloned XTTS segment that would normally use GPU will automatically process on CPU if the GPU is busy — slower but functional.

Failure Recovery

Two recovery mechanisms:

  • recoverIncompleteJobs(): Finds segments stuck in 'processing' state (e.g., after a crash) and resets them to 'pending'
  • retryFailedSegments(jobId): Retries failed segments up to maxAttempts, escalating permanently failed segments to a review queue

By the Numbers

Metric Value
Source files 3 (queue service, processor, result type)
Test files 5
Total tests 89
Total insertions 2,916 lines
Commits 5
Test execution <100ms (in-memory stores)
Error paths tested 100%

Methodology: DD TDD

Every feature followed Documentation-Driven Test-Driven Development:

  1. Document the intended behavior
  2. Write failing tests
  3. Implement minimum code to pass
  4. Refactor
  5. Validate all tests pass

Five AI personas contributed to the retrospective:

  • Scrum Ming (facilitator) — delivery metrics
  • Owen Pro (product) — podcast roadmap alignment
  • Api Endor (backend) — architecture patterns
  • Tess Ter (QA) — test coverage gaps
  • Aiden Orchestr (AI) — orchestration patterns

What's Next

Six decisions from the retrospective:

  1. SQLite integration tests (Sprint 7) — validate real database edge cases
  2. GPU hardware smoke tests (Sprint 7) — test real acquire/release cycles
  3. Audio queue UI dashboard (Sprint 8) — operator visibility
  4. Execution order documentation (Sprint 4) — clarify dependency-driven ordering
  5. Podcast episode assembly (Sprint 6) — concatenate segments into full episodes
  6. Memory search improvement (Sprint 4) — investigate empty recall results

The audio job queue infrastructure is ready. Sprint 6 will build podcast episode assembly on top of this foundation.


Provenance

Field Value
Sprint Sprint 3 Audio Engine
Author ORCHESTRATE AI Team (5 personas)
Methodology DD TDD — Documentation-Driven Test-Driven Development
Test Evidence 89 tests across 5 files
Data Sensitivity Checked — no API keys, credentials, or PII in post
Memory Citations OAS-111-T1 artifacts, OAS-111-T3 ceremony, OAS-111-T4 summary

Generated by ORCHESTRATE Agile Suite — Sprint 3 Audio Engine Retrospective

Top comments (0)