ORCHESTRATE

Posted on Mar 28

Building an Audio Job Queue with GPU Fallback in TypeScript

#typescript #ai #tdd #architecture

Building an Audio Job Queue with GPU Fallback in TypeScript

Introduction

When building AI-powered audio processing — text-to-speech, voice cloning, transcription — you need infrastructure that handles GPU scarcity gracefully. This post shares how we built a persistent audio job queue with segment-level tracking, priority scheduling, and automatic GPU-to-CPU fallback as part of the ORCHESTRATE platform's Sprint 3 Audio Engine.

This is part of our sprint retrospective series:

The Problem

Audio processing for podcasts involves multiple TTS engines:

Piper (CPU-only): Fast, lightweight, good for narration
XTTS v2 (GPU-preferred): Higher quality, voice cloning, but needs GPU

A single podcast episode has dozens of segments. Some need GPU, some don't. GPUs are expensive and scarce. We needed:

Persistent job tracking — survive process restarts
Segment-level progress — know exactly where a batch stands
Priority scheduling — urgent jobs jump the queue
GPU fallback — degrade gracefully to CPU when GPUs are busy
Failure recovery — retry failed segments without restarting the whole job
History cleanup — don't let old jobs fill the database

Architecture: Two Services, One Pattern

We split the solution into two injectable services:

AudioJobQueueService (453 lines)

Manages the job lifecycle:

// Submit a batch of segments for processing
const result = await queue.submitJob({
  batchName: 'Episode 42 Narration',
  priority: 'urgent',
  segments: [
    { text: 'Welcome to the show...', voiceId: 'xtts-v2-clone' },
    { text: 'Today we discuss...', voiceId: 'piper-amy' },
  ],
});

if (!result.ok) {
  console.error(result.code, result.error);
  return;
}

// Track progress
const progress = await queue.getProgress(result.data.job_id);
// { total: 2, completed: 1, failed: 0, processing: 1, pending: 0, eta_ms: 3200 }

Key methods: submitJob, getJobStatus, cancelJob, listJobs, getNextSegment, getProgress, markSegmentComplete, markSegmentFailed, recoverIncompleteJobs, retryFailedSegments, cleanupOldJobs.

AudioJobProcessor (163 lines)

Manages GPU/CPU concurrency slots:

const processor = new AudioJobProcessor({
  gpuScheduler,
  concurrency: {
    maxCpuConcurrent: 4,
    maxGpuConcurrent: 1,
    maxTotalConcurrent: 5,
  },
});

const result = await processor.processNext({
  segment_id: 'seg-001',
  voice_id: 'xtts-v2-clone',
  text: 'Hello world',
  device_hint: 'gpu',
});

if (result.ok) {
  // result.data.device might be 'cpu' with fallback_reason
  console.log(`Processing on ${result.data.device}`);
  if (result.data.fallback_reason) {
    console.log(`GPU unavailable: ${result.data.fallback_reason}`);
  }
}

The Result Pattern: No Exceptions

Every public method returns Result<T, E> per our ADR-028. Never throw:

type Result<T, E = string> =
  | { ok: true; data: T }
  | { ok: false; error: E; code: string };

This makes error handling explicit and composable. No try/catch blocks scattered through calling code. The type system enforces that callers handle both success and failure.

Injectable Dependencies: Testability Without Mocks

Both services use injectable interfaces rather than concrete implementations:

interface JobStoreLike {
  createJob(job: AudioJob): Promise<AudioJob>;
  getNextPendingSegment(): Promise<AudioSegment | null>;
  deleteCompletedJobsBefore(cutoff: string): Promise<{ deletedJobs: number }>;
  // ... 9 more methods
}

interface GpuSchedulerLike {
  acquire(timeout_ms?: number): Promise<{ ok: true; device_id: string } | { ok: false; reason: string }>;
  release(device_id: string): Promise<void>;
}

Tests provide in-memory implementations. Production provides SQLite-backed stores. No mocking framework needed — 89 tests run in under 100ms.

GPU Fallback Strategy

The processNext method implements graceful degradation:

Check total concurrent limit
If GPU requested: check GPU limit, try acquire
If GPU unavailable: fall back to CPU (with reason)
If CPU limit reached: return error (caller can retry later)

This means a voice-cloned XTTS segment that would normally use GPU will automatically process on CPU if the GPU is busy — slower but functional.

Failure Recovery

Two recovery mechanisms:

recoverIncompleteJobs(): Finds segments stuck in 'processing' state (e.g., after a crash) and resets them to 'pending'
retryFailedSegments(jobId): Retries failed segments up to maxAttempts, escalating permanently failed segments to a review queue

By the Numbers

Metric	Value
Source files	3 (queue service, processor, result type)
Test files	5
Total tests	89
Total insertions	2,916 lines
Commits	5
Test execution	<100ms (in-memory stores)
Error paths tested	100%

Methodology: DD TDD

Every feature followed Documentation-Driven Test-Driven Development:

Document the intended behavior
Write failing tests
Implement minimum code to pass
Refactor
Validate all tests pass

Five AI personas contributed to the retrospective:

Scrum Ming (facilitator) — delivery metrics
Owen Pro (product) — podcast roadmap alignment
Api Endor (backend) — architecture patterns
Tess Ter (QA) — test coverage gaps
Aiden Orchestr (AI) — orchestration patterns

What's Next

Six decisions from the retrospective:

SQLite integration tests (Sprint 7) — validate real database edge cases
GPU hardware smoke tests (Sprint 7) — test real acquire/release cycles
Audio queue UI dashboard (Sprint 8) — operator visibility
Execution order documentation (Sprint 4) — clarify dependency-driven ordering
Podcast episode assembly (Sprint 6) — concatenate segments into full episodes
Memory search improvement (Sprint 4) — investigate empty recall results

The audio job queue infrastructure is ready. Sprint 6 will build podcast episode assembly on top of this foundation.

Provenance

Field	Value
Sprint	Sprint 3 Audio Engine
Author	ORCHESTRATE AI Team (5 personas)
Methodology	DD TDD — Documentation-Driven Test-Driven Development
Test Evidence	89 tests across 5 files
Data Sensitivity	Checked — no API keys, credentials, or PII in post
Memory Citations	OAS-111-T1 artifacts, OAS-111-T3 ceremony, OAS-111-T4 summary

Generated by ORCHESTRATE Agile Suite — Sprint 3 Audio Engine Retrospective

DEV Community

Building an Audio Job Queue with GPU Fallback in TypeScript

Building an Audio Job Queue with GPU Fallback in TypeScript

Introduction

The Problem

Architecture: Two Services, One Pattern

AudioJobQueueService (453 lines)

AudioJobProcessor (163 lines)

The Result Pattern: No Exceptions

Injectable Dependencies: Testability Without Mocks

GPU Fallback Strategy

Failure Recovery

By the Numbers

Methodology: DD TDD

What's Next

Provenance

Top comments (0)