Umesh Malik

Posted on Jun 12 • Edited on Jul 20 • Originally published at umesh-malik.com

How I Built a Full Audio/Video Streaming Microservice in One Day with Claude Fable 5 Auto Mode

#claudefable5 #claudecode #aicodingagents #aws

Last week I did something I would have called irresponsible a year ago: I handed an entire production microservice — a full audio/video streaming service with real-time delivery, AWS infrastructure, security, CI/CD, and data migrations — to Claude Fable 5 running in auto mode, and I shipped it the same day.

Not a prototype. Not a demo repo with a TODO: auth comment. A service that today sits in front of every video and audio file my other services used to serve as raw, full-size downloads — now delivered as adaptive HLS streams with signed playback, sub-second startup, and a live-streaming path.

My total contribution was a written brief and 17 design decisions. Fable wrote the PRD, the TRD, the high-level and low-level designs, the Terraform, the backend, the frontend player, the GitHub Actions pipelines, the migration scripts that transcoded my existing media library, and the runbooks my other services now use to integrate. If you searched for how to build a microservice with an AI agent, or you're wondering whether Claude Fable 5 auto mode is actually different from babysitting an autocomplete — this is the full, honest teardown.

TL;DR

Claude Fable 5 in auto mode built a complete HLS audio/video streaming microservice in under a day — AWS resources, security hardening, backend, frontend player, CI/CD, and integration runbooks included.
It wrote PRD → TRD → HLD → LLD before writing a single line of code, and paused only to ask me real design questions: HLS vs DASH, signed cookies vs signed URLs, IVS vs MediaLive for the realtime path.
The architecture it chose is boring in the best way: S3 + MediaConvert + CloudFront signed cookies for on-demand media, AWS IVS for real-time streams, a small Lambda control plane, all provisioned by Terraform.
It also wrote idempotent migration scripts that transcoded my entire back catalog of raw videos and audio files — with a DynamoDB state table, concurrency caps, and verification.
My job changed from typing code to making decisions and reviewing diffs. That's the actual story of agentic coding in 2026: the bottleneck moved from implementation to judgment.

What "Auto Mode" Means (and What It Doesn't)

Claude Fable 5 is Anthropic's newest model tier — the Mythos-class model that sits above Opus — and inside Claude Code, "auto mode" means the agent runs the full loop autonomously: it plans, edits files, runs commands, executes tests, deploys infrastructure, and keeps going until the task is done or it genuinely needs a human decision. You're not approving every tool call. You're not pasting code between windows. You set the destination and the constraints; the agent drives.

The crucial nuance: auto mode is not "no human input." It's minimal, high-leverage human input. Across nine hours, Fable stopped me 17 times — and every single stop was a question only I could answer: a product trade-off, a cost ceiling, a security posture choice. It never once asked me "how do I configure CloudFront?" It asked me "do you want playback URLs to be shareable, or locked to the session?" Those are very different questions, and the second one is the one that actually deserves my time.

💡 Key insight: The quality bar of an agentic build is set by the quality of the questions the agent asks you. Fable 5 asks product questions, not syntax questions — that's the generational difference.

The Starting Point: Raw Files Pretending to Be Streaming

Here's the embarrassing "before" picture, because every good case study needs one.

I run several services that handle user-facing media — course recordings, podcast-style audio, screen captures. The original implementation was the one every team ships first: media files uploaded to S3, served back through the app as raw, full-size files. A GET /files/{id} endpoint, a presigned URL, and an HTML <video> tag pointed at a 400 MB MP4.

It worked, in the way a tent works as a house. Here's the same media library, before and after the one-day build:

The fix is well understood: transcode to HLS (HTTP Live Streaming) with an adaptive bitrate ladder, serve segments from a CDN, sign playback, and use a managed service for live. The reason I hadn't done it: done properly, it's a solid two-to-three week project across infra, backend, frontend, and a scary migration of existing content. That estimate is what Fable 5 deleted.

The One-Day Timeline

Here's how the day actually broke down. Times are approximate; the shape is exact.

Docs Before Code: The Part Everyone Skips

This is the section I most want you to steal, because it's the highest-leverage behavior in the whole workflow — and it costs nothing.

Before touching code, Fable wrote four documents into the repo, in order, and made me approve each one:

PRD (Product Requirements Document) — what the service does and for whom: VOD playback, live streams, integration contract for sibling services, explicit non-goals (no DRM in v1, no user-generated live streams).
TRD (Technical Requirements Document) — the measurable bar: time-to-first-frame under 1 second on broadband, live glass-to-glass latency under 5 seconds, playback URLs unusable after expiry, migration must be resumable and verifiable.
HLD (High-Level Design) — the architecture diagram, the AWS services chosen and the ones rejected (with reasons), data flow for upload, playback, live, and migration.
LLD (Low-Level Design) — DynamoDB key design, every API route with request/response shapes, IAM policy boundaries per function, error taxonomy, the exact MediaConvert ladder.

If you've read my piece on spec-driven development with AI agents, you know I'm already convinced specs are the steering wheel for agentic coding. What Fable 5 adds is that the agent now writes the spec itself and interrogates you against it. The TRD review is where I caught the one thing I'd have regretted: the first draft proposed signed URLs per segment. I pushed back — per-segment signing breaks CDN cache efficiency — and Fable switched the design to CloudFront signed cookies scoped to a playback session, then updated the LLD and the threat notes to match, unprompted.

💡 Key insight: Reviewing a 2-page TRD takes 10 minutes and catches architecture mistakes. Reviewing 4,000 lines of generated code to find the same mistake takes a day. Auto mode works because of the documents, not despite them.

The Architecture Fable Built

The short answer: a serverless control plane around managed media services. Fable's HLD argued — correctly — that in 2026 you should not be running your own transcoders or packagers for this workload class, and every component it picked is the boring, durable choice:

S3 (two private buckets) — a mezzanine bucket for original uploads and a packaged bucket for HLS output. Both with Block Public Access on, KMS encryption at rest, and lifecycle rules that expire failed multipart uploads.
AWS Elemental MediaConvert — VOD transcoding. Each video becomes an adaptive ladder (1080p / 720p / 480p / audio-only) of HLS segments; audio files become segmented HLS audio so podcasts get the same instant-seek behavior as video.
EventBridge — glue. ObjectCreated on the mezzanine bucket triggers the transcode orchestrator; MediaConvert job-state changes flow back to update asset status. No polling anywhere.
CloudFront with Origin Access Control + signed cookies — the only public face of media. The packaged bucket is unreachable except through the CDN, and the CDN only serves you with a valid short-lived cookie.
A small Lambda + API Gateway control plane — four routes: request an upload (presigned multipart), check asset status, create a playback session (the integration contract), and create a live channel.
AWS IVS (Interactive Video Service) — the real-time path. Managed RTMPS ingest, 2–5 second latency, an HLS-compatible playback URL that drops into the same player. Fable's HLD explicitly rejected MediaLive for v1 as cost- and ops-overkill, which matched my instinct exactly.
DynamoDB — asset and session metadata, single-table, with the migration state tracked in the same table under its own key prefix.

The integration contract is the part my other services care about, and it's one endpoint:

// Before: every service hand-rolled raw file access
const url = await getPresignedUrl(fileId); // 400 MB MP4, good luck

// After: one call, any service, audio or video, VOD or live
const session = await streamSvc.createPlaybackSession({
  assetId,
  viewerId,            // bound to the session, not shareable
  expiresIn: 3600
});
// → { manifestUrl: "https://media.example.com/hls/{assetId}/master.m3u8",
//     cookies: { "CloudFront-Policy": "...", "CloudFront-Signature": "..." } }

Raw-file serving didn't just get faster — it got deleted as a concept. There is no code path left that hands a full original file to a browser.

The security work I didn't have to ask for

I gave Fable one sentence of security direction: "private by default, no long-lived credentials, signed playback." Here's what it derived from that sentence:

I still ran my own review pass and an automated code review over the diff — trust but verify is doing heavy lifting in this workflow — and the review came back with style nits, not security findings.

The Migration: Transcoding an Existing Library Without Fear

New architectures are easy; old data is where projects go to die. I had years of raw media sitting in the legacy bucket, all of it needing transcoding into HLS, none of it allowed to break while users were actively consuming it.

Fable's migration design treated the transcode of the back catalog as a resumable, verifiable batch job, not a script you run and pray over:

The detail that sold me: Fable added a --dry-run flag and a cost estimate to the queue stage before I asked, because the TRD it had written contained a cost ceiling — so it treated "don't surprise me on the bill" as a requirement to implement, not a vibe. Eleven assets failed verification on the first pass (corrupted sources, one mislabeled codec). They were exactly the kind of thing a hand-rolled for loop over aws s3 ls would have silently butchered.

Steal this pattern even without an AI agent
Inventory → capped queue → verify → dual-read cutover, with state in a table instead of your terminal scrollback. It is the difference between a migration and an incident.

CI/CD and Runbooks: The Unsexy 20% That Makes It Real

A microservice without a pipeline is a liability with good intentions. Fable shipped both halves of operability the same afternoon:

GitHub Actions: on PR — lint, type-check, unit tests, terraform plan posted as a PR comment; on merge — gated terraform apply, Lambda deploy, and a canary that creates a real playback session against production and fails the deploy if time-to-first-byte on the manifest regresses.
Runbooks in the repo (/runbooks): Onboarding a new service (the playback-session contract, with copy-paste client code), Live stream operations (create channel, rotate stream key, end-of-stream archive), Transcode failure triage (where MediaConvert errors land, how to requeue one asset), and Cost monitoring (the CloudWatch dashboard + budget alarms it provisioned).

The runbooks are why integrating my first two services took an hour instead of a week of Slack archaeology. The third service was onboarded by a teammate without talking to me at all — they read the runbook Fable wrote, called one endpoint, and shipped. That's the real productivity story: the agent didn't just write code, it wrote down how to use the code, which is the part humans chronically skip.

What I Actually Did All Day

Let's be precise about the human role, because this is where most coverage of agentic coding gets hand-wavy. Every one of my 17 inputs was a product trade-off, a cost ceiling, or a risk-tolerance call — never a technical how-to. These five carried the most weight:

Plus the review passes: each design doc, the Terraform plan before the first apply, the IAM policies line by line, and the final diff. Call it two hours of genuine attention across the nine.

Notice what's not on the list: I never wrote a handler, never created a resource in the AWS console (my single console visit was confirming the budget alarm fired during a test), never debugged a YAML indentation error. Every hour I spent was on decisions that needed my context — which is exactly the division of labor I've been arguing the agent landscape was heading toward. It also confirmed something smaller but important: a well-maintained CLAUDE.md with your conventions is the cheapest force multiplier in this whole setup — Fable followed my repo conventions because they were written down where it looks.

Where It Stumbled (Because Nothing Is Magic)

Honesty section. Three real friction points:

First MediaConvert ladder was over-provisioned. The draft included a 4K rendition my content doesn't have sources for. Caught in LLD review — a 30-second fix at doc stage, but it would have quietly doubled transcode costs if I'd rubber-stamped it.
One quota assumption. Fable assumed default MediaConvert concurrent-job quotas; my account had a lower legacy limit. The migration's capped queue absorbed it gracefully (jobs just drained slower), and Fable filed the quota-increase request when the throttling showed up in logs — but it discovered the limit by hitting it, not by checking first.
It's only as good as your brief. I forgot to mention that some legacy audio was in a deprecated codec. The verification stage caught all of them, but a better inventory in my brief would have saved a requeue cycle.

None of these are "AI wrote bad code" stories. They're the same integration realities any senior engineer hits — the difference is the system was designed (by the agent, in the TRD) to surface them loudly instead of corrupting silently.

Best Practices: How to Run an Auto-Mode Build

FAQ

Final Take

The headline isn't "AI wrote a lot of code fast." Code generation has been cheap for two years. The headline is that Claude Fable 5 ran an engineering *process* — requirements, design review, infrastructure, security posture, migration safety, operations docs — and the process is what made one-day delivery survivable instead of reckless.

My role didn't shrink; it concentrated. Seventeen decisions and two hours of review were the entire human footprint, but they were the seventeen decisions that determine whether this service is still standing in two years. That's the trade every senior engineer should want.

If you're going to try this, start with the playbook above and a service you actually need — not a toy. The toys don't force the migration, the security review, or the runbooks, and those are exactly where auto mode earns its keep.

If you found this useful, read spec-driven development with AI agents next — it's the methodology that makes builds like this one repeatable instead of lucky.

Sources

Written for umesh-malik.com — no-fluff technical writing on AI, Web Dev, and Engineering.

Originally published at umesh-malik.com

Keep reading on umesh-malik.com:

DEV Community