Joanne Skiles

Posted on Mar 15

I Built a Local AI Podcast Editor Because I'm Done Renting My Own Workflow

#ai #localfirst #python

Despite how much I post online, I'm pretty deliberate about where my data actually lives. Not in a tinfoil hat way, I use cloud services, I have accounts everywhere. But there's a difference between choosing to share something and having no other option.

I started a podcast. Suddenly I needed to think about marketing, which means Shorts, Reels, vertical cuts, captions. The tools that do this are good. Genuinely good. But they all run on the same model: your files live on their servers, your workflow lives in their interface, and your access lives on their billing page. Cancel your subscription and you're starting over. Want to export your project? Hope they support that. Want to know what they're doing with your footage? Read the terms of service and good luck.

I'm not interested in that trade. I want to own my pipeline the same way I own my code. So I built a POC called DarkRoom.

Local-First Isn't Just a Preference, It's an Architecture

There's a growing conversation in software about local-first design, the idea that your data should live on your device by default and only sync or share when you explicitly choose to. Tools like Obsidian, Excalidraw, and a wave of newer apps are built around this. Your device is the source of truth, not some server you're renting space on.

The content creation space hasn't caught up. Almost everything assumes the cloud is the primary location and your device is just a thin client. That's great for collaboration. But it also means:

You're paying rent on your own data
Your workflow can disappear when a company pivots, gets acquired, or raises prices
You have no real visibility into how your content is used after upload

DarkRoom is an attempt to apply local-first thinking to podcast production. The processing happens on your machine. The files are yours. The workflow is just a Python server and some JSON. You can read every artifact it produces in a text editor, put it in git, move it to another machine, or delete it. No account required. No export wizard. No "your project will be deleted in 30 days."

The only external dependency (which is optional) is one Claude API call per episode, text in and edit decisions out, which you pay for directly at fractions of a cent per use.

What is DarkRoom?

DarkRoom is a local-first, multi-camera podcast editor. You give it your raw camera files, it transcribes them on your machine, an AI generates an edit decision list, you review and tweak it in the browser, and FFmpeg renders the final outputs. Nothing leaves your machine except that one, optional, text-only API call.

The Stack

Layer	Tech
Backend	Python + Flask
Transcription	OpenAI Whisper (runs locally)
AI editing decisions	Claude (`claude-sonnet-4-6`)
Rendering	FFmpeg
Frontend	Vanilla HTML/JS/CSS, one file
State	JSON in a `projects/` folder

No database. No Docker. No SaaS. No monthly fee. Just a Python server you run on your own machine, using open tools that have been around for years.

How It Actually Works

Step 1 - Upload your cameras

You record with 2 to 4 cameras, one per speaker, already synced. Upload each file and tag it with a speaker name.

Step 2 - Local transcription via Whisper

Whisper runs locally on each camera's audio. Because you're uploading one file per speaker, attribution is free. Whisper doesn't need to guess who said what. The result is a merged, time-coded transcript where every line already knows which camera it came from.

Step 3 - AI generates the Edit Decision List

This is the only moment anything leaves your machine. The transcript text gets sent to Claude and Claude returns a JSON Edit Decision List (EDL):

{
  "segments": [
    { "id": "seg_001", "start": 0.0, "end": 12.4, "keep": true, "camera": "A", "layout": "single" },
    { "id": "seg_002", "start": 12.4, "end": 15.1, "keep": false, "camera": "A", "reason": "filler words" }
  ],
  "clips": [
    { "id": "clip_001", "label": "Best 60-90s clip for Shorts", "start": 4.2, "end": 94.2, "reason": "Strong hook, complete thought" }
  ]
}

It tells FFmpeg what to keep, what to cut, which camera to show when, and which chunk would make the best Short. Plain JSON. Read it, edit it by hand, version control it, diff it between episodes.

Step 4 - Review in the browser

A UI served from localhost shows camera previews, per-speaker timeline tracks, and a transcript panel synced to the timeline. Toggle cuts, swap cameras, change layouts. The EDL is just a file on disk so nothing is locked away.

Step 5 - Render

Pick your targets: 16:9 full edit, 9:16 vertical, or the best 60 to 90 second Short. FFmpeg renders them into projects/{id}/output/. They're yours.

The Ownership Model

Your files stay local. Whisper runs on your machine, video never goes anywhere.
Your data is readable. Every project is a folder of MP4s and JSON files.
Your workflow is portable. No account, no export flow, no "download before you cancel."
Your cost is marginal. One Claude API call per episode, typically under $0.10. Pay per use, not per month.
Your stack is replaceable. Want to swap Claude for a local LLM? That's one file. Want to parse the EDL yourself? It's JSON. You're not locked into anything.

Running It

git clone https://github.com/you/darkroom
cd darkroom
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# add ANTHROPIC_API_KEY to .env
python app.py

Open http://localhost:5000.

What's Next

A few things I want to explore:

Replace Claude with a local LLM. The API call is already isolated in one file (editor.py). Plugging in Ollama or any OpenAI-compatible endpoint would make this fully air-gapped. The tradeoff is edit quality but for someone who wants zero external dependencies, it's a straightforward swap.

Auto-captions baked in. Whisper already produces the transcript. Burning subtitles into the vertical cut is a small FFmpeg step and probably the highest-value addition for short-form content.

Smarter clip selection. Right now Claude picks the best 60 to 90 second window. With a bit more prompt work you could get multiple ranked clip candidates per episode.

A proper timeline UI. The current review interface is functional but minimal. A drag-and-drop timeline where you can scrub through cuts before rendering would make the whole review step a lot nicer.

What This Is (and Isn't)

This is a proof of concept. The AI cuts are a starting point and you still review everything before anything renders. It is not Descript. It will not auto-post to Instagram.

What it is: a working pipeline from raw multi-camera footage to a full edit, a vertical cut, and a Short, on your own hardware, for less than a dime an episode, where every file is something you own and every dependency is something you can replace.

The tools already existed. Whisper, FFmpeg, Claude's API, none of this is new. DarkRoom is just the glue that connects them in a way that keeps you in control.

Repo: https://github.com/chaotictoejam/darkroom
Stack: Python · Flask · Whisper · Claude · FFmpeg

DEV Community