I built a free macOS app that removes silence from videos and auto-generates subtitles

#opensource #macos #python #swift

I edit travel vlogs in Final Cut Pro. The most tedious part was removing silence — scrubbing through hours of footage to cut dead air, frame by
frame.

I looked for tools to automate this, but they all split audio by fixed time windows. Words get chopped in half. "Restaur-" on one clip, "-ant" on
the next.

So I built Silenci — a macOS app that removes silence and generates word-level subtitles without ever cutting mid-word.

How it works

Most tools chop audio into chunks, run ASR on each chunk, and words get split at chunk boundaries.

Silenci uses a 2-pass approach:

Pass 1: Silero VAD detects speech → Qwen3-ASR transcribes → ForcedAligner produces word-level timestamps
Pass 2: Split only at word end_time boundaries → never cuts mid-word

The output is an FCPXML file you import directly into Final Cut Pro — silence removed, subtitles already embedded.

Tech stack

Frontend: SwiftUI native macOS app
Backend: Python subprocess (Silero VAD + Qwen3-ASR + ForcedAligner)
Communication: JSON-RPC 2.0 over stdin/stdout pipes
ML runtime: MLX 8-bit quantized — optimized for Apple Silicon
No cloud, no API keys — everything runs locally on your Mac

Why Swift + Python?

I wanted a native macOS feel (drag & drop, real-time preview) but the ML ecosystem lives in Python. Instead of FFI bindings, I used a separate
Python process communicating via JSON-RPC.


Swift App  →  {"method":"analyze", "params":{...}}     →  Python
Swift App  ←  {"method":"progress", "params":{"percent":45}}  ←  Python
Swift App  ←  {"result": {"segments":[...]}}            ←  Python

If the Python process crashes, the UI stays alive. Cancellation is instant — just kill the subprocess.

Features

🔇 Silence removal with Silero VAD
🗣️ Word-level subtitles (Korean, English, Japanese, Chinese)
✂️ Smart subtitle splitting at sentence endings and punctuation
📤 Export FCPXML with inline iTT captions, SRT, iTT
🔄 Import edited FCPXML back from FCP to re-transcribe subtitles
🌐 In-app language selector (4 languages)
💻 CLI for scripting and automation

Auto-install

On first launch, the app sets up a Python venv and installs all dependencies automatically. ASR models download on first analysis with progress
tracking. No manual setup needed.

Output

The FCPXML includes both title text overlays and iTT inline captions. Import into FCP and you get the silence-removed timeline with subtitles
ready to edit.

DEV Community