DEV Community: imper

영상 하나 던지면 번역된 자막이 생기는 오픈소스 앱 WhisperSubTranslate v2.0 (whisper.cpp + 로컬 LLM)

imper — Sat, 23 May 2026 14:09:21 +0000

외국어 영상 볼 때 자막 없으면 답답하죠. 저도 그랬습니다.

그래서 WhisperSubTranslate 만들었습니다. 영상 파일 하나 던지면 whisper.cpp가 음성 추출하고, 번역 엔진이 SRT 자막으로 만들어줍니다. 클라우드 업로드 없고, 계정도 필요 없습니다.

엔진 선택 가능:

MyMemory (무료, 키 불필요, 일일 제한 있음)
DeepL / OpenAI / Gemini (본인 키 사용)
로컬 LLM (HY-MT GGUF 모델, 완전 오프라인) — v2.0 주요 기능

실제로 잘 되는 것들:

긴 영상도 배치 큐로 처리
폴더 드래그 앤 드롭
GPU 있으면 CUDA 가속, 없으면 CPU 폴백
SRT 출력 → VLC, mpv, Plex 등에서 바로 사용

안 되는 것 (솔직히):

영상에 박힌 자막 추출은 안 됨. 음성에서 새로 만드는 방식입니다
실시간 마이크 입력은 아직 지원 안 함

한국어 포함 14개 언어, whisper.cpp 기준 100+ 소스 언어 지원합니다. Windows 포터블 빌드 + Linux/macOS 소스 빌드.

오픈소스 (GPL-3.0): https://github.com/Blue-B/WhisperSubTranslate

/disconnect for opencode — a tiny TUI plugin I wish existed before I made it

imper — Mon, 18 May 2026 17:23:30 +0000

I use opencode as my daily TUI coding agent. It's good. But there's one thing that kept biting me.

When you want to remove a single provider — say you rotated a key and want the old entry gone — opencode doesn't ship a clean way to do it. The choices are:

Re-auth every provider from scratch
Open ~/.local/share/opencode/auth.json in vim and pray the trailing comma survives

I did option 2 maybe four times before I got annoyed enough to write a plugin.

What I built

opencode plugin opencode-tui-utils

Then inside the TUI:

/disconnect

It opens opencode's own provider picker — the same dialog component the rest of the TUI already uses — so it doesn't feel like a bolted-on script. You pick the provider you want gone, confirm, done.

Token values are never printed. Only provider names and auth types are shown, so you can run it in a recorded session without paranoia.

The other commands

Since I was writing the plugin loader anyway, I added the three other slash commands I kept wanting:

/lsp-toggle — flip LSP on/off without restarting the TUI
/websearch-toggle — same for web search
/tool-status — print what's currently enabled

The toggle commands update your shell profile and prompt for a restart, which matches opencode's existing pattern for launch-gated flags.

Why a plugin instead of a fork

opencode's plugin system is just JSON: drop the package into ~/.config/opencode/tui.json under "plugin", restart, the slash command shows up in the palette next to the built-ins. No fork to maintain, no rebase pain when opencode updates.

This also means adding command #5 is mechanical — there's a shared API wrapper and the loader handles registration. If you've got a slash command you keep missing in opencode, there's a "command ideas up for grabs" section in CONTRIBUTING.md.

Where to find it

Repo: https://github.com/Blue-B/opencode-tui-utils
npm: opencode-tui-utils
License: MIT
Inspired by opencode issue #10494, where someone had asked for the disconnect flow

Built by @Blue-B — happy to take feedback, especially from anyone using opencode in a workflow I haven't thought of.

One install command wires 8 CLI coding agents to a shared stealth Chromium

imper — Mon, 18 May 2026 16:58:37 +0000

I use a bunch of CLI coding agents — Claude Code, Codex CLI, pi, opencode, Gemini CLI, Kiro, Amp, Crush — and every one of them shipped its own headless Chromium that gets blocked by basic bot checks (navigator.webdriver, missing plugins, the usual headless fingerprints).

The fix itself isn't novel. Patched stealth Chromium has existed for years. The annoying part is wiring it into each agent — every one wants its config in a different file:

Claude Code → CLAUDE.md
Codex CLI / opencode → AGENTS.md
Gemini CLI → GEMINI.md
Kiro → .kiro/steering/
Amp / Crush → their own dotfiles

…and all of them need to point at the same browser binary.

So I spent a weekend writing a one-command installer that does the wiring:

git clone https://github.com/Blue-B/browser-harness-kit
cd browser-harness-kit
bash install.sh

It detects which of the 8 agents you have installed, writes the right rule file for each, and creates a shared ~/.playwright/cli.config.json so they all hit the same stealth Chromium through pi-playwright as the CLI runner.

Result

Every agent drives a real headed Chromium that passes:

navigator.webdriver check
plugin / mimeType length checks
window.chrome shape
full bot.sannysoft.com matrix

Screenshot of the pass matrix and verify output are in the README.

What it isn't

This repo is integration glue only. It does NOT bundle or redistribute:

[CloakBrowser]https://www.reddit.com/r/SideProject/comments/1tgrmo3(https://github.com/CloakHQ/CloakBrowser) — the actual stealth Chromium build (linked, not bundled)
Playwright + pi-playwright — the runner
The 8 agents themselves — each lives at its own home

Just the wiring between them, so I don't have to redo it on every fresh box.

Why I'm sharing it

I wrote this for my own setup. After the third time pasting the same config snippets into a new machine I figured someone else would save a weekend.

Repo: https://github.com/Blue-B/browser-harness-kit
Tested on WSL2 + Ubuntu 24.04 + Node 22
License: MIT
Reddit discussion: https://www.reddit.com/r/SideProject/comments/1tgrmo3

Happy to take feedback, or PRs adding adapters for CLI agents I haven't covered. If your agent reads from yet another rule file, drop an issue.

Built by @Blue-B — find more side projects on my GitHub.

Building an offline subtitle extractor with whisper.cpp and Electron

imper — Sun, 08 Mar 2026 02:37:46 +0000

I watch a lot of foreign language content - anime, K-dramas, tech talks - and getting subtitles was always a pain. Upload to a random website, hit the daily limit, try another one, or install Python and figure out whisper's CLI.

So over the past few months I've been building a desktop app that handles the whole pipeline locally.

What it does

You drop a video file in, pick a whisper model size, and it spits out an SRT subtitle file. Optionally you can translate the subtitles using one of several engines.

The speech-to-text runs via whisper.cpp so everything stays on your machine. No uploads, no API calls for the transcription part. If you have an NVIDIA GPU it automatically uses CUDA, otherwise it falls back to CPU - this was one of the trickier parts to get right.

Tech decisions

I went with Electron + Node.js because:

Cross-platform (though I'm mainly targeting Windows right now)
Easy to bundle whisper.cpp binaries and ffmpeg
The UI is just HTML/CSS/JS so iteration is fast

For whisper.cpp integration, the app spawns it as a child process with the right flags depending on whether CUDA is available. Model files (GGML format) auto-download on first run into a local _models/ folder.

Translation engines

Translation is optional. Currently supported:

MyMemory - free, no API key, ~50K chars/day
DeepL - free tier 500K chars/month, needs API key
OpenAI GPT - paid, good quality for nuanced translations
Gemini - Google's API, generous free tier

The app chunks subtitle text and sends it in batches to avoid rate limits. Each engine has its own quirks with language codes so there's a mapping layer.

v1.4.0 changes

Just pushed the latest update which adds:

Automatic GPU/CPU fallback detection
Bundled ffprobe-static (no more separate ffmpeg install)
Better DeepL language code mapping

Try it out

It's packaged as a portable .exe - no install needed, just extract and run.

GitHub: WhisperSubTranslate
License: GPL-3.0

If you're working on something similar or have suggestions, I'd love to hear about it.