For a 10-minute video in English on a modern laptop, transcription takes roughly 45–90 seconds. The quality holds at near-production accuracy for single-speaker content and degrades gracefully for multi-speaker or accented speech — which is honest, and consistent with what Whisper.cpp benchmarks show publicly.
Verify It Yourself — DevTools Network Tab
You don't have to take any of this on faith. Here's how to confirm zero outbound media requests during an AetherCut edit session.
Step 1. Open aethercut.app in Chrome or Edge. No account required.
Step 2. Open DevTools: F12 on Windows/Linux, Cmd + Option + I on macOS.
Step 3. Click the Network tab. Check Preserve log. Filter by Media or XHR.
Step 4. Drop a video file into the editor. Trim a clip. Apply captions. Export.
Step 5. Review the network log.
You will see requests for static assets (JS bundles, WASM files, fonts). You will not see any request carrying your video frames, audio data, or rendered output. The media bytes never leave the tab.
This is the test we invite every skeptical developer to run. It takes two minutes and produces a definitive answer.
Top comments (0)