<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Boran Oktay DABAK</title>
    <description>The latest articles on DEV Community by Boran Oktay DABAK (@oktaydbk54).</description>
    <link>https://dev.to/oktaydbk54</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3981514%2Fb978679e-cb9e-4b6e-ae0f-8465661b62f4.jpeg</url>
      <title>DEV Community: Boran Oktay DABAK</title>
      <link>https://dev.to/oktaydbk54</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oktaydbk54"/>
    <language>en</language>
    <item>
      <title>How I built an open-source AI video editor you control by chatting</title>
      <dc:creator>Boran Oktay DABAK</dc:creator>
      <pubDate>Fri, 12 Jun 2026 15:23:31 +0000</pubDate>
      <link>https://dev.to/oktaydbk54/how-i-built-an-open-source-ai-video-editor-you-control-by-chatting-1k37</link>
      <guid>https://dev.to/oktaydbk54/how-i-built-an-open-source-ai-video-editor-you-control-by-chatting-1k37</guid>
      <description>&lt;p&gt;A few months ago I had a folder full of long videos — podcast recordings, talks, screen recordings — and one simple goal: turn them into short vertical clips with captions. Every tool I tried wanted me to either drag clips around a timeline for an hour, or upload my raw footage to someone's cloud and pay per export.&lt;/p&gt;

&lt;p&gt;So I built my own, and I just open-sourced it. It's called VibeClip, and the core idea is simple: you edit by &lt;em&gt;typing what you want&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;You drop in a long video and it cuts it into vertical 9:16 shorts with word-synced captions. Then you refine each clip by chatting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cut the silences
make it mrbeast style
split it and add gameplay underneath
make the captions bigger and yellow
undo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No timeline scrubbing. You describe the edit, it happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;Three pieces:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Transcription — faster-whisper (local).&lt;/strong&gt; Everything starts with an accurate word-level transcript. faster-whisper runs on your own machine, needs no API key, and gives per-word timestamps that are the backbone of both caption sync and silence detection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The "brain" — an LLM (your key).&lt;/strong&gt; This is the only thing that touches the network. The model does two jobs: scoring the strongest moments in the transcript (hook / flow / value, not a keyword scan), and turning a plain-language request like "make it punchier" into a sequence of concrete edit actions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Rendering — ffmpeg (local).&lt;/strong&gt; Every cut, 9:16 reframe, caption burn, zoom and audio duck is an ffmpeg operation. No cloud render farm.&lt;/p&gt;

&lt;p&gt;The chat layer is a tool-calling agent. When you type "cut the silences" the model never touches the video — it calls a tool that operates on the transcript timestamps, and the actual pixels are produced by replaying cached intermediates through ffmpeg. One &lt;strong&gt;undo&lt;/strong&gt; reverts a whole multi-step plan, because every mutation is a snapshot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decisions I'd make again
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bring-your-own-key.&lt;/strong&gt; VibeClip never ships with an API key. You plug in OpenAI, Gemini, Claude, DeepSeek, or point it at any OpenAI-compatible endpoint (Ollama, LM Studio). I'm not a proxy taking a cut of every render, and there's no lock-in — when a cheaper model drops you switch one line in .env.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local-first.&lt;/strong&gt; A video tool sees your unreleased footage. "Trust me, we delete it" isn't a great pitch. Transcription and rendering both run on your machine; the only thing that leaves is the transcript text you send to your own LLM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AGPL, not MIT.&lt;/strong&gt; If someone runs a modified version as a hosted service, the improvements have to come back to the commons. A paid hosted version may come later for people who don't want to self-host, but the open tool is the real product, not a teaser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things that were harder than expected
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Caption sync.&lt;/strong&gt; Whisper word timestamps are good but not perfect; landing captions on the right frame after silence-removal cuts meant treating the transcript timeline as the single source of truth and deriving the video from it, never the other way around.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-provider JSON.&lt;/strong&gt; OpenAI's json_object response format is silently ignored by some providers and errors on others. The fix: only send it where it's supported, and tolerantly parse fenced/prose-wrapped JSON everywhere else.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No libass on some ffmpeg builds&lt;/strong&gt;, so captions are rendered with Pillow and overlaid instead of relying on the subtitle filter.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;It's a docker compose up away, or run it with uv / pip. You only add one LLM key.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/oktaydbk54/vibeclip" rel="noopener noreferrer"&gt;https://github.com/oktaydbk54/vibeclip&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Site: &lt;a href="https://vibeclip.dev" rel="noopener noreferrer"&gt;https://vibeclip.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's early and rough in places. If you self-host AI tools or make short-form content, I'd genuinely love your feedback — and PRs and issues are very welcome.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>python</category>
      <category>ffmpeg</category>
    </item>
  </channel>
</rss>
