<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Garry Williams</title>
    <description>The latest articles on DEV Community by Garry Williams (@garrywilliamss).</description>
    <link>https://dev.to/garrywilliamss</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3930136%2Ff5b0ef49-ba09-4809-a900-02034f768dce.webp</url>
      <title>DEV Community: Garry Williams</title>
      <link>https://dev.to/garrywilliamss</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/garrywilliamss"/>
    <language>en</language>
    <item>
      <title>Level Up Your Video Workflow: Introducing the Lumiclip.ai API &amp; MCP Server</title>
      <dc:creator>Garry Williams</dc:creator>
      <pubDate>Thu, 14 May 2026 22:48:01 +0000</pubDate>
      <link>https://dev.to/garrywilliamss/level-up-your-video-workflow-introducing-the-lumiclipai-api-mcp-server-3i9l</link>
      <guid>https://dev.to/garrywilliamss/level-up-your-video-workflow-introducing-the-lumiclipai-api-mcp-server-3i9l</guid>
      <description>&lt;p&gt;As developers, we're constantly seeking tools that streamline our workflows and unlock new possibilities. For anyone working with video content, especially in the age of short-form virality, the process of extracting compelling clips from longer videos can be a significant bottleneck. Enter the Lumiclip.ai API and its accompanying Model Context Protocol (MCP) server – a powerful solution designed to programmatically transform YouTube URLs into ready-to-post, AI-generated video clips.&lt;/p&gt;

&lt;p&gt;The Lumiclip.ai API: A Developer's Gateway to AI Video Clipping&lt;/p&gt;

&lt;p&gt;The Lumiclip.ai API exposes the core AI clipping engine behind the popular Lumiclip.ai consumer product. This means you can integrate advanced video analysis, intelligent clip selection, and automated editing directly into your applications, platforms, or custom workflows. Whether you're building a content management system, a social media scheduler, or an AI-powered video assistant, this API provides the programmatic control you need.&lt;/p&gt;

&lt;p&gt;Core API Functionality:&lt;/p&gt;

&lt;p&gt;The API is designed for simplicity and efficiency, focusing on a few key endpoints to manage the entire clip generation lifecycle.&lt;/p&gt;

&lt;p&gt;POST /api/v1/clips/generate&lt;br&gt;
Initiates AI-driven clip generation from a YouTube URL. Accepts optional start_time, end_time, and callback_url for webhook notifications. Returns a project_id for status tracking.&lt;/p&gt;

&lt;p&gt;GET /api/v1/projects/{id}&lt;br&gt;
Polls for the status of a specific project, including processing steps and a list of generated clips (sorted by AI score).&lt;/p&gt;

&lt;p&gt;Base URL: &lt;a href="https://api.lumiclip.ai" rel="noopener noreferrer"&gt;https://api.lumiclip.ai&lt;/a&gt;&lt;br&gt;
Authentication: Authorization: Bearer sk_live_... (API Key )&lt;/p&gt;

&lt;p&gt;How it Works Under the Hood:&lt;/p&gt;

&lt;p&gt;1.Submission: You send a POST request to /api/v1/clips/generate with a YouTube URL.&lt;/p&gt;

&lt;p&gt;2.Processing: Lumiclip.ai downloads the video, transcribes it, and its AI engine analyzes the content to identify high-retention moments. These moments are then cut, reframed to a 9:16 aspect ratio with active-speaker tracking, and styled subtitles are burned in.&lt;/p&gt;

&lt;p&gt;3.Asynchronous Results: The API responds immediately with a 202 Accepted status and a project_id. You can then poll the /api/v1/projects/{id} endpoint or configure a callback_url to receive a webhook when your clips are ready.&lt;/p&gt;

&lt;p&gt;4.Retrieval: Once complete, you can fetch clip metadata and download URLs to integrate the generated clips into your application.&lt;/p&gt;

&lt;p&gt;Seamless AI Agent Integration with the Lumiclip.ai MCP Server&lt;/p&gt;

&lt;p&gt;For those leveraging AI agents and LLMs in their development stack, &lt;a href="//lumiclip.ai/api"&gt;Lumiclip.ai&lt;/a&gt; takes integration a step further with its Model Context Protocol (MCP) server. The @lumiclip/mcp-server package provides a set of five typed tools that AI assistants can directly invoke as tool-use actions.&lt;/p&gt;

&lt;p&gt;This means your AI agents (e.g., in Claude Desktop, Cursor, or other MCP-compatible environments) can programmatically request video clipping without needing to manage raw HTTP requests. The MCP server uses the same API key and credit pool, ensuring consistent access and usage.&lt;/p&gt;

&lt;p&gt;MCP Tools for AI Agents:&lt;/p&gt;

&lt;p&gt;•generate_clips: Initiate clip generation from a YouTube URL.&lt;/p&gt;

&lt;p&gt;•get_project_status: Query the progress and retrieve clips for a given project.&lt;/p&gt;

&lt;p&gt;•list_projects: Get an overview of recent projects.&lt;/p&gt;

&lt;p&gt;•get_clip: Retrieve detailed information for a specific clip.&lt;/p&gt;

&lt;p&gt;•check_usage: Monitor API usage and credit balance.&lt;/p&gt;

&lt;p&gt;For remote MCP clients, a streamable HTTP endpoint is available at &lt;a href="https://lumiclip.ai/api" rel="noopener noreferrer"&gt;https://lumiclip.ai/api&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Get Started and Build Something Amazing!&lt;/p&gt;

&lt;p&gt;Lumiclip.ai offers a compelling solution for developers looking to automate and enhance their video content workflows. With a free hour of processing credits available (no credit card required ), there's no better time to explore its capabilities.&lt;/p&gt;

&lt;p&gt;Dive into the documentation and start building: &lt;a href="//lumiclip.ai/api"&gt;lumiclip.ai/api&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>mcp</category>
      <category>showdev</category>
    </item>
    <item>
      <title>How LumiClip Finds the Best Moments in Your Video and Reframes Them for Mobile</title>
      <dc:creator>Garry Williams</dc:creator>
      <pubDate>Wed, 13 May 2026 21:49:53 +0000</pubDate>
      <link>https://dev.to/garrywilliamss/how-lumiclip-finds-the-best-moments-in-your-video-and-reframes-them-for-mobile-2hlh</link>
      <guid>https://dev.to/garrywilliamss/how-lumiclip-finds-the-best-moments-in-your-video-and-reframes-them-for-mobile-2hlh</guid>
      <description>&lt;p&gt;When someone uploads an hour-long podcast or a Twitch VOD to &lt;a href="https://lumiclip.ai" rel="noopener noreferrer"&gt;LumiClip&lt;/a&gt;, they expect ten short, vertical, ready-to-post clips back. Two pipelines do the heavy lifting: a highlight finder that decides what's worth clipping, and a reframer that turns landscape footage into something that looks native on a phone screen.&lt;br&gt;
Here's how each one actually works under the hood.&lt;/p&gt;

&lt;p&gt;The core problem with asking one model to do everything&lt;br&gt;
The first thing we tried was the obvious thing: prompt a capable LLM with the transcript and ask it to find the best clips. The signal-to-noise was terrible. A model looking at a raw hour-long transcript has no spatial sense of the video, no understanding of energy or pacing, and no way to know that two candidate clips are basically the same moment from different angles.&lt;br&gt;
So we scrapped that and built a small assembly line instead. Each step is cheap, focused, and only passes its survivors to the next stage. By the time the most capable model runs, it's looking at a curated shortlist rather than raw noise.&lt;/p&gt;

&lt;p&gt;The Highlight Pipeline&lt;br&gt;
Step 1 — Transcribe with Deepgram Nova-3&lt;br&gt;
Word-level timestamps, speaker diarization, and utterance boundaries are the substrate for everything downstream. Long sources get split into chunks, transcribed in parallel, then merged back into a single timeline. Nova-3 is fast enough that this doesn't become the bottleneck even on 3-hour VODs.&lt;br&gt;
Step 2 — Classify the video type&lt;br&gt;
Seven evenly-spaced frames go to a multimodal classifier — a small, fast vision model — that returns one of four buckets: dialogue, screenshare, gaming, or action. This decision changes everything downstream. A podcast doesn't need the same clip-selection heuristics as a Call of Duty stream. A screenshare tutorial has completely different "good moment" criteria than a two-person interview.&lt;br&gt;
This single classification step rules out the wrong heuristics before any expensive processing runs.&lt;br&gt;
Step 3 — Topic-segment the transcript&lt;br&gt;
A second LLM call walks the merged transcript and breaks it into topic blocks — coherent runs of related speech. We score each segment on three axes: how self-contained it is, how hooky the opening is, and how emotionally salient the content is.&lt;br&gt;
This is where most of the junk gets filtered. A five-minute tangent that goes nowhere scores poorly on self-containment. A mid-sentence cut scores poorly on hooks. Only segments that clear all three thresholds move forward.&lt;br&gt;
Step 4 — Score candidate highlights&lt;br&gt;
A scoring model evaluates each candidate against criteria like: opens strong, has tension, has payoff, would survive being seen with no setup. Anything below a hard quality floor gets dropped before the next step ever sees it.&lt;br&gt;
This is the most expensive step in the pipeline. The reason we can afford to run a capable model here is that by this point we've gone from hours of raw content to maybe 15-20 candidate segments. The classifier and topic segmenter did the cheap filtering work so this step can do the quality work.&lt;br&gt;
Step 5 — Final selection&lt;br&gt;
A final pass picks the best non-overlapping clips, respects per-tier caps (we ship ten clips per project), and assigns each a viral-score hint that surfaces in the dashboard. The non-overlapping constraint is important — without it you get five clips that are all variations of the same thirty-second moment.&lt;br&gt;
Step 6 — Generate the hook&lt;br&gt;
Each clip gets a three-to-seven-word punch line generated by a model that has seen only that clip's transcript. Short, declarative, optimized for the first second of attention. This runs last because you want the hook to reflect what the clip actually is, not what you hoped it would be.&lt;/p&gt;

&lt;p&gt;Why layering matters for cost and quality&lt;br&gt;
The reason the pipeline is structured this way is cost containment without quality sacrifice.&lt;br&gt;
A small classifier rules out 90% of the work for a screenshare video instantly. A topic segmenter narrows hours of speech to tens of candidates cheaply. Only the survivors get the expensive scoring pass. Running a capable model on a raw hour-long transcript for every upload would be both slow and expensive. Running it on a pre-filtered shortlist of 15 candidates is fast and affordable.&lt;br&gt;
The quality benefit is the same: a model looking at 15 curated candidates makes much better decisions than a model drowning in 200 possible segments.&lt;/p&gt;

&lt;p&gt;The Reframing Pipeline&lt;br&gt;
Landscape video on a 9:16 phone screen has a brutal math problem: 75% of the pixels are now off-canvas. The naive fix — a centered static crop — works for a stationary podcaster in the middle of the frame. It fails immediately the moment anyone moves, looks at a side monitor, or there are two people sitting apart from each other.&lt;br&gt;
This was the hardest problem to solve well and the one most other tools get wrong consistently.&lt;br&gt;
Step 1 — Face detection on keyframes&lt;br&gt;
We run InsightFace's buffalo_l model on sampled keyframes to get bounding boxes plus a face embedding per detection. Sampling keyframes rather than every frame keeps this fast without losing tracking fidelity — faces don't teleport between adjacent frames.&lt;br&gt;
Step 2 — Identify the active speaker&lt;br&gt;
Face embeddings let us cluster detections into persistent identities across the clip. We combine this with the diarization data from Deepgram to know not just where faces are, but which face is currently speaking. The active speaker gets priority in the crop target calculation.&lt;br&gt;
This is the step that makes two-person interview reframing work. Without speaker identification, the crop just averages the two face positions and ends up centered between them — which means neither person is properly in frame. With it, the crop follows whoever is actually talking.&lt;br&gt;
Step 3 — Smooth the crop path&lt;br&gt;
Raw frame-by-frame crop targets are jittery. If you apply them directly the video looks like it's having a seizure. We run the crop coordinates through a smoothing pass that respects the natural movement of the speaker while eliminating micro-jitter. The goal is motion that feels like a camera operator following the subject, not a bounding box chasing pixels.&lt;br&gt;
Step 4 — Handle edge cases&lt;br&gt;
Some frames have no detectable face — the speaker looked down, the camera cut away, there's a B-roll insert. We hold the last known crop position through short gaps and interpolate smoothly back when the face reappears. For longer gaps we fall back to a content-aware center crop rather than freezing awkwardly on the last known position.&lt;/p&gt;

&lt;p&gt;What we got wrong first&lt;br&gt;
The first version of the reframer used a single static crop calculated from the average face position across the whole clip. It looked fine in demos with a stationary speaker and broke on everything else.&lt;br&gt;
The first version of the highlight pipeline had no video type classification. It applied the same dialogue heuristics to a gaming stream and generated clips of someone quietly farming resources with no audio spike. The scoring model had no idea those clips were bad because it was optimized for spoken content.&lt;br&gt;
Both mistakes came from the same root cause: building for the clean case and being surprised by real footage. Real footage is messy, unpredictable, and almost never looks like your test videos.&lt;/p&gt;

&lt;p&gt;What's next&lt;br&gt;
The video type classifier currently handles four buckets. We're working on expanding it and adding per-type scoring models that understand what "good" actually means for each format — a good gaming clip has completely different properties than a good podcast clip, and a single scoring model trying to handle both will always be compromised.&lt;br&gt;
The reframer handles single-camera footage well. Multi-camera footage with cuts is the next hard problem.&lt;br&gt;
If you've built something similar or have thoughts on the pipeline, would love to hear it in the comments. And if you want to see what the output actually looks like on your own content: &lt;a href="https://lumiclip.ai" rel="noopener noreferrer"&gt;lumiclip.ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
