<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Koki Koki</title>
    <description>The latest articles on DEV Community by Koki Koki (@koki_oki).</description>
    <link>https://dev.to/koki_oki</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3481140%2Fe5d04684-5983-4716-b5e6-920f42dc2972.png</url>
      <title>DEV Community: Koki Koki</title>
      <link>https://dev.to/koki_oki</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/koki_oki"/>
    <language>en</language>
    <item>
      <title>Director's Cut AI: A Multimodal Storytelling Toolkit</title>
      <dc:creator>Koki Koki</dc:creator>
      <pubDate>Sun, 14 Sep 2025 20:34:59 +0000</pubDate>
      <link>https://dev.to/koki_oki/directors-cut-ai-a-multimodal-storytelling-toolkit-22d9</link>
      <guid>https://dev.to/koki_oki/directors-cut-ai-a-multimodal-storytelling-toolkit-22d9</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built Director's Cut AI, an all-in-one web tool that transforms a user's creative spark into a complete, multi-stage cinematic production plan. It acts as a creative co-pilot, guiding the user from a simple idea to finished video scenes through a seamless, six-step process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Inspiration&lt;/strong&gt;: Users upload three images to set the mood and select a genre and length for their project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Narrative&lt;/strong&gt;: The AI analyzes the images and prompts to generate a compelling short story.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storyboard&lt;/strong&gt;: The narrative is automatically broken down into a detailed, scene-by-scene storyboard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Style Frame&lt;/strong&gt;: Users select key shots and a visual style (e.g., "Cinematic," "Anime," "Film Noir") to generate high-quality still images that define the project's aesthetic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blueprint&lt;/strong&gt;: The AI synthesizes all the creative decisions into a machine-readable JSON blueprint, ready for video generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Videos&lt;/strong&gt;: Finally, the blueprint is used to generate dynamic, 8-second video clips for each scene in the story.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This applet solves the problem of the "blank page" for creators by providing a structured yet flexible workflow to develop a visual story from the ground up, leveraging Google's powerful multimodal AI at every step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Here is a link to the deployed applet: &lt;br&gt;
&lt;a href="https://dev.tourl"&gt;&lt;/a&gt;&lt;a href="https://director-s-cut-ai-30971010556.us-west1.run.app" rel="noopener noreferrer"&gt;https://director-s-cut-ai-30971010556.us-west1.run.app&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt;Google AI Studio was instrumental in prototyping and refining the prompts for this project. I used it extensively to test the interactions between different modalities and to structure the expected outputs.&lt;/p&gt;

&lt;p&gt;The applet integrates several Google AI models:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; (gemini-2.5-flash): This was the workhorse for all language and data-structuring tasks. I used it for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generating the initial narrative from a combination of images and text.&lt;/li&gt;
&lt;li&gt;Parsing the narrative into a structured JSON storyboard using Gemini's JSON mode with a defined responseSchema.&lt;/li&gt;
&lt;li&gt;Creating the final, machine-readable JSON blueprint that drives the video generation. The reliability of the JSON output was critical for the app's pipeline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Imagen 4&lt;/strong&gt; (imagen-4.0-generate-001): This model was used to generate the cinematic style frames. The prompt was engineered to combine the storyboard action with a specific artistic style and even render text overlays, providing a true preview of the final look.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Veo 2&lt;/strong&gt; (veo-2.0-generate-001): The final step of the process uses Veo 2 to bring the story to life. The detailed prompts from the JSON blueprint are fed to the model to generate high-quality, coherent video scenes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;Director's Cut AI is fundamentally multimodal, creating a chain of transformations from one media type to another.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Image + Text to Text (Narrative Generation)&lt;/strong&gt;: The app's starting point is a multimodal prompt. It combines the visual information and mood from user-uploaded images with textual instructions (genre, length) to produce a cohesive narrative. This grounds the AI's creativity in the user's specific vision, making the resulting story feel personal and relevant.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Text to Image (Style Frame Visualization)&lt;/strong&gt;: The app translates descriptive text from the storyboard (shot type, camera angle, action) and the user's chosen aesthetic into rich, detailed still images. This is a crucial feedback loop that allows the user to visually confirm the creative direction before committing to the more time-intensive video generation step, bridging the gap between imagination and a concrete visual.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Text to Video (Cinematic Scene Generation)&lt;/strong&gt;: The culmination of the user's journey is the text-to-video feature. The structured text prompts from the JSON blueprint are converted into dynamic video clips. This powerful feature completes the creative process, transforming the entire plan into a tangible cinematic product.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
