<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bizzi Cole87</title>
    <description>The latest articles on DEV Community by Bizzi Cole87 (@bizzi_cole87_26ec228487d6).</description>
    <link>https://dev.to/bizzi_cole87_26ec228487d6</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2687687%2Fbd9673ad-be06-4578-aa51-60cb0a7320eb.png</url>
      <title>DEV Community: Bizzi Cole87</title>
      <link>https://dev.to/bizzi_cole87_26ec228487d6</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bizzi_cole87_26ec228487d6"/>
    <language>en</language>
    <item>
      <title>I Built a Voice-First AI Photo &amp; Document Editor with the Gemini Live API— Here's How</title>
      <dc:creator>Bizzi Cole87</dc:creator>
      <pubDate>Sun, 15 Mar 2026 22:04:07 +0000</pubDate>
      <link>https://dev.to/bizzi_cole87_26ec228487d6/i-built-a-voice-first-ai-photo-document-editor-with-gemini-live-heres-how-dab</link>
      <guid>https://dev.to/bizzi_cole87_26ec228487d6/i-built-a-voice-first-ai-photo-document-editor-with-gemini-live-heres-how-dab</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There's a version of photo editing where you don't touch a single slider. You click the part of the image you want to change, say what you want out loud, and watch it happen. That's Say Edit.&lt;/p&gt;

&lt;p&gt;Over the past few weeks I built Say Edit — a voice-first AI workspace that lets you edit images and navigate documents entirely by speaking. It's powered by the &lt;strong&gt;Gemini Live API&lt;/strong&gt;, &lt;strong&gt;Gemini image generation&lt;/strong&gt;, and deployed on &lt;strong&gt;Google Cloud Run&lt;/strong&gt;. This article is the behind-the-scenes of how I built it, what broke, and what surprised me.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Idea
&lt;/h2&gt;

&lt;p&gt;Most AI tools make you type. You open a chat window, describe what you want, wait for a response, copy it somewhere, repeat. I wanted to eliminate every one of those steps for two use cases I found genuinely painful:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Editing a photo&lt;/strong&gt; — you know exactly what you want to change, but you have to hunt through menus, masks, and sliders to get there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reading a dense document&lt;/strong&gt; — you have a question, but finding the exact passage means scrolling through 80 pages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The answer to both is the same: a persistent voice session that listens continuously, understands your intent, and acts immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Voice Loop — Gemini Live API
&lt;/h2&gt;

&lt;p&gt;The foundation of Say Edit is a bi-directional WebSocket with the &lt;strong&gt;Gemini Live API&lt;/strong&gt; (&lt;code&gt;gemini-2.5-flash-native-audio-preview-12-2025&lt;/code&gt;). This is not a push-to-talk button. The model listens the entire time you're in the workspace.&lt;/p&gt;

&lt;p&gt;Here's what the audio pipeline looks like on the browser side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;audioCtx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createMediaStreamSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;audioCtx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createScriptProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;audioCtx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onaudioprocess&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;float32&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inputBuffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getChannelData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Float32 → Int16 → base64&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;int16&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Int16Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;int16&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;32768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32767&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;32768&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendRealtimeInput&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;media&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;btoa&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromCharCode&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;int16&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;))),&lt;/span&gt; &lt;span class="na"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;audio/pcm;rate=16000&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;16kHz PCM, streamed continuously. The model responds with audio at 24kHz, which I queue through &lt;code&gt;AudioBufferSourceNode&lt;/code&gt; for gapless playback.&lt;/p&gt;

&lt;p&gt;The part that changed how I think about this kind of app: &lt;strong&gt;interruption&lt;/strong&gt;. When you speak while the model is responding, the Live API sends &lt;code&gt;serverContent.interrupted&lt;/code&gt;. I clear the audio queue instantly and the model re-listens. No waiting. That single behavior makes the whole thing feel like talking to a person rather than waiting for a request.&lt;/p&gt;




&lt;h2&gt;
  
  
  Image Editing — Hotspot + Voice + Gemini Image Generation
&lt;/h2&gt;

&lt;p&gt;The image editor works in three steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Click&lt;/strong&gt; anywhere on the image. The frontend records the pixel coordinates &lt;code&gt;(x, y)&lt;/code&gt; as a "hotspot" and renders a crosshair.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speak&lt;/strong&gt; your edit. The Gemini Live session calls the &lt;code&gt;edit_image_region&lt;/code&gt; tool with your spoken instruction and the hotspot coordinates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch&lt;/strong&gt; the edit apply. The current image is sent as base64 to &lt;code&gt;gemini-2.0-flash-exp-image-generation&lt;/code&gt; alongside the prompt and coordinates. The result replaces the current frame in the history stack.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Tool definition registered with the Live session&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;edit_image_region&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Edit a specific region of the current image.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OBJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nl"&gt;edit_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;STRING&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="nx"&gt;hotspot_x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NUMBER&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="nx"&gt;hotspot_y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NUMBER&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="nx"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;edit_prompt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hotspot_x&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hotspot_y&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The edit call to Gemini image generation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gemini-3.1-flash-image-preview&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;inlineData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;base64&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Edit: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;editPrompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;". Focus around pixel (x: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, y: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;y&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;). Return ONLY the edited image.`&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;responseModalities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;IMAGE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TEXT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every edit is non-destructive — pushed onto a history stack with full undo/redo and a hold-to-compare button that fades back to the original.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Stale Closure Problem
&lt;/h3&gt;

&lt;p&gt;This one burned me for a while. The Live API's &lt;code&gt;onmessage&lt;/code&gt; callback is registered once at session start. Any React state it closes over is immediately stale — meaning &lt;code&gt;history[historyIndex]&lt;/code&gt; would always point to the original image, no matter how many edits had been applied.&lt;/p&gt;

&lt;p&gt;The fix was to maintain refs that mirror the state and always read from those inside the callback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;historyRef&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useRef&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;File&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;initialFile&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;historyIndexRef&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useRef&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;historyRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// kept in sync on every render&lt;/span&gt;
&lt;span class="nx"&gt;historyIndexRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;historyIndex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Inside the async tool handler — always gets the live value&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imageFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;historyRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;historyIndexRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is now a fixture of how I build anything on top of the Live API.&lt;/p&gt;




&lt;h2&gt;
  
  
  Document Navigation — Spatial Search + Live Highlights
&lt;/h2&gt;

&lt;p&gt;The document workspace is a different beast. When you ask a question, the AI shouldn't just answer — it should show you exactly where in the document the answer lives.&lt;/p&gt;

&lt;p&gt;The pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Ingestion (NestJS backend on Google Cloud Run)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you upload a PDF, the backend runs it through &lt;code&gt;pdfjs-dist&lt;/code&gt;, extracts text items with their transform matrices, groups words into lines by Y-coordinate proximity, and merges lines into sentence-level chunks. Every chunk stores a tight bounding box &lt;code&gt;[x, y, width, height]&lt;/code&gt; alongside the text.&lt;/p&gt;

&lt;p&gt;One gotcha: PDF coordinates use bottom-left origin, but the React PDF viewer uses top-left. Every Y coordinate gets flipped during ingestion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pageHeight&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each chunk is then embedded with &lt;code&gt;gemini-embedding-001&lt;/code&gt; and stored in a &lt;strong&gt;Supabase pgvector&lt;/strong&gt; index with an HNSW index for fast cosine similarity search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Search during the Live session&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you ask a question, the Live session calls &lt;code&gt;search_document&lt;/code&gt;. The frontend hits the backend's &lt;code&gt;/query&lt;/code&gt; endpoint, which embeds your query and runs a cosine similarity search against the document's chunks. The top results come back with page numbers and bounding boxes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Spatial highlight&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model then calls &lt;code&gt;focus_document_section&lt;/code&gt; with the page and an array of bounding box coordinates. The frontend jumps the PDF viewer to the right page and renders yellow highlight overlays at the exact pixel positions of the relevant sentences:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Convert stored PDF-point bboxes to percentage overlays&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;highlight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;pageIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;left&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bx&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;pageWidthPx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;by&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;pageHeightPx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bw&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;pageWidthPx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bh&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;pageHeightPx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You hear the answer and see it highlighted on the page at the same moment.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Compose Studio
&lt;/h2&gt;

&lt;p&gt;Beyond single-image editing, Say Edit has a Compose Studio that merges two photos by voice:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Dress the person from A in the outfit from B."&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Put the product on the background from B."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Both images are encoded as base64 and sent to Gemini image generation together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gemini-3.1-flash-image-preview&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;inlineData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;imgA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;base64A&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;inlineData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;imgB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;base64B&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Composition request: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;". Return ONLY the composed image.`&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;responseModalities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;IMAGE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TEXT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result lands in the same history stack as regular edits — so you can compose two images together and then keep refining the result by voice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Infrastructure — Google Cloud Run
&lt;/h2&gt;

&lt;p&gt;Both the frontend (React + Vite) and the backend (NestJS) are containerized and deployed to &lt;strong&gt;Google Cloud Run&lt;/strong&gt;. The Dockerfile for the frontend bakes the Vite environment variables at build time using &lt;code&gt;ARG&lt;/code&gt; injection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; VITE_GEMINI_API_KEY&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; VITE_BACKEND_URL&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"VITE_GEMINI_API_KEY=&lt;/span&gt;&lt;span class="nv"&gt;$VITE_GEMINI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .env
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"VITE_BACKEND_URL=&lt;/span&gt;&lt;span class="nv"&gt;$VITE_BACKEND_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .env
&lt;span class="k"&gt;RUN &lt;/span&gt;npx vite build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy say-edit &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-build-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"VITE_GEMINI_API_KEY=...,VITE_BACKEND_URL=..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Live API's interruption model is the whole game.&lt;/strong&gt; Every other voice interface I've used makes you wait. The ability to interrupt mid-sentence — backed by a proper server-side signal rather than a client-side hack — is what makes Say Edit feel like a real conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool design is UX design.&lt;/strong&gt; The &lt;code&gt;get_current_hotspot&lt;/code&gt; tool exists purely so users can say "edit this" without re-stating coordinates. Getting the tool schema right — what the model calls, when, and with what arguments — determines the quality of the interaction more than any UI element.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Refs are mandatory in Live API callbacks.&lt;/strong&gt; Any async callback registered at session start captures stale React state. The ref-mirror pattern is non-negotiable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PDF extraction is harder than it looks.&lt;/strong&gt; &lt;code&gt;pdfjs-dist&lt;/code&gt; gives you transform matrices, not sentences. The grouping and chunking pipeline is the most underestimated part of the whole system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://glyph-client-229364276486.us-central1.run.app/" rel="noopener noreferrer"&gt;https://glyph-client-229364276486.us-central1.run.app/&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Client repo:&lt;/strong&gt; &lt;a href="https://github.com/greatsage-raphael/say_edit" rel="noopener noreferrer"&gt;https://github.com/greatsage-raphael/say_edit&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Server repo:&lt;/strong&gt; &lt;a href="https://github.com/greatsage-raphael/say_edit_server" rel="noopener noreferrer"&gt;https://github.com/greatsage-raphael/say_edit_server&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Built with the Gemini Live API, NaNo Banana, Google Cloud Run, and Supabase.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gemini</category>
      <category>ai</category>
      <category>gcp</category>
      <category>nanobanana</category>
    </item>
    <item>
      <title>Business Portfolio Generator</title>
      <dc:creator>Bizzi Cole87</dc:creator>
      <pubDate>Sun, 26 Jan 2025 22:15:12 +0000</pubDate>
      <link>https://dev.to/bizzi_cole87_26ec228487d6/business-portfolio-generator-1p3i</link>
      <guid>https://dev.to/bizzi_cole87_26ec228487d6/business-portfolio-generator-1p3i</guid>
      <description>&lt;p&gt;&lt;strong&gt;Business Portfolio Generator&lt;/strong&gt; 💼🌐&lt;br&gt;
This is a submission for the Agent.ai Challenge: Assembly of Agents&lt;br&gt;
What I Built&lt;br&gt;
An automated portfolio website generator combining Vercel's prompt generation with Code Genie's development capabilities. Users input basic business details to receive a professional website instantly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr75bvae4rf2k9agx4dl3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr75bvae4rf2k9agx4dl3.png" alt="Image description" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;div&gt;
  &lt;iframe src="https://loom.com/embed/6bf09dda292845a2baf8f4ea2f2ec8f2"&gt;
  &lt;/iframe&gt;
&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Demo&lt;/strong&gt;&lt;br&gt;
Try it here: &lt;a href="https://agent.ai/agent/portfolio" rel="noopener noreferrer"&gt;https://agent.ai/agent/portfolio&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Process Flow:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;User provides business name, industry, and description&lt;br&gt;
Vercel agent generates optimized prompts&lt;br&gt;
Code Genie transforms prompts into website code&lt;br&gt;
Complete portfolio website delivered in HTML&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent.ai Experience:&lt;/strong&gt;&lt;br&gt;
The development highlighted agent collaboration's potential for business automation. Combining prompt engineering with code generation created a streamlined website creation process.&lt;br&gt;
Key Achievements:&lt;/p&gt;

&lt;p&gt;Seamless integration between prompt and code generation&lt;br&gt;
Automated business-specific customization&lt;br&gt;
Rapid portfolio deployment&lt;/p&gt;

&lt;p&gt;This project demonstrates how agent assembly can simplify complex business tasks through AI collaboration.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>agentaichallenge</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Career Launcher: Improve your linked-in today</title>
      <dc:creator>Bizzi Cole87</dc:creator>
      <pubDate>Sun, 26 Jan 2025 20:02:55 +0000</pubDate>
      <link>https://dev.to/bizzi_cole87_26ec228487d6/career-launcher-improve-your-linked-in-today-4k45</link>
      <guid>https://dev.to/bizzi_cole87_26ec228487d6/career-launcher-improve-your-linked-in-today-4k45</guid>
      <description>&lt;h1&gt;
  
  
  Professional Development Hub 🚀👔
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;This is a submission for the Agent.ai Challenge: Assembly of Agents&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I created a powerful professional development agent that leverages multiple specialized agents to generate a comprehensive career enhancement package. By combining CareerGPS, Personal Portfolio, and LinkedIn Post Generator, this agent transforms a simple LinkedIn profile ID into a holistic career growth toolkit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzpbsenjdlgnlag1e3kx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzpbsenjdlgnlag1e3kx.png" alt="Image description" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;div&gt;
  &lt;iframe src="https://loom.com/embed/0e89c9cb4f62415092102f89f87e0de7"&gt;
  &lt;/iframe&gt;
&lt;/div&gt;


&lt;p&gt;Key features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Personalized career advice generation&lt;/li&gt;
&lt;li&gt;Automated personal portfolio website creation&lt;/li&gt;
&lt;li&gt;Tailored LinkedIn post drafting&lt;/li&gt;
&lt;li&gt;One-click professional branding solution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent addresses a critical need: helping professionals efficiently showcase their skills, receive targeted career guidance, and maintain a strong online presence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;You can experience the Professional Development Hub agent here: &lt;a href="https://agent.ai/agent/careerLauncher" rel="noopener noreferrer"&gt;https://agent.ai/agent/careerLauncher&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflow Breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User enters LinkedIn Profile ID&lt;/li&gt;
&lt;li&gt;Select professional development themes&lt;/li&gt;
&lt;li&gt;CareerGPS generates personalized career advice&lt;/li&gt;
&lt;li&gt;Personal Portfolio agent creates a website&lt;/li&gt;
&lt;li&gt;LinkedIn Post Generator crafts an engaging post&lt;/li&gt;
&lt;li&gt;Outputs are generated in multiple formats&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Agent.ai Experience
&lt;/h2&gt;

&lt;p&gt;Building this agent was an exploration of AI's collaborative potential. The most exciting aspect was watching different specialized agents seamlessly work together to create a comprehensive solution.&lt;/p&gt;

&lt;p&gt;Delightful moments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Seeing how different agents complement each other&lt;/li&gt;
&lt;li&gt;Witnessing the transformation of a simple profile ID into multi-format professional content&lt;/li&gt;
&lt;li&gt;Realizing the power of agent assembly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Challenges included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensuring smooth communication between agents&lt;/li&gt;
&lt;li&gt;Maintaining consistency across different generated outputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key breakthrough was understanding that specialized agents, when strategically combined, can create solutions far more powerful than individual agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Professional Development Hub demonstrates how agent assembly can simplify complex professional branding and career development processes. It's not just a tool, but a career acceleration platform powered by AI collaboration.&lt;/p&gt;

&lt;p&gt;Thank you for the opportunity to showcase the potential of assembled agents! 🙏&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>agentaichallenge</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Podcaster: Your Daily News podcast.</title>
      <dc:creator>Bizzi Cole87</dc:creator>
      <pubDate>Sat, 25 Jan 2025 10:50:18 +0000</pubDate>
      <link>https://dev.to/bizzi_cole87_26ec228487d6/podcaster-38dd</link>
      <guid>https://dev.to/bizzi_cole87_26ec228487d6/podcaster-38dd</guid>
      <description>&lt;h1&gt;
  
  
  Daily News Podcast 🎙️📰
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://agent.ai" rel="noopener noreferrer"&gt;Agent.ai&lt;/a&gt; Challenge: Productivity-Pro Agent&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I created an AI-powered Daily News Podcast agent that automatically generates a concise, up-to-date audio podcast on any global topic within minutes. The agent transforms the complex world of current events into an easily digestible audio experience, making staying informed effortless and convenient.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmuj113ky9jlwkq413mgf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmuj113ky9jlwkq413mgf.png" alt="Image description" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;div&gt;
  &lt;iframe src="https://loom.com/embed/05867f6f02674830aec731ad5311c59a"&gt;
  &lt;/iframe&gt;
&lt;/div&gt;


&lt;p&gt;Key features of the Daily News Podcast agent include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic topic selection based on user input&lt;/li&gt;
&lt;li&gt;Real-time news aggregation from Google News&lt;/li&gt;
&lt;li&gt;AI-powered script generation using the latest news data&lt;/li&gt;
&lt;li&gt;Automatic audio podcast creation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why did I build this? In our fast-paced world, staying informed is crucial, but finding time to read extensive news articles can be challenging. This agent provides a quick, accessible way to consume current events during commutes, workouts, or any spare moment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;You can experience the Daily News Podcast agent here: &lt;a href="https://agent.ai/agent/dailynewspodcast" rel="noopener noreferrer"&gt;https://agent.ai/agent/dailynewspodcast&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflow Breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User selects or inputs a topic (default: "Rapidly changing global alliances")&lt;/li&gt;
&lt;li&gt;Agent retrieves latest Google News results for the topic&lt;/li&gt;
&lt;li&gt;GPT-4o generates a professional podcast script from the news data&lt;/li&gt;
&lt;li&gt;Audio is generated from the script&lt;/li&gt;
&lt;li&gt;Podcast is output in an automated format&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Agent.ai Experience
&lt;/h2&gt;

&lt;p&gt;Building this agent on the Agent.ai platform was an exciting journey of creativity and technological exploration. The platform's intuitive interface made complex automation feel surprisingly straightforward.&lt;/p&gt;

&lt;p&gt;The most delightful moments were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Seeing the agent seamlessly pull real-time news&lt;/li&gt;
&lt;li&gt;Watching an AI transform raw news data into a coherent podcast script&lt;/li&gt;
&lt;li&gt;Hearing the generated audio podcast come to life&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Challenges included fine-tuning the script generation to maintain a natural, engaging tone and ensuring the audio output sounds conversational.&lt;/p&gt;

&lt;p&gt;The key to success was breaking down the podcast generation into clear, modular steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Input gathering&lt;/li&gt;
&lt;li&gt;News retrieval&lt;/li&gt;
&lt;li&gt;Script generation&lt;/li&gt;
&lt;li&gt;Audio creation&lt;/li&gt;
&lt;li&gt;Output formatting&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Daily News Podcast agent represents a small but meaningful step towards making information consumption more accessible, personalized, and convenient. It showcases how AI can transform how we interact with news and stay informed.&lt;/p&gt;

&lt;p&gt;Thank you for the opportunity to participate in this challenge! 🙏&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>agentaichallenge</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>check out my new app. Leave a like if you love it</title>
      <dc:creator>Bizzi Cole87</dc:creator>
      <pubDate>Tue, 21 Jan 2025 14:43:43 +0000</pubDate>
      <link>https://dev.to/bizzi_cole87_26ec228487d6/check-out-my-new-app-leave-a-like-if-you-love-it-4pbf</link>
      <guid>https://dev.to/bizzi_cole87_26ec228487d6/check-out-my-new-app-leave-a-like-if-you-love-it-4pbf</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/bizzi_cole87_26ec228487d6" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2687687%2Fbd9673ad-be06-4578-aa51-60cb0a7320eb.png" alt="bizzi_cole87_26ec228487d6"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/bizzi_cole87_26ec228487d6/habits-a-modern-habit-tracking-application-3cea" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Habits: A Modern Habit Tracking Application&lt;/h2&gt;
      &lt;h3&gt;Bizzi Cole87 ・ Jan 16&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#devchallenge&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#githubchallenge&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#webdev&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>webdev</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Habits: A Modern Habit Tracking Application</title>
      <dc:creator>Bizzi Cole87</dc:creator>
      <pubDate>Thu, 16 Jan 2025 10:00:05 +0000</pubDate>
      <link>https://dev.to/bizzi_cole87_26ec228487d6/habits-a-modern-habit-tracking-application-3cea</link>
      <guid>https://dev.to/bizzi_cole87_26ec228487d6/habits-a-modern-habit-tracking-application-3cea</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github"&gt;GitHub Copilot Challenge&lt;/a&gt;: New Beginnings&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I created &lt;a href="https://habits-ivory.vercel.app/" rel="noopener noreferrer"&gt;"Habits"&lt;/a&gt; a minimalist yet powerful habit tracking application that helps users build and maintain positive habits throughout the year. The application features a clean, intuitive interface that allows users to:&lt;/p&gt;

&lt;p&gt;-Track multiple habits across a 52-week timeline&lt;br&gt;
-Break down habits into actionable subtasks with progress tracking&lt;br&gt;
-Celebrate completion with a delightful confetti animation&lt;br&gt;
-Customize habits with different colors for visual organization&lt;br&gt;
-Mark daily completions with an elegant checkbox system&lt;br&gt;
-Navigate through weeks effortlessly&lt;br&gt;
-Add and remove habits dynamically&lt;br&gt;
-Store progress persistently using Supabase&lt;/p&gt;

&lt;p&gt;The project uses React with TypeScript, integrating modern UI components from shadcn/ui, and features responsive design principles for a seamless experience across devices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1ssz739fgq7ychumdyp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1ssz739fgq7ychumdyp.png" alt="A user finishing the tasks for gym" width="800" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Repo
&lt;/h2&gt;

&lt;p&gt;Check out the source code on GitHub: &lt;a href="https://github.com/greatsage-raphael/habits" rel="noopener noreferrer"&gt;https://github.com/greatsage-raphael/habits&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How GitHub Copilot Made It All Possible
&lt;/h2&gt;

&lt;p&gt;In this project, GitHub Copilot was more than just a tool. It was like having a pair programmer with me throughout the day. Here's how Copilot made a difference:&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Writing &amp;amp; Suggestions
&lt;/h3&gt;

&lt;p&gt;Copilot was instrumental in accelerating development by suggesting smart, concise code snippets. Some key areas where it excelled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Type definitions for Habit and HabitLog interfaces&lt;/li&gt;
&lt;li&gt;State management setup with useState hooks&lt;/li&gt;
&lt;li&gt;Database integration with Supabase&lt;/li&gt;
&lt;li&gt;Complex UI grid layouts with Tailwind CSS&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Debugging Assistance
&lt;/h3&gt;

&lt;p&gt;Copilot helped identify and resolve several potential issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suggested proper error handling in database operations&lt;/li&gt;
&lt;li&gt;Helped implement proper TypeScript types&lt;/li&gt;
&lt;li&gt;Provided solutions for state management edge cases&lt;/li&gt;
&lt;li&gt;Assisted with component lifecycle management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Faster Development
&lt;/h3&gt;

&lt;p&gt;Thanks to Copilot's suggestions, I was able to develop the website quickly and efficiently. Some notable time-savings came from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto-completing repetitive UI patterns&lt;/li&gt;
&lt;li&gt;Generating consistent styling patterns&lt;/li&gt;
&lt;li&gt;Suggesting optimal data structures&lt;/li&gt;
&lt;li&gt;Creating efficient database queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Inline Suggestions ⚡
&lt;/h3&gt;

&lt;p&gt;Copilot provided valuable inline suggestions for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Error handling patterns in database operations&lt;/li&gt;
&lt;li&gt;Consistent variable naming conventions&lt;/li&gt;
&lt;li&gt;Type definitions and interfaces&lt;/li&gt;
&lt;li&gt;UI component structure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Model Switching 🔄
&lt;/h3&gt;

&lt;p&gt;I leveraged different AI models for specific tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code Completion: GitHub Copilot&lt;/li&gt;
&lt;li&gt;Documentation: Claude&lt;/li&gt;
&lt;li&gt;Debugging: GPT-4&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common Prompts Used 🎯
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Function implementation
/implement habit toggle logic
/suggest state management pattern
/optimize database queries

# UI Components
/create responsive grid layout
/implement week navigation
/design habit row component
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Code Edits ✏️
&lt;/h3&gt;

&lt;p&gt;Copilot helped with several important code improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Refactored habit management functions&lt;/li&gt;
&lt;li&gt;Added proper TypeScript types&lt;/li&gt;
&lt;li&gt;Improved error handling&lt;/li&gt;
&lt;li&gt;Enhanced component documentation&lt;/li&gt;
&lt;li&gt;Optimized database queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  GitHub Models
&lt;/h2&gt;

&lt;p&gt;The project extensively used GitHub Copilot's code completion and suggestions. The model was particularly helpful in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generating TypeScript interfaces&lt;/li&gt;
&lt;li&gt;Suggesting optimal React hooks usage&lt;/li&gt;
&lt;li&gt;Creating efficient database queries&lt;/li&gt;
&lt;li&gt;Implementing proper error handling&lt;/li&gt;
&lt;li&gt;Structuring components effectively&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technical Implementation
&lt;/h2&gt;

&lt;p&gt;The application is built using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Next.js with TypeScript for the frontend&lt;/li&gt;
&lt;li&gt;Supabase for data persistence&lt;/li&gt;
&lt;li&gt;shadcn/ui for UI components&lt;/li&gt;
&lt;li&gt;Tailwind CSS for styling&lt;/li&gt;
&lt;li&gt;Clerk for user authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time habit tracking&lt;/li&gt;
&lt;li&gt;Persistent storage&lt;/li&gt;
&lt;li&gt;Responsive design&lt;/li&gt;
&lt;li&gt;User authentication&lt;/li&gt;
&lt;li&gt;Custom color coding for habits&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building New Beginnings with GitHub Copilot demonstrated the power of AI-assisted development. The tool not only accelerated the development process but also helped maintain high code quality and consistency throughout the project.&lt;/p&gt;

&lt;p&gt;The habit tracker serves its purpose of helping users build positive habits while showcasing modern web development practices. The combination of TypeScript, React, and Supabase, aided by Copilot's suggestions, resulted in a robust and maintainable application.&lt;/p&gt;

&lt;p&gt;Future enhancements could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data visualization for habit streaks&lt;/li&gt;
&lt;li&gt;Mobile applications&lt;/li&gt;
&lt;li&gt;Social features for accountability&lt;/li&gt;
&lt;li&gt;Custom reminder systems&lt;/li&gt;
&lt;li&gt;Progress sharing capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project showcases how AI tools like GitHub Copilot can significantly enhance developer productivity while maintaining code quality and best practices.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>webdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>Evolution By Sound</title>
      <dc:creator>Bizzi Cole87</dc:creator>
      <pubDate>Wed, 15 Jan 2025 09:46:53 +0000</pubDate>
      <link>https://dev.to/bizzi_cole87_26ec228487d6/evolution-by-sound-453f</link>
      <guid>https://dev.to/bizzi_cole87_26ec228487d6/evolution-by-sound-453f</guid>
      <description>&lt;h2&gt;
  
  
  Evolution by Sound: Transitions and Transformations
&lt;/h2&gt;

&lt;p&gt;This project, &lt;a href="https://evolution-by-sound.vercel.app/" rel="noopener noreferrer"&gt;evolution by sound&lt;/a&gt;, is a dynamic web application that allows users to upload audio files and experience their music in a new visual dimension through an interactive and visually captivating audio spectrum rendered with Three.js. It combines creative coding, modern web development, and AI assistance to deliver an engaging user experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Evolution by Sound&lt;/strong&gt; leverages Three.js and React to create a dynamic audio visualization application. Users can upload an audio file, and the app analyzes the audio's frequency data to drive a real-time, interactive visual experience. Customization options for colors, intensity, and animation speed further enhance the user’s connection to the music.&lt;/p&gt;




&lt;h2&gt;
  
  
  Inspiration
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Abiogenesis&lt;/strong&gt;: The origin of life from non-living matter.&lt;/p&gt;

&lt;p&gt;Some studies suggest that acoustic forces may have played a role in prebiotic chemistry and the organization of materials that could have led to the first protocells. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Acoustic waves can create small droplets and concentrate molecules, potentially helping to form membrane-like structures.&lt;/li&gt;
&lt;li&gt;Sound waves can drive chemical reactions through cavitation (the formation and collapse of bubbles), which creates localized areas of high temperature and pressure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This inspired the concept of &lt;strong&gt;Evolution by Sound&lt;/strong&gt;, where the initially inactive, slow-moving patterns transform into an interactive audio spectrum visualization reminiscent of abiogenesis through sound. Once a user uploads a song, the visualizations evolve dynamically, symbolizing life emerging through the transformative power of sound.&lt;/p&gt;




&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Interactive Audio Spectrum Visualization&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A real-time visual representation of the uploaded audio, using Three.js shaders.&lt;/li&gt;
&lt;li&gt;Frequency data is analyzed and mapped to create an immersive visual experience.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;User Customization&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RGB sliders allow users to adjust the color palette.&lt;/li&gt;
&lt;li&gt;Intensity and speed controls provide further customization for the visualization.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Audio Upload and Playback&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users can upload an audio file, which is processed via an API to provide playback and analysis.&lt;/li&gt;
&lt;li&gt;A built-in audio player allows users to control playback seamlessly.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Clean and Responsive UI&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Components like &lt;code&gt;Button&lt;/code&gt; and &lt;code&gt;Slider&lt;/code&gt; make interactions intuitive.&lt;/li&gt;
&lt;li&gt;The app is optimized for responsiveness and usability across devices.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Live app: &lt;a href="https://evolution-by-sound.vercel.app/" rel="noopener noreferrer"&gt;Evolution by Sound&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Screenshots/GIFs
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkw0vnceji1lwo0xam29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkw0vnceji1lwo0xam29.png" alt="Screenshot 1" width="800" height="397"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6gb8z6ppyk3c4heeshz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6gb8z6ppyk3c4heeshz7.png" alt="Screenshot 2" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9pzpjtlz5x8tgu3c2hx.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj9pzpjtlz5x8tgu3c2hx.gif" alt="Gif 1" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Repo
&lt;/h2&gt;

&lt;p&gt;GitHub Repository: &lt;a href="https://github.com/greatsage-raphael/evolutionBySound" rel="noopener noreferrer"&gt;https://github.com/greatsage-raphael/evolutionBySound&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Copilot Experience
&lt;/h2&gt;

&lt;p&gt;This was my first time using Three.js, and GitHub Copilot played a pivotal role in developing this project. From conceptualization to execution, Copilot assisted in:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Component Development&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copilot generated much of the foundational functionality for both the &lt;code&gt;Cell&lt;/code&gt; and &lt;code&gt;InteractiveMusicWave&lt;/code&gt; components based on my prompts.&lt;/li&gt;
&lt;li&gt;By providing clear context and intent, I could rely on Copilot’s autocomplete to implement Three.js logic, which would have otherwise required significant manual learning.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Iterative Improvements&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copilot offered suggestions for optimizing shader logic and connecting audio analysis to the visuals.&lt;/li&gt;
&lt;li&gt;Edits and adjustments to prompts allowed me to refine functionality without needing to switch contexts frequently.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Learning and Experimentation&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copilot served as a guide while exploring Three.js, offering concise examples and snippets I could adapt for my use case.&lt;/li&gt;
&lt;li&gt;I used the chat feature and model switcher to refine more complex interactions, such as connecting audio data to shader uniforms.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  GitHub Models
&lt;/h2&gt;

&lt;p&gt;I utilized GitHub Copilot’s autocomplete and context-aware coding suggestions throughout the development process. While this submission didn’t directly involve prototyping LLM capabilities, it’s evident that Copilot’s integration into the workflow streamlined my progress and enhanced productivity.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Working on &lt;strong&gt;Evolution by Sound&lt;/strong&gt; has been an enriching experience that showcased the creative possibilities of combining AI, music, and visuals. GitHub Copilot significantly lowered the barrier to entry for using Three.js and enabled me to focus on creativity rather than being bogged down by the technical details.&lt;/p&gt;

&lt;p&gt;This project highlights how AI tools can empower developers to explore new domains, even with limited prior knowledge. I look forward to expanding this app further, perhaps by adding support for live audio streams or multiplayer audio-visual experiences.&lt;/p&gt;




&lt;p&gt;This project was developed independently.*&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>webdev</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
