<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ryven</title>
    <description>The latest articles on DEV Community by Ryven (@ryven).</description>
    <link>https://dev.to/ryven</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3824450%2Fb8cb52e6-41d0-4b91-9d8e-ad5523be3e90.png</url>
      <title>DEV Community: Ryven</title>
      <link>https://dev.to/ryven</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ryven"/>
    <language>en</language>
    <item>
      <title>Turn your impressions into software</title>
      <dc:creator>Ryven</dc:creator>
      <pubDate>Mon, 16 Mar 2026 16:06:52 +0000</pubDate>
      <link>https://dev.to/ryven/turn-your-impressions-into-software-1nfl</link>
      <guid>https://dev.to/ryven/turn-your-impressions-into-software-1nfl</guid>
      <description>&lt;h1&gt;
  
  
  I Stopped Typing Text Prompts and Started Talking and Sketching to Code Editor
&lt;/h1&gt;

&lt;p&gt;Right now, the hottest way to build an app with AI is... typing.&lt;/p&gt;

&lt;p&gt;You open a chat window. Write a five-paragraph essay describing what you want. Hit enter. Pray.&lt;/p&gt;

&lt;p&gt;We went from writing code to writing &lt;em&gt;about&lt;/em&gt; code.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Monet&lt;/strong&gt;, a real-time canvas where you &lt;strong&gt;talk&lt;/strong&gt; and &lt;strong&gt;draw&lt;/strong&gt;, and it builds actual, working React apps live in front of you. No typing. No prompts. Just your voice and a sketch.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This post was created as part of my participation in the &lt;a href="https://geminiliveagentchallenge.devpost.com/" rel="noopener noreferrer"&gt;Gemini Live Agent Challenge&lt;/a&gt;. #GeminiLiveAgentChallenge&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The "Wait, What?" Demo
&lt;/h2&gt;

&lt;p&gt;Before I explain anything, let me just show you.&lt;/p&gt;

&lt;p&gt;My nephew's birthday was coming up. He likes dinosaurs. Instead of buying him a book, I opened Monet and said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Hey Monet, let's build an interactive storybook about a little dinosaur named Boba who goes on an adventure to find a golden egg."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then I drew a rough dinosaur on the canvas.&lt;/p&gt;

&lt;p&gt;Monet generated a polished cartoon character from my terrible sketch, built a multi-page storybook UI in React, and I clicked through the pages, all while having a voice conversation about what to change next.&lt;/p&gt;

&lt;p&gt;One minute. Done.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7r38ea3ggztfirrh5i49.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7r38ea3ggztfirrh5i49.png" alt=" " width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then my friend said he was bored at class, so I made him a space shooter game.&lt;/p&gt;

&lt;p&gt;Drew a spaceship. Said "make the aliens zigzag." Interrupted Monet mid-sentence to add explosions.&lt;/p&gt;

&lt;p&gt;Because explosions are essential.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5pyacc733gmj42ztlv79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5pyacc733gmj42ztlv79.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Playable. In the browser. Built from a conversation and some doodles.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Voice + Sketch?
&lt;/h2&gt;

&lt;p&gt;Text prompts are &lt;em&gt;lossy&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Try describing a layout in words:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Put the image on the left, the text on the right, with a card below that spans the full width, but not on mobile where it should stack."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By the time you've typed that, you could've just... drawn it. In two seconds. With a squiggly line.&lt;/p&gt;

&lt;p&gt;And voice? Voice is how humans naturally explain things. You don't type instructions to a colleague at a whiteboard. You &lt;em&gt;talk&lt;/em&gt; and &lt;em&gt;point&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Monet combines both. You speak your ideas while sketching on a canvas, and the AI sees everything at once: your voice, your drawings, your reference images.&lt;/p&gt;

&lt;p&gt;Multimodal context turns out to be way more powerful than any single input alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fan94zhuyp20y008rwf9c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fan94zhuyp20y008rwf9c.png" alt=" " width="800" height="729"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent 1: The Orchestrator
&lt;/h3&gt;

&lt;p&gt;The brain. Runs on &lt;strong&gt;Gemini Live 2.5 Flash&lt;/strong&gt; with native audio, using BIDI streaming for real-time voice I/O.&lt;/p&gt;

&lt;p&gt;It listens to you speak, sees your canvas, processes uploaded images, and decides what to do. It follows a &lt;strong&gt;plan-then-approve&lt;/strong&gt; workflow: it always tells you what it's about to build and waits for your "go ahead."&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent 2: The Code Builder
&lt;/h3&gt;

&lt;p&gt;Powered by &lt;strong&gt;Gemini 3 Flash&lt;/strong&gt;. Generates and edits React + TypeScript + Tailwind files using ADK tool calls: &lt;code&gt;write_file&lt;/code&gt;, &lt;code&gt;edit_file&lt;/code&gt;, &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;list_files&lt;/code&gt;, &lt;code&gt;delete_file&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Has a fast mode using &lt;strong&gt;Gemini 3.1 Flash Lite&lt;/strong&gt; for simpler changes. Speed matters when someone is literally watching.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent 3: The Image Artist
&lt;/h3&gt;

&lt;p&gt;Draw rough. Get polished. The Image Agent (running &lt;strong&gt;Gemini 3.1 Flash Image&lt;/strong&gt;) treats your sketch as a &lt;em&gt;compositional guide&lt;/em&gt;, not a literal blueprint. Your terrible stick figure becomes a beautiful illustration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Streaming Tools: The Secret Sauce
&lt;/h2&gt;

&lt;p&gt;Nobody warns you about this when building voice agents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool calls block the conversation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agent calls a tool. Voice goes silent. User stares at a spinner. 10-20 seconds. In a voice-first experience, that silence is &lt;em&gt;death&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Our fix: &lt;strong&gt;streaming tools&lt;/strong&gt;. The orchestrator delegates to sub-agents, but the conversation keeps going. Monet narrates what it's doing, acknowledges your input, takes new instructions, all while code generates in the background.&lt;/p&gt;

&lt;p&gt;This is the difference between a collaborator and a chatbot.&lt;/p&gt;

&lt;p&gt;![IMAGE: Diagram showing the streaming tool flow. Orchestrator speaks to user while Code Agent generates in parallel. Caption: "The conversation never stops."]&lt;/p&gt;




&lt;h2&gt;
  
  
  The Canvas is the Interface
&lt;/h2&gt;

&lt;p&gt;We use &lt;strong&gt;tldraw&lt;/strong&gt; for the freehand canvas. Blue pen annotations become spatial context for the AI.&lt;/p&gt;

&lt;p&gt;This unlocks interactions that are impossible with text:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Circle&lt;/strong&gt; an element and say "make this bigger"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Draw&lt;/strong&gt; a rough 3-column layout and say "put cards here"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sketch&lt;/strong&gt; a rough scene and say "generate this as an image"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drop in&lt;/strong&gt; a screenshot and say "make it look like this"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Vague gestures become precise spatial context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwx2tmailxt05vuwjv1k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwx2tmailxt05vuwjv1k.png" alt=" " width="800" height="494"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Things That Broke
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Voice UX is unforgiving.&lt;/strong&gt; Text prompts let you edit and rephrase. Voice is real-time. &lt;em&gt;"Make it, uh, like... bigger? The header part"&lt;/em&gt; needs to just work. The plan-then-approve workflow saved us.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Silence during generation.&lt;/strong&gt; Streaming tools fixed it, but required rethinking the entire tool execution model. The default "call tool, wait, resume" pattern doesn't work for voice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multimodal state coordination.&lt;/strong&gt; Voice, canvas, and images arrive simultaneously. Getting fresh canvas state to the code agent at execution time required a custom live state registry outside ADK's default state management.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Sketching is underrated.&lt;/strong&gt; The moment you circle something and say "change this," you realize text was never the right interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multimodal &amp;gt; sum of parts.&lt;/strong&gt; Voice alone is vague. Canvas alone is silent. Together, the AI gets far richer understanding than either alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ADK handles the plumbing.&lt;/strong&gt; &lt;code&gt;Runner&lt;/code&gt; and &lt;code&gt;LiveRequestQueue&lt;/code&gt; managed concurrent tools, session state, and streaming. I focused on the product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming is non-negotiable for voice.&lt;/strong&gt; BIDI streaming + binary WebSocket frames keep latency low enough for natural conversation. Any buffering breaks the experience.&lt;/p&gt;




&lt;p&gt;If you're building voice-first AI, I hope this gives you useful patterns. The biggest lesson?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stop thinking in text boxes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2rxpubqrml5g2y2f3jr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2rxpubqrml5g2y2f3jr.png" alt="Turn your impressions into software" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built for the &lt;a href="https://geminiliveagentchallenge.devpost.com/" rel="noopener noreferrer"&gt;Gemini Live Agent Challenge&lt;/a&gt;. #GeminiLiveAgentChallenge&lt;/em&gt;&lt;/p&gt;

</description>
      <category>geminiliveagentchallenge</category>
      <category>vibecoding</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
