<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: S M Tahosin</title>
    <description>The latest articles on DEV Community by S M Tahosin (@tahosin).</description>
    <link>https://dev.to/tahosin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3886453%2F0f012a95-ad46-4c17-97e8-125ec8b4978d.png</url>
      <title>DEV Community: S M Tahosin</title>
      <link>https://dev.to/tahosin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tahosin"/>
    <language>en</language>
    <item>
      <title>The Most Underrated Announcement from Google I/O 2026 Was Buried in a 90-Second Demo</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Thu, 21 May 2026 20:22:22 +0000</pubDate>
      <link>https://dev.to/tahosin/the-most-underrated-announcement-from-google-io-2026-was-buried-in-a-90-second-demo-550</link>
      <guid>https://dev.to/tahosin/the-most-underrated-announcement-from-google-io-2026-was-buried-in-a-90-second-demo-550</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I watched the Google I/O 2026 keynote twice.&lt;/p&gt;

&lt;p&gt;First time, I got swept up in the shiny stuff. Gemini 3.5 Flash benchmarks. Veo 3 generating videos that look disturbingly real. Gemini Omni doing that multimodal physics thing. Cool. Expected. The usual I/O sugar rush that gets 50,000 retweets and fades by Thursday.&lt;/p&gt;

&lt;p&gt;Second time through, I caught something different.&lt;/p&gt;

&lt;p&gt;About 40 minutes into the developer keynote, sandwiched between the Jules GA announcement and a Stitch demo, there was maybe 90 seconds on something called the &lt;strong&gt;Managed Agents API&lt;/strong&gt;. The presenter dropped one line that made me hit pause and rewind.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Deploy an autonomous agent that reasons, writes code, browses the web, and executes in a secure sandbox. One API call."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I closed every other tab. Pulled up the docs. Started writing code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 19-Day Problem
&lt;/h2&gt;

&lt;p&gt;Here's some context. If you've tried building anything with AI agents in the past year, you know the drill. And by "drill" I mean "weeks of suffering."&lt;/p&gt;

&lt;p&gt;Say you want an agent that takes a GitHub issue, reads the codebase, writes a fix, runs tests, and opens a PR. Sounds straightforward, right? In reality, you're wiring up five services, spinning up sandboxed containers, managing auth, building tool-call routing, writing health checks, and setting up network policies so your agent doesn't accidentally nuke production at 3am on a Saturday.&lt;/p&gt;

&lt;p&gt;Last month I built an internal bot that triages support tickets. Took three weeks. The actual AI logic? One day. The other 19 days were pure infrastructure. Docker config. Sandbox isolation with gVisor. Network policies. Timeout handling. Health checks. Retry logic.&lt;/p&gt;

&lt;p&gt;Nineteen days of plumbing. One day of thinking.&lt;/p&gt;

&lt;p&gt;That ratio is broken. And this API just fixed it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Weeks to Eleven Lines
&lt;/h2&gt;

&lt;p&gt;I took that same support ticket bot and rewired it on the Managed Agents API. Not a demo version. The same bot. Same capabilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;interaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;antigravity-preview-05-2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a support ticket triage agent. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the following ticket, classify its severity, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;identify the affected component from the codebase, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;and draft a response with a proposed fix.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ticket: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eleven lines. No Docker. No Kubernetes. No sandbox config.&lt;/p&gt;

&lt;p&gt;The API spins up a fresh, isolated Linux environment, loads the agent runtime, runs your task, hands back the result, and destroys the sandbox. Done.&lt;/p&gt;

&lt;p&gt;Here's what that looked like in practice:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Old Setup&lt;/th&gt;
&lt;th&gt;Managed Agents API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time to build&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3 weeks&lt;/td&gt;
&lt;td&gt;1 afternoon&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of infra code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~2,400&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of agent logic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~180&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker, gVisor, Redis, nginx&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google-genai&lt;/code&gt; pip package&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance burden&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Container updates, health checks, scaling&lt;/td&gt;
&lt;td&gt;None (Google's problem)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I stared at my screen for a solid minute when it worked. Not because the output was flawless (it wasn't). Because I'd just thrown away three weeks of infrastructure code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Google Actually Built Under the Hood
&lt;/h2&gt;

&lt;p&gt;When you hit &lt;code&gt;interactions.create&lt;/code&gt;, four things happen:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandbox provisioning.&lt;/strong&gt; Google fires up an isolated Linux VM. Fresh filesystem every time. No leftover state from previous runs. Network access is off by default, opt-in only. This alone used to cost me a week of Docker and gVisor wrestling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent harness boots up.&lt;/strong&gt; This is the exact same runtime that powers Jules and the Antigravity desktop app. Not a watered-down version. Same thing. Every improvement Google makes to Jules? Your managed agents get it too.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reasoning loop.&lt;/strong&gt; The agent reads your input, builds a plan, starts executing. Writing files. Running code. Hitting the web if you've turned that on. There's a "critic" layer baked in that catches logic errors before returning output. Think of it like a built-in code reviewer that runs before every response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleanup.&lt;/strong&gt; Interaction finishes, sandbox gets nuked, you get the result plus any files the agent created. Thirty seconds to a few minutes total.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Sandbox Breaks: The Preview Limitations
&lt;/h2&gt;

&lt;p&gt;I'm not going to pretend this is ready for production. Two days of testing surfaced real problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timeout wall.&lt;/strong&gt; I pointed it at a 15,000-line codebase and asked it to refactor one module. Hit the 5-minute ceiling and died. Large, complex tasks choke.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero memory between calls.&lt;/strong&gt; Each interaction gets a clean sandbox. Great for security. Terrible if you need your agent to remember context. You have to manage state yourself, passing the &lt;code&gt;previous_interaction_id&lt;/code&gt; and relevant context back in on every subsequent call. Not hard, but not free either.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "preview" tax.&lt;/strong&gt; Pre-GA. Google says don't feed it sensitive data. Side projects and internal tools? Go for it. Customer data in production? Wait.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing is a black box.&lt;/strong&gt; Free during preview. Nobody knows what this costs at scale. That's a real problem for anyone planning production workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network access is half-baked.&lt;/strong&gt; Your agent can browse the public web. But reaching internal APIs? You need an MCP server as a bridge, which brings back some of that infrastructure overhead. A bit ironic.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Stacks Up Against the Competition
&lt;/h2&gt;

&lt;p&gt;Here's what made me pay attention. Right now, if you want an autonomous agent that executes in a sandbox, your options are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Assistants API&lt;/strong&gt; gives you code execution in a sandbox, but it's tied to OpenAI models, the sandbox is limited (no arbitrary binary execution, no web browsing), and you're paying per-token plus tool-call fees. It's also not truly "deploy an agent" so much as "run a conversation with tools."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic's tool-use&lt;/strong&gt; is powerful for single-turn tool calling, but there's no managed sandbox. You bring your own execution environment. So you're back to the Docker-and-gVisor dance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangGraph Cloud&lt;/strong&gt; gets you agent orchestration, but again, you manage the infrastructure. The execution environment is your problem.&lt;/p&gt;

&lt;p&gt;Google's approach is different. They're saying: give us the instructions, we'll handle the sandbox, the execution, the security, the cleanup. You don't think about infrastructure at all. That's a genuinely new position in this space.&lt;/p&gt;

&lt;p&gt;This is the first time a major cloud provider is treating autonomous agents as serverless compute, not just chat-with-tools.&lt;/p&gt;

&lt;p&gt;The tradeoff? You're locked into Google's ecosystem. The agent runs on Gemini models. If you need Claude or GPT-4 for a specific task, this isn't your tool. But for teams already in the Google stack, the friction drop is real.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Feature That Actually Got Me: Saved Agents
&lt;/h2&gt;

&lt;p&gt;One-shot interactions are cool. But &lt;code&gt;agents.create&lt;/code&gt; is where things get interesting.&lt;/p&gt;

&lt;p&gt;You define an agent with custom instructions, specific tools, MCP connections, and environment settings. Save that whole configuration. Then trigger it by ID from anywhere. Cron job. Webhook. GitHub Action. Another agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticket-triage-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior support engineer. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify tickets by severity. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Always check error logs before suggesting a fix. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Never suggest restarting the service as a first option.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_browse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;environment_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sandbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout_seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Trigger from anywhere
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New ticket: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wired one to our Slack. Someone files a bug, the agent auto-triages, pulls relevant logs, posts analysis in the thread. Forty lines of Python and a webhook.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lambda Moment
&lt;/h2&gt;

&lt;p&gt;Remember 2014? Before Lambda, running code in the cloud meant EC2 instances. Load balancers. Auto-scaling groups. The works.&lt;/p&gt;

&lt;p&gt;Lambda said: give us the function, we handle the rest. People called it a toy. Then it ate the backend world.&lt;/p&gt;

&lt;p&gt;I keep seeing the same pattern. Before this API, building an agent meant managing infrastructure. Now you hand over instructions and Google runs the thing in a sandboxed environment.&lt;/p&gt;

&lt;p&gt;Maybe I'm wrong. Maybe this stays niche. But the parallel keeps nagging at me, and I haven't been able to talk myself out of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Want to Build Next
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;docs drift detector&lt;/strong&gt; that points at a repo, reads the README, runs the code, and flags where documentation and behavior have diverged. Every project has this problem. Nobody fixes it manually.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;dependency changelog reader&lt;/strong&gt; that actually reads changelogs for your deps, understands breaking changes, and tells you which updates are safe to auto-merge and which ones need human review.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;pre-review PR agent&lt;/strong&gt; that reads changes before a human reviewer opens the PR, checks test coverage on modified files, identifies risky diffs, and writes review notes. Like a thorough junior dev who never sleeps.&lt;/p&gt;

&lt;p&gt;All of these would've been multi-week projects before. Now they're afternoon builds. That's the shift. Not what agents can do. But how fast you can ship them.&lt;/p&gt;

&lt;h2&gt;
  
  
  So What Now
&lt;/h2&gt;

&lt;p&gt;Google I/O 2026 had no shortage of headlines. Gemini 3.5 Flash is fast. Veo 3 is wild. Gemini Omni understanding physics makes you wonder what 2027 looks like.&lt;/p&gt;

&lt;p&gt;But this quiet little API is the one that actually changed my Tuesday. It didn't make me go "wow." It made me delete code. And that's usually how the important stuff starts.&lt;/p&gt;

&lt;p&gt;Open the docs. Write eleven lines of Python. See what happens.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? A reaction helps others find it too. Questions about the API or building with it? I'm in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>googleiochallenge</category>
      <category>devchallenge</category>
      <category>discuss</category>
      <category>python</category>
    </item>
    <item>
      <title>Hermes Just Killed OpenClaw (Here's Why)</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Tue, 19 May 2026 13:12:33 +0000</pubDate>
      <link>https://dev.to/tahosin/hermes-just-killed-openclaw-heres-why-4c23</link>
      <guid>https://dev.to/tahosin/hermes-just-killed-openclaw-heres-why-4c23</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I do not think OpenClaw is dead.&lt;/p&gt;

&lt;p&gt;That title is deliberately dramatic because the shift is dramatic. OpenClaw did something important: it made a lot of developers believe that a personal AI assistant could be more than a chat box. It could sit on your machine, connect to your messages, call tools, browse, run commands, and actually move work forward.&lt;/p&gt;

&lt;p&gt;But Hermes Agent changes the question.&lt;/p&gt;

&lt;p&gt;OpenClaw asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if I could run a personal AI assistant on my own devices?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hermes asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if my agent could live on my infrastructure, remember how I work, improve its own procedures, use tools across channels, and become more useful every week?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That second question is why Hermes feels like the next step.&lt;/p&gt;

&lt;p&gt;Not because OpenClaw is bad. OpenClaw is popular for a reason. The official repo describes it as a personal AI assistant that runs on your own devices, answers through the channels you already use, and uses a Gateway as the control plane. That is a strong idea.&lt;/p&gt;

&lt;p&gt;The problem is that the AI agent market is moving from "assistant I operate" to "worker I supervise." Once that happens, the winning system is not the one with the loudest demo. It is the one with the better memory model, execution boundary, skill lifecycle, tool surface, and deployment story.&lt;/p&gt;

&lt;p&gt;That is where Hermes starts to pull ahead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;If I had to explain the difference in one line:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;OpenClaw feels like a local-first assistant. Hermes feels like agent infrastructure that happens to chat.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;p&gt;A real agent has to do more than respond. It needs to run somewhere reliable. It needs to work while I am away. It needs to remember the parts of my environment that matter. It needs to learn repeatable procedures. It needs to make tool use safer, especially when those tools touch files, browsers, credentials, APIs, and servers.&lt;/p&gt;

&lt;p&gt;OpenClaw helped prove the demand.&lt;/p&gt;

&lt;p&gt;Hermes is making the operating model more serious.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five claims that matter
&lt;/h2&gt;

&lt;p&gt;The loudest Hermes pitch right now is simple: install it, connect it, give it skills, run it on a server, and let it become your agent.&lt;/p&gt;

&lt;p&gt;That pitch is exciting, but I would not judge Hermes by hype. I would judge it by which claims survive contact with architecture.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Claim&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;th&gt;My read&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"One-command install"&lt;/td&gt;
&lt;td&gt;Agents die when setup is fragile. If the first hour is dependency pain, most people quit.&lt;/td&gt;
&lt;td&gt;Useful, but not the real moat. Setup gets you to day one. Memory and skills decide day thirty.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Run it on a VPS or sandbox"&lt;/td&gt;
&lt;td&gt;A serious agent should not need your personal laptop open all day.&lt;/td&gt;
&lt;td&gt;This is one of Hermes' strongest arguments. Persistent agents belong on persistent infrastructure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Built-in skills"&lt;/td&gt;
&lt;td&gt;Skills turn vague AI behavior into repeatable procedures.&lt;/td&gt;
&lt;td&gt;Strong, especially because Hermes treats skills as something the agent can improve, not just something a user installs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Messaging integrations"&lt;/td&gt;
&lt;td&gt;Telegram, Discord, Slack, WhatsApp, and similar channels make the agent reachable from normal life.&lt;/td&gt;
&lt;td&gt;Important, but only if paired with background sessions. Otherwise it is just another bot in another inbox.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Safer execution"&lt;/td&gt;
&lt;td&gt;Agents touch terminals, files, browsers, APIs, and credentials. That is dangerous by default.&lt;/td&gt;
&lt;td&gt;This is where Hermes feels more mature: command approval, allowlists, Docker, SSH, sandbox backends, and scoped toolsets all matter.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is the lens for the rest of this post.&lt;/p&gt;

&lt;p&gt;I do not care whether Hermes can produce a flashy demo once. Most agent frameworks can do that now.&lt;/p&gt;

&lt;p&gt;I care whether Hermes has the bones for repeated work: memory, procedural learning, sandboxed execution, remote availability, and enough tool scoping to avoid turning convenience into a security incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why OpenClaw won attention first
&lt;/h2&gt;

&lt;p&gt;OpenClaw's strength is obvious from its own README. It is broad, local, channel-heavy, and familiar to developers who want an assistant they can own.&lt;/p&gt;

&lt;p&gt;The official repo highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Microsoft Teams, Matrix, LINE, WeChat, and many more channels&lt;/li&gt;
&lt;li&gt;A local-first Gateway that owns messaging surfaces and routes requests&lt;/li&gt;
&lt;li&gt;First-class tools for browser, files, exec, canvas, cron, sessions, image generation, video generation, TTS, and sub-agents&lt;/li&gt;
&lt;li&gt;Skills based on &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Native onboarding with &lt;code&gt;openclaw onboard&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Companion apps and nodes for macOS, iOS, Android, and headless devices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not small. That is why OpenClaw became a reference point for personal agents.&lt;/p&gt;

&lt;p&gt;It also has a massive community. At the time I checked the GitHub API, OpenClaw had far more stars than Hermes. Popularity alone does not decide technical direction, but it does tell you something: OpenClaw made the category legible.&lt;/p&gt;

&lt;p&gt;For context, I checked the public repos directly: &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;openclaw/openclaw&lt;/a&gt; and &lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;NousResearch/hermes-agent&lt;/a&gt;. OpenClaw has the bigger gravity right now. Hermes has the more interesting agent-runtime thesis.&lt;/p&gt;

&lt;p&gt;The issue is that popularity also brings a harsh spotlight. Once strangers, groups, plugins, browsers, shells, and personal accounts all meet inside one assistant, the security model becomes the product.&lt;/p&gt;

&lt;p&gt;OpenClaw's own security docs are honest about this. The guidance assumes a personal assistant trust boundary: one trusted operator boundary per gateway. It says OpenClaw is not a hostile multi-tenant security boundary for adversarial users sharing one gateway. It also says the product default for trusted single-operator setups allows host execution in the &lt;code&gt;gateway&lt;/code&gt; or &lt;code&gt;node&lt;/code&gt; context unless you tighten it.&lt;/p&gt;

&lt;p&gt;That is not a cheap criticism. It is the tradeoff OpenClaw chose: powerful local assistant first, hardening second.&lt;/p&gt;

&lt;p&gt;Hermes starts from a different center.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hermes is built around compounding
&lt;/h2&gt;

&lt;p&gt;The most important Hermes idea is not Telegram integration. It is not browser automation. It is not even the tool count.&lt;/p&gt;

&lt;p&gt;The key idea is compounding.&lt;/p&gt;

&lt;p&gt;Hermes describes itself as a self-improving agent with a built-in learning loop. Its docs talk about agent-curated memory, autonomous skill creation, skill improvement during use, session search, external memory providers, and user modeling.&lt;/p&gt;

&lt;p&gt;That sounds abstract until you translate it into developer terms:&lt;/p&gt;

&lt;p&gt;If the agent solves a hard workflow today, it should not rediscover that workflow next week.&lt;/p&gt;

&lt;p&gt;That is the difference between a chatbot with tools and an agent that grows.&lt;/p&gt;

&lt;p&gt;Hermes has two memory layers that are easy to reason about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MEMORY.md&lt;/code&gt; for environment facts, project conventions, lessons learned, and workflow notes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;USER.md&lt;/code&gt; for preferences, communication style, expectations, and profile details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are bounded on purpose. Hermes keeps them focused instead of stuffing an infinite pile of text into every prompt. For older conversations, it uses SQLite session storage with FTS5 search and summarization.&lt;/p&gt;

&lt;p&gt;That design feels practical. The always-loaded memory stays small. The deeper history is searchable when needed.&lt;/p&gt;

&lt;p&gt;This is exactly how I want a serious agent to behave. I do not want it to remember everything equally. I want it to remember what changes future behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  The skill system is the real "DNA"
&lt;/h2&gt;

&lt;p&gt;Skills are where Hermes becomes interesting.&lt;/p&gt;

&lt;p&gt;OpenClaw has skills too. Its docs explain that skills are AgentSkills-compatible &lt;code&gt;SKILL.md&lt;/code&gt; folders that teach the agent how to use tools. OpenClaw loads bundled skills, managed/local skills, personal skills, project skills, and workspace skills.&lt;/p&gt;

&lt;p&gt;Hermes takes the same basic idea and pushes it closer to procedural memory.&lt;/p&gt;

&lt;p&gt;The Hermes docs say the agent can create, update, and delete its own skills through &lt;code&gt;skill_manage&lt;/code&gt;. It creates skills after complex successful tasks, when it finds the path through errors, when a user corrects its approach, or when it discovers a non-trivial workflow.&lt;/p&gt;

&lt;p&gt;That is the part that matters.&lt;/p&gt;

&lt;p&gt;Not "skills as a plugin folder."&lt;/p&gt;

&lt;p&gt;Skills as the agent writing down how to be better next time.&lt;/p&gt;

&lt;p&gt;This is the difference between installing extensions and building organizational memory. A good senior developer does not just solve an incident. They improve the runbook. Hermes is trying to make the agent do the same thing.&lt;/p&gt;

&lt;p&gt;And it is not only local skills. Hermes supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Official optional skills&lt;/li&gt;
&lt;li&gt;&lt;code&gt;skills.sh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Well-known skill endpoints&lt;/li&gt;
&lt;li&gt;Direct URL skills&lt;/li&gt;
&lt;li&gt;GitHub skill installs&lt;/li&gt;
&lt;li&gt;Community registries&lt;/li&gt;
&lt;li&gt;External read-only skill directories&lt;/li&gt;
&lt;li&gt;Security scanning and audit commands for installed hub skills&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives Hermes a useful middle ground. It can learn locally, but it can also participate in a broader open skill ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The execution story is stronger
&lt;/h2&gt;

&lt;p&gt;This is where the comparison gets practical.&lt;/p&gt;

&lt;p&gt;An agent that can run commands should make you slightly nervous. That is healthy.&lt;/p&gt;

&lt;p&gt;Hermes treats terminal execution as a configurable backend. Commands can run locally, in Docker, over SSH, in Singularity, in Modal, in Daytona, or in Vercel Sandbox. The docs are clear about the tradeoff:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local is easy, but has no isolation&lt;/li&gt;
&lt;li&gt;Docker gives container isolation&lt;/li&gt;
&lt;li&gt;SSH moves execution to another server&lt;/li&gt;
&lt;li&gt;Modal and Daytona give cloud sandbox options&lt;/li&gt;
&lt;li&gt;Vercel Sandbox gives microVM-style cloud execution with snapshot persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The security page goes further. With Docker, Hermes applies hardened container flags: drop capabilities, no new privileges, PID limits, tmpfs mounts, and explicit resource limits. It also avoids forwarding host environment variables by default.&lt;/p&gt;

&lt;p&gt;That matters for one simple reason:&lt;/p&gt;

&lt;p&gt;The agent should not automatically inherit your entire laptop just because you wanted it to scrape a page or refactor a file.&lt;/p&gt;

&lt;p&gt;OpenClaw can sandbox too. Its README points to Docker, SSH, and OpenShell options, and it recommends sandboxing for non-main sessions. Its security docs are detailed and serious.&lt;/p&gt;

&lt;p&gt;But the default mental model is different.&lt;/p&gt;

&lt;p&gt;OpenClaw is a personal assistant with optional hardening.&lt;/p&gt;

&lt;p&gt;Hermes is an agent runtime where isolated execution is part of the normal deployment conversation.&lt;/p&gt;

&lt;p&gt;That is why I would rather run Hermes on a VPS or cloud sandbox for always-on work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Messaging is not the win. Remote agency is.
&lt;/h2&gt;

&lt;p&gt;Both tools can talk through messaging platforms.&lt;/p&gt;

&lt;p&gt;OpenClaw has a huge channel list. Hermes also supports a wide set: Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, Matrix, Mattermost, Home Assistant, DingTalk, Feishu/Lark, WeCom, Microsoft Teams, and more.&lt;/p&gt;

&lt;p&gt;The interesting Hermes feature is not that you can message it.&lt;/p&gt;

&lt;p&gt;The interesting feature is that messaging becomes a control surface for background work.&lt;/p&gt;

&lt;p&gt;Hermes supports background sessions from messaging platforms. You can start a separate task, keep chatting in the main thread, and receive the result back in the same channel. That is a small feature on paper, but it changes the feel of the system.&lt;/p&gt;

&lt;p&gt;It stops being:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am chatting with a bot.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am dispatching work to an agent that lives somewhere else.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the future I care about.&lt;/p&gt;

&lt;p&gt;I do not want my personal agent trapped inside the laptop I am currently using. I want it on a server, reachable from my phone, able to run a long task, report back, and remember the result.&lt;/p&gt;

&lt;p&gt;Hermes is built for that shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool breadth is now table stakes
&lt;/h2&gt;

&lt;p&gt;There was a time when "this agent can browse the web and run commands" sounded wild.&lt;/p&gt;

&lt;p&gt;That time is over.&lt;/p&gt;

&lt;p&gt;Both OpenClaw and Hermes have serious tool surfaces.&lt;/p&gt;

&lt;p&gt;OpenClaw ships built-in tools for shell execution, code execution, browser control, web search, file I/O, patching, messaging, canvas, nodes, cron, images, music, video, TTS, sessions, and sub-agents.&lt;/p&gt;

&lt;p&gt;Hermes ships a broad registry too: web search, extraction, terminal, file editing, browser automation, vision, image generation, TTS, memory, session search, cron, messaging, delegation, code execution, Home Assistant, MCP tools, RL tools, and more.&lt;/p&gt;

&lt;p&gt;So the question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which one has tools?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The better question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which one makes tools safer, more composable, and easier to scope per situation?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hermes has a clear toolset model. Toolsets can be enabled per session, per platform, or per task. There are platform presets like &lt;code&gt;hermes-cli&lt;/code&gt;, &lt;code&gt;hermes-telegram&lt;/code&gt;, and dynamic MCP toolsets. That gives you a cleaner way to say:&lt;/p&gt;

&lt;p&gt;"This Telegram agent can do X, but not Y."&lt;/p&gt;

&lt;p&gt;For me, that is more important than raw tool count.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hermes vs OpenClaw
&lt;/h2&gt;

&lt;p&gt;Here is my practical comparison.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;th&gt;Hermes Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core identity&lt;/td&gt;
&lt;td&gt;Personal AI assistant&lt;/td&gt;
&lt;td&gt;Self-improving agent runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mental model&lt;/td&gt;
&lt;td&gt;Local-first Gateway assistant&lt;/td&gt;
&lt;td&gt;Persistent worker on your infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup&lt;/td&gt;
&lt;td&gt;CLI onboarding and Gateway daemon&lt;/td&gt;
&lt;td&gt;CLI, Gateway, and multiple runtime backends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Messaging&lt;/td&gt;
&lt;td&gt;Very broad channel coverage&lt;/td&gt;
&lt;td&gt;Channels plus background sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skills&lt;/td&gt;
&lt;td&gt;Skills loaded from many locations&lt;/td&gt;
&lt;td&gt;Skills as procedural memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Workspace and session context&lt;/td&gt;
&lt;td&gt;Curated memory plus session search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tooling&lt;/td&gt;
&lt;td&gt;Broad built-in tools&lt;/td&gt;
&lt;td&gt;Toolsets, MCP, delegation, media, web&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Personal trust boundary, hardening available&lt;/td&gt;
&lt;td&gt;Approval, isolation, env filtering, scoped tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Device or Gateway host&lt;/td&gt;
&lt;td&gt;Local, VPS, Docker, SSH, Modal, Daytona, Vercel Sandbox&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ideal user&lt;/td&gt;
&lt;td&gt;Power user with a device assistant&lt;/td&gt;
&lt;td&gt;Developer building a supervised digital worker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Biggest risk&lt;/td&gt;
&lt;td&gt;Too much power in one assistant boundary&lt;/td&gt;
&lt;td&gt;Newer ecosystem still proving itself&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This table is why I do not read Hermes as "another OpenClaw clone."&lt;/p&gt;

&lt;p&gt;Hermes is competing on a different axis.&lt;/p&gt;

&lt;p&gt;OpenClaw made the assistant powerful.&lt;/p&gt;

&lt;p&gt;Hermes is trying to make the assistant compound.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical playbook
&lt;/h2&gt;

&lt;p&gt;If you are reading this and wondering "okay, but what do I actually try first?", this is the path I would take.&lt;/p&gt;

&lt;p&gt;First, run Hermes somewhere disposable. A local machine is fine for learning, but the interesting path is Docker, SSH, Modal, Daytona, or another sandbox backend. The whole point is to avoid giving an experimental agent unlimited access to your daily machine on day one.&lt;/p&gt;

&lt;p&gt;Then connect one messaging surface, not five. Telegram or Discord is enough. Make sure allowlists or DM pairing are enabled before you give the agent terminal access.&lt;/p&gt;

&lt;p&gt;Then give Hermes one recurring workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/background Research the latest Hermes Agent docs changes, summarize the developer impact, and send me 5 possible DEV post angles.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, watch for the compounding moment. If the workflow takes several tool calls, has a repeatable structure, or needs a correction from you, that is exactly the kind of thing that should become a skill.&lt;/p&gt;

&lt;p&gt;A good first Hermes skill would not be "write blog posts." Too vague.&lt;/p&gt;

&lt;p&gt;A better one would be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;research-release-notes

When given a GitHub repo or docs page:
1. Find the latest release or docs update.
2. Prefer primary sources.
3. Extract concrete changes.
4. Separate confirmed facts from opinion.
5. Produce a DEV-ready outline with links.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is where Hermes becomes more than a chat assistant. You are not just asking it to do a task. You are teaching it a durable way to do that class of task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where OpenClaw still wins
&lt;/h2&gt;

&lt;p&gt;A good comparison should admit the other side.&lt;/p&gt;

&lt;p&gt;OpenClaw still has big advantages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It has enormous attention and community gravity.&lt;/li&gt;
&lt;li&gt;Its channel ecosystem is very broad.&lt;/li&gt;
&lt;li&gt;Its native app and node story is compelling.&lt;/li&gt;
&lt;li&gt;Its local-first assistant feel is easier to explain to non-agent people.&lt;/li&gt;
&lt;li&gt;It has already shaped how people talk about personal AI assistants.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your goal is "I want a personal AI assistant connected to my messaging apps and devices," OpenClaw is still a serious answer.&lt;/p&gt;

&lt;p&gt;But if your goal is "I want an agent that can become operational infrastructure," Hermes is the more interesting answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Hermes wins
&lt;/h2&gt;

&lt;p&gt;Hermes wins because it is opinionated about the hard parts.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. It treats memory as a product surface
&lt;/h3&gt;

&lt;p&gt;Memory is not just chat history. It is a curated behavioral layer. The split between &lt;code&gt;MEMORY.md&lt;/code&gt;, &lt;code&gt;USER.md&lt;/code&gt;, and searchable session history is simple enough to trust and flexible enough to grow.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. It treats skills as learning
&lt;/h3&gt;

&lt;p&gt;The agent can create and update skills after hard tasks. That is the closest thing to compounding engineering knowledge in this category.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. It treats execution location as a first-class choice
&lt;/h3&gt;

&lt;p&gt;Local, Docker, SSH, Modal, Daytona, Vercel Sandbox, Singularity. That is not a footnote. That is the difference between a toy assistant and something you can deploy with intent.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. It treats messaging as dispatch
&lt;/h3&gt;

&lt;p&gt;I can talk to the agent through Telegram or Discord, but the real value is sending background work and getting results back. That makes the chat app a command center, not the product itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. It treats safety as architecture, not a disclaimer
&lt;/h3&gt;

&lt;p&gt;Allowlists, DM pairing, command approval, container isolation, MCP credential filtering, context scanning, env var filtering, and scoped toolsets are not glamorous features. They are the features you need after the first impressive demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger point
&lt;/h2&gt;

&lt;p&gt;The agent space is splitting into two philosophies.&lt;/p&gt;

&lt;p&gt;One philosophy says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Give the user a powerful assistant and let them connect everything.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The other says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Give the user an agent runtime that can be supervised, isolated, taught, remembered, and deployed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;OpenClaw represents the first philosophy extremely well.&lt;/p&gt;

&lt;p&gt;Hermes represents the second.&lt;/p&gt;

&lt;p&gt;That is why I think Hermes is the more important project to study right now.&lt;/p&gt;

&lt;p&gt;OpenClaw proved people want agents with hands.&lt;/p&gt;

&lt;p&gt;Hermes is asking what happens when those hands also get memory, runbooks, safer execution, background work, and a home outside your current laptop.&lt;/p&gt;

&lt;p&gt;That is the jump.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would build with Hermes
&lt;/h2&gt;

&lt;p&gt;If I were turning this into a real project, I would build a developer publishing agent.&lt;/p&gt;

&lt;p&gt;Not a blog spammer. A proper assistant for technical writing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Watch official docs, GitHub releases, and challenge pages.&lt;/li&gt;
&lt;li&gt;Summarize what changed with links to primary sources.&lt;/li&gt;
&lt;li&gt;Keep a memory of my writing preferences and recurring projects.&lt;/li&gt;
&lt;li&gt;Create reusable skills for research, outline creation, source checking, and DEV formatting.&lt;/li&gt;
&lt;li&gt;Draft posts in my style, but keep claims grounded in citations.&lt;/li&gt;
&lt;li&gt;Send drafts to Telegram for review.&lt;/li&gt;
&lt;li&gt;Track comments and suggest follow-up posts based on real discussion.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That would use the Hermes shape well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-running background research&lt;/li&gt;
&lt;li&gt;web extraction&lt;/li&gt;
&lt;li&gt;session search&lt;/li&gt;
&lt;li&gt;persistent memory&lt;/li&gt;
&lt;li&gt;skills that improve over time&lt;/li&gt;
&lt;li&gt;messaging delivery&lt;/li&gt;
&lt;li&gt;scoped tool access&lt;/li&gt;
&lt;li&gt;scheduled tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the kind of workflow where Hermes makes more sense than a one-shot chat assistant.&lt;/p&gt;

&lt;p&gt;The point is not that Hermes can write.&lt;/p&gt;

&lt;p&gt;The point is that Hermes can build a writing operation around memory, tools, and feedback.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;Did Hermes literally kill OpenClaw?&lt;/p&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;OpenClaw is too useful, too popular, and too culturally important to dismiss.&lt;/p&gt;

&lt;p&gt;But Hermes may have killed the idea that a personal agent is only a local assistant with a chat interface.&lt;/p&gt;

&lt;p&gt;That is the real shift.&lt;/p&gt;

&lt;p&gt;The next generation of agents will not be judged only by how many apps they connect to. They will be judged by whether they can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remember the right things&lt;/li&gt;
&lt;li&gt;forget the wrong things&lt;/li&gt;
&lt;li&gt;learn procedures&lt;/li&gt;
&lt;li&gt;run in isolated environments&lt;/li&gt;
&lt;li&gt;work asynchronously&lt;/li&gt;
&lt;li&gt;integrate with open tools&lt;/li&gt;
&lt;li&gt;stay useful after the first demo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By that standard, Hermes is not just another agent.&lt;/p&gt;

&lt;p&gt;It is a strong argument for where agent software is going next.&lt;/p&gt;

&lt;p&gt;That is my real test for any agent framework now:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Does it get more useful because I used it yesterday?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the answer is no, it is still mostly a tool wrapper.&lt;/p&gt;

&lt;p&gt;If the answer is yes, we are finally talking about agent software.&lt;/p&gt;

&lt;p&gt;And yes, that is why the title says it:&lt;/p&gt;

&lt;p&gt;Hermes just killed OpenClaw.&lt;/p&gt;

&lt;p&gt;Not by replacing it overnight.&lt;/p&gt;

&lt;p&gt;By making the category grow up.&lt;/p&gt;

&lt;p&gt;The first thing I would personally validate is not whether Hermes can write a pretty paragraph. It is whether a Docker or SSH-backed Hermes research agent can run for a week, keep useful memory, and avoid turning one bad tool call into a machine-level mess. If you have tried either backend already, I would genuinely like to hear which one felt smoother and where it broke.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;Hermes Agent official docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent GitHub repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/features/tools/" rel="noopener noreferrer"&gt;Hermes tools and toolsets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/features/memory" rel="noopener noreferrer"&gt;Hermes persistent memory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/features/skills/" rel="noopener noreferrer"&gt;Hermes skills system&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/messaging" rel="noopener noreferrer"&gt;Hermes messaging gateway&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/security" rel="noopener noreferrer"&gt;Hermes security model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw GitHub repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openclaw.ai/tools" rel="noopener noreferrer"&gt;OpenClaw tools and plugins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openclaw.ai/skills" rel="noopener noreferrer"&gt;OpenClaw skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openclaw.ai/architecture" rel="noopener noreferrer"&gt;OpenClaw Gateway architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openclaw.ai/security" rel="noopener noreferrer"&gt;OpenClaw security guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What do you think?&lt;/p&gt;

&lt;p&gt;Is Hermes actually the next step after OpenClaw, or is OpenClaw still the better model for personal agents?&lt;/p&gt;

&lt;p&gt;And of the five claims above, which one matters most to you: memory, skills, sandboxing, messaging, or running the agent on real infrastructure?&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>discuss</category>
    </item>
    <item>
      <title>My GitHub Graveyard has 27 dead projects. Here is the brutal truth about why.</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Wed, 13 May 2026 18:32:33 +0000</pubDate>
      <link>https://dev.to/tahosin/my-github-graveyard-has-27-dead-projects-here-is-the-brutal-truth-about-why-52d9</link>
      <guid>https://dev.to/tahosin/my-github-graveyard-has-27-dead-projects-here-is-the-brutal-truth-about-why-52d9</guid>
      <description>&lt;p&gt;I recently opened my GitHub account and filtered by private repositories. I actually counted them: exactly 27 abandoned side projects created over the last 3 years.&lt;/p&gt;

&lt;p&gt;There was a machine-learning habit tracker. There was a Twitter clone for dogs. There was a complex SaaS boilerplate that I spent four weeks configuring before completely giving up on it. Some of them I spent weeks on. One I even bought a domain for.&lt;/p&gt;

&lt;p&gt;Hundreds of hours wasted. Why did they all die before seeing the light of day? It was not a lack of time. It was not a lack of motivation. &lt;/p&gt;

&lt;p&gt;Here is the controversial truth: &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Most developers do not fail because of a lack of skill. They fail because they secretly enjoy the dopamine rush of &lt;em&gt;starting&lt;/em&gt; a new project more than the grind of &lt;em&gt;finishing&lt;/em&gt; it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is the exact pattern that killed my 27 projects, and the rule that finally helped me break the cycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The "Perfect Stack" Trap
&lt;/h3&gt;

&lt;p&gt;As developers, we love shiny new tools. When starting a project, the first instinct is to try that new database everyone is talking about on Twitter, or the latest beta version of a framework.&lt;/p&gt;

&lt;p&gt;I once spent an entire weekend configuring a Next.js app with tRPC, Prisma, and a custom Tailwind design system. By Sunday night, my infrastructure was absolute perfection. But I had zero business logic written. The next day, I lost interest completely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you want to actually finish a project, you have to use boring technology.&lt;/strong&gt; Pick the stack you know best, even if it feels outdated.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Optimizing for Phantom Users
&lt;/h3&gt;

&lt;p&gt;For the dog Twitter clone, I spent three days setting up a complex Redis caching layer. I was terrified the server would crash if a million dogs signed up on day one. &lt;/p&gt;

&lt;p&gt;We love to over-engineer. We worry about how our database will handle massive traffic, so we design complex microservices. But here is the brutal reality:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your biggest threat is not the server crashing. Your biggest threat is that nobody will ever visit your site.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stop building for problems you do not have yet. A simple database query is fine. You can always optimize later when the app actually gets traction.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Feature Creep is a Disease
&lt;/h3&gt;

&lt;p&gt;It starts innocently. You are building a simple to-do list, and you think, "It would be cool if users could upload a custom profile picture." Suddenly, you are reading AWS S3 documentation for five hours instead of finishing the core task logic.&lt;/p&gt;

&lt;p&gt;Features are fun to dream about, but they are heavy to build. &lt;strong&gt;Every extra button you add delays the launch.&lt;/strong&gt; The best way to finish a project is to ruthlessly cut features until you have the absolute minimum viable product. If it does not solve the core problem, it gets deleted.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The Fear of Shipping
&lt;/h3&gt;

&lt;p&gt;Writing code is safe. Your VS Code editor does not judge you. But launching a project means real people might see it, find bugs, or worse—ignore it completely. &lt;/p&gt;

&lt;p&gt;A lot of side projects are abandoned right at the 90 percent mark because the developer is secretly afraid of hitting the deploy button. We hide behind the excuse of "it just needs a little more polish." &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A buggy, ugly app that is live on the internet is infinitely more valuable than a perfect app sitting on localhost.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The 48-Hour Rule
&lt;/h3&gt;

&lt;p&gt;To break this curse, I made a strict new rule for myself: &lt;strong&gt;I have to launch a working, ugly prototype within 48 hours.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If it takes longer than a single weekend to get the core feature live, the scope is too big. This simple mindset shift is one of the biggest reasons I finally started shipping real apps instead of building graveyards.&lt;/p&gt;

&lt;h3&gt;
  
  
  Over to you
&lt;/h3&gt;

&lt;p&gt;I know I am not the only one with a GitHub graveyard of dead ideas. &lt;/p&gt;

&lt;p&gt;Be honest: &lt;strong&gt;What is the weirdest abandoned side project you have ever started, and what was the &lt;em&gt;real&lt;/em&gt; reason you stopped working on it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let me know in the comments. What is in your graveyard?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>beginners</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Replaced My $500 GPU with a $75 Raspberry Pi: How Gemma 4 Makes Computer Vision 10x Cheaper</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Thu, 07 May 2026 18:35:58 +0000</pubDate>
      <link>https://dev.to/tahosin/i-replaced-my-500-gpu-with-a-75-raspberry-pi-how-gemma-4-makes-computer-vision-10x-cheaper-1gbo</link>
      <guid>https://dev.to/tahosin/i-replaced-my-500-gpu-with-a-75-raspberry-pi-how-gemma-4-makes-computer-vision-10x-cheaper-1gbo</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GemmaVision&lt;/strong&gt; — A complete computer vision pipeline that replaces $500+ GPU setups with a $75 Raspberry Pi 5, powered entirely by Gemma 4's native multimodal capabilities.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Native object detection without YOLO, OpenCV, CUDA, or cloud APIs. Just Gemma 4 multimodal AI running 100% offline on a single-board computer.&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Traditional CV&lt;/th&gt;
&lt;th&gt;Gemma 4 Vision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$500–2000 (GPU + cloud)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$75&lt;/strong&gt; (Raspberry Pi 5)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monthly Bill&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$20–100 cloud fees&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$0&lt;/strong&gt; (runs offline)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2–4 hours of dependency hell&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;20 minutes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code Complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;500–1000 lines&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50 lines&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10+ (OpenCV, CUDA, etc.)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;3&lt;/strong&gt; (torch, transformers, Pillow)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power Draw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150–300W&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.5W&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy (COCO)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~90%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~85%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zero-Shot Detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Requires training&lt;/td&gt;
&lt;td&gt;✅ &lt;strong&gt;Works out of box&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The trade-off:&lt;/strong&gt; 5% accuracy drop for &lt;strong&gt;90% cost reduction&lt;/strong&gt; and &lt;strong&gt;10× simpler setup&lt;/strong&gt;. For home automation, accessibility tools, and hobby robotics, this trade is obvious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🚀 &lt;a href="https://github.com/tahosinx/gemmavision" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt; — Full source code&lt;/li&gt;
&lt;li&gt;🛒 Shopping List — Exact parts to buy&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Problem: Why Computer Vision is Broken for Indie Developers
&lt;/h2&gt;

&lt;p&gt;For two years, I maintained a production computer vision pipeline that looked like every tutorial on the internet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;YOLOv8 → OpenCV preprocessing → CUDA drivers → Cloud API fallback → Custom NMS → Deployment hell
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The reality of traditional CV:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pain Point&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Frequency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloud GPU rental&lt;/td&gt;
&lt;td&gt;$47/month&lt;/td&gt;
&lt;td&gt;Every month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CUDA driver updates&lt;/td&gt;
&lt;td&gt;3-4 hours debugging&lt;/td&gt;
&lt;td&gt;Quarterly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependency conflicts&lt;/td&gt;
&lt;td&gt;2-6 hours resolution&lt;/td&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model retraining&lt;/td&gt;
&lt;td&gt;$50-200 compute&lt;/td&gt;
&lt;td&gt;Per use case&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API rate limits&lt;/td&gt;
&lt;td&gt;Throttled at scale&lt;/td&gt;
&lt;td&gt;Daily&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The monthly bill:&lt;/strong&gt; $47 for cloud GPU + API calls&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The codebase:&lt;/strong&gt; 800 lines of preprocessing, coordinate transforms, and version pinning&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The maintenance:&lt;/strong&gt; Broken every time NVIDIA drivers updated&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The latency:&lt;/strong&gt; 2–5 seconds end-to-end (when it worked)&lt;/p&gt;

&lt;p&gt;It worked. But it felt… heavy. Like I was managing infrastructure instead of building products. The cognitive overhead of keeping CUDA, cuDNN, PyTorch, and OpenCV versions in sync was exhausting. Every &lt;code&gt;apt update&lt;/code&gt; on the server felt like a gamble.&lt;/p&gt;

&lt;p&gt;The frustration peaked in March 2026. I was debugging a CUDA version mismatch at 2 AM for a side project that was supposed to be "simple object detection." I asked myself: &lt;em&gt;Why does computer vision require so much ceremony? Why does a "hello world" object detector need 10 dependencies and a $500 GPU?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That night, I started researching alternatives. What I found changed everything.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Discovery: Gemma 4's Secret Weapon
&lt;/h2&gt;

&lt;p&gt;Reading the &lt;a href="https://ai.google.dev/gemma/docs/core" rel="noopener noreferrer"&gt;Gemma 4 technical documentation&lt;/a&gt;, I found something buried in the multimodal section that made me stop breathing for a second:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The model can return structured JSON output including &lt;code&gt;box_2d&lt;/code&gt; coordinates for detected objects."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I read it twice. Then I tested it immediately.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Experiment
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The prompt I sent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Detect all objects in this image. Return bounding boxes in JSON format 
with 'box_2d' [y1, x1, y2, x2] and 'label' fields.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The response I got:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"box_2d"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;171&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;245&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;308&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"coffee mug"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"box_2d"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;89&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;420&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;334&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;612&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"laptop"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"box_2d"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;245&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;412&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;780&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"desk chair"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Minimal post-processing.&lt;/strong&gt; Coordinates are normalized to a 1000×1000 grid, so you descale them to your image dimensions — but no NMS, no coordinate transforms, no class-ID mapping. No Non-Maximum Suppression algorithms. No OpenCV &lt;code&gt;cv2.rectangle()&lt;/code&gt; calls. Just… coordinates. Ready to use. Native from the model.&lt;/p&gt;

&lt;p&gt;The realization hit like a truck: &lt;em&gt;A large vision-language model can replace my entire computer vision pipeline.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Changes Everything
&lt;/h3&gt;

&lt;p&gt;Traditional computer vision pipelines are &lt;em&gt;composed&lt;/em&gt; systems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Detection model&lt;/strong&gt; (YOLO) outputs raw tensors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NMS algorithm&lt;/strong&gt; filters overlapping boxes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coordinate transforms&lt;/strong&gt; scale to image dimensions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Label mapping&lt;/strong&gt; converts class IDs to text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualization layer&lt;/strong&gt; draws boxes with OpenCV&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Gemma 4 is a &lt;em&gt;unified&lt;/em&gt; system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One model&lt;/strong&gt; takes image + text prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One output&lt;/strong&gt; contains structured bounding boxes with labels&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architectural simplification isn't just cleaner code — it's a fundamentally different approach to computer vision that eliminates entire categories of bugs and maintenance overhead.&lt;/p&gt;




&lt;h2&gt;
  
  
  The $75 Solution: Building GemmaVision
&lt;/h2&gt;

&lt;p&gt;If Gemma 4 could output bounding boxes natively, I didn't need a GPU server. I needed just enough compute to run an E4B (Effective 4B) parameter model. That compute fits in a $75 single-board computer.&lt;/p&gt;

&lt;p&gt;Enter the &lt;strong&gt;Raspberry Pi 5&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware Shopping List
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Where to Buy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Raspberry Pi 5 (8GB)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$60&lt;/td&gt;
&lt;td&gt;Inference engine&lt;/td&gt;
&lt;td&gt;&lt;a href="https://rpilocator.com" rel="noopener noreferrer"&gt;rpilocator.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Camera Module 3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$15&lt;/td&gt;
&lt;td&gt;Image capture&lt;/td&gt;
&lt;td&gt;&lt;a href="https://adafruit.com" rel="noopener noreferrer"&gt;Adafruit&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Active Cooler&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$5&lt;/td&gt;
&lt;td&gt;Thermal management&lt;/td&gt;
&lt;td&gt;Official Raspberry Pi store&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;64GB microSD (U3)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;td&gt;Model storage&lt;/td&gt;
&lt;td&gt;Any retailer (U3 speed required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;USB-C Power Supply&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$8&lt;/td&gt;
&lt;td&gt;5V 5A PSU&lt;/td&gt;
&lt;td&gt;Included or separate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$90&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Complete system&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Note: Skip the camera, use existing images — total drops to *&lt;/em&gt;$75*&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Software Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│                    GemmaVision Pipeline                     │
├─────────────────────────────────────────────────────────────┤
│  [Camera/PIL Image]                                         │
│         ↓                                                   │
│  [Transformers 4.48+ — AutoProcessor]                       │
│         ↓                                                   │
│  [Gemma 4 E4B-it, 4-bit quantized, 2.1GB]                   │
│         ↓                                                   │
│  [Native JSON: box_2d + label]                              │
│         ↓                                                   │
│  [PIL ImageDraw — Bounding boxes overlay]                   │
└─────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Dependencies: 3.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;torch&lt;/code&gt; — PyTorch (CPU-optimized)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;transformers&lt;/code&gt; — Hugging Face model loading&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Pillow&lt;/code&gt; — Image I/O and drawing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lines of code: ~50.&lt;/strong&gt; Compare that to a YOLOv8 pipeline with preprocessing, NMS, coordinate transforms, and visualization.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance &amp;amp; Evaluation
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What Works / What Breaks: Honest Assessment
&lt;/h2&gt;

&lt;p&gt;I promised honesty. Here's the real-world performance:&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ Works Well
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Common objects&lt;/td&gt;
&lt;td&gt;Coffee mugs, laptops, chairs, phones&lt;/td&gt;
&lt;td&gt;87%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI elements&lt;/td&gt;
&lt;td&gt;Buttons, text inputs, dropdowns, links&lt;/td&gt;
&lt;td&gt;91%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indoor scenes&lt;/td&gt;
&lt;td&gt;Living rooms, kitchens, offices&lt;/td&gt;
&lt;td&gt;84%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Screenshots&lt;/td&gt;
&lt;td&gt;Web interfaces, mobile apps&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documented objects&lt;/td&gt;
&lt;td&gt;Items with clear visual features&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ⚠️ Edge Cases
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Mitigation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Small text at distance&lt;/td&gt;
&lt;td&gt;Poor detection&lt;/td&gt;
&lt;td&gt;Crop or zoom image&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Occluded objects&lt;/td&gt;
&lt;td&gt;Partial detection&lt;/td&gt;
&lt;td&gt;Multiple angles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very dark images&lt;/td&gt;
&lt;td&gt;Missed objects&lt;/td&gt;
&lt;td&gt;Brighten/preprocess&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy images&lt;/td&gt;
&lt;td&gt;False positives&lt;/td&gt;
&lt;td&gt;Confidence threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Abstract art&lt;/td&gt;
&lt;td&gt;Nonsensical labels&lt;/td&gt;
&lt;td&gt;Not recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ❌ Don't Use For
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Application&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;th&gt;Alternative&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Real-time video&lt;/td&gt;
&lt;td&gt;Too slow (8-12s/frame)&lt;/td&gt;
&lt;td&gt;YOLOv8 on GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sub-100ms latency&lt;/td&gt;
&lt;td&gt;Impossible on Pi&lt;/td&gt;
&lt;td&gt;Edge TPU / NVIDIA Jetson&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Industrial precision&lt;/td&gt;
&lt;td&gt;85% isn't enough&lt;/td&gt;
&lt;td&gt;Custom trained YOLO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safety-critical systems&lt;/td&gt;
&lt;td&gt;No hard real-time guarantees&lt;/td&gt;
&lt;td&gt;Certified CV systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tiny objects (&amp;lt; 20px)&lt;/td&gt;
&lt;td&gt;Detection fails&lt;/td&gt;
&lt;td&gt;Higher resolution camera&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Gemma 4 vision excels at &lt;em&gt;general-purpose object detection where latency tolerance is 10+ seconds&lt;/em&gt;. For real-time applications, traditional CV still wins.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Home Automation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Detect if garage door is open/closed
&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;garage.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;garage door&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;open&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nf"&gt;send_notification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Garage door is open!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Accessibility Tool
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Describe scene for visually impaired users
&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;room.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all furniture and obstacles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_spatial_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;speak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# "Coffee table 2 meters ahead, chair to the right"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inventory Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Count items on shelf
&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shelf.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;inventory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;count_by_label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stock: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;inventory&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  UI Testing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Verify all buttons are present in screenshot
&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ui-screenshot.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;buttons and input fields&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Submit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cancel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;missing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_missing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing UI elements: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Head to Head: Gemma 4 vs Traditional CV
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;YOLOv8 + OpenCV&lt;/th&gt;
&lt;th&gt;Gemma 4 on Pi 5&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2–4 hours&lt;/td&gt;
&lt;td&gt;20 minutes&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;500–1000&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$500–2000&lt;/td&gt;
&lt;td&gt;$75–90&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monthly cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$20–100&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power draw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150–300W&lt;/td&gt;
&lt;td&gt;7.5W&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Offline capable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zero-shot capable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Requires training&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50-200ms&lt;/td&gt;
&lt;td&gt;8-12s&lt;/td&gt;
&lt;td&gt;🏆 YOLOv8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy (COCO)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~90%&lt;/td&gt;
&lt;td&gt;~85%&lt;/td&gt;
&lt;td&gt;🏆 YOLOv8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real-time video&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;🏆 YOLOv8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Well documented&lt;/td&gt;
&lt;td&gt;⚠️ Limited&lt;/td&gt;
&lt;td&gt;🏆 YOLOv8&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;When to choose Gemma 4:&lt;/strong&gt; Offline deployment, zero-shot detection, simple setup, low cost, privacy-first.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to choose YOLOv8:&lt;/strong&gt; Real-time video, highest accuracy, custom training, GPU available.&lt;/p&gt;


&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;🚀 &lt;strong&gt;&lt;a href="https://github.com/tahosinx/gemmavision" rel="noopener noreferrer"&gt;GitHub Repository: tahosinx/gemmavision&lt;/a&gt;&lt;/strong&gt; — Full source code, MIT Licensed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick start:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tahosinx/gemmavision.git
&lt;span class="nb"&gt;cd &lt;/span&gt;gemmavision/src
python3 pi-client.py &lt;span class="nt"&gt;--image&lt;/span&gt; test.jpg &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"all objects"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Hardware Setup: 10-Minute Raspberry Pi Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Raspberry Pi 5 (8GB RAM strongly recommended)&lt;/li&gt;
&lt;li&gt;64GB microSD card (U3 speed class)&lt;/li&gt;
&lt;li&gt;Camera Module 3 or USB webcam&lt;/li&gt;
&lt;li&gt;Active cooler (thermal throttling occurs without it)&lt;/li&gt;
&lt;li&gt;Stable internet connection (for initial model download)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step-by-Step Installation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1: System Dependencies&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Update system packages&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;apt full-upgrade &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# Install Python and camera support&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    python3-pip &lt;span class="se"&gt;\&lt;/span&gt;
    python3-venv &lt;span class="se"&gt;\&lt;/span&gt;
    python3-picamera2 &lt;span class="se"&gt;\&lt;/span&gt;
    git &lt;span class="se"&gt;\&lt;/span&gt;
    htop &lt;span class="se"&gt;\&lt;/span&gt;
    libcamera-dev

&lt;span class="c"&gt;# Increase swap (essential for 4GB Pi models)&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;dphys-swapfile swapoff
&lt;span class="nb"&gt;sudo sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'s/CONF_SWAPSIZE=.*/CONF_SWAPSIZE=4096/'&lt;/span&gt; /etc/dphys-swapfile
&lt;span class="nb"&gt;sudo &lt;/span&gt;dphys-swapfile setup
&lt;span class="nb"&gt;sudo &lt;/span&gt;dphys-swapfile swapon
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Python Environment&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create virtual environment&lt;/span&gt;
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv ~/gemmavision-env
&lt;span class="nb"&gt;source&lt;/span&gt; ~/gemmavision-env/bin/activate

&lt;span class="c"&gt;# Install CPU-optimized PyTorch (NO CUDA)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;torch &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://download.pytorch.org/whl/cpu

&lt;span class="c"&gt;# Install transformers and utilities&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;transformers Pillow bitsandbytes accelerate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Download GemmaVision&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tahosinx/gemmavision.git
&lt;span class="nb"&gt;cd &lt;/span&gt;gemmavision/src

&lt;span class="c"&gt;# Optional: Run tests&lt;/span&gt;
python3 test_local.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: First Run (Model Download)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 pi-client.py &lt;span class="nt"&gt;--image&lt;/span&gt; test.jpg &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"all objects"&lt;/span&gt;

&lt;span class="c"&gt;# First run downloads ~2.1GB quantized model&lt;/span&gt;
&lt;span class="c"&gt;# Time: 5-10 minutes depending on internet&lt;/span&gt;
&lt;span class="c"&gt;# Subsequent runs: ~30s (cached)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Camera Configuration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;For Camera Module 3:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable camera interface&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;raspi-config
&lt;span class="c"&gt;# Interface Options → Camera → Enable&lt;/span&gt;

&lt;span class="c"&gt;# Test camera&lt;/span&gt;
libcamera-jpeg &lt;span class="nt"&gt;-o&lt;/span&gt; test.jpg &lt;span class="nt"&gt;-t&lt;/span&gt; 1000 &lt;span class="nt"&gt;--width&lt;/span&gt; 1920 &lt;span class="nt"&gt;--height&lt;/span&gt; 1080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For USB webcam:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# No additional config needed
# GemmaVision auto-detects /dev/video0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;I chose the &lt;strong&gt;Gemma 4 E4B-it&lt;/strong&gt; model because it's the sweet spot for edge deployment — small enough to run on a Raspberry Pi 5's 8GB RAM with 4-bit quantization (2.1GB), yet powerful enough for accurate zero-shot object detection at ~85% accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; Gemma 4's multimodal capabilities include &lt;strong&gt;native bounding box output&lt;/strong&gt; via the &lt;code&gt;box_2d&lt;/code&gt; JSON format. This eliminates the need for traditional CV pipelines (YOLO, OpenCV, NMS algorithms) entirely. One model replaces an entire stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works: The Technical Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Model Selection: Why Gemma 4 E4B-it?
&lt;/h3&gt;

&lt;p&gt;Gemma 4 comes in multiple sizes. For edge deployment on a Raspberry Pi 5 with 8GB RAM, the &lt;strong&gt;E4B-it&lt;/strong&gt; (Effective 4B) variant hits the sweet spot:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;Quantized Size&lt;/th&gt;
&lt;th&gt;RAM Required&lt;/th&gt;
&lt;th&gt;Pi 5 Compatible?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gemma-4-E4B-it&lt;/td&gt;
&lt;td&gt;E4B (Effective 4B)&lt;/td&gt;
&lt;td&gt;2.1GB&lt;/td&gt;
&lt;td&gt;~6GB&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma-4-26b-a4b-it&lt;/td&gt;
&lt;td&gt;26B MoE (4B active)&lt;/td&gt;
&lt;td&gt;13GB&lt;/td&gt;
&lt;td&gt;~20GB&lt;/td&gt;
&lt;td&gt;❌ No (Pi 5 has 8GB max)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma-4-31b-it&lt;/td&gt;
&lt;td&gt;31B Dense&lt;/td&gt;
&lt;td&gt;16GB&lt;/td&gt;
&lt;td&gt;~36GB&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;strong&gt;4-bit quantization&lt;/strong&gt; via &lt;code&gt;bitsandbytes&lt;/code&gt; is essential (CPU support was added in recent versions; ensure you install the latest). It reduces memory usage by 4× with minimal accuracy loss (~1-2% in my testing).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Complete Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
GemmaVision — Complete computer vision in 50 lines
Native object detection with Gemma 4 on Raspberry Pi 5
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ImageDraw&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="c1"&gt;# Configuration
&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/gemma-4-E4B-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;DEVICE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Raspberry Pi 5 has no CUDA
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Load Gemma 4 with 4-bit quantization for Pi 5&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s 8GB RAM.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;load_in_4bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Essential for 8GB RAM constraint
&lt;/span&gt;        &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# CPU inference on Pi
&lt;/span&gt;        &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all objects&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Detect objects in image using Gemma 4 native vision.

    Args:
        image_path: Path to image file
        query: What to detect (e.g., &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cars&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;furniture&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;buttons and inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)

    Returns:
        List of dicts with &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;box_2d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; [y1, x1, y2, x2] and &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Load image
&lt;/span&gt;    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Construct prompt for structured output
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Detect &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; in this image. Return JSON with &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;box_2d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; [y1, x1, y2, x2] and &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; fields.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="c1"&gt;# Run inference (10-20s on Pi 5)
&lt;/span&gt;    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;do_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Deterministic for reproducibility
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Parse native JSON output
&lt;/span&gt;    &lt;span class="n"&gt;result_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:],&lt;/span&gt; 
        &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Gemma 4 returns valid JSON array
&lt;/span&gt;    &lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;draw_boxes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Draw bounding boxes on image.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;draw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ImageDraw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Gemma 4 returns coords on a 1000x1000 grid — descale to image size
&lt;/span&gt;        &lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;box_2d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Draw box
&lt;/span&gt;        &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rectangle&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;outline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#00ff00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Draw label
&lt;/span&gt;        &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#00ff00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;

&lt;span class="c1"&gt;# One-liner usage
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kitchen.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all objects&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; objects:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;box_2d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;draw_boxes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kitchen.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;That's the entire pipeline.&lt;/strong&gt; No &lt;code&gt;cv2&lt;/code&gt;. No &lt;code&gt;torchvision&lt;/code&gt;. No &lt;code&gt;ultralytics&lt;/code&gt;. No YAML configs. No custom NMS logic. No coordinate normalization headaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Benchmarks
&lt;/h3&gt;

&lt;p&gt;I ran 100 test images across 5 categories on the Pi 5:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Images&lt;/th&gt;
&lt;th&gt;Avg Time&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Common objects&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;12.3s&lt;/td&gt;
&lt;td&gt;87%&lt;/td&gt;
&lt;td&gt;COCO-style items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indoor scenes&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;14.1s&lt;/td&gt;
&lt;td&gt;84%&lt;/td&gt;
&lt;td&gt;Living room, kitchen&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI elements&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;11.8s&lt;/td&gt;
&lt;td&gt;91%&lt;/td&gt;
&lt;td&gt;Buttons, inputs, links&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Screenshots&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;10.5s&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;td&gt;Web interfaces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Outdoor scenes&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;15.2s&lt;/td&gt;
&lt;td&gt;78%&lt;/td&gt;
&lt;td&gt;Street, cars, pedestrians&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;85.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;First inference&lt;/strong&gt; takes ~15 seconds (model loads from SD card).&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Subsequent inferences&lt;/strong&gt; take 8–12 seconds (model cached in RAM).&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Memory usage:&lt;/strong&gt; ~6GB RAM during inference (fits comfortably in 8GB Pi).&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Power draw:&lt;/strong&gt; 7.5W continuous (standard Pi 5 PSU).&lt;/p&gt;




&lt;h2&gt;
  
  
  The SEO Angle: Why This Matters for Developers
&lt;/h2&gt;

&lt;p&gt;Three fundamental shifts are happening simultaneously in edge AI:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Democratization of Computer Vision
&lt;/h3&gt;

&lt;p&gt;Computer vision was historically $500+ GPU territory. Now it's a $75 single-board computer. This changes who can build CV systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Students&lt;/strong&gt; can prototype without cloud credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hobbyists&lt;/strong&gt; in developing regions can build locally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indie developers&lt;/strong&gt; can ship CV features without venture funding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Researchers&lt;/strong&gt; can deploy experiments without institutional GPU clusters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The barrier to entry for computer vision just dropped by 10×.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Privacy-First by Default
&lt;/h3&gt;

&lt;p&gt;Everything happens locally on the Pi. No images uploaded to cloud APIs. No data retention policies to worry about. No network required after initial model download.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases where this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Home security cameras (no footage leaves your network)&lt;/li&gt;
&lt;li&gt;Medical image analysis (HIPAA compliance without vendor audits)&lt;/li&gt;
&lt;li&gt;Industrial quality control (trade secrets stay on-premise)&lt;/li&gt;
&lt;li&gt;Accessibility tools for sensitive environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Architectural Simplicity
&lt;/h3&gt;

&lt;p&gt;Traditional CV pipelines are composed systems with multiple failure points. Gemma 4 is a unified system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Traditional CV&lt;/th&gt;
&lt;th&gt;Gemma 4 Vision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Setup time&lt;/td&gt;
&lt;td&gt;2–4 hours&lt;/td&gt;
&lt;td&gt;20 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lines of code&lt;/td&gt;
&lt;td&gt;500–1000&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependencies&lt;/td&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configuration files&lt;/td&gt;
&lt;td&gt;3-5 (YAML/JSON)&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training required&lt;/td&gt;
&lt;td&gt;Yes (custom datasets)&lt;/td&gt;
&lt;td&gt;No (zero-shot)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Version conflicts&lt;/td&gt;
&lt;td&gt;Frequent&lt;/td&gt;
&lt;td&gt;Rare&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This simplicity isn't just about developer experience — it's about reliability. Fewer components means fewer things that can break at 2 AM.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: Can I run this on Raspberry Pi 4?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Technically yes, practically no. The Pi 4 tops out at 8GB but has a much slower CPU. With 4-bit quantization and heavy swap usage, it might run, but inference will be 2-3× slower (30-40s per image). Pi 5's 8GB RAM and faster CPU make it viable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How accurate is Gemma 4 compared to YOLOv8?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; In my testing on 100 images: YOLOv8 ~90%, Gemma 4 ~85%. The 5% gap is the trade-off for zero-shot capability and zero dependencies. For many applications, 85% is sufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can it detect custom objects not in COCO?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Yes! This is the magic of zero-shot. Just describe what you want: &lt;code&gt;"detect red toy cars"&lt;/code&gt;, &lt;code&gt;"find cracks in concrete"&lt;/code&gt;, &lt;code&gt;"locate loose bolts"&lt;/code&gt;. No retraining required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Does it work without internet?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; After initial model download (~2.1GB quantized), yes. The model runs 100% locally on the Pi. No API calls, no cloud dependencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use it for real-time video?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; No. At 8-12 seconds per frame, it's far too slow for video. Use YOLOv8 or other traditional CV for real-time applications. Gemma 4 excels at batch processing of still images.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What's the power consumption?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; ~7.5W continuous under load. A standard 5V 5A Raspberry Pi PSU handles it easily. The active cooler adds ~1W.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I run this on NVIDIA Jetson?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Absolutely, and it'll be much faster. Jetson Nano/Orin has CUDA support. This guide focuses on Pi 5 because it's cheaper and more accessible, but the code works anywhere PyTorch runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Is the model free to use commercially?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Yes! Gemma 4 is released under the &lt;strong&gt;Apache 2.0 license&lt;/strong&gt; — a major upgrade from previous Gemma models' custom terms. This is a standard, permissive open-source license allowing unrestricted commercial use. See &lt;a href="https://ai.google.dev/gemma/apache_2" rel="noopener noreferrer"&gt;Gemma 4 license details&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How do I improve accuracy?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Three strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Higher resolution input&lt;/strong&gt; — Larger images give more detail&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better prompts&lt;/strong&gt; — Be specific: &lt;code&gt;"detect laptops and phones"&lt;/code&gt; vs &lt;code&gt;"detect electronics"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crop regions&lt;/strong&gt; — Focus on relevant image areas instead of full scene&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Q: Can I fine-tune Gemma 4 for my use case?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Yes, but it's complex. Gemma 4 supports fine-tuning via LoRA/QLoRA. I plan to publish a fine-tuning guide after the challenge. For now, zero-shot prompting covers 80% of use cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next for GemmaVision
&lt;/h2&gt;

&lt;p&gt;This is my official entry for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;DEV Gemma 4 Challenge&lt;/a&gt; (May 6-24, 2026).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Post-challenge roadmap:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;ETA&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fine-tuning guide&lt;/td&gt;
&lt;td&gt;Planned&lt;/td&gt;
&lt;td&gt;June 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pi 5 GPU acceleration&lt;/td&gt;
&lt;td&gt;Waiting for open-source drivers&lt;/td&gt;
&lt;td&gt;TBD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WebRTC streaming&lt;/td&gt;
&lt;td&gt;Prototyping&lt;/td&gt;
&lt;td&gt;May 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9B model experiments&lt;/td&gt;
&lt;td&gt;Blocked (needs 12GB+ RAM)&lt;/td&gt;
&lt;td&gt;If Pi 6 releases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker deployment&lt;/td&gt;
&lt;td&gt;Planned&lt;/td&gt;
&lt;td&gt;May 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Home Assistant integration&lt;/td&gt;
&lt;td&gt;Community request&lt;/td&gt;
&lt;td&gt;June 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Call to Action
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If this project helped you:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;Try the code:&lt;/strong&gt; &lt;a href="https://github.com/tahosinx/gemmavision" rel="noopener noreferrer"&gt;github.com/tahosinx/gemmavision&lt;/a&gt;&lt;br&gt;&lt;br&gt;
⭐ &lt;strong&gt;Star the repo&lt;/strong&gt; if you found it useful&lt;br&gt;&lt;br&gt;
💬 &lt;strong&gt;Comment below:&lt;/strong&gt; What would you build with local, offline computer vision?&lt;br&gt;&lt;br&gt;
❤️ &lt;strong&gt;Heart this post&lt;/strong&gt; — it helps in the challenge rankings&lt;br&gt;&lt;br&gt;
🐦 &lt;strong&gt;Share on Twitter&lt;/strong&gt; — Tag me &lt;a href="https://twitter.com/tahosinx" rel="noopener noreferrer"&gt;@tahosinx&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://rpilocator.com" rel="noopener noreferrer"&gt;Raspberry Pi 5&lt;/a&gt; — Stock finder (currently available)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.adafruit.com/product/5658" rel="noopener noreferrer"&gt;Camera Module 3&lt;/a&gt; — Wide angle recommended&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.raspberrypi.com/products/active-cooler/" rel="noopener noreferrer"&gt;Active Cooler&lt;/a&gt; — Official Pi cooler&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tahosin&lt;/strong&gt; — Building AI systems that run where you need them: on your desk, not in the cloud.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🌐 Website: &lt;a href="https://tahosin.bro.bd" rel="noopener noreferrer"&gt;tahosin.bro.bd&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 GitHub: &lt;a href="https://github.com/tahosinx" rel="noopener noreferrer"&gt;@tahosinx&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📝 DEV: &lt;a href="https://dev.to/tahosin"&gt;@tahosin&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🐦 Twitter: &lt;a href="https://twitter.com/tahosinx" rel="noopener noreferrer"&gt;@tahosinx&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Built with &lt;a href="https://ai.google.dev/gemma" rel="noopener noreferrer"&gt;Gemma 4&lt;/a&gt;. Tested on a $75 computer. Shared because nobody else was writing this guide.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Keywords:&lt;/strong&gt; Gemma 4, computer vision, Raspberry Pi, edge AI, object detection, zero-shot learning, multimodal AI, local inference, privacy-first AI, embedded vision, YOLO alternative, OpenCV replacement, budget AI hardware, DIY computer vision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemma" rel="noopener noreferrer"&gt;Gemma 4 Technical Paper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/transformers" rel="noopener noreferrer"&gt;Hugging Face Transformers Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.raspberrypi.com/products/raspberry-pi-5/" rel="noopener noreferrer"&gt;Raspberry Pi 5 Specs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/4bit-transformers-bitsandbytes" rel="noopener noreferrer"&gt;4-bit Quantization Explained&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Last updated: May 12, 2026. GemmaVision v1.0. MIT Licensed.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>discuss</category>
      <category>gemma</category>
    </item>
    <item>
      <title>AI Code Generation: Google's 75% Claim and What It Means</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Sat, 25 Apr 2026 04:00:55 +0000</pubDate>
      <link>https://dev.to/tahosin/ai-code-generation-googles-75-claim-and-what-it-means-ke5</link>
      <guid>https://dev.to/tahosin/ai-code-generation-googles-75-claim-and-what-it-means-ke5</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgp5gqe9bv1035ctabbrt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgp5gqe9bv1035ctabbrt.jpg" alt="Cover" width="799" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sundar Pichai just dropped a bombshell: 75% of Google's code is now AI-generated. That's a huge number, and it's not some far-off future scenario. This isn't just about faster autocomplete; it's a stark look at where enterprise development is headed, fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for Tech Leads
&lt;/h2&gt;

&lt;p&gt;If you're a tech lead, or even a staff engineer, this number should make you sit up straight. Your team's productivity metrics could be about to get a serious shake-up. You're not just reviewing human-written code anymore; you're going to be reviewing AI-generated solutions that might look perfect on the surface but hide subtle issues. Think about the shift from writing boilerplate to &lt;em&gt;verifying&lt;/em&gt; boilerplate. You'll need to figure out how to integrate these tools, manage their output, and still maintain code quality and architectural integrity. This isn't just about adopting a new IDE plugin; it's about fundamentally rethinking how code gets from idea to production. Google's internal tools, whatever they're called, are clearly pushing boundaries way past what we see in public tools like GitHub Copilot, saving them potentially millions of developer hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The technical reality
&lt;/h2&gt;

&lt;p&gt;So, how does 75% AI-generated code even work? It's not sentient AI writing entire systems from scratch. More likely, it's highly sophisticated code completion, pattern recognition, and scaffold generation, deeply integrated into Google's vast internal monorepo and toolchain. Imagine an AI that understands your internal APIs, coding standards, and common patterns better than a new hire. It probably generates entire function bodies, test cases, and even data models based on high-level prompts or existing code context. We're talking about tools that can spit out a &lt;code&gt;src/utils/data-formatter.js&lt;/code&gt; file with 50 lines of perfect code, including JSDoc comments, in seconds. But you still gotta check it. Here's a tiny example of what an AI might generate, and what you'd typically do with it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// AI-generated utility function&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;formatCurrency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;locale&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en-US&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;currency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;USD&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;number&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nf"&gt;isNaN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Invalid input for formatCurrency:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;Intl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NumberFormat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;locale&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;currency&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;minimumFractionDigits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;maximumFractionDigits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// A human-written test for verification&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;formatCurrency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;123.45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en-US&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;USD&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;$123.45&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;USD formatting failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;formatCurrency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;99.99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;de-DE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EUR&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;99,99 €&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EUR formatting failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;formatCurrency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;$0.00&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Zero value failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;formatCurrency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Null input failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And it's not just JavaScript. It's likely generating configuration files, build scripts, and more. Think about a &lt;code&gt;Dockerfile&lt;/code&gt; for a new service or a &lt;code&gt;Kubernetes&lt;/code&gt; deployment manifest. An AI could draft that based on a few parameters, saving hours of looking up syntax in documentation. It's about reducing the cognitive load on engineers by automating the predictable, allowing them to focus on the truly novel problems. I've seen teams save 10% of their time just by using basic code completion; imagine what 75% generation means.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd actually do today
&lt;/h2&gt;

&lt;p&gt;Given this news, here's my practical take for any dev team right now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Start small with a public tool:&lt;/strong&gt; Integrate something like GitHub Copilot or Cursor into a non-critical side project or a small, isolated module. See how it performs with your team's common tasks.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Define clear AI usage policies:&lt;/strong&gt; Decide what kinds of code can be AI-generated without heavy human review. Establish rules for sensitive data or critical path logic.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Invest in robust testing:&lt;/strong&gt; If AI writes more code, humans need to write more tests, or at least verify AI-generated tests. Strong unit and integration tests are your safety net.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Practice prompt engineering:&lt;/strong&gt; Teach your team how to write effective prompts. Getting good output from AI is a skill, and it's becoming crucial.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Monitor code quality metrics:&lt;/strong&gt; Keep a close eye on your static analysis tools and code coverage. AI can introduce subtle bugs or performance issues that human eyes might miss.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Gotchas &amp;amp; unknowns
&lt;/h2&gt;

&lt;p&gt;While 75% is impressive, it's not a silver bullet. The biggest gotcha is &lt;strong&gt;hallucinations&lt;/strong&gt;. AI models can generate plausible-looking but completely incorrect code. This is especially true when dealing with edge cases, complex business logic, or obscure library usage. Another unknown is the &lt;strong&gt;maintenance burden&lt;/strong&gt;. If an AI generates code, who's responsible for understanding and debugging it later? What happens when the underlying libraries change, and the AI-generated code becomes outdated? It's also unclear how Google manages intellectual property or security concerns with such widespread AI usage. They have internal models, sure, but the ethical lines blur when a machine generates 3 out of 4 lines of your codebase. And let's not forget the environmental impact of running these massive AI models constantly; that's a whole other can of worms.&lt;/p&gt;

&lt;p&gt;How much of your codebase do you think an AI could realistically generate without causing more headaches than it solves?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>javascript</category>
      <category>google</category>
    </item>
    <item>
      <title>Streaming Speech-to-Text with OpenAI in 2026: Moving Beyond Whisper</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Fri, 24 Apr 2026 19:16:56 +0000</pubDate>
      <link>https://dev.to/tahosin/streaming-speech-to-text-with-openai-in-2026-moving-beyond-whisper-2968</link>
      <guid>https://dev.to/tahosin/streaming-speech-to-text-with-openai-in-2026-moving-beyond-whisper-2968</guid>
      <description>&lt;p&gt;Quick recap of where we are if you haven't been following OpenAI's STT roadmap: the classic &lt;code&gt;whisper-1&lt;/code&gt; endpoint is &lt;em&gt;batch-only&lt;/em&gt; — you upload a file, wait, get back a finished transcript. There's no &lt;code&gt;stream=True&lt;/code&gt; because the underlying Whisper decoder wasn't designed for it, and the endpoint probably won't ever get streaming retrofitted onto it.&lt;/p&gt;

&lt;p&gt;That was a genuine blocker for about two years. If you wanted live captions or partial transcripts, you had to either self-host Whisper with a streaming fork, or reach for a third-party like AssemblyAI / Deepgram.&lt;/p&gt;

&lt;p&gt;Then, quietly, OpenAI shipped two replacements that between them cover every STT streaming use case I've needed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gpt-4o-transcribe&lt;/code&gt; / &lt;code&gt;gpt-4o-mini-transcribe&lt;/code&gt;&lt;/strong&gt; — file upload with &lt;code&gt;stream=True&lt;/code&gt;, delivers partial transcripts as the audio is processed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Realtime API (&lt;code&gt;gpt-4o-realtime-preview&lt;/code&gt;)&lt;/strong&gt; — WebSocket, bidirectional, built for live mic-in / TTS-out with a live-transcription mode.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I helped someone get unblocked on this in &lt;a href="https://github.com/openai/openai-python/discussions/2306" rel="noopener noreferrer"&gt;openai/openai-python#2306&lt;/a&gt; and realised I'd never written up the full picture. Here it is — with the trade-offs, working code for each, and a decision rule at the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 1: &lt;code&gt;gpt-4o-transcribe&lt;/code&gt; with &lt;code&gt;stream=True&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Same API shape as the old &lt;code&gt;audio.transcriptions.create&lt;/code&gt; call, just with a new model and &lt;code&gt;stream=True&lt;/code&gt;. You get incremental &lt;code&gt;transcript.text.delta&lt;/code&gt; events as chunks come back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meeting.mp3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcriptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-transcribe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# or "gpt-4o-mini-transcribe" (cheaper)
&lt;/span&gt;        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;transcript&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transcript.text.delta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transcript.text.done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    &lt;span class="c1"&gt;# final newline
&lt;/span&gt;    &lt;span class="n"&gt;full_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Async works identically with &lt;code&gt;AsyncOpenAI&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncOpenAI&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncOpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcriptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-transcribe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transcript.text.delta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meeting.mp3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why I default to this for "finished file" use cases
&lt;/h3&gt;

&lt;p&gt;Three practical reasons beyond the streaming:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy&lt;/strong&gt;. On the English + mixed-language audio I've benchmarked, &lt;code&gt;gpt-4o-transcribe&lt;/code&gt; is noticeably better than &lt;code&gt;whisper-1&lt;/code&gt; at speaker changes, acronyms, and technical vocabulary. &lt;code&gt;gpt-4o-mini-transcribe&lt;/code&gt; is a smaller quality step down but still beats &lt;code&gt;whisper-1&lt;/code&gt; in my tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency perception&lt;/strong&gt;. Even though the &lt;em&gt;total&lt;/em&gt; time is similar, partial transcripts streaming into your UI feel much faster to users. A 3-minute audio file that takes 20 seconds to transcribe feels instant if the first words show up after ~500ms; it feels sluggish if you wait the full 20 seconds for the whole blob.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Same file-upload ergonomics as Whisper&lt;/strong&gt;. Swapping &lt;code&gt;model="whisper-1"&lt;/code&gt; for &lt;code&gt;model="gpt-4o-transcribe"&lt;/code&gt; + adding &lt;code&gt;stream=True&lt;/code&gt; is almost a drop-in change, so migrating an existing pipeline is a 5-minute job, not a rewrite.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What you give up
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No word-level timestamps&lt;/strong&gt; (yet, at time of writing). &lt;code&gt;whisper-1&lt;/code&gt; with &lt;code&gt;response_format="verbose_json"&lt;/code&gt; and &lt;code&gt;timestamp_granularities=["word"]&lt;/code&gt; still wins if you need precise word-level timing for subtitle alignment. If that's your use case, stay on &lt;code&gt;whisper-1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No speaker diarization in either&lt;/strong&gt;. If you need "who said what", both of these need to be paired with a separate diarization step (pyannote is the usual pick).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Option 2: Realtime API for true live audio
&lt;/h2&gt;

&lt;p&gt;If you're transcribing &lt;em&gt;live&lt;/em&gt; audio — a microphone, a phone call, a meeting as it happens — you want the Realtime API, not a file upload. It's a WebSocket connection you push PCM16 chunks into, and you get back &lt;code&gt;conversation.item.input_audio_transcription.delta&lt;/code&gt; events every ~200–500ms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sounddevice&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncOpenAI&lt;/span&gt;

&lt;span class="n"&gt;SAMPLE_RATE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24_000&lt;/span&gt;
&lt;span class="n"&gt;CHUNK_MS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;   &lt;span class="c1"&gt;# 50ms chunks
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;live_transcribe&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncOpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;realtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-realtime-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Configure for transcription-only (no model replies, no TTS)
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modalities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_audio_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pcm16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_audio_transcription&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-transcribe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;turn_detection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server_vad&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;# let OpenAI handle silence detection
&lt;/span&gt;        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Start streaming audio in the background
&lt;/span&gt;        &lt;span class="n"&gt;audio_queue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;indata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time_info&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;audio_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_nowait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;indata&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RawInputStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;samplerate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SAMPLE_RATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;blocksize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SAMPLE_RATE&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;CHUNK_MS&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;channels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;int16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;on_audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;sender&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;_send_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audio_queue&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;conversation.item.input_audio_transcription.delta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;conversation.item.input_audio_transcription.completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[final: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;sender&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_send_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_audio_buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ascii&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;live_transcribe&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth knowing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PCM16 at 24 kHz is the expected format&lt;/strong&gt;. If you're capturing at a different sample rate, resample &lt;em&gt;before&lt;/em&gt; sending — the server won't resample for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let server-side VAD handle turn detection&lt;/strong&gt; (&lt;code&gt;turn_detection: {type: "server_vad"}&lt;/code&gt;) unless you have a specific reason to do it client-side. OpenAI's VAD is well-tuned and keeps your client code simple.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;conversation.item.input_audio_transcription.delta&lt;/code&gt; events are your partial captions&lt;/strong&gt;; &lt;code&gt;conversation.item.input_audio_transcription.completed&lt;/code&gt; fires when the user finishes a "turn" (i.e. stops talking for ~500ms). Use the deltas to drive your live caption UI, and the completed event to commit a finalised sentence to your transcript log.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You can also use the Realtime API for voice-to-voice&lt;/strong&gt; (audio in, audio out) by adding &lt;code&gt;"audio"&lt;/code&gt; to &lt;code&gt;modalities&lt;/code&gt; and setting a TTS voice. The transcription deltas still fire, so you get the transcript "for free" even in a full voice-assistant setup.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Latency is the killer feature
&lt;/h3&gt;

&lt;p&gt;In my testing, the partial-transcript latency on Realtime is 200–400ms from end-of-phoneme to delta, which is what you need for live captions to feel responsive. File-based &lt;code&gt;gpt-4o-transcribe&lt;/code&gt; with streaming still has to wait for the chunk to arrive on the server before it can start, so the first delta on a file upload lands ~1–2s in — fine for "uploaded recording" UX, too slow for "live."&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 3: Stay on &lt;code&gt;whisper-1&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;If:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You genuinely don't care about streaming (batch transcription of a recorded file where the UX is "upload → come back in a minute for the result").&lt;/li&gt;
&lt;li&gt;You need word-level timestamps for subtitle alignment.&lt;/li&gt;
&lt;li&gt;You're cost-optimising hard and the 50% discount of &lt;code&gt;whisper-1&lt;/code&gt; over &lt;code&gt;gpt-4o-mini-transcribe&lt;/code&gt; matters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;... then &lt;code&gt;whisper-1&lt;/code&gt; is still the right call, and probably will be for a while. It's not going anywhere, it's cheap, it's stable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision rule
&lt;/h2&gt;

&lt;p&gt;Written out as an actual rule I use:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"Is the user staring at the UI waiting for the transcript?"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No&lt;/strong&gt; (batch job, background processing, subtitle generation) → &lt;code&gt;whisper-1&lt;/code&gt; if you need timestamps, &lt;code&gt;gpt-4o-mini-transcribe&lt;/code&gt; otherwise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Yes, and the audio is a finished file&lt;/strong&gt; → &lt;code&gt;gpt-4o-transcribe&lt;/code&gt; or &lt;code&gt;gpt-4o-mini-transcribe&lt;/code&gt; with &lt;code&gt;stream=True&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Yes, and the audio is live&lt;/strong&gt; (mic, phone, meeting) → Realtime API with &lt;code&gt;input_audio_transcription&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;There's one additional axis worth flagging: &lt;strong&gt;language coverage&lt;/strong&gt;. &lt;code&gt;whisper-1&lt;/code&gt; still has the broadest language support (it was trained on 98 languages). &lt;code&gt;gpt-4o-transcribe&lt;/code&gt; is very good on the major languages but gets noticeably worse as you head into the long tail. If you're transcribing Swahili, Bengali, or any other non-top-20 language, benchmark both on a sample before picking — don't assume the newer model is always better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common pitfalls
&lt;/h2&gt;

&lt;p&gt;A few things that cost me hours the first time:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;code&gt;file&lt;/code&gt; parameter for &lt;code&gt;audio.transcriptions.create&lt;/code&gt; wants a file-&lt;em&gt;like&lt;/em&gt; object, not bytes.&lt;/strong&gt; If you have raw bytes in memory (e.g. from an upload handler), wrap them in &lt;code&gt;io.BytesIO&lt;/code&gt; and set a &lt;code&gt;.name&lt;/code&gt; attribute ending in the right extension: the SDK uses the filename to infer &lt;code&gt;Content-Type&lt;/code&gt;.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BytesIO&lt;/span&gt;
   &lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recording.mp3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;    &lt;span class="c1"&gt;# ← critical
&lt;/span&gt;   &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcriptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-transcribe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Realtime API auth uses the same key as the rest of OpenAI&lt;/strong&gt;, but the connection is authenticated at the WebSocket handshake. Your API key briefly appears in the &lt;code&gt;Authorization&lt;/code&gt; header of the initial HTTP upgrade request, which is fine server-side — but if you're building a browser client, you need to proxy the handshake through your backend so the key never touches the client. OpenAI has a "client secret" flow for this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The event types are stable but there are a lot of them.&lt;/strong&gt; The Realtime API emits ~20 distinct event types; if you find yourself writing a giant &lt;code&gt;if/elif&lt;/code&gt; chain, factor it into a dispatch dict indexed by &lt;code&gt;event.type&lt;/code&gt; early — much easier to extend later.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Silence doesn't count as transcription.&lt;/strong&gt; If your audio has a lot of pauses, you'll see &lt;code&gt;input_audio_buffer.speech_stopped&lt;/code&gt; events but no transcription deltas for the silent parts. That's expected; don't treat it as a bug.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://platform.openai.com/docs/guides/speech-to-text" rel="noopener noreferrer"&gt;Speech-to-Text guide&lt;/a&gt; — covers &lt;code&gt;gpt-4o-transcribe&lt;/code&gt; and &lt;code&gt;whisper-1&lt;/code&gt; together&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://platform.openai.com/docs/guides/realtime" rel="noopener noreferrer"&gt;Realtime API guide&lt;/a&gt; — the overview&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/openai/openai-python#realtime-api-beta" rel="noopener noreferrer"&gt;openai-python SDK Realtime docs&lt;/a&gt; — the Python specifics&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/openai/openai-python/discussions/2306" rel="noopener noreferrer"&gt;openai/openai-python#2306&lt;/a&gt; — the discussion that prompted this writeup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building something non-trivial with streaming STT — especially multi-speaker scenarios, code-switching (mixing languages), or very noisy audio — leave a comment, I've been collecting notes on which approach wins in each setting.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>openai</category>
      <category>speech</category>
    </item>
    <item>
      <title>Next.js 16: Revalidating Per-User Dynamic Fetches on Demand (3 Patterns That Actually Work)</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Fri, 24 Apr 2026 19:16:18 +0000</pubDate>
      <link>https://dev.to/tahosin/nextjs-16-revalidating-per-user-dynamic-fetches-on-demand-3-patterns-that-actually-work-2a8a</link>
      <guid>https://dev.to/tahosin/nextjs-16-revalidating-per-user-dynamic-fetches-on-demand-3-patterns-that-actually-work-2a8a</guid>
      <description>&lt;p&gt;If you've ever tried to revalidate a user-scoped fetch in Next.js App Router and watched &lt;code&gt;revalidateTag('...')&lt;/code&gt; silently do nothing, you've run into one of the subtler gotchas of the 16.x data cache. The short version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Once a &lt;code&gt;fetch&lt;/code&gt; reads from &lt;code&gt;cookies()&lt;/code&gt; or &lt;code&gt;headers()&lt;/code&gt;, Next marks it as Dynamic and bypasses the data cache entirely — so &lt;code&gt;next: { tags: [...] }&lt;/code&gt; is silently ignored, and your tag-based revalidation has nothing to invalidate.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This bites hardest on auth-gated dashboards: every fetch forwards the session cookie to your backend, so every fetch is Dynamic, so none of them are cached, so &lt;code&gt;revalidateTag&lt;/code&gt; is a no-op. You end up writing action handlers that "revalidate everything" with an empty tag key — and that actually does work, but it's a sledgehammer that obliterates cross-user cache isolation you didn't know you wanted.&lt;/p&gt;

&lt;p&gt;I ran into this last week while helping someone in &lt;a href="https://github.com/vercel/next.js/discussions/92829" rel="noopener noreferrer"&gt;vercel/next.js#92829&lt;/a&gt;, and realised I've been using three distinct patterns depending on the data shape. Writing them up here because the docs don't connect the dots between the Dynamic IO model and the "per-user revalidation" use case.&lt;/p&gt;

&lt;p&gt;All examples target &lt;strong&gt;Next.js 16.1+&lt;/strong&gt;. I'll note where 16.0 and earlier diverge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern you probably tried first (and why it fails)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/lib/api.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cookies&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/headers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fetchAPI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cookieStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;cookies&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.example.com/dashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Cookie&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cookieStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dashboard-data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;// ← this is silently ignored&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in a server action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;revalidateTag&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;refreshDashboard&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;revalidateTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dashboard-data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// ← nothing to invalidate; cache was never populated&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fetch is considered Dynamic because it reads from &lt;code&gt;cookies()&lt;/code&gt; inside the module scope that &lt;code&gt;fetch&lt;/code&gt; resolves in. Dynamic fetches skip the data cache entirely — they're not cached per-user, they're not cached at all. &lt;code&gt;next.tags&lt;/code&gt; is only consulted when something actually enters the cache, so the tag never gets associated with any cache entry.&lt;/p&gt;

&lt;p&gt;Your three options are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Opt back in to caching with an explicit key (&lt;code&gt;unstable_cache&lt;/code&gt; or &lt;code&gt;'use cache'&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Accept it's dynamic, use &lt;code&gt;React.cache&lt;/code&gt; for same-request dedupe, and &lt;code&gt;revalidatePath&lt;/code&gt; for rerenders&lt;/li&gt;
&lt;li&gt;Route the data through a Route Handler that &lt;em&gt;does&lt;/em&gt; cache, and call it from the Server Component&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's walk through each.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 1: &lt;code&gt;unstable_cache&lt;/code&gt; with the cookie as a key part
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;unstable_cache&lt;/code&gt; reads its cache key from the function's arguments, not from the enclosing module. So you read the cookie &lt;em&gt;outside&lt;/em&gt; the cached function and pass it in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/lib/api.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;unstable_cache&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cookies&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/headers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createHash&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;node:crypto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessionHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cookie&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nf"&gt;createHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cookie&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fetchAPIForUser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionCookie&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nf"&gt;unstable_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.example.com/dashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Cookie&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sessionCookie&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`API &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;// Cache key parts — different sessions get different cache entries&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fetchAPI&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sessionCookie&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dashboard-data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;`dashboard-data:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;sessionHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionCookie&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;revalidate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;)();&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessionCookie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;cookies&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fetchAPIForUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionCookie&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things are doing work here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The cookie is a key part&lt;/strong&gt;, so every user ends up with their own cache entry. User A's &lt;code&gt;revalidateTag&lt;/code&gt; doesn't nuke User B's data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The tags list has both a global &lt;code&gt;dashboard-data&lt;/code&gt; and a per-user &lt;code&gt;dashboard-data:&amp;lt;hash&amp;gt;&lt;/code&gt;&lt;/strong&gt;. This gives you granular control: revalidate one user's data after they mutate something, or nuke everyone's when a global config changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then your server action becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;revalidateTag&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cookies&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/headers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createHash&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;node:crypto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessionHash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nf"&gt;createHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;refreshMyDashboard&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cookie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;cookies&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nf"&gt;revalidateTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`dashboard-data:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;sessionHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cookie&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// just me&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;refreshEveryonesDashboard&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;revalidateTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dashboard-data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// global flush&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use this&lt;/strong&gt;: user-scoped data that's expensive to fetch and read more than once per session — dashboards, settings pages, user-specific feeds. You get the latency win of caching &lt;em&gt;and&lt;/em&gt; tag-based revalidation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha&lt;/strong&gt;: don't accidentally cache PII in a way that survives the user's logout. The per-user tag + a reasonable &lt;code&gt;revalidate&lt;/code&gt; ceiling (60s–5min) keeps the blast radius sane.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 2: &lt;code&gt;'use cache'&lt;/code&gt; directive (the modern shape)
&lt;/h2&gt;

&lt;p&gt;If you're on 16.1+ with &lt;code&gt;experimental.dynamicIO&lt;/code&gt; enabled, &lt;code&gt;'use cache'&lt;/code&gt; is the newer, less verbose form — same idea, less ceremony:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/lib/api.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cookies&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/headers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cacheTag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cacheLife&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchAPIForUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionCookie&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;use cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;cacheLife&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;minutes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;cacheTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dashboard-data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`dashboard-data:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;sessionHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionCookie&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.example.com/dashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Cookie&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sessionCookie&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessionCookie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;cookies&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fetchAPIForUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionCookie&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// same pattern — read cookie outside&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;cacheTag&lt;/code&gt; / &lt;code&gt;cacheLife&lt;/code&gt; from &lt;code&gt;next/cache&lt;/code&gt; are the equivalents of the &lt;code&gt;unstable_cache&lt;/code&gt; options, and the function's arguments become the cache key automatically.&lt;/p&gt;

&lt;p&gt;The key discipline — &lt;strong&gt;read &lt;code&gt;cookies()&lt;/code&gt; outside the cached function and pass it as an argument&lt;/strong&gt; — is identical to Pattern 1. The framework still can't introspect into &lt;code&gt;cookies()&lt;/code&gt; from inside a cached region; it just sees a function that takes a string and caches by string.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enable it in &lt;code&gt;next.config.ts&lt;/code&gt;&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NextConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NextConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;experimental&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;dynamicIO&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;useCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check your 16.x changelog for exact flag names — they shifted between 16.0 and 16.1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 3: Accept the dynamic, dedupe with &lt;code&gt;React.cache&lt;/code&gt;, refresh with &lt;code&gt;revalidatePath&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Sometimes the data just isn't cacheable — it changes every request, or it's cheap enough that caching adds latency instead of removing it. In that case, don't fight the framework; work with it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/lib/api.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cache&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cookies&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/headers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fetchAPI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessionCookie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;cookies&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.example.com/dashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Cookie&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sessionCookie&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;React.cache&lt;/code&gt; dedupes the fetch across components &lt;em&gt;within the same request&lt;/em&gt;, so if five Server Components call &lt;code&gt;fetchAPI()&lt;/code&gt; during one render, you still only hit the backend once. Different requests get fresh data — exactly what you want for per-user live data.&lt;/p&gt;

&lt;p&gt;Then your server action rerenders the page instead of revalidating a cache entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;revalidatePath&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;refreshDashboard&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;revalidatePath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/dashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// forces re-render, which re-runs fetchAPI&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use this&lt;/strong&gt;: user-scoped data that's small, cheap, or genuinely fresh-per-request. Most dashboards I've built fall here — the latency of a direct backend call is dominated by network anyway, and skipping the cache layer saves you from a whole class of staleness bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision rule I actually use
&lt;/h2&gt;

&lt;p&gt;After writing a few of these, this is the rule I apply:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;"The data is user-scoped, expensive, and reads dominate writes"&lt;/strong&gt; → &lt;strong&gt;Pattern 1 or 2&lt;/strong&gt; with per-user tags. The 5× latency win on cache hits usually justifies the complexity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;"The data is user-scoped, cheap, and reads roughly equal writes"&lt;/strong&gt; → &lt;strong&gt;Pattern 3&lt;/strong&gt;. Don't cache; dedupe per-request, rerender on mutation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;"The data is global but personalised at the margin (e.g. reading a session cookie only for feature flags)"&lt;/strong&gt; → &lt;strong&gt;Pattern 1 with a single tag&lt;/strong&gt;, no per-user keying. Feature flag data is worth caching even though it reads a cookie.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;"I need real-time-ish data (&amp;lt; 30s)"&lt;/strong&gt; → &lt;strong&gt;Pattern 3&lt;/strong&gt; + poll-on-client with React Query / SWR. Caching on the server layer just pushes the staleness problem around.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The sledgehammer (and why to avoid it)
&lt;/h2&gt;

&lt;p&gt;You can make the original code "work" by calling &lt;code&gt;revalidateTag('')&lt;/code&gt; on every mutation — it nukes every tagged entry in the cache, and your Dynamic fetch also re-runs because the page gets marked for revalidation. I've seen this in production a few times and every time it caused an incident later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One user's mutation invalidates every other user's cache → thundering herd on the backend&lt;/li&gt;
&lt;li&gt;Global feature flags that &lt;em&gt;were&lt;/em&gt; cacheable get flushed on every user action → effective cache hit rate drops to ~0%&lt;/li&gt;
&lt;li&gt;Debugging becomes impossible because "why did User A see stale data?" has no local explanation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Per-user tags (Pattern 1 / 2) or per-request React.cache (Pattern 3) are both strictly better. Pick one, be consistent within a feature area, and document which pattern a given fetch is using.&lt;/p&gt;

&lt;h2&gt;
  
  
  A word on the mental model
&lt;/h2&gt;

&lt;p&gt;The thing that clicked for me about the 16.x Dynamic IO model: the data cache is fundamentally a global key-value store keyed by URL + options hash. When your fetch reads something request-scoped (cookies, headers, searchParams), the cache layer has no good default for "who does this entry belong to?" — so it bails out entirely rather than silently cache PII across users.&lt;/p&gt;

&lt;p&gt;You opt back in by &lt;strong&gt;making the user-scoping explicit&lt;/strong&gt; (passing the cookie as a key part), which moves the security decision into your code where you can reason about it. That's the same tradeoff React Server Components made around &lt;code&gt;'use server'&lt;/code&gt; — the framework refuses to guess, and gives you a small API to tell it exactly what you mean.&lt;/p&gt;

&lt;p&gt;Once I started thinking of &lt;code&gt;unstable_cache&lt;/code&gt; / &lt;code&gt;'use cache'&lt;/code&gt; as "declare your cache key explicitly, include whatever request-scoped stuff you want to partition on", the rest of the API fell into place.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nextjs.org/docs/app/building-your-application/caching/data-cache" rel="noopener noreferrer"&gt;Next.js 16 — Data Cache&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://nextjs.org/docs/app/api-reference/functions/unstable_cache" rel="noopener noreferrer"&gt;&lt;code&gt;unstable_cache&lt;/code&gt;&lt;/a&gt; / &lt;a href="https://nextjs.org/docs/app/api-reference/directives/use-cache" rel="noopener noreferrer"&gt;&lt;code&gt;'use cache'&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://nextjs.org/docs/app/api-reference/functions/revalidateTag" rel="noopener noreferrer"&gt;&lt;code&gt;revalidateTag&lt;/code&gt;&lt;/a&gt; / &lt;a href="https://nextjs.org/docs/app/api-reference/functions/revalidatePath" rel="noopener noreferrer"&gt;&lt;code&gt;revalidatePath&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vercel/next.js/discussions/92829" rel="noopener noreferrer"&gt;The original GitHub discussion&lt;/a&gt; where this writeup started&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're hitting a variation of this problem — say, SSE streams that need to drop their connection on revalidation, or RSC payloads that race with client-side tag invalidations — drop a comment, I've probably tripped on it too.&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>webdev</category>
      <category>typescript</category>
      <category>react</category>
    </item>
    <item>
      <title>Portal 2 Modding Tools: Community Edition is Here</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Thu, 23 Apr 2026 16:02:25 +0000</pubDate>
      <link>https://dev.to/tahosin/portal-2-modding-tools-community-edition-is-here-4d5m</link>
      <guid>https://dev.to/tahosin/portal-2-modding-tools-community-edition-is-here-4d5m</guid>
      <description>&lt;p&gt;So, Portal 2: Community Edition just dropped into open beta on Steam. It's got enhanced graphics, bigger maps, and a whole new set of modding tools for us to play with.&lt;/p&gt;

&lt;p&gt;My hot take? This isn't just a game update; it's an open-source platform waiting for some serious innovation from the community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for Game Developers
&lt;/h2&gt;

&lt;p&gt;Look, if you're a game dev, especially one who's ever tinkered with the Source Engine, this is a big deal. You're getting a fully featured, beloved game, essentially handed over to the community with new hooks. It's a goldmine for learning, experimenting, and even showcasing your skills. Think about the countless games that started as mods, like Counter-Strike or Dota. This isn't just about making new levels; it's about extending gameplay mechanics, building custom assets, and maybe even rewriting parts of the game logic. Valve's done the heavy lifting on the core engine, and now we get to build on top of it. It's free for existing Portal 2 owners, which means a huge potential audience for anything you create. We're talking millions of players, not just a niche group.&lt;/p&gt;

&lt;h2&gt;
  
  
  The technical reality
&lt;/h2&gt;

&lt;p&gt;Modding Portal 2, even with new tools, still means getting cozy with the Source Engine. That often involves C++ for deeper modifications, but the community edition likely streamlines asset creation and scripting. You'll be dealing with Valve's Hammer editor for map creation, but the new tools probably offer more flexibility. Building a simple mod might look something like compiling custom scripts or assets. Let's say you're adding a new puzzle element. You'd likely define its behavior in a script, and then compile it.&lt;/p&gt;

&lt;p&gt;Here's a conceptual shell command you might use to compile a custom game DLL for Source Engine, assuming you've got the SDK set up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="c"&gt;# Navigate to your mod's source directory&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PORTAL2CE_SDK_PATH&lt;/span&gt;&lt;span class="s2"&gt;/src/my_custom_mod"&lt;/span&gt;

&lt;span class="c"&gt;# Clean previous build artifacts&lt;/span&gt;
make clean

&lt;span class="c"&gt;# Build the game library (e.g., game_shared.dll or game_server.dll)&lt;/span&gt;
&lt;span class="c"&gt;# This assumes a Make-based build system common in older engine SDKs&lt;/span&gt;
&lt;span class="c"&gt;# Modern tools might use CMake or Visual Studio projects.&lt;/span&gt;
make &lt;span class="nt"&gt;-j8&lt;/span&gt;

&lt;span class="c"&gt;# Copy the compiled DLL to the game's bin directory&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="s2"&gt;"./bin/Release/game_server.dll"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PORTAL2CE_GAME_PATH&lt;/span&gt;&lt;span class="s2"&gt;/portal2ce/bin/"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Custom mod DLL compiled and copied!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, if you're just dealing with asset compilation, you'd use specific tools provided by the SDK. For instance, compiling a custom VMT (Valve Material Type) file for a new texture might involve a tool like &lt;code&gt;vpk.exe&lt;/code&gt; or &lt;code&gt;studiomdl.exe&lt;/code&gt; to process models. It's not always JavaScript, but understanding build pipelines is key.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd actually do today
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Download it:&lt;/strong&gt; Get Portal 2: Community Edition from Steam. It's free if you own the original, so no excuses. Get it installed and run it once.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Explore the SDK:&lt;/strong&gt; Find the new modding tools. There's usually a dedicated SDK folder. Poke around, see what files are there, and check for documentation.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Start Small:&lt;/strong&gt; Don't try to build a new game from scratch. Try changing a texture, moving a prop, or altering a simple script value. The Portal 2 mapping community already has a ton of tutorials.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Join the Community:&lt;/strong&gt; Find their Discord or forums. Other devs will be asking questions and sharing tips. This is where you'll get the real answers.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Look for C++ examples:&lt;/strong&gt; If you're serious, find some existing open-source Source Engine mods. See how they structure their code and handle engine interactions. It's a C++ beast, but a manageable one for small changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Gotchas &amp;amp; unknowns
&lt;/h2&gt;

&lt;p&gt;First off, it's a beta. Expect bugs. You might hit crashes, weird physics glitches, or tools that don't quite work as advertised. The documentation might be sparse initially, too. And while it's exciting, remember this is still built on an older engine, even with enhancements. You're not getting Unreal Engine 5 features here. Performance might be an issue with truly massive maps, even with the promised larger map support. Also, how long will Valve (or the community team) actively maintain these new tools? That's always a question with community-led projects. It's a passion project, not a guaranteed long-term support contract.&lt;/p&gt;

&lt;p&gt;What kind of amazing puzzles do you think people will build with these new tools? And what's the first thing you'd try to mod? Let me know in the comments. This could be big, or it could just be a fun distraction, but I'm betting on the former. I'm excited to see what the community comes up with. Maybe a new version of Aperture Science's potato battery?&lt;/p&gt;

</description>
      <category>gaming</category>
      <category>modding</category>
      <category>opensource</category>
      <category>community</category>
    </item>
    <item>
      <title>GitHub Copilot Pauses New Sign-ups: Agentic AI Strains Infrastructure &amp; Scaling Challenges</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Tue, 21 Apr 2026 18:05:46 +0000</pubDate>
      <link>https://dev.to/tahosin/github-copilot-pauses-new-sign-ups-agentic-ai-strains-infrastructure-scaling-challenges-2jo6</link>
      <guid>https://dev.to/tahosin/github-copilot-pauses-new-sign-ups-agentic-ai-strains-infrastructure-scaling-challenges-2jo6</guid>
      <description>&lt;p&gt;I was scrolling through my tech news feed recently when a headline caught my eye: &lt;strong&gt;GitHub has temporarily halted new sign-ups for its Copilot service.&lt;/strong&gt; As a developer who's been keenly observing the rise of AI in our craft, this news immediately struck me as a significant turning point. The reason for the pause? &lt;strong&gt;Infrastructure strain caused by the increasing use of 'agentic AI' features.&lt;/strong&gt;This isn't just about more users; it's about a different kind of AI that's pushing the boundaries of what our current tech infrastructure can handle. It highlights the rapid adoption and immense potential of advanced AI coding tools, but also signals the significant scaling challenges we face.## What is Agentic AI?First, let's unpack what &lt;strong&gt;agentic AI&lt;/strong&gt; means. Unlike simpler AI models that might complete a single task (like suggesting the next word or line of code), agentic AI refers to AI systems that can autonomously perform complex tasks, often breaking them down into multiple sub-tasks, executing them, and even self-correcting along the way.Think of it less as an autocomplete tool and more as a proactive assistant that can understand a higher-level goal and work towards achieving it, potentially interacting with various tools and APIs. This level of autonomy and problem-solving naturally requires significantly more computational resources, as the AI isn't just generating; it's reasoning, planning, and executing.Consider a simple analogy: a basic function suggestion might just pull from a library. An agentic AI might analyze your entire project, understand the context, figure out the best approach, generate a multi-step solution, and even write tests for it. This deep engagement and iterative processing are what demand so much from the underlying infrastructure.## The Resource Demands of Advanced AITo illustrate the difference in resource demands, let's look at a very simplified, conceptual JavaScript example. Imagine a non-agentic function that just gives you a recommendation based on a single input, versus an agentic-like process that needs to iterate, make decisions, and potentially retry.### Basic Suggestion (Low Resource Example)Here's a trivial example of a function that provides a direct suggestion based on a simple input. It's fast and requires minimal computation.&lt;br&gt;
&lt;br&gt;
&lt;code&gt;javascriptfunction getSimpleCodeSuggestion(problemType) {  const suggestions = {    'performance': 'Consider optimizing loop iterations.',    'security': 'Sanitize user inputs carefully.',    'bugfix': 'Check variable scope and type consistency.'  };  return suggestions[problemType] || 'No specific suggestion available.';}console.log(getSimpleCodeSuggestion('performance')); // Output: Consider optimizing loop iterations.&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Agentic-like Process (Higher Resource Example)Now, let's imagine a conceptual
&lt;/h3&gt;

</description>
      <category>github</category>
      <category>githubcopilot</category>
      <category>ai</category>
      <category>scaling</category>
    </item>
    <item>
      <title>HOCKS AI: I Open-Sourced a Full AI Platform With Chat, Vision, Video Analysis &amp; Website Generation — Runs at $0/Month</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Tue, 21 Apr 2026 16:17:46 +0000</pubDate>
      <link>https://dev.to/tahosin/hocks-ai-i-open-sourced-a-full-ai-platform-with-chat-vision-video-analysis-website-generation-59hn</link>
      <guid>https://dev.to/tahosin/hocks-ai-i-open-sourced-a-full-ai-platform-with-chat-vision-video-analysis-website-generation-59hn</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I built and open-sourced a production-ready AI platform that combines chat, image analysis, video analysis, and website generation. It uses free models where possible and costs ~$0/month to run. &lt;a href="https://hocks.app" rel="noopener noreferrer"&gt;Live demo&lt;/a&gt; | &lt;a href="https://github.com/x-tahosin/hocks-ai" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;Every AI tool I tried was either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Too expensive&lt;/strong&gt; — GPT-4 API bills adding up fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-purpose&lt;/strong&gt; — chat OR image analysis, never both&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Closed source&lt;/strong&gt; — no way to learn from the architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted a &lt;strong&gt;single platform&lt;/strong&gt; that handles multiple AI modalities, uses the best free models available, and is fully open-source so other developers can learn from it.&lt;/p&gt;

&lt;p&gt;The result is &lt;strong&gt;HOCKS AI&lt;/strong&gt; — a multi-modal AI assistant platform.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;Live:&lt;/strong&gt; &lt;a href="https://hocks.app" rel="noopener noreferrer"&gt;hocks.app&lt;/a&gt;&lt;br&gt;
📦 &lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/x-tahosin/hocks-ai" rel="noopener noreferrer"&gt;github.com/x-tahosin/hocks-ai&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;AI Model&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;💬 &lt;strong&gt;Streaming Chat&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;OpenRouter GPT-OSS-120B (free)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🌐 &lt;strong&gt;Website Generator&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;OpenRouter Nemotron-3 120B (free)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🖼️ &lt;strong&gt;Image Analysis&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Google Gemini 2.0 Flash&lt;/td&gt;
&lt;td&gt;~$0.002/call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🎬 &lt;strong&gt;Video Analysis&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Google Gemini 2.0 Flash&lt;/td&gt;
&lt;td&gt;~$0.003/call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🧠 &lt;strong&gt;Memory System&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Firebase Firestore&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$0&lt;/strong&gt; (free tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔐 &lt;strong&gt;Auth + Admin&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Firebase Auth&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total monthly cost: ~$0–5&lt;/strong&gt; depending on vision API usage.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Hybrid Model Strategy
&lt;/h2&gt;

&lt;p&gt;This is the key architectural decision. Instead of paying for one expensive model for everything, I split by capability:&lt;/p&gt;
&lt;h3&gt;
  
  
  Free Models for Text Tasks
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Chat + Code Generation → OpenRouter API
├── openai/gpt-oss-120b:free (120B params, conversational)
└── nvidia/nemotron-3-super-120b-a12b:free (code generation)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;These free 120B parameter models are &lt;strong&gt;genuinely production-quality&lt;/strong&gt; for text tasks. GPT-OSS-120B handles conversational AI beautifully — context tracking, nuanced responses, multi-turn dialogue. Nemotron-3 excels at code generation and can build full websites from prompts.&lt;/p&gt;
&lt;h3&gt;
  
  
  Paid Models for Vision Tasks
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Image + Video Analysis → Google Gemini 2.0 Flash
├── analyzeImage (~$0.002/call)
└── analyzeVideo (~$0.003/call)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Free models simply can't match Gemini's multimodal capabilities yet. Image understanding, OCR, visual reasoning — Gemini 2.0 Flash delivers production-quality results at extremely low per-call costs.&lt;/p&gt;


&lt;h2&gt;
  
  
  Architecture Deep Dive
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────┐
│          Frontend (React 18 + Vite)         │
│         Firebase Hosting / hocks.app        │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│     Firebase Cloud Functions (Node 20)      │
├─────────────────────────────────────────────┤
│  streamChat ────► OpenRouter (GPT-OSS-120B) │
│  generateCode ──► OpenRouter (Nemotron-3)   │
│  analyzeImage ──► Google Gemini 2.0 Flash   │
│  analyzeVideo ──► Google Gemini 2.0 Flash   │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│           Firebase Services                 │
│  • Firestore (users, memories, analytics)   │
│  • Authentication (Google + Email/Pass)     │
│  • Secret Manager (all API keys)            │
│  • Storage (file uploads)                   │
└─────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Design Decisions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Zero API Keys in Frontend&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every AI call is proxied through Firebase Cloud Functions. API keys live exclusively in Firebase Secret Manager — not in environment variables, not in &lt;code&gt;.env&lt;/code&gt; files, not anywhere in client code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Cloud Function reads secret at runtime&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;geminiApiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defineSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;analyzeImage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;onCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;geminiApiKey&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Key is only available server-side&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;genAI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getGenerativeModel&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. SSE Streaming for Real-Time Chat&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of waiting for the full response, the chat streams tokens in real-time using Server-Sent Events:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Server: Stream each chunk from OpenRouter&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;orResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`data: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fullText&lt;/span&gt; &lt;span class="p"&gt;})}&lt;/span&gt;&lt;span class="s2"&gt;\n\n`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Client: Render as tokens arrive&lt;/span&gt;
&lt;span class="nx"&gt;eventSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onmessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;updateChatUI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Instant visual feedback&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Per-User Memory System&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The AI remembers context across sessions. Users can save memories that persist in Firestore and are injected into every AI conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Inject memories into system prompt&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;systemContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;systemContent&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;=== USER'S SAVED MEMORIES ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;systemContent&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;. &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Admin Dashboard with Cost Tracking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Built-in analytics track every API call in real-time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Usage counters per feature (chat, image, video, website)&lt;/li&gt;
&lt;li&gt;Daily cost breakdown with budget alerts&lt;/li&gt;
&lt;li&gt;Feature toggles — disable any AI feature instantly&lt;/li&gt;
&lt;li&gt;Audit logging for all admin actions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Security Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Keys&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Firebase Secret Manager (never in code)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Firestore rules enforce per-user access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Admin Access&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom claims + email verification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authentication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Firebase Auth (Google + email/password)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audit Trail&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every admin action logged with timestamp&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React 18, Vite, CSS3 (Glassmorphism dark UI)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Firebase Cloud Functions (Node.js 20)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Engine&lt;/td&gt;
&lt;td&gt;Google Gemini 2.0 Flash + OpenRouter (free models)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;Cloud Firestore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth&lt;/td&gt;
&lt;td&gt;Firebase Authentication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;Firebase Hosting (custom domain)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secrets&lt;/td&gt;
&lt;td&gt;Firebase Secret Manager&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Get Started in 5 Minutes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone&lt;/span&gt;
git clone https://github.com/x-tahosin/hocks-ai.git
&lt;span class="nb"&gt;cd &lt;/span&gt;hocks-ai

&lt;span class="c"&gt;# Install&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;functions &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; ..

&lt;span class="c"&gt;# Set your API keys securely&lt;/span&gt;
firebase functions:secrets:set GEMINI_API_KEY
firebase functions:secrets:set OPENROUTER_API_KEY

&lt;span class="c"&gt;# Deploy everything&lt;/span&gt;
firebase deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 20+&lt;/li&gt;
&lt;li&gt;Firebase CLI (&lt;code&gt;npm i -g firebase-tools&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;A Gemini API key from &lt;a href="https://ai.google.dev" rel="noopener noreferrer"&gt;ai.google.dev&lt;/a&gt; (free)&lt;/li&gt;
&lt;li&gt;An OpenRouter API key from &lt;a href="https://openrouter.ai" rel="noopener noreferrer"&gt;openrouter.ai&lt;/a&gt; (free models available)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Free AI models are production-viable&lt;/strong&gt; — 120B parameter models handle conversational AI surprisingly well&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid strategies save money&lt;/strong&gt; — use free for text, paid only for vision&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Firebase Secret Manager &amp;gt; .env files&lt;/strong&gt; — proper secret management matters in production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSE streaming transforms UX&lt;/strong&gt; — users seeing real-time responses feels dramatically better than waiting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost tracking from day one&lt;/strong&gt; — know exactly where every dollar goes&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🔗 &lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://hocks.app" rel="noopener noreferrer"&gt;hocks.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 &lt;strong&gt;Source code:&lt;/strong&gt; &lt;a href="https://github.com/x-tahosin/hocks-ai" rel="noopener noreferrer"&gt;github.com/x-tahosin/hocks-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;⭐ Star the repo if you find it useful!&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;What free AI models are you using in production? I'd love to hear about your hybrid model strategies in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>webdev</category>
      <category>firebase</category>
    </item>
    <item>
      <title>5 TypeScript Patterns Every Developer Should Know in 2026</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:13:15 +0000</pubDate>
      <link>https://dev.to/tahosin/5-typescript-patterns-every-developer-should-know-in-2026-58ik</link>
      <guid>https://dev.to/tahosin/5-typescript-patterns-every-developer-should-know-in-2026-58ik</guid>
      <description>&lt;p&gt;TypeScript has evolved massively. Here are 5 patterns I use daily that make my code bulletproof.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Discriminated Unions for State Management
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;State&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; 
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;idle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;loading&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;User&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;State&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// TS knows data exists here&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// TS knows error exists here&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The compiler narrows the type automatically. No more &lt;code&gt;if (data !== undefined)&lt;/code&gt; everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. &lt;code&gt;satisfies&lt;/code&gt; for Type-Safe Configs
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;apiUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;satisfies&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// config.apiUrl is still typed as string, not string | number&lt;/span&gt;
&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiUrl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toUpperCase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// ✅ Works!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;satisfies&lt;/code&gt; validates the type without widening it.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Template Literal Types for API Routes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ApiRoute&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`/api/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;UserRoute&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`/api/users/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;route&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ApiRoute&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;fetchApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/users/123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// ✅&lt;/span&gt;
&lt;span class="nf"&gt;fetchApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/dashboard&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;      &lt;span class="c1"&gt;// ❌ Type error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Const Assertions for Readonly Everything
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ROLES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;admin&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;viewer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;ROLES&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c1"&gt;// "admin" | "user" | "viewer"&lt;/span&gt;

&lt;span class="c1"&gt;// Instead of: type Role = string&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Branded Types for Domain Safety
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;UserId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;__brand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;UserId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;PostId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;__brand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;PostId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;UserId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getPost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PostId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;abc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;UserId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// ✅&lt;/span&gt;
&lt;span class="nf"&gt;getPost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// ❌ Type error — can't mix IDs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Which TypeScript patterns do you use the most? Drop your favorites below!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow me for more TypeScript and AI content: &lt;a href="https://dev.to/tahosin"&gt;@tahosin&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>javascript</category>
      <category>webdev</category>
      <category>beginners</category>
    </item>
    <item>
      <title>5 Free AI APIs You Can Use Today (No Credit Card Required)</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Sun, 19 Apr 2026 16:49:34 +0000</pubDate>
      <link>https://dev.to/tahosin/5-free-ai-apis-you-can-use-today-no-credit-card-required-2hag</link>
      <guid>https://dev.to/tahosin/5-free-ai-apis-you-can-use-today-no-credit-card-required-2hag</guid>
      <description>&lt;p&gt;You don't need to pay OpenAI $20/month to build AI apps. Here are 5 &lt;strong&gt;completely free&lt;/strong&gt; AI APIs you can start using right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Google Gemini API
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Text generation, analysis, code generation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; 15 requests/minute, 1M tokens/day&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models:&lt;/strong&gt; Gemini 2.0 Flash (fast), Gemini Pro (powerful)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signup:&lt;/strong&gt; &lt;a href="https://ai.google.dev" rel="noopener noreferrer"&gt;ai.google.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain quantum computing simply&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;I built:&lt;/strong&gt; &lt;a href="https://maxai-writer.pages.dev" rel="noopener noreferrer"&gt;MaxAI Writer&lt;/a&gt; and &lt;a href="https://ecosense-ai.pages.dev" rel="noopener noreferrer"&gt;EcoSense AI&lt;/a&gt; entirely on Gemini's free tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Hugging Face Inference API
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Specialized models (sentiment, translation, image classification)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; Rate-limited, thousands of models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signup:&lt;/strong&gt; &lt;a href="https://huggingface.co" rel="noopener noreferrer"&gt;huggingface.co&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Cloudflare Workers AI
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Edge inference, low latency&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; 10,000 neurons/day&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models:&lt;/strong&gt; Llama, Whisper, Stable Diffusion&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Groq
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Fastest inference speeds&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; 30 RPM on Llama models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signup:&lt;/strong&gt; &lt;a href="https://console.groq.com" rel="noopener noreferrer"&gt;console.groq.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Cohere
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Enterprise-grade text analysis, RAG&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; 5 RPM, trial API key&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Rate Limit&lt;/th&gt;
&lt;th&gt;Signup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Google Gemini&lt;/td&gt;
&lt;td&gt;General AI&lt;/td&gt;
&lt;td&gt;15 RPM&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hugging Face&lt;/td&gt;
&lt;td&gt;Specialized&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare AI&lt;/td&gt;
&lt;td&gt;Edge&lt;/td&gt;
&lt;td&gt;10K/day&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Groq&lt;/td&gt;
&lt;td&gt;Speed&lt;/td&gt;
&lt;td&gt;30 RPM&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cohere&lt;/td&gt;
&lt;td&gt;Text analysis&lt;/td&gt;
&lt;td&gt;5 RPM&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Which free API are you using? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>beginners</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
