<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Straightly</title>
    <description>The latest articles on DEV Community by Straightly (@straightly).</description>
    <link>https://dev.to/straightly</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F126286%2F3c66f4b1-9c7f-478c-a256-5ffd4034c16d.png</url>
      <title>DEV Community: Straightly</title>
      <link>https://dev.to/straightly</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/straightly"/>
    <language>en</language>
    <item>
      <title>OIC: From a Working Toast Watcher to a General "Watch It for Me" Agent</title>
      <dc:creator>Straightly</dc:creator>
      <pubDate>Mon, 25 May 2026 03:46:29 +0000</pubDate>
      <link>https://dev.to/straightly/oic-from-a-working-toast-watcher-to-a-general-watch-it-for-me-agent-2njm</link>
      <guid>https://dev.to/straightly/oic-from-a-working-toast-watcher-to-a-general-watch-it-for-me-agent-2njm</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned Trying to Put Gemma 4 Into a Local iPhone Watcher
&lt;/h2&gt;

&lt;p&gt;I already had a small iPhone app called &lt;strong&gt;OIC&lt;/strong&gt;, short for &lt;em&gt;"Oh, I See."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In its current working form, OIC can watch my toaster and tell me when my toast is ready. It uses a custom, hand-rolled local vision model that serves the same architectural role in OIC that I hoped Gemma 4 could serve more generally.&lt;/p&gt;

&lt;p&gt;What I wanted next was to see whether OIC could become something broader: a general "watch it for me" agent. Instead of writing a separate detector for every situation, I wanted to find out whether a small multimodal Gemma 4 model running locally on the phone could let the same watcher loop handle different tasks through instruction.&lt;/p&gt;

&lt;p&gt;The use case I wanted to add was tracking my cat.&lt;/p&gt;

&lt;p&gt;My cat likes to go outside, but it is not trained to come home on its own. Sometimes I have to go find it. Sometimes it has already come back, and I waste time and worry looking for it when I did not need to. That felt like a very good use case for a local visual watcher:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;watch the back door&lt;/li&gt;
&lt;li&gt;detect whether the cat went out or came home&lt;/li&gt;
&lt;li&gt;keep a record of the state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I did not get that full Gemma-powered cat-door watcher working in time for this challenge. But I did get far enough to learn something important about local multimodal models, iPhone deployment, and what it really takes to turn a narrow watcher into a general one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Concept: The Watcher Architecture
&lt;/h2&gt;

&lt;p&gt;The core loop for a watcher is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;watch a scene&lt;/li&gt;
&lt;li&gt;interpret what matters in that scene&lt;/li&gt;
&lt;li&gt;decide whether an event happened&lt;/li&gt;
&lt;li&gt;update state and notify the user if needed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What changes from one watcher to another is not the loop. What changes is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the user wants watched&lt;/li&gt;
&lt;li&gt;what events matter&lt;/li&gt;
&lt;li&gt;what labels the watcher should return&lt;/li&gt;
&lt;li&gt;what visual reasoning is needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why Gemma 4 interested me.&lt;/p&gt;

&lt;p&gt;If a small multimodal model could run locally on the phone and follow instructions well enough, OIC could become a more general visual watcher:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Watch the toaster and tell me when the toast is ready."&lt;/li&gt;
&lt;li&gt;"Watch the back door and tell me whether my cat is outside or back home."&lt;/li&gt;
&lt;li&gt;"Watch other scenes that can be described simply enough."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same loop. Different watched target. Different instruction.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Was Already Working
&lt;/h2&gt;

&lt;p&gt;Before I tried Gemma, OIC already had one working watcher: toast.&lt;/p&gt;

&lt;p&gt;I was not starting from a blank AI demo. I already had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an iPhone app&lt;/li&gt;
&lt;li&gt;a camera loop&lt;/li&gt;
&lt;li&gt;a working toast-monitoring path&lt;/li&gt;
&lt;li&gt;an alert flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The toast watcher is still the cleanest baseline in the project. It is narrow, controlled, and useful. It also made the Gemma experiment more interesting, because I was not asking whether AI could solve a toy problem. I was asking whether a working narrow watcher could be extended into a more general one without losing its local-first character.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Accomplished
&lt;/h2&gt;

&lt;p&gt;I did not finish the cat-door watcher, but I did accomplish several things that matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. I refactored OIC from a toast-specific app toward a watcher architecture
&lt;/h3&gt;

&lt;p&gt;That included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;watcher specifications&lt;/li&gt;
&lt;li&gt;watcher labels&lt;/li&gt;
&lt;li&gt;watcher selection in the app&lt;/li&gt;
&lt;li&gt;a path for multiple watcher types&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. I kept the existing toast watcher working while extending the architecture
&lt;/h3&gt;

&lt;p&gt;The toast watcher is not just a demo. It is the working baseline.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. I got a Gemma GGUF runtime working locally on the iPhone
&lt;/h3&gt;

&lt;p&gt;This was a concrete milestone.&lt;/p&gt;

&lt;p&gt;I integrated a local &lt;code&gt;llama.cpp&lt;/code&gt; iOS XCFramework path, set up app-local model handling, and got the app to load the Gemma GGUF model on-device.&lt;/p&gt;

&lt;p&gt;That did not mean the full watcher worked. It did show that OIC could host a Gemma runtime locally on the phone instead of depending on a cloud loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. I built the plumbing that a local watcher actually needs
&lt;/h3&gt;

&lt;p&gt;A lot of the work ended up being operational:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local model file placement&lt;/li&gt;
&lt;li&gt;app-managed model directories&lt;/li&gt;
&lt;li&gt;separating model transfer from app installation&lt;/li&gt;
&lt;li&gt;avoiding accidental app bloat from bundling giant GGUF files&lt;/li&gt;
&lt;li&gt;watcher session tracing&lt;/li&gt;
&lt;li&gt;result recording&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was not the glamorous part, but it was necessary. Local AI on mobile is not just about the model. It is also about packaging, transfer, storage, and runtime discipline.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. I learned that one must be able to verify every step
&lt;/h3&gt;

&lt;p&gt;That is the biggest lesson I learned from this attempt.&lt;/p&gt;

&lt;p&gt;I had to add traces to tell me exactly where the app was in the pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the camera opened&lt;/li&gt;
&lt;li&gt;the model loaded&lt;/li&gt;
&lt;li&gt;the first frame was captured&lt;/li&gt;
&lt;li&gt;the first frame actually reached the model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not the same milestone.&lt;/p&gt;

&lt;p&gt;Without that level of tracing, I made a few false starts onto the wrong paths. I could have easily mistaken motion in the app for progress in the inference loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Did Not Work
&lt;/h2&gt;

&lt;p&gt;I did not get to the point where I could show that a camera frame from the cat-door watcher was successfully handed into Gemma for image-conditioned inference and returned a usable result.&lt;/p&gt;

&lt;p&gt;That is the missing milestone.&lt;/p&gt;

&lt;p&gt;More specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I got local GGUF runtime startup working on the phone.&lt;/li&gt;
&lt;li&gt;I got the cat-door watcher path into the app.&lt;/li&gt;
&lt;li&gt;I got camera start and frame capture traces.&lt;/li&gt;
&lt;li&gt;I did &lt;strong&gt;not&lt;/strong&gt; get a verified end-to-end multimodal first-frame Gemma inference result.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That turned out to be the line between "interesting prototype" and "working general watcher."&lt;/p&gt;

&lt;h2&gt;
  
  
  Two More Technical Lessons
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Loading a model is not the same as having a working watcher
&lt;/h3&gt;

&lt;p&gt;Model startup was only the beginning.&lt;/p&gt;

&lt;p&gt;A watcher still needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a camera frame&lt;/li&gt;
&lt;li&gt;conversion into the right image representation&lt;/li&gt;
&lt;li&gt;a multimodal call path&lt;/li&gt;
&lt;li&gt;structured output&lt;/li&gt;
&lt;li&gt;traceable timing&lt;/li&gt;
&lt;li&gt;behavior stable enough to repeat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Getting the model to load was progress, but it was not proof that the watcher loop worked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local AI on mobile is also a deployment problem
&lt;/h3&gt;

&lt;p&gt;Some of the hardest issues had nothing to do with model intelligence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;app size exploding when model files were bundled into the app&lt;/li&gt;
&lt;li&gt;iPhone storage pressure&lt;/li&gt;
&lt;li&gt;Finder and file-sharing friction&lt;/li&gt;
&lt;li&gt;making sure the phone could actually see the model files where the app expected them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was a good reminder that on-device AI is not just about whether a model can run. It is also about whether the whole system can be deployed, managed, and repeated cleanly on the device.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Still Think Gemma 4 Is a Strong Fit
&lt;/h2&gt;

&lt;p&gt;Even though I did not finish the cat-door watcher, I still think Gemma 4 is the right kind of model family for this project.&lt;/p&gt;

&lt;p&gt;OIC is not trying to be a chat app. It is trying to be a local, focused, scene-aware watcher.&lt;/p&gt;

&lt;p&gt;That means I care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;on-device inference&lt;/li&gt;
&lt;li&gt;a narrow control loop&lt;/li&gt;
&lt;li&gt;promptable behavior&lt;/li&gt;
&lt;li&gt;multimodal reasoning&lt;/li&gt;
&lt;li&gt;reusing one product loop across different watcher tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why Gemma still feels like such a good fit for the idea. If a local model can be instructed well enough, then OIC may not need a separate hand-built algorithm for every scene it watches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Baseline
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Target model path:&lt;/strong&gt; Gemma 4 GGUF running through &lt;code&gt;llama.cpp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local host:&lt;/strong&gt; iPhone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime engine:&lt;/strong&gt; &lt;code&gt;llama.cpp&lt;/code&gt; iOS XCFramework integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Camera pipeline:&lt;/strong&gt; &lt;code&gt;AVFoundation&lt;/code&gt; capturing real-time frames&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Current status:&lt;/strong&gt; runtime startup works locally; verified first-frame multimodal inference is still the missing step&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Would Do Next
&lt;/h2&gt;

&lt;p&gt;The next step is not "add more AI."&lt;/p&gt;

&lt;p&gt;It is narrower and more technical:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;verify that a GGUF-compatible multimodal iOS path exists for the current model assets&lt;/li&gt;
&lt;li&gt;get one camera frame into that path&lt;/li&gt;
&lt;li&gt;record the exact result in a watcher trace&lt;/li&gt;
&lt;li&gt;only then measure latency, cadence, and whether the watcher is practical&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That sequence follows directly from the biggest lesson in this project: verify the result of each step before moving on to the next one.&lt;/p&gt;

&lt;p&gt;If that works, OIC gets much closer to what I originally wanted: a local watcher loop that can be retargeted by instruction instead of rebuilt from scratch for every new task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;This attempt did not produce a finished Gemma-powered cat-door watcher yet.&lt;/p&gt;

&lt;p&gt;What it did produce was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a working toast watcher as the baseline&lt;/li&gt;
&lt;li&gt;an attempt to generalize that watcher architecture&lt;/li&gt;
&lt;li&gt;an on-device Gemma runtime path on iPhone&lt;/li&gt;
&lt;li&gt;a clearer understanding of what local multimodal product work actually demands&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I started by hoping Gemma 4 would let me turn OIC into a general "watch it for me" agent.&lt;/p&gt;

&lt;p&gt;Trying to do it showed me exactly where the next barrier is.&lt;/p&gt;

&lt;p&gt;For OIC, that barrier is the first verified multimodal frame.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
