<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Humphreysun98</title>
    <description>The latest articles on DEV Community by Humphreysun98 (@humphreysun98).</description>
    <link>https://dev.to/humphreysun98</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4011681%2Fce3db7bf-dd16-48dc-a7c3-2579dcdda910.jpg</url>
      <title>DEV Community: Humphreysun98</title>
      <link>https://dev.to/humphreysun98</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/humphreysun98"/>
    <language>en</language>
    <item>
      <title>Designing an AI agent for the factory floor (model reasons, code never decides)</title>
      <dc:creator>Humphreysun98</dc:creator>
      <pubDate>Thu, 02 Jul 2026 05:32:27 +0000</pubDate>
      <link>https://dev.to/humphreysun98/safetycommander-an-ai-safety-officer-where-the-model-reasons-and-the-code-never-decides-4765</link>
      <guid>https://dev.to/humphreysun98/safetycommander-an-ai-safety-officer-where-the-model-reasons-and-the-code-never-decides-4765</guid>
      <description>&lt;p&gt;Every factory floor already has cameras. The problem is that nobody is watching them. A safety officer can't be on every camera, on every shift, on every floor — so the footage gets recorded and reviewed &lt;em&gt;after&lt;/em&gt; someone is already hurt.&lt;/p&gt;

&lt;p&gt;The obvious fix — "detect no hardhat → fire an alarm" — is worse than it sounds. It's hardcoded, it can't read &lt;em&gt;your&lt;/em&gt; site's rules, and it drowns the ops team in false alarms until they mute it. What's actually missing is something that &lt;strong&gt;reasons about risk the way a human EHS officer does — and can prove why.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;SafetyCommander&lt;/strong&gt;: an autonomous EHS agent that watches the floor, reasons about hazards from your written policy, grounds every alert in OSHA law, routes it to the right worker, and hands off a report. It started at the Zapdos Labs × Antler hackathon (&lt;em&gt;AI Agents for the American Industrial Revolution&lt;/em&gt;) and I've kept building on it since.&lt;/p&gt;

&lt;p&gt;This post is about the one design decision that made it work, and two engineering war-stories that were more interesting than I expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  The one decision that matters: risk lives in exactly one module
&lt;/h2&gt;

&lt;p&gt;The hardest question in this whole space — and the one the hackathon judged on — is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is the &lt;strong&gt;model&lt;/strong&gt; doing the reasoning, or is the &lt;strong&gt;developer&lt;/strong&gt; hardcoding it? Rules don't count.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It's an easy thing to &lt;em&gt;claim&lt;/em&gt; and a hard thing to &lt;em&gt;build&lt;/em&gt;, because the temptation to sneak a rule in is everywhere. The moment you write &lt;code&gt;if hazard == "no_hardhat": risk = "high"&lt;/code&gt;, you no longer have an agent — you have a rulebook with a language-model bolted on top, and it breaks the instant the site's policy differs from your assumptions.&lt;/p&gt;

&lt;p&gt;So I made one rule for myself: &lt;strong&gt;risk is decided in exactly one place.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;THINK    vlm_judge.py   ← reads safety_policy.txt + the footage together,
                          returns a verdict AND the clause it relied on
ACT      actions.py     ← dispatch() only ROUTES the risk level to actions
GROUND   perception.py  ← YOLO measures facts (distance, counts) — no risk
         rag.py         ← retrieves the OSHA clause to cite — no risk
REPORT   shift_report.py, kpi_report.py, planner.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every module except &lt;code&gt;vlm_judge.py&lt;/code&gt; is deliberately &lt;strong&gt;decision-free&lt;/strong&gt;. &lt;code&gt;dispatch()&lt;/code&gt; takes the model's &lt;code&gt;risk_level&lt;/code&gt; and maps it to actions (log, notify, corrective ticket, escalate). The perception layer &lt;em&gt;measures&lt;/em&gt; — it will tell you a forklift is 2.1 m from a person, but it will never tell you that's dangerous. The retriever &lt;em&gt;cites&lt;/em&gt; — it hands the model OSHA 1910.178, but it doesn't decide anything. Risk is the model's job, and only the model's.&lt;/p&gt;

&lt;p&gt;The payoff is a property you can &lt;em&gt;demonstrate live&lt;/em&gt;: &lt;strong&gt;edit one line of the policy and the verdict flips&lt;/strong&gt;, with zero code changes. A hardcoded system physically cannot do that. More on the demo below.&lt;/p&gt;

&lt;p&gt;The verdict the model returns is structured, and it has to cite its work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"observation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A forklift moves with a raised load while a worker stands in the aisle..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hazard_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"forklift_pedestrian_proximity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"risk_level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"policy_clause"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2.1 A minimum 3-meter separation must be kept between a moving forklift and any pedestrian."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Section 2.1 requires 3 m; perception measured 2.1 m with the load raised → per Section 8 this is HIGH."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recommended_actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Sound horn"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Stop forklift"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Open corrective action"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the &lt;code&gt;policy_clause&lt;/code&gt; field. If the model can't point to the rule it applied, the alert isn't trustworthy — and an ops manager will (correctly) ignore it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The loop: Watch → Decide → Act → Report
&lt;/h2&gt;

&lt;p&gt;The agent runs a plain, auditable control loop over video — not single stills. It slides over each clip in short temporal windows and sends the frames in a window to the VLM &lt;em&gt;together&lt;/em&gt;, so it reasons about &lt;strong&gt;behaviour over time&lt;/strong&gt;: a pedestrian &lt;em&gt;entering&lt;/em&gt; a forklift's path is a developing near-miss; a single frame misses it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Watch&lt;/strong&gt; — an on-prem YOLO detector measures people, forklifts, and the distance between them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decide&lt;/strong&gt; — Qwen3-VL (served on vLLM) reads the policy, the frames, and those facts, and returns the verdict above.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Act&lt;/strong&gt; — &lt;code&gt;dispatch()&lt;/code&gt; routes the risk level to guarded actions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Report&lt;/strong&gt; — events accumulate into a written shift-handoff for the next crew, and roll up into weekly plans and monthly KPIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deliberately boring. Boring is auditable, and auditable is what a safety tool needs.&lt;/p&gt;




&lt;h2&gt;
  
  
  War-story #1: teaching YOLO to ignore a press machine
&lt;/h2&gt;

&lt;p&gt;The perception layer exists for one honest reason: &lt;strong&gt;a VLM can't reliably measure distance on low-res, wide-angle CCTV.&lt;/strong&gt; So YOLO supplies hard numbers &lt;em&gt;before&lt;/em&gt; the model ever sees the frame — and outputs facts only.&lt;/p&gt;

&lt;p&gt;Getting the forklift detector right was the first place reality bit back. Off-the-shelf forklift models &lt;strong&gt;false-fired on our press machines&lt;/strong&gt; — grey factory CCTV looks nothing like the construction-site data those models were trained on.&lt;/p&gt;

&lt;p&gt;The instinct is "just fine-tune it." I tried. A plain fine-tune fixed the false alarms and then went &lt;em&gt;blind to real forklifts&lt;/em&gt; — a classic domain overfit. I tried negative mining (oversampling confirmed forklift-free press frames as background). It didn't ship either; it kept costing recall on the money-shot frames.&lt;/p&gt;

&lt;p&gt;What actually shipped was much less glamorous and much more effective: &lt;strong&gt;use the pretrained model and raise the confidence threshold for the forklift class to 0.8.&lt;/strong&gt; The presses fire at 0.65–0.77; real forklifts fire at 0.83–0.92. A single threshold cleanly separates them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;LABEL_CONF&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;forklift&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FORKLIFT_CONF&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Honest result: &lt;strong&gt;18/21 precision&lt;/strong&gt;, clean on the demo's money-shot frames, with the remaining false positives isolated to one camera's viewing angle. The lesson I keep relearning: the boring knob often beats the clever retrain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Grounded in real regulation (RAG)
&lt;/h2&gt;

&lt;p&gt;An alert is only as good as its justification. Behind the editable house rules (&lt;code&gt;safety_policy.txt&lt;/code&gt;) sits a corpus of the &lt;em&gt;real&lt;/em&gt; references — OSHA 1910 standards, plant SOPs, chemical safety sheets. For each scene the agent retrieves the relevant regulation, and the model cites the specific standard in its verdict.&lt;/p&gt;

&lt;p&gt;The nice part: retrieval supplies &lt;em&gt;knowledge&lt;/em&gt;, not &lt;em&gt;decisions&lt;/em&gt;. An overloaded forklift your house policy happens to pass gets flagged the moment OSHA 1910.178 (obstructed view) is retrieved — but it's still the model that decides, and cites, the risk.&lt;/p&gt;




&lt;h2&gt;
  
  
  The killer demo: edit one line, watch the verdict flip
&lt;/h2&gt;

&lt;p&gt;This is the whole thesis in twenty seconds.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A forklift carries a tall, view-blocking load. Under the current policy, the agent watches it and says: &lt;strong&gt;no violation&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;I add &lt;strong&gt;one line&lt;/strong&gt; to &lt;code&gt;safety_policy.txt&lt;/code&gt; — clause 2.6: &lt;em&gt;a load must not block the operator's view.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Re-run. The &lt;strong&gt;same footage&lt;/strong&gt; is now flagged &lt;strong&gt;MEDIUM&lt;/strong&gt;, and the model cites &lt;strong&gt;2.6&lt;/strong&gt; — the clause I just wrote.&lt;/li&gt;
&lt;li&gt;No code changed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the difference between an agent and a rulebook. The behaviour came from the policy, routed through the model's reasoning — not from a branch in my code.&lt;/p&gt;




&lt;h2&gt;
  
  
  War-story #2: making the boxes actually track the video
&lt;/h2&gt;

&lt;p&gt;The live dashboard plays the CCTV clip and draws YOLO boxes on it. My first version ran detection once per analysis window (~every 3 s) and painted those boxes on the playing video. It looked wrong immediately: the video moves at 25 fps, the boxes were frozen at the moment of analysis, so they lagged the people by seconds. Boxes that don't track are worse than no boxes — they read as a broken product.&lt;/p&gt;

&lt;p&gt;True per-frame detection over a network tunnel isn't feasible at video framerate. The fix was to flip &lt;em&gt;who&lt;/em&gt; drives detection: instead of the server pushing stale boxes, &lt;strong&gt;the browser samples the frame it is currently showing&lt;/strong&gt;, sends it to a &lt;code&gt;/api/detect&lt;/code&gt; endpoint (proxied to the GPU), and draws the returned boxes. Now the boxes line up with what's on screen, within one round-trip, and refresh a few times a second.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// grab the frame the &amp;lt;video&amp;gt; is showing right now, detect, draw&lt;/span&gt;
&lt;span class="nx"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2d&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;drawImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;video&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;boxes&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/detect&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;image_b64&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toDataURL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image/jpeg&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nf"&gt;drawBoxes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;boxes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The insight that generalizes: when you can't make the producer fast enough, move the sampling to the consumer that already knows the ground truth (here, the exact frame on screen).&lt;/p&gt;




&lt;h2&gt;
  
  
  From an agent to a product
&lt;/h2&gt;

&lt;p&gt;An alert nobody receives is useless, so SafetyCommander isn't just the loop — it's a small web app for the two people who actually use it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workers&lt;/strong&gt; get a Slack/DingTalk-style &lt;strong&gt;inbox&lt;/strong&gt;: alerts in &lt;em&gt;their&lt;/em&gt; zone are pushed to them to Acknowledge / Resolve / Escalate, and their scheduled inspections and training arrive as tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managers&lt;/strong&gt; get a console: KPIs, hazard and zone charts, the corrective-action backlog, the AI weekly plan, and an alert-delivery table that closes the loop — who got each alert, and who acted on it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it works across three horizons from the same data: &lt;strong&gt;DAY&lt;/strong&gt; (live monitor + handoff), &lt;strong&gt;WEEK&lt;/strong&gt; (an AI-proposed inspection/training plan grounded in retrieved industry cadence), and &lt;strong&gt;MONTH&lt;/strong&gt; (a KPI roll-up). The agent proposes; humans confirm.&lt;/p&gt;




&lt;h2&gt;
  
  
  Honest numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning:&lt;/strong&gt; verdicts cite the controlling clause; flipping a policy line flips the verdict.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detection:&lt;/strong&gt; 18/21 forklift precision, 3/4 recall, measured 2.1 m near-miss distance. (About half of person↔forklift pairs read 0.0 m on 2-D box overlap — reported, not hidden.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robustness:&lt;/strong&gt; in temporal (video) mode, &lt;strong&gt;0 false criticals&lt;/strong&gt; across the demo clips; the grounding prompt plus real distances stop the model inventing hazards it can't see.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are "100%." Ops managers respect an honest number far more than a suspicious one.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;This started as a hackathon build and it's still moving: a served remote-perception layer (the GPU can live anywhere), a local-VLM fallback for air-gapped sites, and richer per-worker routing. But the core won't change — &lt;strong&gt;the model reasons, and the code never decides.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Code: &lt;a href="https://github.com/HumphreySun98/safety-commander-agent" rel="noopener noreferrer"&gt;github.com/HumphreySun98/safety-commander-agent&lt;/a&gt;&lt;br&gt;
Stack: Qwen3-VL on vLLM · YOLO · TF-IDF RAG over an OSHA/SOP corpus · Flask.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Built at the Zapdos Labs × Antler hackathon; in active development since.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>factory</category>
      <category>agents</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
