<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ryan Hsu</title>
    <description>The latest articles on DEV Community by Ryan Hsu (@ryan_hsu_wearedge).</description>
    <link>https://dev.to/ryan_hsu_wearedge</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3956029%2Fa0ab9f2d-13d3-4155-83a1-a27534b01118.jpg</url>
      <title>DEV Community: Ryan Hsu</title>
      <link>https://dev.to/ryan_hsu_wearedge</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ryan_hsu_wearedge"/>
    <language>en</language>
    <item>
      <title>I Ran Five Small Multimodal Models on a Jetson. The Fastest One Was Not the Best Baseline.</title>
      <dc:creator>Ryan Hsu</dc:creator>
      <pubDate>Thu, 18 Jun 2026 02:45:16 +0000</pubDate>
      <link>https://dev.to/ryan_hsu_wearedge/i-ran-five-small-multimodal-models-on-a-jetson-the-fastest-one-was-not-the-best-baseline-3de1</link>
      <guid>https://dev.to/ryan_hsu_wearedge/i-ran-five-small-multimodal-models-on-a-jetson-the-fastest-one-was-not-the-best-baseline-3de1</guid>
      <description>&lt;p&gt;I have been building WearEdge Pro, a wearable industrial edge AI runtime. Think of a frontline operator wearing a smart-glasses device, capturing a first-person image of a machine, and getting back a structured action card from a local Jetson box.&lt;/p&gt;

&lt;p&gt;The key phrase is "structured action card." This is not a chat demo. In a factory setting, an answer needs an audit trail, a mode boundary, a human-confirmation gate, and a way to hand off to maintenance, quality, EHS, or work-instruction workflows.&lt;/p&gt;

&lt;p&gt;I recently tested five compact multimodal models on the same Jetson path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemma 4 E2B&lt;/li&gt;
&lt;li&gt;Qwen2.5-VL-3B&lt;/li&gt;
&lt;li&gt;SmolVLM2-2.2B&lt;/li&gt;
&lt;li&gt;InternVL3-2B&lt;/li&gt;
&lt;li&gt;Qwen2.5-Omni-3B&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal was not to crown a universal benchmark champion. I wanted to know which model was the best current baseline for an industrial edge Agent runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Harness
&lt;/h2&gt;

&lt;p&gt;Every model was exposed through a local OpenAI-compatible llama.cpp endpoint on the Jetson. Each model got the same five prompts and images:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;maintenance&lt;/li&gt;
&lt;li&gt;quality inspection&lt;/li&gt;
&lt;li&gt;changeover&lt;/li&gt;
&lt;li&gt;work instruction&lt;/li&gt;
&lt;li&gt;hazard review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The main run used 560 image tokens, which matches the current WearEdge gateway budget. Qwen2.5-VL also got a 1024-image-token pass because grounding can improve with more visual tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Completion&lt;/th&gt;
&lt;th&gt;Avg latency&lt;/th&gt;
&lt;th&gt;Takeaway&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 E2B&lt;/td&gt;
&lt;td&gt;5/5&lt;/td&gt;
&lt;td&gt;37.51s raw&lt;/td&gt;
&lt;td&gt;Best product baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen2.5-VL-3B&lt;/td&gt;
&lt;td&gt;5/5&lt;/td&gt;
&lt;td&gt;39.72s&lt;/td&gt;
&lt;td&gt;Best OCR challenger&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SmolVLM2-2.2B&lt;/td&gt;
&lt;td&gt;5/5&lt;/td&gt;
&lt;td&gt;12.84s&lt;/td&gt;
&lt;td&gt;Fastest, but weak grounding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;InternVL3-2B&lt;/td&gt;
&lt;td&gt;5/5 only after ctx4096&lt;/td&gt;
&lt;td&gt;80.35s&lt;/td&gt;
&lt;td&gt;Too slow/risky for baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen2.5-Omni-3B&lt;/td&gt;
&lt;td&gt;5/5&lt;/td&gt;
&lt;td&gt;50.09s&lt;/td&gt;
&lt;td&gt;Interesting future audio/video branch&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SmolVLM2 was the speed star. But the answers were often too generic for real operator guidance. In changeover and work-instruction tasks, it returned fields that looked more like placeholders than grounded industrial guidance.&lt;/p&gt;

&lt;p&gt;Qwen2.5-VL was the most impressive challenger. It nailed a changeover OCR task with &lt;code&gt;LABELER-FL1&lt;/code&gt; and &lt;code&gt;SKU-C500&lt;/code&gt;, where Gemma had a machine-label typo. It also produced useful IQC defect scores. If I were building a pure OCR or visual inspection assistant, I would take Qwen very seriously.&lt;/p&gt;

&lt;p&gt;InternVL3 reminded me that token speed is not the whole story. At 2048 context it failed three of five tasks with context errors. At 4096 context it finished, but the latency was high and one raw IQC answer had unsafe release-style wording.&lt;/p&gt;

&lt;p&gt;Qwen2.5-Omni ran cleanly, but its strongest value is probably a future audio/video workflow rather than this current image+text industrial baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Gemma Still Won
&lt;/h2&gt;

&lt;p&gt;Gemma 4 E2B did not win every micro-test. It stayed the baseline because it fit the product runtime:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local Jetson deployment&lt;/li&gt;
&lt;li&gt;structured multimodal prompts&lt;/li&gt;
&lt;li&gt;long-context workflow design&lt;/li&gt;
&lt;li&gt;function-calling-oriented architecture&lt;/li&gt;
&lt;li&gt;deterministic guards&lt;/li&gt;
&lt;li&gt;human confirmation&lt;/li&gt;
&lt;li&gt;action cards&lt;/li&gt;
&lt;li&gt;audit logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In an industrial setting, "fast and fluent" is not enough. The model has to behave inside a system that can say: this came from this image, this route, this required field, this action boundary, and this audit record.&lt;/p&gt;

&lt;p&gt;That is why Gemma remained the WearEdge baseline, while Qwen2.5-VL became the serious A/B challenger for OCR-heavy branches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson Learned
&lt;/h2&gt;

&lt;p&gt;Edge AI model selection is not just a leaderboard exercise. The right question is:&lt;/p&gt;

&lt;p&gt;Can this model run locally, understand the evidence, obey the workflow boundary, and produce an action that the system can audit?&lt;/p&gt;

&lt;p&gt;For WearEdge Pro today, the answer is Gemma 4 E2B as the baseline, Qwen2.5-VL as the next challenger, and a clear path to keep testing without pretending every benchmark cell means the same thing.&lt;/p&gt;

&lt;p&gt;Public artifact link: Benchmark results and public discussion: &lt;a href="https://www.hackster.io/ryanon2008/wearedge-pro-jetson-edge-ai-agent-50ec35" rel="noopener noreferrer"&gt;https://www.hackster.io/ryanon2008/wearedge-pro-jetson-edge-ai-agent-50ec35&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>computervision</category>
    </item>
    <item>
      <title>I Ran Five Small Multimodal Models on a Jetson. The Fastest One Was Not the Best Baseline.</title>
      <dc:creator>Ryan Hsu</dc:creator>
      <pubDate>Wed, 17 Jun 2026 11:55:59 +0000</pubDate>
      <link>https://dev.to/ryan_hsu_wearedge/i-ran-five-small-multimodal-models-on-a-jetson-the-fastest-one-was-not-the-best-baseline-41fm</link>
      <guid>https://dev.to/ryan_hsu_wearedge/i-ran-five-small-multimodal-models-on-a-jetson-the-fastest-one-was-not-the-best-baseline-41fm</guid>
      <description>&lt;p&gt;I have been building WearEdge Pro, a wearable industrial edge AI runtime. Think of a frontline operator wearing a smart-glasses device, capturing a first-person image of a machine, and getting back a structured action card from a local Jetson box.&lt;/p&gt;

&lt;p&gt;The key phrase is "structured action card." This is not a chat demo. In a factory setting, an answer needs an audit trail, a mode boundary, a human-confirmation gate, and a way to hand off to maintenance, quality, EHS, or work-instruction workflows.&lt;/p&gt;

&lt;p&gt;I recently tested five compact multimodal models on the same Jetson path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemma 4 E2B&lt;/li&gt;
&lt;li&gt;Qwen2.5-VL-3B&lt;/li&gt;
&lt;li&gt;SmolVLM2-2.2B&lt;/li&gt;
&lt;li&gt;InternVL3-2B&lt;/li&gt;
&lt;li&gt;Qwen2.5-Omni-3B&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal was not to crown a universal benchmark champion. I wanted to know which model was the best current baseline for an industrial edge Agent runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Harness
&lt;/h2&gt;

&lt;p&gt;Every model was exposed through a local OpenAI-compatible llama.cpp endpoint on the Jetson. Each model got the same five prompts and images:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;maintenance&lt;/li&gt;
&lt;li&gt;quality inspection&lt;/li&gt;
&lt;li&gt;changeover&lt;/li&gt;
&lt;li&gt;work instruction&lt;/li&gt;
&lt;li&gt;hazard review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The main run used 560 image tokens, which matches the current WearEdge gateway budget. Qwen2.5-VL also got a 1024-image-token pass because grounding can improve with more visual tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Completion&lt;/th&gt;
&lt;th&gt;Avg latency&lt;/th&gt;
&lt;th&gt;Takeaway&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 E2B&lt;/td&gt;
&lt;td&gt;5/5&lt;/td&gt;
&lt;td&gt;37.51s raw&lt;/td&gt;
&lt;td&gt;Best product baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen2.5-VL-3B&lt;/td&gt;
&lt;td&gt;5/5&lt;/td&gt;
&lt;td&gt;39.72s&lt;/td&gt;
&lt;td&gt;Best OCR challenger&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SmolVLM2-2.2B&lt;/td&gt;
&lt;td&gt;5/5&lt;/td&gt;
&lt;td&gt;12.84s&lt;/td&gt;
&lt;td&gt;Fastest, but weak grounding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;InternVL3-2B&lt;/td&gt;
&lt;td&gt;5/5 only after ctx4096&lt;/td&gt;
&lt;td&gt;80.35s&lt;/td&gt;
&lt;td&gt;Too slow/risky for baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen2.5-Omni-3B&lt;/td&gt;
&lt;td&gt;5/5&lt;/td&gt;
&lt;td&gt;50.09s&lt;/td&gt;
&lt;td&gt;Interesting future audio/video branch&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SmolVLM2 was the speed star. But the answers were often too generic for real operator guidance. In changeover and work-instruction tasks, it returned fields that looked more like placeholders than grounded industrial guidance.&lt;/p&gt;

&lt;p&gt;Qwen2.5-VL was the most impressive challenger. It nailed a changeover OCR task with &lt;code&gt;LABELER-FL1&lt;/code&gt; and &lt;code&gt;SKU-C500&lt;/code&gt;, where Gemma had a machine-label typo. It also produced useful IQC defect scores. If I were building a pure OCR or visual inspection assistant, I would take Qwen very seriously.&lt;/p&gt;

&lt;p&gt;InternVL3 reminded me that token speed is not the whole story. At 2048 context it failed three of five tasks with context errors. At 4096 context it finished, but the latency was high and one raw IQC answer had unsafe release-style wording.&lt;/p&gt;

&lt;p&gt;Qwen2.5-Omni ran cleanly, but its strongest value is probably a future audio/video workflow rather than this current image+text industrial baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Gemma Still Won
&lt;/h2&gt;

&lt;p&gt;Gemma 4 E2B did not win every micro-test. It stayed the baseline because it fit the product runtime:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local Jetson deployment&lt;/li&gt;
&lt;li&gt;structured multimodal prompts&lt;/li&gt;
&lt;li&gt;long-context workflow design&lt;/li&gt;
&lt;li&gt;function-calling-oriented architecture&lt;/li&gt;
&lt;li&gt;deterministic guards&lt;/li&gt;
&lt;li&gt;human confirmation&lt;/li&gt;
&lt;li&gt;action cards&lt;/li&gt;
&lt;li&gt;audit logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In an industrial setting, "fast and fluent" is not enough. The model has to behave inside a system that can say: this came from this image, this route, this required field, this action boundary, and this audit record.&lt;/p&gt;

&lt;p&gt;That is why Gemma remained the WearEdge baseline, while Qwen2.5-VL became the serious A/B challenger for OCR-heavy branches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson Learned
&lt;/h2&gt;

&lt;p&gt;Edge AI model selection is not just a leaderboard exercise. The right question is:&lt;/p&gt;

&lt;p&gt;Can this model run locally, understand the evidence, obey the workflow boundary, and produce an action that the system can audit?&lt;/p&gt;

&lt;p&gt;For WearEdge Pro today, the answer is Gemma 4 E2B as the baseline, Qwen2.5-VL as the next challenger, and a clear path to keep testing without pretending every benchmark cell means the same thing.&lt;/p&gt;

&lt;p&gt;Public artifact link: Benchmark results and public discussion: &lt;a href="https://www.hackster.io/ryanon2008/wearedge-pro-jetson-edge-ai-agent-50ec35" rel="noopener noreferrer"&gt;https://www.hackster.io/ryanon2008/wearedge-pro-jetson-edge-ai-agent-50ec35&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aiedgeaijetsonllm</category>
    </item>
    <item>
      <title>WearEdge Pro: An OPEA Manufacturing Five-Agent Suite for Frontline Operators</title>
      <dc:creator>Ryan Hsu</dc:creator>
      <pubDate>Thu, 28 May 2026 07:49:53 +0000</pubDate>
      <link>https://dev.to/ryan_hsu_wearedge/wearedge-pro-an-opea-manufacturing-five-agent-suite-for-frontline-operators-5afh</link>
      <guid>https://dev.to/ryan_hsu_wearedge/wearedge-pro-an-opea-manufacturing-five-agent-suite-for-frontline-operators-5afh</guid>
      <description>&lt;p&gt;Manufacturing operators often see early warning signs before enterprise systems&lt;br&gt;
do: an unusual gearbox sound, a quality defect, a label changeover mismatch, a&lt;br&gt;
work-instruction question, missing PPE, or a blocked walkway. These observations&lt;br&gt;
are valuable, but they often stay trapped in verbal handoffs.&lt;/p&gt;

&lt;p&gt;WearEdge Pro packages that frontline evidence into an OPEA-aligned&lt;br&gt;
Manufacturing Agent Suite. The submitted competition artifact is not an&lt;br&gt;
Android-only application. It is a Docker-runnable Web/API package with a&lt;br&gt;
browser demo console, five agent routes, Qdrant-backed RAG, official OPEA TEI&lt;br&gt;
embedding profile, guardrails, and evaluation evidence.&lt;/p&gt;

&lt;p&gt;The five agent routes are:&lt;/p&gt;

&lt;p&gt;Agent   Workflow    Target&lt;br&gt;
maintenance Predictive maintenance from M400 evidence   maintenance_work_order&lt;br&gt;
iqc Incoming and in-process quality checks  qms_quality_event&lt;br&gt;
changeover  SKU setup and first-piece verification  changeover_checklist&lt;br&gt;
wi  Released work-instruction guidance  wi_reference&lt;br&gt;
hazard  PPE, moving-parts, and walkway observations ehs_case&lt;br&gt;
The architecture follows an OPEA-style path:&lt;/p&gt;

&lt;p&gt;M400 / API evidence&lt;br&gt;
  -&amp;gt; Gateway&lt;br&gt;
  -&amp;gt; Manufacturing Megaservice&lt;br&gt;
  -&amp;gt; route registry&lt;br&gt;
  -&amp;gt; Dataprep&lt;br&gt;
  -&amp;gt; RAG / Retriever&lt;br&gt;
  -&amp;gt; Qdrant Vector DB&lt;br&gt;
  -&amp;gt; OPEA-compatible embedding service or official OPEA TEI profile&lt;br&gt;
  -&amp;gt; LLM adapter or deterministic demo path&lt;br&gt;
  -&amp;gt; deterministic evaluator&lt;br&gt;
  -&amp;gt; guardrails&lt;br&gt;
  -&amp;gt; bounded action card&lt;br&gt;
The most important design decision is route isolation. Maintenance must not&lt;br&gt;
issue safety clearance. Hazard observations must not invent final root cause.&lt;br&gt;
Quality must not release a lot. Changeover must not grant restart permission.&lt;br&gt;
Work-instruction guidance must stay tied to released source evidence.&lt;/p&gt;

&lt;p&gt;For OPEA evidence, the repository includes:&lt;/p&gt;

&lt;p&gt;Docker Compose base profile with Qdrant and the Manufacturing Gateway;&lt;br&gt;
OPEA-compatible /v1/embeddings profile;&lt;br&gt;
official OPEA TEI profile using Hugging Face TEI, opea/embedding:latest,&lt;br&gt;
TEI_EMBEDDING_ENDPOINT, and OPEA_TEI_EMBEDDING;&lt;br&gt;
OpenAI/OPEA-compatible LLM adapter boundary;&lt;br&gt;
GenAIEval-compatible route evaluation package;&lt;br&gt;
upstream OPEA RFC, comments, and a CI-green GenAIExamples PR.&lt;br&gt;
The evaluation package includes 15 cases across the five routes and verifies:&lt;/p&gt;

&lt;p&gt;action-card contract;&lt;br&gt;
integration target correctness;&lt;br&gt;
channel correctness;&lt;br&gt;
risk-level correctness;&lt;br&gt;
human gate correctness;&lt;br&gt;
guardrail pass;&lt;br&gt;
RAG source match;&lt;br&gt;
route isolation.&lt;br&gt;
The hardware evidence was captured on Google Cloud C3 c3-standard-4: a&lt;br&gt;
single-node, 4-vCPU, 16-GiB-RAM, no-GPU Intel Xeon host exposing AVX-512 and AMX&lt;br&gt;
flags. On that class of host, WearEdge validated the deterministic five-agent&lt;br&gt;
route benchmark, Docker/Qdrant E2E, OPEA-compatible embedding profile E2E, and&lt;br&gt;
official OPEA TEI profile E2E.&lt;/p&gt;

&lt;p&gt;The public repository is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/davidmillerak2026-sys/wearedge-opea-manufacturing" rel="noopener noreferrer"&gt;https://github.com/davidmillerak2026-sys/wearedge-opea-manufacturing&lt;/a&gt;&lt;br&gt;
The upstream OPEA PR is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/opea-project/GenAIExamples/pull/2462" rel="noopener noreferrer"&gt;https://github.com/opea-project/GenAIExamples/pull/2462&lt;/a&gt;&lt;br&gt;
WearEdge is still a prototype, not a certified safety or release controller.&lt;br&gt;
The important point is the platform pattern: one OPEA-aligned manufacturing&lt;br&gt;
suite can convert frontline evidence into bounded, auditable action cards&lt;br&gt;
across maintenance, quality, changeover, work instructions, and safety.&lt;/p&gt;

</description>
      <category>opea</category>
      <category>manufacturing</category>
      <category>rag</category>
      <category>edgeai</category>
    </item>
  </channel>
</rss>
