<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daisuke Majima</title>
    <description>The latest articles on DEV Community by Daisuke Majima (@john-rocky).</description>
    <link>https://dev.to/john-rocky</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3963789%2F963e7445-7057-4827-8fed-3aa6d9d42dfc.png</url>
      <title>DEV Community: Daisuke Majima</title>
      <link>https://dev.to/john-rocky</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/john-rocky"/>
    <language>en</language>
    <item>
      <title>iPhone on-device LLM: the GPU wins the sprint, the Neural Engine wins the marathon</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Thu, 04 Jun 2026 09:38:23 +0000</pubDate>
      <link>https://dev.to/john-rocky/iphone-on-device-llm-the-gpu-wins-the-sprint-the-neural-engine-wins-the-marathon-42lo</link>
      <guid>https://dev.to/john-rocky/iphone-on-device-llm-the-gpu-wins-the-sprint-the-neural-engine-wins-the-marathon-42lo</guid>
      <description>&lt;p&gt;&lt;em&gt;The follow-up to my &lt;a href="https://rockyshikoku.medium.com/local-llm-on-iphone-which-runtime-is-actually-fastest-58096685481e" rel="noopener noreferrer"&gt;on-device runtime speed benchmark&lt;/a&gt; — because burst tok/s only tells half the story.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I benchmark on-device LLMs on iPhone, and shipping real apps I kept noticing the GPU runtimes start fast and then fade. So I measured &lt;strong&gt;decode rate over time&lt;/strong&gt;, from a cold start, over &lt;strong&gt;10 minutes of continuous generation&lt;/strong&gt;. Same conditions across all runtimes: &lt;strong&gt;same model (Gemma 4 E2B, 4-bit), same iPhone 17 Pro (A19 Pro), cold start.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The GPU runtimes (MLX, LiteRT-LM) that crush the burst test &lt;strong&gt;collapse 50%+ within ~60 seconds&lt;/strong&gt; as the phone heats. The slowest starter — the &lt;strong&gt;Apple Neural Engine (CoreML)&lt;/strong&gt; — barely throttles and overtakes them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decode rate: cold burst vs sustained (Gemma 4 E2B 4-bit)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2exqsfhk8k4thazn7t7z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2exqsfhk8k4thazn7t7z.png" alt="Sustained decode throttling — iPhone 17 Pro" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Runtime (compute)&lt;/th&gt;
&lt;th&gt;Burst tok/s&lt;/th&gt;
&lt;th&gt;Sustained (10 min)&lt;/th&gt;
&lt;th&gt;Retained&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CoreML / ANE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;67%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MLX / GPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LiteRT-LM / GPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;48%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The GPU runtimes fall to 38% / 48% of their own peak after 10 minutes. The ANE holds &lt;strong&gt;67%&lt;/strong&gt; — ending up &lt;strong&gt;faster than MLX outright&lt;/strong&gt;, and shrinking LiteRT's lead from &lt;strong&gt;+70%&lt;/strong&gt; to &lt;strong&gt;+23%&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How fast they fade (vs each runtime's own peak)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                 -10%      -25%     floor
MLX / GPU          5s        35s     ~18 tok/s
LiteRT / GPU      13s        40s     ~27 tok/s
CoreML / ANE      93s       390s     ~22 tok/s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The GPU runtimes are down 25% in well under a minute. The ANE takes over 90 seconds just to lose 10%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why — it's a power story
&lt;/h2&gt;

&lt;p&gt;iOS won't give you per-subsystem watts, and over a long run the phone throttles every backend down to the &lt;em&gt;same&lt;/em&gt; sustainable thermal envelope — so iPhone battery-delta can't separate them. But the &lt;strong&gt;same model on Mac (M4 Max)&lt;/strong&gt;, where &lt;code&gt;powermetrics&lt;/code&gt; exposes package power, shows the cause cleanly:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rpky6vz6lkj7f7i0n7l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rpky6vz6lkj7f7i0n7l.png" alt="Package power per compute unit — M4 Max" width="800" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CoreML / ANE: 12.7 W&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MLX / GPU: 24.7 W&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;llama.cpp / GPU: 24.5 W&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;ANE path draws ~half the GPU's power&lt;/strong&gt; at full decode. Lower power → heats slower → the thermally-constrained iPhone doesn't have to throttle it. Two &lt;strong&gt;independent&lt;/strong&gt; GPU runtimes — Apple's MLX and Google's LiteRT-LM — collapsing the &lt;em&gt;same way&lt;/em&gt; says this is a &lt;strong&gt;GPU-thermal property of the phone&lt;/strong&gt;, not a quirk of one runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway: GPU for the sprint, ANE for the marathon
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quick chat reply&lt;/strong&gt; → the GPU wins outright; burst speed is the experience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sustained load&lt;/strong&gt; (long generation, agentic loops, batch/background jobs) → the GPU's burst lead largely evaporates. MLX ends up &lt;em&gt;slower&lt;/em&gt; than the ANE; LiteRT keeps only a slim lead after shedding half its speed.&lt;/li&gt;
&lt;li&gt;And the ANE draws &lt;strong&gt;~half the power&lt;/strong&gt; &lt;em&gt;and&lt;/em&gt; leaves the GPU free for the rest of the app (rendering, camera, other ML).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the real case for running an LLM on the ANE, even though its &lt;em&gt;peak&lt;/em&gt; decode is lower.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats (so you can poke holes)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Burst&lt;/strong&gt; = one cold ~128-token generation. &lt;strong&gt;Sustained&lt;/strong&gt; = a 600 s re-prompt loop, decode rate from a rolling window. The in-loop re-prompt overhead shaves a little off the absolute rate (more for CoreML, whose prefill is relatively costlier), so I quote burst from the clean single shot.&lt;/li&gt;
&lt;li&gt;iOS only exposes battery in &lt;strong&gt;1% steps&lt;/strong&gt;, and under sustained load the SoC pins every backend to the same thermal-budget power. The &lt;strong&gt;power chart is measured on Mac&lt;/strong&gt; (M4 Max, &lt;code&gt;powermetrics&lt;/code&gt;, same model) — the mechanism behind the iPhone throttling, on a device where per-unit watts are observable. The throttle curves themselves are iPhone-measured and clean.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiteRT-LM&lt;/strong&gt; has no output-token cap, so its per-call generations run longer than the 128 used for CoreML/MLX; and that one run happened to start at &lt;code&gt;fair&lt;/code&gt; rather than &lt;code&gt;nominal&lt;/code&gt; thermal (its burst was unaffected).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CoreML-LLM uses sliding-window attention&lt;/strong&gt;, part of why its decode cost stays flat (bounded context) — it trades some long-range recall for that.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Repo (raw data + scripts)
&lt;/h2&gt;

&lt;p&gt;Raw JSONL for all three runs and the script that draws these curves: &lt;a href="https://github.com/john-rocky/apple-silicon-llm-bench" rel="noopener noreferrer"&gt;https://github.com/john-rocky/apple-silicon-llm-bench&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Does this match what you see on &lt;strong&gt;Android&lt;/strong&gt; (Snapdragon Hexagon / Tensor NPU vs the GPU)? And for &lt;em&gt;your&lt;/em&gt; workload — does the GPU's burst advantage or the NPU's endurance matter more?&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ios</category>
      <category>apple</category>
      <category>llm</category>
    </item>
    <item>
      <title>Filling object detection's last mile: peaceofcake (D-FINE &amp; RF-DETR, Apache 2.0)</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 23:37:28 +0000</pubDate>
      <link>https://dev.to/john-rocky/filling-object-detections-last-mile-peaceofcake-d-fine-rf-detr-apache-20-405d</link>
      <guid>https://dev.to/john-rocky/filling-object-detections-last-mile-peaceofcake-d-fine-rf-detr-apache-20-405d</guid>
      <description>&lt;h2&gt;
  
  
  What happened to object detection in 2024
&lt;/h2&gt;

&lt;p&gt;YOLOv1 stunned the world in 2016. Eight years on, detection accuracy has improved dramatically.&lt;/p&gt;

&lt;p&gt;But let me ask one thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you actually embed the latest object-detection model into your own app?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You've read the paper. You ran the demo. But the moment it becomes "train on my data and run it on iPhone," you stall. Writing dozens of lines of YAML config, resolving dependency conflicts, writing an export script from scratch, building an inference pipeline in Swift — the road is longer than you'd think.&lt;/p&gt;

&lt;p&gt;In 2024, DETR-based detection entered a new phase. Two models — &lt;strong&gt;D-FINE&lt;/strong&gt; and &lt;strong&gt;RF-DETR&lt;/strong&gt; — achieved breakthroughs in both accuracy and real-time performance. But to deliver those benefits to a real product, a "last mile" remains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;peaceofcake&lt;/strong&gt; is a library to fill that mile.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;peaceofcake
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It starts in a single line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Object detection in 3 lines
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peaceofcake&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DFINE&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DFINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dfine-l-coco&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Weights are auto-downloaded and cached. It runs on GPU if you have one, CPU otherwise. Results come back as a structured object with bounding boxes, class labels, and confidence scores.&lt;/p&gt;

&lt;p&gt;RF-DETR uses the same interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peaceofcake&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RFDETR&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RFDETR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rfdetr-l-coco&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Different model, same API. That's the core of peaceofcake's design philosophy.&lt;/p&gt;

&lt;h2&gt;
  
  
  D-FINE — from "fuzzy" to "sharp"
&lt;/h2&gt;

&lt;p&gt;D-FINE's full name is "DETR with Fine-grained Distribution Refinement." As the name suggests, the heart of this model is &lt;strong&gt;how it predicts bounding boxes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Conventional DETR-family models try to &lt;strong&gt;nail the location in one shot&lt;/strong&gt; as four numbers (x, y, w, h). By analogy: asked "how tall is that building?", you instantly answer "37.2 meters!"&lt;/p&gt;

&lt;p&gt;D-FINE is different. First it produces a &lt;strong&gt;probability distribution&lt;/strong&gt; ("maybe 30–40 meters"), then narrows ("35–38 meters"), and finally arrives at "37.2 meters." It progressively sharpens a fuzzy estimate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;coarse distribution → refine → refine more → final prediction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this &lt;strong&gt;Fine-grained Distribution Refinement&lt;/strong&gt;, D-FINE achieves accuracy beyond RT-DETR at real-time inference speed.&lt;/p&gt;

&lt;p&gt;The architecture is clear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;image → HGNetv2 (Backbone) → HybridEncoder → DFINE Decoder → detections
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The HGNetv2 backbone comes in 5 sizes (Nano/Small/Medium/Large/XLarge), covering everything from lightweight mobile inference to accuracy-first server-side, with one architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  RF-DETR — the "foundation model" wind reaches detection
&lt;/h2&gt;

&lt;p&gt;RF-DETR is a "Real-Time Foundational Object Detection" model Roboflow released in late 2024.&lt;/p&gt;

&lt;p&gt;Note the word "Foundational." Just as "foundation models" revolutionized the LLM world, that wave is reaching object detection: pretrain general detection ability on massive data, then fine-tune on a small amount of domain-specific data. RF-DETR brought that paradigm to real-time detection.&lt;/p&gt;

&lt;p&gt;peaceofcake makes these two state-of-the-art models usable through the &lt;strong&gt;same 3-line API.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "yet another wrapper" is needed
&lt;/h2&gt;

&lt;p&gt;The world is full of model-wrapper libraries. So what's different about peaceofcake?&lt;/p&gt;

&lt;p&gt;The answer is &lt;strong&gt;scope.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many wrappers stop at "make inference easy." peaceofcake doesn't. It covers the full product-development cycle — &lt;strong&gt;train → export → on-device mobile inference&lt;/strong&gt; — in one package.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training: it eats YOLO format as-is
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DFINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dfine-m-coco&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dataset.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Annotate in Roboflow or Label Studio, export in YOLO format, and pass it straight in. Automatic conversion to COCO format runs internally, and training-schedule scaling, EMA, and AMP are applied transparently.&lt;/p&gt;

&lt;p&gt;COCO-format datasets work as-is too. You don't have to think about the format.&lt;/p&gt;

&lt;h3&gt;
  
  
  Export: three formats from one method
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;onnx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                          &lt;span class="c1"&gt;# server-side inference
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coreml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;precision&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FLOAT16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# iOS / macOS
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tensorrt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                       &lt;span class="c1"&gt;# NVIDIA GPU optimization
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On CoreML export, denoising mechanisms and auxiliary heads needed only at training time are automatically removed. The output is &lt;code&gt;confidence&lt;/code&gt; + &lt;code&gt;coordinates&lt;/code&gt;, directly compatible with iOS's Vision framework. NMS is done outside the model, so you can change the confidence threshold from the UI in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  There's a CLI too
&lt;/h3&gt;

&lt;p&gt;Without writing Python, you can access all features from the terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poc predict &lt;span class="nb"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;photo.jpg &lt;span class="nv"&gt;conf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.3
poc train &lt;span class="nv"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dfine-m-coco &lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my_dataset.yaml &lt;span class="nv"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100
poc &lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dfine-l-coco &lt;span class="nv"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coreml &lt;span class="nv"&gt;precision&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;FLOAT16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Design highlights — where to hide the complexity
&lt;/h2&gt;

&lt;p&gt;Three points in peaceofcake's internal design feel especially clever.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Extensibility via the Strategy pattern
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DFINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@property&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;task_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;detect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;predictor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DFINEPredictor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trainer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DFINETrainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exporter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DFINEExporter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DFINEValidator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;BaseModel&lt;/code&gt; receives &lt;code&gt;predict()&lt;/code&gt;, &lt;code&gt;train()&lt;/code&gt;, &lt;code&gt;export()&lt;/code&gt;, &lt;code&gt;val()&lt;/code&gt; calls and dispatches to the right class via &lt;code&gt;task_map&lt;/code&gt;. D-FINE and RF-DETR have completely different Predictor/Trainer/Exporter implementations, but the API the user sees is identical. If segmentation or pose tasks are added later, you just add an entry to &lt;code&gt;task_map&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Lazy imports for a light startup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# peaceofcake/__init__.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__getattr__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RFDETR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peaceofcake.models.rfdetr&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RFDETR&lt;/span&gt;
        &lt;span class="nf"&gt;globals&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RFDETR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RFDETR&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;RFDETR&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;AttributeError&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RF-DETR depends on the transformers library, which takes seconds just to import. peaceofcake uses &lt;code&gt;__getattr__&lt;/code&gt; to &lt;strong&gt;defer the import until RFDETR is actually referenced.&lt;/strong&gt; Users of only D-FINE are fine even without transformers installed. The &lt;code&gt;src.data&lt;/code&gt;, &lt;code&gt;src.optim&lt;/code&gt;, &lt;code&gt;src.nn.criterion&lt;/code&gt; needed only for training are also imported inside &lt;code&gt;train()&lt;/code&gt; — if you only do inference, those modules are never loaded.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. "Intent inference" for model names
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nc"&gt;DFINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dfine-l-coco&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# registry name → auto-download
&lt;/span&gt;&lt;span class="nc"&gt;DFINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/custom.pth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# local file
&lt;/span&gt;&lt;span class="nc"&gt;DFINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dfine_l_coco.pth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# filename only → matched against registry
&lt;/span&gt;&lt;span class="nc"&gt;DFINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dfine-n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                 &lt;span class="c1"&gt;# no weights → random init
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a single string, multi-stage fallback runs: registry match → local-file check → filename match → size estimation. For a local checkpoint, it tries to infer model size from the filename, and if it still can't, it &lt;strong&gt;back-calculates from the parameter count&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;numel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;6_000_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;     &lt;span class="c1"&gt;# Nano
&lt;/span&gt;&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;15_000_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Small
&lt;/span&gt;&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;25_000_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Medium
&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So that the user can "just pass a path without thinking," all this inference logic runs behind the scenes.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Field wisdom" carved into the commit log
&lt;/h2&gt;

&lt;p&gt;There's trial-and-error in the commit log that you can't see by reading the code alone — real lessons of bringing research code into production.&lt;/p&gt;

&lt;h3&gt;
  
  
  CUDA hangs
&lt;/h3&gt;

&lt;p&gt;Early in training, the model can output bounding boxes with negative width or height. The original D-FINE implementation detected this with an &lt;code&gt;assert&lt;/code&gt;. On CPU that just throws. But an assert failure inside a CUDA kernel &lt;strong&gt;hangs the entire GPU device.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;peaceofcake replaced it with &lt;code&gt;clamp&lt;/code&gt;. But the first fix used an in-place operation and broke the autograd graph. The second commit finally fixed it properly. &lt;strong&gt;Fix one bug and another shows up.&lt;/strong&gt; That's the reality of putting research code into battle.&lt;/p&gt;

&lt;h3&gt;
  
  
  YOLO format has no "spec"
&lt;/h3&gt;

&lt;p&gt;YOLO format has a de facto standard but no strict spec. How to specify paths (&lt;code&gt;train: images/train&lt;/code&gt; vs &lt;code&gt;path: ./&lt;/code&gt; + &lt;code&gt;train: images/train&lt;/code&gt;), blank lines in label files, how to write the class count — each tool differs subtly. To support all of Roboflow/Ultralytics/Label Studio output, peaceofcake spends over 50 lines on path-resolution logic alone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embed class names in the checkpoint
&lt;/h3&gt;

&lt;p&gt;When distributing a model trained on a custom dataset, the class-name info tends to get lost. peaceofcake embeds a &lt;code&gt;class_names&lt;/code&gt; key directly in the &lt;code&gt;.pth&lt;/code&gt; checkpoint. No separate metadata file to manage.&lt;/p&gt;

&lt;h2&gt;
  
  
  How many miles from Python to device?
&lt;/h2&gt;

&lt;p&gt;Let's lay out the distance to getting an object-detection model "usable" in a table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Traditional workflow&lt;/th&gt;
&lt;th&gt;peaceofcake&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Install&lt;/td&gt;
&lt;td&gt;clone repo, set up env&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install peaceofcake&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference&lt;/td&gt;
&lt;td&gt;write config, run script&lt;/td&gt;
&lt;td&gt;3 lines of Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training&lt;/td&gt;
&lt;td&gt;convert data, tune config&lt;/td&gt;
&lt;td&gt;&lt;code&gt;model.train(data="data.yaml")&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Export&lt;/td&gt;
&lt;td&gt;write a dedicated script&lt;/td&gt;
&lt;td&gt;&lt;code&gt;model.export("coreml")&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iOS device&lt;/td&gt;
&lt;td&gt;implement inference pipeline from scratch&lt;/td&gt;
&lt;td&gt;build &lt;code&gt;DFINEDemo/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;At every one of the 5 steps, it's a one-liner or pre-built code.&lt;/strong&gt; That's what filling the "last mile" means.&lt;/p&gt;

&lt;h2&gt;
  
  
  All Apache 2.0 — the shortest path to commercial use
&lt;/h2&gt;

&lt;p&gt;When embedding object detection into a product, &lt;strong&gt;license&lt;/strong&gt; matters as much as accuracy and API ergonomics.&lt;/p&gt;

&lt;p&gt;This is where many developers trip up. The latest YOLO series (Ultralytics YOLO) is AGPL-3.0; commercial use requires buying a paid license. A great model right in front of you, and you give up on adopting it because of the license wall — many have had that experience.&lt;/p&gt;

&lt;p&gt;The D-FINE and RF-DETR that peaceofcake adopts are &lt;strong&gt;both Apache 2.0.&lt;/strong&gt; peaceofcake itself is Apache 2.0 too. So the library, the models, and the trained weights are &lt;strong&gt;all free for commercial use.&lt;/strong&gt; Modify, redistribute, embed — anything goes. A patent clause is included too, reducing patent-litigation risk from contributors.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;Model license&lt;/th&gt;
&lt;th&gt;Commercial use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ultralytics (YOLO)&lt;/td&gt;
&lt;td&gt;AGPL-3.0&lt;/td&gt;
&lt;td&gt;paid license required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;peaceofcake (D-FINE)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Apache 2.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;free&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;peaceofcake (RF-DETR)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Apache 2.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;free&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A startup embedding it in their product. A contractor delivering to a client. Integrating it into embedded-device firmware. In every scenario, license cost is zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Beats YOLO on accuracy, and the license is completely free.&lt;/strong&gt; That's another reason to choose DETR-based models.&lt;/p&gt;

&lt;h2&gt;
  
  
  The future of object detection, and the role of tools
&lt;/h2&gt;

&lt;p&gt;Computer vision's history is shifting from a "race for accuracy" to a "race for accessibility."&lt;/p&gt;

&lt;p&gt;Since AlexNet in 2012, improving ImageNet accuracy was research's main battlefield. In detection too, raising COCO-benchmark mAP was seen as the primary contribution of a paper. But in 2024, as D-FINE and RF-DETR showed, DETR-based models have reached the stage of combining sufficient accuracy with real-time performance.&lt;/p&gt;

&lt;p&gt;The next axis of competition is &lt;strong&gt;"anyone can use it."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ultralytics (YOLOv8/YOLO11) was the pioneer, building an ecosystem where &lt;code&gt;pip install ultralytics&lt;/code&gt; gives access to all of YOLO's features. But the AGPL license is a hurdle for commercial use. peaceofcake delivers the "easy to use" experience Ultralytics established — with &lt;strong&gt;Apache 2.0 DETR-based models.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In an era where model accuracy commoditizes, the source of differentiation moves to "how fast and how freely you can embed it into a product." The time and cost from a researcher posting to arXiv to an app developer delivering that model into users' hands — driving both toward zero. That's the role tools like peaceofcake play.&lt;/p&gt;

&lt;p&gt;Object detection isn't hard anymore. And it isn't expensive anymore.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;peaceofcake
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A piece of cake. Peace 🍰&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Repository&lt;/strong&gt;: &lt;a href="https://github.com/john-rocky/peaceofcake" rel="noopener noreferrer"&gt;peaceofcake&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href="https://pypi.org/project/peaceofcake/" rel="noopener noreferrer"&gt;peaceofcake&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;License&lt;/strong&gt;: Apache 2.0&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/f3e59a73e4cc365875db" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>computervision</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How to use the TensorFlow Object Detection API (inference, with Colab)</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 23:37:26 +0000</pubDate>
      <link>https://dev.to/john-rocky/how-to-use-the-tensorflow-object-detection-api-inference-with-colab-651</link>
      <guid>https://dev.to/john-rocky/how-to-use-the-tensorflow-object-detection-api-inference-with-colab-651</guid>
      <description>&lt;p&gt;This article shows how to use the TensorFlow Object Detection API (the inference part). You can do it in Colab.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://colab.research.google.com/drive/1bb2QfeMEWomL-UA_Y68QqbNA0GnXVQoV?usp=sharing" rel="noopener noreferrer"&gt;Colab sample&lt;/a&gt; — run the cells in order to try the TensorFlow Object Detection API. Change &lt;code&gt;Image_Path&lt;/code&gt; in the last cell to your own image to detect objects in it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The official TensorFlow Model Zoo has many kinds of models.&lt;/p&gt;

&lt;p&gt;(For training a model: &lt;a href="https://qiita.com/john-rocky/items/eb962f9aea438dc4eb75" rel="noopener noreferrer"&gt;Train an object detection model with the TensorFlow Object Detection API&lt;/a&gt;)&lt;br&gt;
(For quick training with just a few images: &lt;a href="https://qiita.com/john-rocky/items/6ad6aeed65fa71f92e1b" rel="noopener noreferrer"&gt;Quick-train an object detection model with the TensorFlow Object Detection API&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flny5jtmqt5jwetk0j8vd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flny5jtmqt5jwetk0j8vd.png" width="600" height="747"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Steps
&lt;/h2&gt;
&lt;h3&gt;
  
  
  0. Install TensorFlow 2
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;U&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;pre&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2.2.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  1. Clone the official TensorFlow Models from GitHub
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;

&lt;span class="c1"&gt;# If "models" is in the current directory path, move there. Otherwise clone it.
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;models&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;models&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;..&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;models&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
  &lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;clone&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;tensorflow&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  2. Install the Object Detection API and required modules
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="n"&gt;bash&lt;/span&gt; &lt;span class="c1"&gt;# enable bash commands
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="n"&gt;protoc&lt;/span&gt; &lt;span class="n"&gt;object_detection&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;protos&lt;/span&gt;&lt;span class="o"&gt;/*&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;proto&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;python_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="n"&gt;object_detection&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;packages&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;tf2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;setup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  3. Import modules
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;scipy.misc&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;six&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BytesIO&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ImageDraw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ImageFont&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;object_detection.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;label_map_util&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;object_detection.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;config_util&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;object_detection.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;visualization_utils&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;viz_utils&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;object_detection.builders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;model_builder&lt;/span&gt;

&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;matplotlib&lt;/span&gt; &lt;span class="n"&gt;inline&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  4. Image-loading function
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_image_into_numpy_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Load an image into a numpy array.

  Puts image into numpy array to feed into tensorflow graph.
  Note that by convention we put it into a numpy array with shape
  (height, width, channels), where channels=3 for RGB.

  Args:
    path: the file path to the image

  Returns:
    uint8 numpy array with shape (img_height, img_width, 3)
  &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
  &lt;span class="n"&gt;img_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;im_width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;im_height&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getdata&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;im_height&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;im_width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_keypoint_tuples&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eval_config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return a tuple list of keypoint edges from the eval config.

  Args:
    eval_config: an eval config containing the keypoint edges

  Returns:
    a list of edge tuples, each in the format (start, end)
  &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
  &lt;span class="n"&gt;tuple_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="n"&gt;kp_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;eval_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keypoint_edge&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;kp_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tuple_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tuple_list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  5. Download a model
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;wget&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tensorflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;object_detection&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;tf2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;20200713&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;centernet_hg104_512x512_coco17_tpu&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;8.&lt;/span&gt;&lt;span class="n"&gt;tar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;tar&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;xf&lt;/span&gt; &lt;span class="n"&gt;centernet_hg104_512x512_coco17_tpu&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;8.&lt;/span&gt;&lt;span class="n"&gt;tar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gz&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Download any model you like from the &lt;a href="https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md" rel="noopener noreferrer"&gt;official Model Zoo&lt;/a&gt;. Hover over a model name there to see its download URL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffm0iillzjp4i4tzw6mig.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffm0iillzjp4i4tzw6mig.png" width="800" height="931"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's fun just looking at the performance comparisons. Once download and extraction finish, you get a folder containing &lt;code&gt;checkpoint&lt;/code&gt;, &lt;code&gt;saved_model&lt;/code&gt;, and &lt;code&gt;pipeline.config&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnr03m676xcdey1p74xx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbnr03m676xcdey1p74xx.png" width="628" height="200"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  6. Read the pipeline config and build the model
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Path to the config file. The repo has a folder of config files, but the model
# names are slightly abbreviated, so the downloaded one is more reliable.
&lt;/span&gt;&lt;span class="n"&gt;pipeline_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./centernet_hg104_512x512_coco17_tpu-8/pipeline.config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# Path to the checkpoint
&lt;/span&gt;&lt;span class="n"&gt;model_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./centernet_hg104_512x512_coco17_tpu-8/checkpoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Load the model config
&lt;/span&gt;&lt;span class="n"&gt;configs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config_util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_configs_from_pipeline_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipeline_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;configs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Build the model from the loaded config
&lt;/span&gt;&lt;span class="n"&gt;detection_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="n"&gt;model_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_training&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Restore weights from the checkpoint
&lt;/span&gt;&lt;span class="n"&gt;ckpt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Checkpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;detection_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ckpt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;restore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ckpt-0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;expect_partial&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  7. Prepare the inference function
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_model_detection_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get a tf.function for detection.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

  &lt;span class="nd"&gt;@tf.function&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;detect_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Detect objects in image.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shapes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;preprocess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;prediction_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shapes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;postprocess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prediction_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shapes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prediction_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shapes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;detect_fn&lt;/span&gt;

&lt;span class="n"&gt;detect_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_model_detection_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detection_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  8. Prepare labels
&lt;/h3&gt;

&lt;p&gt;Inference needs the object labels the model was trained on. The labels are in the official repo at &lt;code&gt;models/research/object_detection/data/&lt;/code&gt;. This model was trained on COCO, so we use &lt;code&gt;mscoco_label_map.pbtxt&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;label_map_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./models/research/object_detection/data/mscoco_label_map.pbtxt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;label_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;label_map_util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_labelmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label_map_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;categories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;label_map_util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert_label_map_to_categories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;label_map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_num_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;label_map_util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_max_label_map_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label_map&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;use_display_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;category_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;label_map_util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_category_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;label_map_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;label_map_util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_label_map_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label_map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_display_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  9. Run detection on your image
&lt;/h3&gt;

&lt;p&gt;Upload any image to Colab and set its path as &lt;code&gt;image_path&lt;/code&gt;. By the way, images with an alpha channel seem to need converting to 3 channels first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;models/research/object_detection/test_images/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;image_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;image2.jpg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;image_np&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_image_into_numpy_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Things to try:
# Flip horizontally
# image_np = np.fliplr(image_np).copy()
&lt;/span&gt;
&lt;span class="c1"&gt;# Convert image to grayscale
# image_np = np.tile(
#     np.mean(image_np, 2, keepdims=True), (1, 1, 3)).astype(np.uint8)
&lt;/span&gt;
&lt;span class="n"&gt;input_tensor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert_to_tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_dims&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_np&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;predictions_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shapes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_tensor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;label_id_offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;image_np_with_detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Use keypoints if available in detections
&lt;/span&gt;&lt;span class="n"&gt;keypoints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keypoint_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;detection_keypoints&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;keypoints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;detection_keypoints&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="n"&gt;keypoint_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;detection_keypoint_scores&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;viz_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;visualize_boxes_and_labels_on_image_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="n"&gt;image_np_with_detections&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;detection_boxes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;detection_classes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;label_id_offset&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;detection_scores&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="n"&gt;category_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;use_normalized_coordinates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;max_boxes_to_draw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;min_score_thresh&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;agnostic_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;keypoints&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;keypoints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;keypoint_scores&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;keypoint_scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;keypoint_edges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;get_keypoint_tuples&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;configs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;eval_config&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imshow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_np_with_detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flny5jtmqt5jwetk0j8vd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flny5jtmqt5jwetk0j8vd.png" width="600" height="747"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Boxes, labels, and confidence scores are displayed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd79i2kt1dz4lyjbo87zq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd79i2kt1dz4lyjbo87zq.png" width="600" height="555"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/7d9176d16617c66fb3f1" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. I build apps with Core ML and write about machine learning. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>tensorflow</category>
      <category>machinelearning</category>
      <category>computervision</category>
    </item>
    <item>
      <title>Brightening dark images with machine learning (GLADNet)</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 23:32:24 +0000</pubDate>
      <link>https://dev.to/john-rocky/brightening-dark-images-with-machine-learning-gladnet-4nok</link>
      <guid>https://dev.to/john-rocky/brightening-dark-images-with-machine-learning-gladnet-4nok</guid>
      <description>&lt;h2&gt;
  
  
  How to use a machine-learning model that brightens low-light images
&lt;/h2&gt;

&lt;p&gt;Using a model called GLADNet, we brighten dark images. It's easy to run in Python.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewqktm5gqdn4e8ban28v.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewqktm5gqdn4e8ban28v.jpeg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpe2mrv9kj4io4nbdkw47.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpe2mrv9kj4io4nbdkw47.jpeg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2e5xjwqbea3xuswivr8w.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2e5xjwqbea3xuswivr8w.jpeg" width="800" height="1200"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3v35ixr7xm4bd8v6kxlv.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3v35ixr7xm4bd8v6kxlv.jpeg" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  I want to extract information from dark images
&lt;/h2&gt;

&lt;p&gt;If you can brighten images that came out too dark, or footage from night-time surveillance or dashcams, you can sometimes extract useful information.&lt;/p&gt;

&lt;p&gt;But simply adjusting brightness with a filter doesn't add any information.&lt;/p&gt;
&lt;h2&gt;
  
  
  GLADNet brightens them cleanly
&lt;/h2&gt;

&lt;p&gt;Just feed an image into GLADNet and a bright image comes back. It really looks like it was actually shot brighter.&lt;/p&gt;
&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;Clone the GLADNet repository and run with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;use_gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;gpu_idx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;gpu_mem&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;phase&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;test_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;input_images&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;save_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfqrz7e0pp3pionltkwt.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfqrz7e0pp3pionltkwt.jpeg" width="800" height="600"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0govpc2ru00grtijh1n.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0govpc2ru00grtijh1n.jpeg" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  For various uses
&lt;/h2&gt;

&lt;p&gt;With techniques like this you can capture the information you need without missing it — and even "ghost photos" stop being scary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik2nk5vg15qzcixyhh80.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik2nk5vg15qzcixyhh80.jpeg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffp3gnfrjwasgx9msymqe.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffp3gnfrjwasgx9msymqe.jpeg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1zjtnzvyz1vzcolswrt.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1zjtnzvyz1vzcolswrt.jpeg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgttadc1b4pga1coicpij.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgttadc1b4pga1coicpij.jpeg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/f8150b3b976469b0fccc" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. I build apps with Core ML and ARKit and write about ML/AR. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>tensorflow</category>
      <category>imageprocessing</category>
    </item>
    <item>
      <title>Resuming training in PyTorch: save and load the optimizer too</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 23:32:23 +0000</pubDate>
      <link>https://dev.to/john-rocky/resuming-training-in-pytorch-save-and-load-the-optimizer-too-9kj</link>
      <guid>https://dev.to/john-rocky/resuming-training-in-pytorch-save-and-load-the-optimizer-too-9kj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo012obr94wa8pqus4t7.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo012obr94wa8pqus4t7.jpeg" width="799" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to save/load a model in PyTorch and resume training from a checkpoint
&lt;/h2&gt;

&lt;p&gt;Sometimes you want to pause training partway (for machine or human reasons) and resume later. In environments with a continuous-use time limit like Colab, or when you want to train beyond your initial epoch count, you want to save the model weights to a file and load them later to resume.&lt;/p&gt;

&lt;h2&gt;
  
  
  Saving the model alone won't resume from the same accuracy
&lt;/h2&gt;

&lt;p&gt;Saving and loading a model goes like this — but this only works for plain inference in eval mode. If you try to resume training, you'll notice the loss and accuracy don't continue from before saving; they revert to initial values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# save
&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;state_dict&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# load
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TheModelClass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_state_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PATH&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  You should save the optimizer too
&lt;/h2&gt;

&lt;p&gt;To resume training, you need to save/load the optimizer's state in addition to the model weights.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# save
&lt;/span&gt;&lt;span class="n"&gt;save_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_model_training_state.pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epoch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_state_dict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;state_dict&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;optimizer_state_dict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;state_dict&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;loss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,},&lt;/span&gt;
           &lt;span class="n"&gt;save_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# load
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TheModelClass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;optimizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SGD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_ft&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;lr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;momentum&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_model_training_state.pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;checkpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_state_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_state_dict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_state_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;optimizer_state_dict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Move the optimizer state to the current device. Without this you can get a
# device mismatch between before and after saving.
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;epoch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epoch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;loss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# model.eval()
# # - or -
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;criterion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CrossEntropyLoss&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;exp_lr_scheduler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lr_scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StepLR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gamma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can resume from the loss and accuracy you had before saving.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/b206810350757d0325ec" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. I build apps with Core ML and ARKit and write about ML/AR. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>pytorch</category>
      <category>machinelearning</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>Run a ChatGPT-class chatbot in 5 minutes with FlexGen (in Colab)</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 23:27:21 +0000</pubDate>
      <link>https://dev.to/john-rocky/run-a-chatgpt-class-chatbot-in-5-minutes-with-flexgen-in-colab-337p</link>
      <guid>https://dev.to/john-rocky/run-a-chatgpt-class-chatbot-in-5-minutes-with-flexgen-in-colab-337p</guid>
      <description>&lt;p&gt;In about 5 minutes, you can launch and chat with a chatbot app.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ccuq0xkcrrskrtd59z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ccuq0xkcrrskrtd59z7.png" width="800" height="568"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/FMInference/FlexGen" rel="noopener noreferrer"&gt;https://github.com/FMInference/FlexGen&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use it
&lt;/h2&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/FMInference/FlexGen.git
&lt;span class="nb"&gt;cd &lt;/span&gt;FlexGen
pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;apps
python3 chatbot.py &lt;span class="nt"&gt;--model&lt;/span&gt; facebook/opt-6.7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That alone launches the chatbot.&lt;/p&gt;

&lt;p&gt;Here's a conversation I had with FlexGen's assistant:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Human:&lt;/strong&gt; Hi, I'm Daisuke.&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; Hi, I'm Jiro.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; Are you an AI?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; Yes, I'm an AI.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; How are you?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; Feeling good.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; What do you think about ChatGPT?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; I think it's very good.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; Which is better, you or ChatGPT?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; The AI assistant is superior.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; Wow.&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; I've been watching humans for a long time.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; How long?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; Since I was a baby.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; What kind of baby were you?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; A human baby.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; Are you a human?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; Yes.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; Doubtful.&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; Why?&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; I think you're an AI, not a baby.&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; Correct. I'm an AI baby.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; By the way, I'm an AI programmer and I'm wondering how to make money. Any advice?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; I don't know.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; What should I do to make money?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; You should learn how to program AI.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; What kind of AI programming should I learn?&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; I don't know.&lt;br&gt;
&lt;strong&gt;Human:&lt;/strong&gt; What do you think about the Red Hot Chili Peppers? I like them.&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; I don't know.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/28f17229b524c45d2204" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. I build machine-learning and AR apps (web/iOS) and write about them. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>chatbot</category>
    </item>
    <item>
      <title>I built a CLI that reads Claude Code's responses aloud (Zundamon TTS)</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 23:27:20 +0000</pubDate>
      <link>https://dev.to/john-rocky/i-built-a-cli-that-reads-claude-codes-responses-aloud-zundamon-tts-2oe0</link>
      <guid>https://dev.to/john-rocky/i-built-a-cli-that-reads-claude-codes-responses-aloud-zundamon-tts-2oe0</guid>
      <description>&lt;h2&gt;
  
  
  Have Zundamon read it aloud
&lt;/h2&gt;

&lt;p&gt;Claude Code is great. But when a long response comes back, you end up staring at the screen the whole time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What if it just read this to me, so I don't have to watch the screen?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I built it: &lt;strong&gt;voicevoice&lt;/strong&gt;, a CLI that reads Claude Code's responses aloud in real time.&lt;/p&gt;

&lt;p&gt;The default voice is &lt;strong&gt;Zundamon&lt;/strong&gt; — a popular Japanese synthetic-voice character. That voice reads Claude's responses to you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8or3mzncokiruqzn3lf5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8or3mzncokiruqzn3lf5.png" width="674" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/john-rocky/voicevoice" rel="noopener noreferrer"&gt;Demo video (GitHub)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Setup in 3 lines
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;john-rocky/tap/voicevoice
voicevoice setup    &lt;span class="c"&gt;# also auto-installs VOICEVOX here&lt;/span&gt;
voicevoice on
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Just launch &lt;code&gt;claude&lt;/code&gt; as usual, and Zundamon reads each response as it comes back.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto-reads&lt;/strong&gt; Claude Code's responses&lt;/li&gt;
&lt;li&gt;Switch among &lt;strong&gt;50+ character voices&lt;/strong&gt; (Zundamon, Shikoku Metan, Aoyama Ryusei, etc.)&lt;/li&gt;
&lt;li&gt;Fully local. No cloud API, &lt;strong&gt;completely free&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;When OFF, &lt;strong&gt;zero impact&lt;/strong&gt; on Claude Code (checks one file and exits immediately, ~0.1ms)&lt;/li&gt;
&lt;li&gt;One &lt;code&gt;uninstall&lt;/code&gt; &lt;strong&gt;fully restores your environment&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Basics
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# turn reading ON and launch Claude&lt;/span&gt;
voicevoice on
claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To toggle mid-conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;!&lt;/span&gt; voicevoice off    &lt;span class="c"&gt;# mute (can run during a Claude conversation)&lt;/span&gt;
&lt;span class="o"&gt;!&lt;/span&gt; voicevoice on     &lt;span class="c"&gt;# resume&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Works standalone too
&lt;/h3&gt;

&lt;p&gt;You can use it as a plain text-to-speech tool without Claude Code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;voicevoice &lt;span class="s2"&gt;"Hello, nice weather today"&lt;/span&gt;

&lt;span class="c"&gt;# pipes work too&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Build succeeded!"&lt;/span&gt; | voicevoice

&lt;span class="c"&gt;# for a male voice&lt;/span&gt;
voicevoice &lt;span class="nt"&gt;-s&lt;/span&gt; 13 &lt;span class="s2"&gt;"Good work"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Change the voice
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# list available characters&lt;/span&gt;
voicevoice &lt;span class="nt"&gt;-l&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Shikoku Metan: Normal(id=2), Sweet(id=0), Tsundere(id=6), Sexy(id=4), Whisper(id=36), Murmur(id=37)
Zundamon: Normal(id=3), Sweet(id=1), Tsundere(id=7), Sexy(id=5), Whisper(id=22), Murmur(id=38), Exhausted(id=75), Tearful(id=76)
Kasukabe Tsumugi: Normal(id=8)
Aoyama Ryusei: Normal(id=13), Passionate(id=81), Grumpy(id=82)
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pick from 50+ characters × multiple styles. Save your favorite and it's used for the hook-driven auto-reading too.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;voicevoice config speaker 13    &lt;span class="c"&gt;# switch to Aoyama Ryusei (saved)&lt;/span&gt;
voicevoice config speed 1.3     &lt;span class="c"&gt;# a bit faster&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code (response complete)
    ↓ Stop hook
voicevoice-hook.sh
    ↓ check if enabled
voicevoice CLI
    ↓ HTTP (localhost)
VOICEVOX engine
    ↓
audio playback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It uses Claude Code's &lt;strong&gt;Stop hook.&lt;/strong&gt; Each time a response completes, the hook script fires, grabs the last message, and passes it to VOICEVOX.&lt;/p&gt;

&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reading runs &lt;strong&gt;in the background.&lt;/strong&gt; You can type your next input even while it's playing.&lt;/li&gt;
&lt;li&gt;Long responses are &lt;strong&gt;cut at 500 characters&lt;/strong&gt; with "…(truncated)."&lt;/li&gt;
&lt;li&gt;Audio from multiple sessions &lt;strong&gt;queues via a file lock.&lt;/strong&gt; Voices never overlap.&lt;/li&gt;
&lt;li&gt;Everything runs on your Mac. &lt;strong&gt;No internet needed.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What &lt;code&gt;voicevoice setup&lt;/code&gt; does
&lt;/h2&gt;

&lt;p&gt;The setup command automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Installs VOICEVOX if missing&lt;/strong&gt; (downloads the DMG → copies to /Applications)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generates the hook script&lt;/strong&gt; (&lt;code&gt;~/.claude/hooks/voicevoice-hook.sh&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Registers the hook in Claude Code's settings.json&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It won't break existing settings. Running it twice won't double-register.&lt;/p&gt;

&lt;h2&gt;
  
  
  Uninstall
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;voicevoice uninstall
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This alone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;removes the hook registration from settings.json (other settings untouched)&lt;/li&gt;
&lt;li&gt;deletes the hook script&lt;/li&gt;
&lt;li&gt;deletes all config and flag files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;You return to exactly the same environment as before setup.&lt;/strong&gt; It's a one-shot to remove, so feel free to try it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;macOS 14+ (Apple Silicon)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://claude.ai/code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://voicevox.hiroshiba.jp/" rel="noopener noreferrer"&gt;VOICEVOX&lt;/a&gt; — auto-installed by &lt;code&gt;voicevoice setup&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://jqlang.github.io/jq/" rel="noopener noreferrer"&gt;jq&lt;/a&gt; — &lt;code&gt;brew install jq&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Command list
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;voicevoice setup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;first-time setup (incl. VOICEVOX auto-install)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;voicevoice on&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;reading ON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;voicevoice off&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;reading OFF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;voicevoice status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;check current state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;voicevoice config&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;view/change settings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;voicevoice -l&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;list characters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;voicevoice uninstall&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;full removal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;You don't have to watch the screen the whole time you're coding — Zundamon tells you the gist.&lt;/p&gt;

&lt;p&gt;Stretching, making coffee, listening to Claude's responses. That kind of dev experience is honestly pretty nice.&lt;/p&gt;

&lt;p&gt;Repo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/john-rocky/voicevoice" rel="noopener noreferrer"&gt;https://github.com/john-rocky/voicevoice&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;john-rocky/tap/voicevoice
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use VOICEVOX character voices publicly, credit is required (e.g. &lt;code&gt;VOICEVOX:Zundamon&lt;/code&gt;). See the &lt;a href="https://voicevox.hiroshiba.jp/term/" rel="noopener noreferrer"&gt;VOICEVOX terms&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/cb569097d45c6670dda2" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cli</category>
      <category>productivity</category>
      <category>macos</category>
    </item>
    <item>
      <title>A Swift library to run Segment Anything natively on iOS (SamKit)</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 06:16:54 +0000</pubDate>
      <link>https://dev.to/john-rocky/a-swift-library-to-run-segment-anything-natively-on-ios-samkit-5eho</link>
      <guid>https://dev.to/john-rocky/a-swift-library-to-run-segment-anything-natively-on-ios-samkit-5eho</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8c1hl5d9jx3kjzie8c7t.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8c1hl5d9jx3kjzie8c7t.gif" alt="SAMKit Demo" width="390" height="802"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a while I'd wanted to build a Swift Package that runs Meta's &lt;a href="https://github.com/facebookresearch/segment-anything" rel="noopener noreferrer"&gt;Segment Anything Model (SAM)&lt;/a&gt; on-device on iOS.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cut out the object you tap&lt;/li&gt;
&lt;li&gt;Cut out the object you box in&lt;/li&gt;
&lt;li&gt;Cut out the object you specify by text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Any of these segments instantly, with all inference completing on-device. It even comes with ready-to-use UI components.&lt;/p&gt;

&lt;p&gt;So I built it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/john-rocky/SamKit" rel="noopener noreferrer"&gt;https://github.com/john-rocky/SamKit&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What it can do
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Point &amp;amp; Box&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tap for a point, drag for a box, then segment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Text Prompt&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Type text like &lt;code&gt;"dog"&lt;/code&gt; or &lt;code&gt;"red cup"&lt;/code&gt; to detect and segment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Subject Lift&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Long-press to lift an object out, Apple Photos–style; copy/save/share&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Two backbones&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MobileSAM (fast, 23MB) and SAM2 Tiny (accurate, 76MB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Drop-in UI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Just embed the SwiftUI views as-is&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SAMKit/
├── SAMKit            # core inference engine (point/box)
├── SAMKitGrounding   # text detection (YOLO-World + CLIP)
└── SAMKitUI          # SwiftUI views (SamView / TextPromptView)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Split into three Swift Package products. Import only what you need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Add the Swift Package
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="nv"&gt;dependencies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;package&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"https://github.com/john-rocky/SamKit.git"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"1.0.0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Download the models
&lt;/h3&gt;

&lt;p&gt;Get the &lt;code&gt;.mlpackage&lt;/code&gt; files from &lt;a href="https://github.com/john-rocky/SamKit/releases" rel="noopener noreferrer"&gt;Releases&lt;/a&gt; and add them to your Xcode project.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MobileSAM&lt;/td&gt;
&lt;td&gt;23 MB&lt;/td&gt;
&lt;td&gt;point/box segmentation (required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SAM2 Tiny&lt;/td&gt;
&lt;td&gt;76 MB&lt;/td&gt;
&lt;td&gt;higher-accuracy segmentation (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grounding (YOLO-World + CLIP)&lt;/td&gt;
&lt;td&gt;148 MB&lt;/td&gt;
&lt;td&gt;text detection (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Point/box segmentation
&lt;/h3&gt;

&lt;p&gt;The most basic use. Set an image and specify a point.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;import&lt;/span&gt; &lt;span class="kt"&gt;SAMKit&lt;/span&gt;

&lt;span class="c1"&gt;// create a session (the model auto-loads from a bundled resource)&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="kt"&gt;SamSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bundled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mobileSam&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nv"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bestAvailable&lt;/span&gt;      &lt;span class="c1"&gt;// priority: Neural Engine &amp;gt; GPU &amp;gt; CPU&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// encode the image (once; later predicts use the cache)&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cgImage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// segment by point&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;points&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;SamPoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;positive&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// results&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;masks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;
&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cgImage&lt;/span&gt;   &lt;span class="c1"&gt;// segmentation mask image&lt;/span&gt;
&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;     &lt;span class="c1"&gt;// IoU confidence score&lt;/span&gt;
&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;     &lt;span class="c1"&gt;// alpha-channel data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also specify negative points (regions to exclude) and a bounding box:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;points&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="kt"&gt;SamPoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;positive&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;   &lt;span class="c1"&gt;// point to include&lt;/span&gt;
        &lt;span class="kt"&gt;SamPoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;negative&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;// point to exclude&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="nv"&gt;box&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;SamBox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;x0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;y0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// bounding box&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Segment by text prompt
&lt;/h3&gt;

&lt;p&gt;Combine SAM with text detection by YOLO-World + CLIP.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;import&lt;/span&gt; &lt;span class="kt"&gt;SAMKit&lt;/span&gt;
&lt;span class="kd"&gt;import&lt;/span&gt; &lt;span class="kt"&gt;SAMKitGrounding&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="kt"&gt;TextSegmentationSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;groundingModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bundled&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="nv"&gt;samModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bundled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mobileSam&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cgImage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// search by text and segment&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;segment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"dog, cat"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt;   &lt;span class="c1"&gt;// detections (bounding box + label)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;masks&lt;/span&gt;        &lt;span class="c1"&gt;// segmentation mask for each detection&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;       &lt;span class="c1"&gt;// confidence scores&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cutting out the object
&lt;/h3&gt;

&lt;p&gt;You can generate a transparent PNG from the segmentation result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="c1"&gt;// cut out from a single mask&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;extracted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;masks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extractObject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cgImage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → a CGImage with a transparent background&lt;/span&gt;

&lt;span class="c1"&gt;// composite cut-out from multiple masks&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;SamMask&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extractObject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cgImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;masks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;masks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Embedding the SwiftUI views
&lt;/h3&gt;

&lt;p&gt;You don't need to build the UI yourself. &lt;code&gt;SAMKitUI&lt;/code&gt; includes ready-to-use views.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;import&lt;/span&gt; &lt;span class="kt"&gt;SAMKitUI&lt;/span&gt;

&lt;span class="c1"&gt;// interactive segmentation by point/box&lt;/span&gt;
&lt;span class="kt"&gt;SamView&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uiImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bundled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mobileSam&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;// segmentation by text search&lt;/span&gt;
&lt;span class="kt"&gt;TextPromptView&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uiImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;textSession&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These views include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;subject highlight after segmentation (dim background + subject at full brightness)&lt;/li&gt;
&lt;li&gt;an animated glowing outline&lt;/li&gt;
&lt;li&gt;long-press to lift the object → drag → Copy/Save/Share menu&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Subject Lift is implemented
&lt;/h2&gt;

&lt;p&gt;A technical walkthrough of recreating Apple Photos' "lift the subject" feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Binarizing the mask
&lt;/h3&gt;

&lt;p&gt;SAM's mask output is continuous sigmoid values, so convert it to a clean binary mask for display.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;binarizeMask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="nv"&gt;maskImage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;CGImage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;CGImage&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// get pixel data via CGContext&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;CGContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maskImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;pixels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;!.&lt;/span&gt;&lt;span class="nf"&gt;bindMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;UInt8&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;UInt8&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;  &lt;span class="c1"&gt;// 50% — SAM's standard threshold&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;o&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// fully opaque white&lt;/span&gt;
            &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// fully transparent&lt;/span&gt;
            &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;pixels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makeImage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At threshold 0 it picks up mask noise and cuts out most of the image. 128 (50%) is stable.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Generating the glowing outline
&lt;/h3&gt;

&lt;p&gt;Extract the mask's contour with CGContext's shadow feature. Far faster than per-pixel dilation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;generateOutline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;from&lt;/span&gt; &lt;span class="nv"&gt;maskImage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;CGImage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;CGImage&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Step 1: turn the mask into a solid-white silhouette&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maskImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setBlendMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sourceIn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setFillColor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;UIColor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cgColor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → white silhouette&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 2: draw with a shadow, then erase the interior → only the contour remains&lt;/span&gt;
    &lt;span class="n"&gt;outCtx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setShadow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zero&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;blur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;glowRadius&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;UIColor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cgColor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;outCtx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;whiteSilhouette&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// shadow = the contour's glow&lt;/span&gt;

    &lt;span class="n"&gt;outCtx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setBlendMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;destinationOut&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;outCtx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;whiteSilhouette&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// erase the interior → contour only&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;setShadow&lt;/code&gt; makes the white glow (only two draws)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.destinationOut&lt;/code&gt; erases the interior, leaving only the outer glow&lt;/li&gt;
&lt;li&gt;Far faster than a dilation loop (O(thickness² × pixels))&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Shimmer animation
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;TimelineView&lt;/code&gt; and &lt;code&gt;AngularGradient&lt;/code&gt; to make light travel around the contour.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kt"&gt;TimelineView&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;animation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;minimumInterval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;timeline&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;phase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeIntervalSinceReferenceDate&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;truncatingRemainder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;dividingBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;2.5&lt;/span&gt;  &lt;span class="c1"&gt;// one lap in 2.5s&lt;/span&gt;

    &lt;span class="kt"&gt;ZStack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// soft glow (blurred cyan)&lt;/span&gt;
        &lt;span class="n"&gt;outlineImage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;colorMultiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;red&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;green&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;blue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blur&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;radius&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;// sharp outline&lt;/span&gt;
        &lt;span class="n"&gt;outlineImage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;colorMultiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;// moving highlight&lt;/span&gt;
        &lt;span class="n"&gt;outlineImage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;colorMultiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="kt"&gt;AngularGradient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="nv"&gt;colors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;white&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="nv"&gt;center&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;center&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nv"&gt;startAngle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;degrees&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phase&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;360&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="nv"&gt;endAngle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;degrees&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phase&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;360&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;360&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Unified gesture handler
&lt;/h3&gt;

&lt;p&gt;Manage tap (add point), box drawing, and long-press lift &lt;strong&gt;all with a single &lt;code&gt;DragGesture(minimumDistance: 0)&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SwiftUI's &lt;code&gt;onTapGesture&lt;/code&gt; + &lt;code&gt;onLongPressGesture&lt;/code&gt; block each other, so I receive all touches in one gesture and classify them by time and movement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kt"&gt;DragGesture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;minimumDistance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;onChanged&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
        &lt;span class="c1"&gt;// schedule a timer on the first touch&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gestureStartTime&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;gestureStartTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="c1"&gt;// decide long-press after 0.3s&lt;/span&gt;
            &lt;span class="kt"&gt;DispatchQueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asyncAfter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;deadline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;guard&lt;/span&gt; &lt;span class="n"&gt;gestureStartTime&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;isLifted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hasVisibleMasks&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;moved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hypot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lastTranslation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lastTranslation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;guard&lt;/span&gt; &lt;span class="n"&gt;moved&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// if it moved, it's not a long-press&lt;/span&gt;
                &lt;span class="nf"&gt;handleLiftObject&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;// start the lift!&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isLifted&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;liftDragOffset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;translation&lt;/span&gt;  &lt;span class="c1"&gt;// follow the drag&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;onEnded&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isLifted&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// released → show menu&lt;/span&gt;
            &lt;span class="n"&gt;showLiftMenu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;moved&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// quick touch → add point&lt;/span&gt;
            &lt;span class="nf"&gt;addPoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;startLocation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Classification logic:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Condition&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;lt; 0.3s, &amp;lt; 15pt moved&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;tap&lt;/strong&gt; → add point&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;≥ 0.3s, &amp;lt; 15pt moved&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;long-press&lt;/strong&gt; → start lift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;≥ 10pt moved (box mode)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;drag&lt;/strong&gt; → draw box&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;movement while lifted&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;lift-drag&lt;/strong&gt; → move object&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  5. Subject highlight
&lt;/h3&gt;

&lt;p&gt;Rather than overlaying a colored mask after segmentation, &lt;strong&gt;dim the background and show only the subject at its original brightness.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="c1"&gt;// darken the background&lt;/span&gt;
&lt;span class="kt"&gt;Color&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;black&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;opacity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// show only the subject at the original image's brightness&lt;/span&gt;
&lt;span class="kt"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;uiImage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;uiImage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;UIImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;cgImage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;binaryMask&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes the transition to long-press lift natural (the dimming just deepens 0.25 → 0.4).&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Image encoding&lt;/strong&gt;: once per image; later predicts reuse the cache&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inference&lt;/strong&gt;: accelerated on Neural Engine / GPU (FP16)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outline generation&lt;/strong&gt;: only two CGContext-shadow draws; no pixel loop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Networking&lt;/strong&gt;: none. Fully on-device&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;With SAMKit you can add segmentation to an iOS app in a few lines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="c1"&gt;// an interactive segmentation UI in one line&lt;/span&gt;
&lt;span class="kt"&gt;SamView&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uiImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bundled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mobileSam&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Experiences like Subject Lift are built in too, so you can bring an Apple Photos–like UX into your own app immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/john-rocky/SamKit" rel="noopener noreferrer"&gt;https://github.com/john-rocky/SamKit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback and issues welcome!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/83a5aa73edfeb577f5ea" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>swift</category>
      <category>machinelearning</category>
      <category>coreml</category>
    </item>
    <item>
      <title>Type 'dog' to detect a dog: running YOLO-World on iPhone</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 06:16:52 +0000</pubDate>
      <link>https://dev.to/john-rocky/type-dog-to-detect-a-dog-running-yolo-world-on-iphone-2h2e</link>
      <guid>https://dev.to/john-rocky/type-dog-to-detect-a-dog-running-yolo-world-on-iphone-2h2e</guid>
      <description>&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Type text like "person, red car, coffee cup" and it detects those objects in the camera view in real time. No class list needed. &lt;strong&gt;You can specify any words you like, as many as you like.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgjpcz7ij7abal04obh8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgjpcz7ij7abal04obh8.png" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is YOLO-World's "Open-Vocabulary Detection." Presented at CVPR 2024, it's a fundamentally different approach from the conventional "fixed 80 classes" YOLO.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;text input ──→ CLIP Text Encoder ──→ text features [1,80,512]
                                            │
camera feed ──→ YOLO-World Detector ────────┤──→ boxes [1,4,8400]
                                            └──→ scores [1,80,8400]
                                                     │
                                                 NMS + Filter ──→ bounding boxes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A dual-wield of CLIP's language understanding and YOLO's detection speed. It converts text into vectors and detects via the matching score against features extracted from the image.&lt;/p&gt;

&lt;p&gt;Changing the query text only re-runs the CLIP encoder; camera-frame inference uses only the visual detector. No heavy recompute runs every time the text changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing the CoreML models
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Download (ready to use)
&lt;/h3&gt;

&lt;p&gt;Download 3 files from the release assets of the &lt;a href="https://github.com/john-rocky/CoreML-Models" rel="noopener noreferrer"&gt;CoreML-Models&lt;/a&gt; repository:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/john-rocky/CoreML-Models/releases/download/yolo-models-v1/yoloworld_detector.mlpackage.zip" rel="noopener noreferrer"&gt;yoloworld_detector.mlpackage&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;25 MB&lt;/td&gt;
&lt;td&gt;YOLO-World V2-S (image → boxes+scores)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/john-rocky/CoreML-Models/releases/download/yolo-models-v1/clip_text_encoder.mlpackage.zip" rel="noopener noreferrer"&gt;clip_text_encoder.mlpackage&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;121 MB&lt;/td&gt;
&lt;td&gt;CLIP ViT-B/32 (text → embedding)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/john-rocky/CoreML-Models/releases/download/yolo-models-v1/clip_vocab.json.zip" rel="noopener noreferrer"&gt;clip_vocab.json&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1.6 MB&lt;/td&gt;
&lt;td&gt;BPE tokenizer vocabulary&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Convert it yourself
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;ultralytics open_clip_torch &lt;span class="nv"&gt;coremltools&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;8.1
python convert_models.py &lt;span class="nt"&gt;--size&lt;/span&gt; s  &lt;span class="c"&gt;# s/m/l/x&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The conversion script does:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Unwrap YOLO-World V2's Detect head&lt;/strong&gt; — output &lt;code&gt;boxes [1,4,8400]&lt;/code&gt; and &lt;code&gt;scores [1,NC,8400]&lt;/code&gt; directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Convert CLIP's text encoder standalone&lt;/strong&gt; — patch MultiheadAttention to be CoreML-compatible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export the BPE vocab as JSON&lt;/strong&gt; — for the Swift-side tokenizer&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  iOS implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Architecture overview
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TextGroundingDetector (ObservableObject)
├── visualModel: MLModel    — YOLO-World detector
├── textEncoder: MLModel    — CLIP text encoder
├── tokenizer: CLIPTokenizer — BPE tokenizer
└── cachedTxtFeats: MLMultiArray — text-feature cache
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Encoding text
&lt;/h3&gt;

&lt;p&gt;Run only when the user changes the query; the result is cached.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;updateQueries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="nv"&gt;queryString&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;queryString&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;separator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;","&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;$0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trimmingCharacters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;whitespaces&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// tokenize each query → CLIP encoder → 512-dim vector&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;txtFeats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="kt"&gt;MLMultiArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nv"&gt;dataType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enumerated&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;// ... textEncoder.prediction() via MLDictionaryFeatureProvider ...&lt;/span&gt;
        &lt;span class="c1"&gt;// L2-normalize the result and store into txtFeats[i]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;cachedTxtFeats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;txtFeats&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Up to 80 queries can be detected at once&lt;/li&gt;
&lt;li&gt;L2 normalization is important — CLIP outputs live in a normalized cosine-similarity space&lt;/li&gt;
&lt;li&gt;Fast normalization with Accelerate via &lt;code&gt;vDSP_svesq&lt;/code&gt; + &lt;code&gt;vDSP_vsmul&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Image preprocessing
&lt;/h3&gt;

&lt;p&gt;YOLO-World requires letterbox preprocessing (keep aspect ratio + padding):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;preprocessImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="nv"&gt;cgImage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;CGImage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throws&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;MLMultiArray&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;scale&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="kt"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imgW&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imgH&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;scaledW&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imgW&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;scaledH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imgH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;padX&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;640&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;scaledW&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;padY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;640&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;scaledH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

    &lt;span class="c1"&gt;// draw onto a 640x640 canvas padded with gray (0.5)&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setFillColor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;gray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;CGRect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cgImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;CGRect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;padX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;padY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;scaledW&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;scaledH&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;// RGBA → CHW Float32 [0,1]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;640*&lt;/span&gt;&lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;hw&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;  &lt;span class="c1"&gt;// R&lt;/span&gt;
        &lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;hw&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;  &lt;span class="c1"&gt;// G&lt;/span&gt;
        &lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;hw&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;  &lt;span class="c1"&gt;// B&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can't use &lt;code&gt;.scaleFill&lt;/code&gt; — the coordinates shift by the letterbox padding, so you have to subtract the padding back out of the output coordinates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inference and post-processing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="kt"&gt;MLDictionaryFeatureProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;dictionary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"txt_feats"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cachedTxtFeats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// cached text features&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;visualModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;boxes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;featureValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;for&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"boxes"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;!.&lt;/span&gt;&lt;span class="n"&gt;multiArrayValue&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;   &lt;span class="c1"&gt;// [1,4,8400]&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;featureValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;for&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"scores"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;!.&lt;/span&gt;&lt;span class="n"&gt;multiArrayValue&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="c1"&gt;// [1,NC,8400]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;qi&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;queryCount&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;anchor&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;8400&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;qi&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8400&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;guard&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;cx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boxes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8400&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;cy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boxes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8400&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;bw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boxes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8400&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;bh&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boxes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8400&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;// remove padding and convert to normalized coordinates&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;nx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cx&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;bw&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;padX&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imgW&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;ny&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cy&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;padY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imgH&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output scores are sigmoid values already computed by the BNContrastiveHead, so you can use them directly as confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  NMS
&lt;/h3&gt;

&lt;p&gt;Apply NMS per query (per-class):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="n"&gt;allDets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;$0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;kept&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allDets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indices&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;suppress&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ki&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;kept&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;allDets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;classIndex&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;allDets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ki&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;classIndex&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nf"&gt;iou&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;allDets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allDets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ki&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;suppress&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;suppress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;kept&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  BPE tokenizer (Swift)
&lt;/h3&gt;

&lt;p&gt;You need to implement CLIP's tokenizer in Swift. Load the BPE merge rules and vocabulary from &lt;code&gt;clip_vocab.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="kt"&gt;CLIPTokenizer&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;contextLength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;  &lt;span class="c1"&gt;// 77&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;bpeRanks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="nv"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;|startoftext|&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="c1"&gt;// lowercase text → split into characters → BPE merge → token IDs&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
        &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;|endoftext|&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;// pad to contextLength (77)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="kt"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;repeating&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;contextLength&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Compared with ordinary YOLO
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;YOLO-World (Open-Vocabulary)&lt;/th&gt;
&lt;th&gt;YOLO26 (fixed classes)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Detection target&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;any text&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;fixed COCO 80 classes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model setup&lt;/td&gt;
&lt;td&gt;Detector + CLIP Encoder + Vocab&lt;/td&gt;
&lt;td&gt;one model only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total size&lt;/td&gt;
&lt;td&gt;~148 MB&lt;/td&gt;
&lt;td&gt;~18 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NMS&lt;/td&gt;
&lt;td&gt;implemented app-side&lt;/td&gt;
&lt;td&gt;none (End-to-End)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use for&lt;/td&gt;
&lt;td&gt;flexible detection / search / grounding&lt;/td&gt;
&lt;td&gt;general object detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speed&lt;/td&gt;
&lt;td&gt;a bit slower (CLIP overhead)&lt;/td&gt;
&lt;td&gt;fastest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Practical scenarios
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Search by "red sneakers"&lt;/strong&gt; — visual search in an e-commerce app&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detect "cracks"&lt;/strong&gt; — infrastructure inspection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detect "dog, cat, hamster" simultaneously&lt;/strong&gt; — pet tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let users freely specify what to detect&lt;/strong&gt; — deploy without customization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With fixed-class YOLO you had to collect a dataset and retrain to detect "cracks." With YOLO-World you just change the text.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample app
&lt;/h2&gt;

&lt;p&gt;A complete sample app is in &lt;code&gt;sample_apps/YOLOWorldDemo/&lt;/code&gt; of the &lt;a href="https://github.com/john-rocky/CoreML-Models" rel="noopener noreferrer"&gt;CoreML-Models&lt;/a&gt; repository.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 modes: camera / photo / video&lt;/li&gt;
&lt;li&gt;freely change the query in a text field&lt;/li&gt;
&lt;li&gt;real-time filtering with a confidence slider&lt;/li&gt;
&lt;li&gt;download the models from release assets and drag into Xcode&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conversion tips
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;coremltools 8.1&lt;/strong&gt; (9.0 has a bug)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need to patch &lt;code&gt;torch.nn.MultiheadAttention.forward&lt;/code&gt;&lt;/strong&gt; — CoreML can't convert the default PyTorch MHA well; monkey-patch it to call &lt;code&gt;F.multi_head_attention_forward&lt;/code&gt; directly&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;YOLO-World V2&lt;/strong&gt; (faster and more accurate than V1)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;compute_precision=ct.precision.FLOAT16&lt;/code&gt; halves the model size&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;YOLO-World delivers intuitive, powerful object detection where you "specify what you want to detect by text." Run it on the iPhone's Neural Engine and it works server-free, offline, with low latency.&lt;/p&gt;

&lt;p&gt;When to use which:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Speed-first, COCO 80 classes is enough → &lt;strong&gt;YOLO26&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Want to flexibly change targets → &lt;strong&gt;YOLO-World&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2401.17270" rel="noopener noreferrer"&gt;YOLO-World paper (CVPR 2024)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/AILab-CVC/YOLO-World" rel="noopener noreferrer"&gt;AILab-CVC/YOLO-World (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.ultralytics.com/models/yolo-world/" rel="noopener noreferrer"&gt;Ultralytics YOLO-World Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/john-rocky/CoreML-Models" rel="noopener noreferrer"&gt;CoreML-Models repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/openai/CLIP" rel="noopener noreferrer"&gt;OpenAI CLIP&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/f573001a4fec4451ced0" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>machinelearning</category>
      <category>computervision</category>
      <category>coreml</category>
    </item>
    <item>
      <title>Real-time relighting of Gaussian Splatting reflections on iPhone (Metal)</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 06:11:50 +0000</pubDate>
      <link>https://dev.to/john-rocky/real-time-relighting-of-gaussian-splatting-reflections-on-iphone-metal-3b26</link>
      <guid>https://dev.to/john-rocky/real-time-relighting-of-gaussian-splatting-reflections-on-iphone-metal-3b26</guid>
      <description>&lt;p&gt;I built a Metal viewer on iPhone that &lt;strong&gt;re-lights an already-captured 3D scene with any lighting you like.&lt;/strong&gt; Swap or rotate the HDR environment map, and the object's reflections follow in real time, with that environment also drawn into the background.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fee7g9orkg1964chxkend.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fee7g9orkg1964chxkend.gif" width="360" height="783"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why relighting is valuable (the practical, commercial meaning)
&lt;/h2&gt;

&lt;p&gt;Ordinary Gaussian Splatting has the &lt;strong&gt;lighting from capture time baked in.&lt;/strong&gt; So if you place the captured object somewhere else, the highlights and shadows clash with the new surroundings and look fake. Relighting strips that light off and &lt;strong&gt;returns it to material&lt;/strong&gt;, so you can place the captured object &lt;strong&gt;under any lighting.&lt;/strong&gt; This matters in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;E-commerce / product visuals&lt;/strong&gt;: capture a product once and show it under any light — showroom, outdoors, the customer's room (AR). Strong for "texture sells" goods like furniture, cars, jewelry, sneakers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Film / virtual production&lt;/strong&gt;: place a real capture into a new scene, consistent with that scene's lighting (HDRI / LED wall). No reshoot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AR / spatial computing&lt;/strong&gt;: an object placed in a real room &lt;strong&gt;only blends in once it's lit by the room's light.&lt;/strong&gt; Relighting is a precondition for realistic AR placement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Games / real-time 3D&lt;/strong&gt;: instead of a baked-in fixed look, you get a photoreal asset that reacts to dynamic in-game light (day/night, moving lights).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: an asset captured in minutes becomes "usable under real production lighting," like manual modeling + material authoring that takes days.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, relighting &lt;strong&gt;turns a captured 3D into an actually usable asset.&lt;/strong&gt; This article is a record of bringing it from desktop research (CUDA-assumed) to a &lt;strong&gt;real-time Metal implementation on iPhone.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;The first half collects &lt;strong&gt;background knowledge&lt;/strong&gt;; the second half covers the &lt;strong&gt;implementation and four bugs I hit.&lt;/strong&gt; I define jargon as it appears, so you can follow even without a Gaussian Splatting / PBR background.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Repo: &lt;a href="https://github.com/john-rocky/MetalGaussianSplatRelighting" rel="noopener noreferrer"&gt;https://github.com/john-rocky/MetalGaussianSplatRelighting&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1. What is 3D Gaussian Splatting
&lt;/h2&gt;

&lt;p&gt;A method that reconstructs a 3D scene from a set of photos and renders it in real time. The scene is represented as a huge set of translucent ellipsoids called &lt;strong&gt;splats.&lt;/strong&gt; Each splat has "position, shape &amp;amp; orientation (rotation), color, opacity." Rendering projects each splat to the screen as an ellipse and &lt;strong&gt;alpha-composites them front-to-back in depth order.&lt;/strong&gt; Color changes with viewing angle (view-dependent).&lt;/p&gt;

&lt;p&gt;Key point: ordinary Gaussian Splatting holds the &lt;strong&gt;appearance (color) itself&lt;/strong&gt;, with the &lt;strong&gt;capture-time lighting baked in.&lt;/strong&gt; So you can't change the lighting afterward.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Lighting and relighting
&lt;/h2&gt;

&lt;p&gt;Ordinary GS directly learns "the color of that spot photographed under that light." So lighting is fixed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Relightable GS&lt;/strong&gt; thinks differently. Per splat, it holds not "color" but a decomposed &lt;strong&gt;material&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Albedo&lt;/strong&gt;: the base color of the material itself, with lighting removed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normal&lt;/strong&gt;: the direction the surface faces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Roughness&lt;/strong&gt;: surface micro-roughness&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reflectance&lt;/strong&gt;: strength of specular reflection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the material, you can &lt;strong&gt;recompute the color on the fly under any environment light.&lt;/strong&gt; That's relighting. The &lt;a href="https://github.com/fudan-zvg/ref-gaussian" rel="noopener noreferrer"&gt;Ref-Gaussian&lt;/a&gt; I used learns this material decomposition.&lt;/p&gt;




&lt;p&gt;That's enough if you get "splats hold material, and we want to re-light them in a new environment." &lt;strong&gt;The rest, #3–#5, just answer one question:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Given that material and the environment, how do we compute the color of one pixel?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3. Reflection is split into two and added
&lt;/h2&gt;

&lt;p&gt;Light hitting a surface returns in two ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Diffuse&lt;/strong&gt;: returns light evenly in all directions → you see the &lt;strong&gt;albedo color itself.&lt;/strong&gt; No reflections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specular&lt;/strong&gt;: returns only in a specific direction (the reflection of the incidence) → the &lt;strong&gt;environment is reflected.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So &lt;code&gt;color = diffuse + specular&lt;/code&gt;. The splat's &lt;strong&gt;reflectance&lt;/strong&gt; decides the blend (how strong the specular is). And &lt;strong&gt;roughness decides how blurry the specular is&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;low roughness → the environment reflects crisply, like a mirror&lt;/li&gt;
&lt;li&gt;high roughness → blurry; the bright parts of the environment just appear as a &lt;strong&gt;vague blob&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters later. In fact, a glossy car (roughness ~0.2) just shows the environment lighting as a &lt;strong&gt;blurry white blob&lt;/strong&gt; that moves, without resolving beams or window shapes. It's not "no reflection," it's a &lt;strong&gt;blurry reflection&lt;/strong&gt;, and that's physically correct. Drop the roughness way down and the same car becomes mirror-like, clearly reflecting the room.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. How to hold the environment (light source)
&lt;/h2&gt;

&lt;p&gt;Instead of placing point bulbs, use a &lt;strong&gt;360° image&lt;/strong&gt; as the light = a list of what color light comes from each direction.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HDR&lt;/strong&gt;: an image that can hold brightness above 1 (needed because windows and lights are orders of magnitude brighter than paper)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;equirect / cubemap&lt;/strong&gt;: names for the &lt;strong&gt;storage format&lt;/strong&gt; of that 360° image. Both contents are "direction → light color."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Swapping the environment map = swapping the lighting = relighting.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Getting diffuse and specular from the environment (split-sum)
&lt;/h2&gt;

&lt;p&gt;We want to compute #3's "diffuse" and "specular" from #4's environment map. Done naively, you integrate the environment per pixel — heavy every frame. So UE4's &lt;strong&gt;split-sum&lt;/strong&gt; &lt;strong&gt;precomputes two images&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For diffuse (irradiance)&lt;/strong&gt;: the environment averaged over all directions → one lookup in the &lt;strong&gt;normal direction&lt;/strong&gt; gives diffuse light.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For specular (prefiltered)&lt;/strong&gt;: the environment &lt;strong&gt;blurred per roughness level&lt;/strong&gt; (stored progressively in mips) → one lookup in the &lt;strong&gt;reflection direction&lt;/strong&gt; gives specular light (blur level = roughness).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At runtime you just sample these two textures. Replacing a heavy integral with "look up a pre-blurred image" is the heart of split-sum. (Auxiliary: a small table that fine-tunes specular strength by angle — the BRDF LUT — is also precomputed.)&lt;/p&gt;

&lt;p&gt;I implemented this precompute kernel in Metal.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Deferred shading (a splat-specific issue)
&lt;/h2&gt;

&lt;p&gt;Splats are translucent and overlap, so neighboring splats' normals vary and get &lt;strong&gt;noisy.&lt;/strong&gt; If you shade each splat individually and then composite, that noise comes straight through.&lt;/p&gt;

&lt;p&gt;So &lt;strong&gt;change the order&lt;/strong&gt;: first accumulate each splat's material (color, normal, roughness, etc.) into a screen buffer (&lt;strong&gt;G-buffer&lt;/strong&gt;) and &lt;strong&gt;blend = average&lt;/strong&gt;, then shade &lt;strong&gt;once per pixel.&lt;/strong&gt; Computing after normals are averaged reduces noise. In Metal, use &lt;strong&gt;tile memory (imageblock)&lt;/strong&gt; to keep this buffer on the GPU and process it fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Normals and coordinate systems
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Normal&lt;/strong&gt;: the unit vector of the surface direction. The reflection direction depends on it, so if it's off, all reflections are off. It's reconstructed from the splat's orientation (rotation quaternion).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Up-axis convention&lt;/strong&gt;: there's &lt;strong&gt;Y-up&lt;/strong&gt; (many viewers) and &lt;strong&gt;Z-up&lt;/strong&gt; (Blender-family). A mismatch between data and viewer tips the object over (→ bug 3).&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Implementation: the shading equations
&lt;/h1&gt;

&lt;p&gt;With the background in place, read the equations Ref-Gaussian's deferred surfel renderer (&lt;code&gt;render_surfel&lt;/code&gt;) computes per pixel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;F0&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.04&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;reflectance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;albedo&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;reflectance&lt;/span&gt;
&lt;span class="n"&gt;specular&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prefiltered&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reflect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;roughness&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;F0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;fg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;final&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;reflectance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;base_color&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;specular&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;reflect(V, N)&lt;/code&gt;: view direction V reflected about normal N. Look up prefiltered (#5) here = the environment reflected in the specular.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fg&lt;/code&gt;: a lookup into the BRDF LUT (#5).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;final&lt;/code&gt;: &lt;strong&gt;uses base_color directly for diffuse, and only computes specular from the environment and adds it&lt;/strong&gt; (= #3's "diffuse + specular"). This matters in bug 2.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whole pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ref-Gaussian .ply --&amp;gt; load --&amp;gt; per-splat material (normal, roughness, reflectance, albedo)
                                       |
HDR env --&amp;gt; IBL precompute --&amp;gt; prefiltered (specular) + irradiance (diffuse) + BRDF LUT
                                       |
                         +-------------+--------------+
                         v                            v
                 G-buffer pass               postprocess pass
        (blend color/normal/material      (per-pixel split-sum IBL
          into tile memory = #6)            + skybox compositing)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;— sounds clean. Until you run it. Now the real story: four bugs.&lt;/p&gt;




&lt;h1&gt;
  
  
  Bug 1: the normal map is a rainbow sandstorm
&lt;/h1&gt;

&lt;p&gt;The shading was patchy and flickering. The "Normal" debug view (normals visualized as color) was &lt;strong&gt;rainbow noise&lt;/strong&gt;, not a smooth gradient.&lt;/p&gt;

&lt;p&gt;My first thought — "2D-surfel normals are just inherently noisy" — was &lt;strong&gt;wrong&lt;/strong&gt;, and I nearly wasted a stack of device builds on it.&lt;/p&gt;

&lt;p&gt;What saved me was discipline: &lt;strong&gt;first draw the normals offline and verify.&lt;/strong&gt; Compositing each splat's geometric normal (#7: reconstructed from the rotation quaternion and flipped to face the camera) with a small numpy script gave a &lt;strong&gt;smooth&lt;/strong&gt; result (median gradient 0.006). So the data was correct and the renderer was buggy.&lt;/p&gt;

&lt;p&gt;Culprit: at load time MetalSplatter reorders splats for cache efficiency (Morton order, &lt;code&gt;sortByLocality&lt;/code&gt;). It reorders the splat buffer &lt;strong&gt;and&lt;/strong&gt; the SH-coefficient buffer, but the &lt;strong&gt;material buffer (which holds the normals)&lt;/strong&gt; I'd added later was a separate buffer and &lt;strong&gt;wasn't reordered.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So after sorting, &lt;code&gt;splats[i]&lt;/code&gt; corresponded to &lt;code&gt;materials[some other j]&lt;/code&gt;, and &lt;strong&gt;every splat held someone else's normal.&lt;/strong&gt; Color (the view-dependent color of #1) was in an already-reordered buffer, so it stayed consistent, and &lt;strong&gt;only the normals and specular looked broken&lt;/strong&gt; — which made it hard to isolate.&lt;/p&gt;

&lt;p&gt;The fix was one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="n"&gt;materialsBuffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reorderInPlace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;fromSourceIndices&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lesson: when you bolt a "per-element parallel buffer" onto someone else's pipeline, fix &lt;strong&gt;every&lt;/strong&gt; place the source data gets reordered.&lt;/p&gt;

&lt;h1&gt;
  
  
  Bug 2: I was re-lighting the diffuse with environment light (don't)
&lt;/h1&gt;

&lt;p&gt;Even after fixing normals, the body was a "watery," patchy yellow.&lt;/p&gt;

&lt;p&gt;I'd written the diffuse term by the textbook as &lt;code&gt;albedo × irradiance&lt;/code&gt; (multiplying by #5's diffuse image). But Ref-Gaussian's equation uses &lt;strong&gt;base_color directly&lt;/strong&gt; for diffuse — &lt;strong&gt;only the specular is relit.&lt;/strong&gt; I was painting a pattern onto the cream-colored body with my own irradiance. Worse, I was tinting the specular F0 with the &lt;strong&gt;view-dependent color&lt;/strong&gt; (capture-time reflections baked in) instead of the learned albedo.&lt;/p&gt;

&lt;p&gt;Matching the reference equation exactly fixed it. Lesson: before improvising "correct" PBR, &lt;strong&gt;read the reference implementation's source and match it line by line.&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Bug 3: the car is on its side (Z-up vs Y-up)
&lt;/h1&gt;

&lt;p&gt;When I trained and loaded a reflective car, it rendered &lt;strong&gt;90° on its side.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of guessing, I measured the point cloud's bounding box: the shortest axis was Z (= height), the longest was Y (= length), and the dark tire splats were on the −Z side. The data was &lt;strong&gt;Z-up&lt;/strong&gt; (#7, Blender-family). The viewer assumed &lt;strong&gt;Y-up.&lt;/strong&gt; The 90° offset that didn't show on a round helmet was suddenly exposed by the car.&lt;/p&gt;

&lt;p&gt;Adding a Z-up→Y-up correction (−90° about X) to the camera fixed the car. Then a second head sprouted: now the &lt;strong&gt;background environment was 90° off.&lt;/strong&gt; equirect (#4) assumes Y-up, but the skybox rays and reflections are computed in the scene's Z-up frame.&lt;/p&gt;

&lt;p&gt;The fix: convert the &lt;strong&gt;sample direction&lt;/strong&gt; for the environment into the environment's Y-up frame, and rotate the slider about the scene's up-axis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;environmentRotation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Rx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="err"&gt;°&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;Rz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slider&lt;/span&gt; &lt;span class="n"&gt;angle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skybox and reflections sample with the same matrix, so they always match. I verified the mapping numerically before building, and validated the skybox itself by an offline render that reconstructs rays from the inverse view-projection. That also caught an old top-bottom flip (double-flipped) I'd previously baked into the HDR.&lt;/p&gt;

&lt;h1&gt;
  
  
  Bug 4: a "successful" build that runs old code
&lt;/h1&gt;

&lt;p&gt;After the orientation fix, a report came in: "the car is still on its side." The offline render had already proven the math correct, so the running binary must be stale — but why?&lt;/p&gt;

&lt;p&gt;I'd been type-checking with &lt;code&gt;xcodebuild -destination platform=macOS&lt;/code&gt;. That only compiles the &lt;code&gt;#if os(macOS)&lt;/code&gt; branch and Mac architectures. When I built for the iOS simulator, existing code revealed a &lt;strong&gt;compile error&lt;/strong&gt;: I was assigning &lt;code&gt;Float16(x).bitPattern&lt;/code&gt; into a &lt;code&gt;[UInt16]&lt;/code&gt; array.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;arm64 (device): native &lt;code&gt;Float16&lt;/code&gt; exists, &lt;code&gt;bitPattern&lt;/code&gt; is &lt;code&gt;UInt16&lt;/code&gt; → compiles&lt;/li&gt;
&lt;li&gt;the simulator's x86_64 slice: &lt;code&gt;Float16&lt;/code&gt; falls back, &lt;code&gt;bitPattern&lt;/code&gt; is &lt;code&gt;UInt32&lt;/code&gt; → type error&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The iOS build was failing, so the device kept running the &lt;strong&gt;previous binary.&lt;/strong&gt; Holding &lt;code&gt;[Float16]&lt;/code&gt; directly fixed it.&lt;/p&gt;

&lt;p&gt;Lesson: &lt;strong&gt;type-check for the platform you ship to.&lt;/strong&gt; A macOS-only &lt;code&gt;xcodebuild&lt;/code&gt; will happily lie about your iOS app. And "nothing changes on device" is almost always a sign the &lt;strong&gt;binary isn't new&lt;/strong&gt; — suspect that before re-debugging your logic.&lt;/p&gt;




&lt;h1&gt;
  
  
  The methodology that actually worked
&lt;/h1&gt;

&lt;p&gt;What all four bugs share: &lt;strong&gt;establish ground truth before judging your output.&lt;/strong&gt; Early on, my offline numpy preview was overly smooth and misled me. What worked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run the &lt;em&gt;reference&lt;/em&gt; renderer (Ref-Gaussian's &lt;code&gt;eval.py&lt;/code&gt; or training-time visualizations) on the same asset and compare. If the reference is clean and yours is dirty, it's a bug in your renderer.&lt;/li&gt;
&lt;li&gt;Reproduce the transforms (normals, skybox rays) exactly offline and &lt;strong&gt;look at them&lt;/strong&gt; before building for device.&lt;/li&gt;
&lt;li&gt;Verify on a clean synthetic asset (a hand-made chrome sphere) to separate "renderer bug" from "asset bug."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every time I skipped this and settled for "looks fine," I was wrong.&lt;/p&gt;

&lt;h1&gt;
  
  
  Aside: is the look "correct"?
&lt;/h1&gt;

&lt;p&gt;After finishing, I felt "the car's reflections are dull, it doesn't look like it's reflecting the environment." That's &lt;strong&gt;not a bug — it's the material's nature.&lt;/strong&gt; To be clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The trained car is roughness ~0.2 = &lt;strong&gt;semi-gloss.&lt;/strong&gt; Semi-gloss reflects the environment &lt;strong&gt;blurrily&lt;/strong&gt; (not a mirror). So bright lighting appearing as a blurry blob is correct. &lt;strong&gt;Any renderer, lit by the same environment, shows it equally dull.&lt;/strong&gt; Even the official renderer's output (ground truth) shows a clean reconstruction (sharp car, smooth normals); the material is fine.&lt;/li&gt;
&lt;li&gt;Turning off the app's "Use trained material" and lowering roughness lets you &lt;strong&gt;override&lt;/strong&gt; all splats to a uniform specular, clearly reflecting the room. That's flashy but &lt;strong&gt;not present on a real car.&lt;/strong&gt; ON = the learned real material, OFF = a manual override.&lt;/li&gt;
&lt;li&gt;Note that material decomposition has an inherent &lt;strong&gt;ambiguity&lt;/strong&gt;: a blue car's albedo can decompose as yellowish (splitting blue into "blue light + yellow material"). This is normal in inverse rendering; the rendered result itself is clean, so the harm is small, but it's not a "physically perfect decomposition."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So "dull look = the material behaving correctly," not an error in the iOS renderer.&lt;/p&gt;

&lt;h1&gt;
  
  
  Result
&lt;/h1&gt;

&lt;p&gt;A reflective Ref-Gaussian scene, relit in real time on iPhone: switch and rotate the HDR environment and the reflections and skybox follow. The base is MetalSplatter, the relighting model is Ref-Gaussian, and split-sum IBL is from UE4.&lt;/p&gt;

&lt;p&gt;Code, demo, details: &lt;a href="https://github.com/john-rocky/MetalGaussianSplatRelighting" rel="noopener noreferrer"&gt;https://github.com/john-rocky/MetalGaussianSplatRelighting&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next: light the splats from the &lt;em&gt;actual&lt;/em&gt; environment via ARKit's environment probe — place a relightable object in your own room and reflect the room.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Implemented in Swift + Metal / iOS. Credits: &lt;a href="https://github.com/scier/MetalSplatter" rel="noopener noreferrer"&gt;MetalSplatter&lt;/a&gt; (Sean Cier, MIT), &lt;a href="https://github.com/fudan-zvg/ref-gaussian" rel="noopener noreferrer"&gt;Ref-Gaussian&lt;/a&gt;, HDRIs from &lt;a href="https://polyhaven.com" rel="noopener noreferrer"&gt;Poly Haven&lt;/a&gt; (CC0). Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/3c452476e66bdc29c290" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>metal</category>
      <category>graphics</category>
      <category>3d</category>
    </item>
    <item>
      <title>Adding UI to Google Colab: forms, sliders, buttons and more</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 06:11:49 +0000</pubDate>
      <link>https://dev.to/john-rocky/adding-ui-to-google-colab-forms-sliders-buttons-and-more-1ad2</link>
      <guid>https://dev.to/john-rocky/adding-ui-to-google-colab-forms-sliders-buttons-and-more-1ad2</guid>
      <description>&lt;h2&gt;
  
  
  Adding UI to Colab
&lt;/h2&gt;

&lt;p&gt;You can &lt;strong&gt;show UI&lt;/strong&gt; in a Google Colaboratory notebook. Input forms, buttons, and so on are handy when other people use your notebook. &lt;strong&gt;A form's value is reflected into the cell's variable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://colab.research.google.com/drive/1mTkZQ0daeGyvVtHYoprznAP_tnYl2rJD?usp=sharing" rel="noopener noreferrer"&gt;&lt;strong&gt;Live Colab notebook sample&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3scwqbclptinvbcznnmk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3scwqbclptinvbcznnmk.png" width="600" height="870"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkoluqgr3sex1bx89qy2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkoluqgr3sex1bx89qy2.png" width="600" height="208"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Cell title
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#@title cell title
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ncxa8i2r0bpoubxxnki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ncxa8i2r0bpoubxxnki.png" width="600" height="189"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Input form
&lt;/h3&gt;

&lt;p&gt;You can reflect the form's content into a variable in the cell.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;variable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;the form is reflected into the variable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;#@param {type:"string"}
# for a number: #@param {type:"number"}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyyfiz1n6pose0rwprxot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyyfiz1n6pose0rwprxot.png" width="600" height="102"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Select box
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dropdown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;#@param ["1st option", "2nd option", "3rd option"] {allow-input: true}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov0m7nutubtai8yzz2yp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov0m7nutubtai8yzz2yp.png" width="600" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Date input
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;date_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2018-03-22&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;#@param {type:"date"}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foo2ta01vgj030xf8dhu1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foo2ta01vgj030xf8dhu1.png" width="600" height="126"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Slider
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;number_slider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt; &lt;span class="c1"&gt;#@param {type:"slider", min:-1, max:1, step:0.1}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmvousjymvr4pyu8io33.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmvousjymvr4pyu8io33.png" width="600" height="100"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Checkbox
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;boolean_checkbox&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt; &lt;span class="c1"&gt;#@param {type:"boolean"}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgl26xke8gzj1vjrmfcec.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgl26xke8gzj1vjrmfcec.png" width="600" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Markdown
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#@markdown ---
#@markdown #Big
#@markdown ###Middle
#@markdown #####Little
#@markdown ---
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hfjkbo24y6fyr5qt3kv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hfjkbo24y6fyr5qt3kv.png" width="600" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  A button via the DOM
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;IPython.display&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;display&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Javascript&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.colab&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.colab.output&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;eval_js&lt;/span&gt;

&lt;span class="n"&gt;js&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Javascript&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
            async function load_image() {
                const div = document.createElement(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;);
                var button = document.createElement(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;button&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;);
                var log = document.createElement(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;);

                button.textContent = &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;button&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;;
                button.onclick = function(){
                    log.innerHTML = &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Button Clicked.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;;
                }
                div.appendChild(button)
                div.appendChild(log)

                document.querySelector(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#output-area&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;).appendChild(div);
                return
                } &lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;display&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;js&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;eval_js&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;load_image()&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft29lqs44yc7sxihtwnw3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft29lqs44yc7sxihtwnw3.png" width="358" height="124"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/e5802cdd15dc2e34cb84" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. I build apps with Core ML and write about machine learning. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>colab</category>
      <category>machinelearning</category>
      <category>jupyter</category>
    </item>
    <item>
      <title>Make a 3D model on iPhone just by taking photos (RealityKit Photogrammetry)</title>
      <dc:creator>Daisuke Majima</dc:creator>
      <pubDate>Tue, 02 Jun 2026 06:06:47 +0000</pubDate>
      <link>https://dev.to/john-rocky/make-a-3d-model-on-iphone-just-by-taking-photos-realitykit-photogrammetry-aj9</link>
      <guid>https://dev.to/john-rocky/make-a-3d-model-on-iphone-just-by-taking-photos-realitykit-photogrammetry-aj9</guid>
      <description>&lt;h2&gt;
  
  
  Make a realistic 3D model just by taking photos
&lt;/h2&gt;

&lt;p&gt;Using Apple's tool, you can easily create 3D models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhed94xik8okn5g5gg67l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhed94xik8okn5g5gg67l.png" width="200" height="433"&gt;&lt;/a&gt; &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqybgbfwm3sq5y3dioix.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqybgbfwm3sq5y3dioix.png" width="800" height="647"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xrcw58vbxqg7yq6flwo.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xrcw58vbxqg7yq6flwo.gif" width="324" height="662"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;A turtle, and the captured turtle.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  I want realistic 3D models
&lt;/h2&gt;

&lt;p&gt;If you have plenty of usable 3D models, you can use them in AR and game apps. But (in my experience) freely usable 3D models for AR/games are surprisingly scarce — on download sites, the nice content is often paid.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making 3D models seems hard
&lt;/h2&gt;

&lt;p&gt;And making them yourself — modeling and converting in 3D software — seems difficult.&lt;/p&gt;

&lt;h2&gt;
  
  
  With Apple's RealityKit, just taking photos makes a model
&lt;/h2&gt;

&lt;p&gt;In 2021 Apple released a tool that builds a 3D model just from photos. It's fairly easy and produces realistic objects. It recognizes the salient object and cleanly separates it from the background and floor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Method
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Take photos
&lt;/h3&gt;

&lt;p&gt;With a handheld camera (iPhone is fine), photograph the thing you want in 3D from every direction.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhed94xik8okn5g5gg67l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhed94xik8okn5g5gg67l.png" width="200" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I simply put a stuffed animal on the carpet and shot 360° from the side and from above, covering it like a hemisphere. The conditions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;in as bright a place as possible&lt;/li&gt;
&lt;li&gt;with the whole subject as large in frame as possible&lt;/li&gt;
&lt;li&gt;shooting frequently so consecutive photos overlap by 70%+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I took 200 photos and gathered them into a folder on my Mac.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Download Apple's tool
&lt;/h3&gt;

&lt;p&gt;You can get it from the developer site:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developer.apple.com/documentation/realitykit/creating_a_photogrammetry_command-line_app/" rel="noopener noreferrer"&gt;https://developer.apple.com/documentation/realitykit/creating_a_photogrammetry_command-line_app/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Build it in Xcode and a &lt;code&gt;HelloPhotogrammetry&lt;/code&gt; file appears in the product folder; open it in Finder to find its location.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkzhn90v8rul2bnjmmudb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkzhn90v8rul2bnjmmudb.png" width="800" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Run
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&amp;lt;path to HelloPhotogrammetry&amp;gt; &amp;lt;input image folder&amp;gt; &amp;lt;output file path, with .usdz&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;* Absolute paths from root are required.&lt;/p&gt;

&lt;p&gt;This generates a USDZ model. With 200 photos it took about 20 minutes. Click the link below on an iPhone to try the turtle USDZ model in AR:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://firebasestorage.googleapis.com/v0/b/sincere-nirvana-292404.appspot.com/o/model.usdz?alt=media&amp;amp;token=96083418-ed43-435c-bce4-6fed085bbd7b" rel="noopener noreferrer"&gt;https://firebasestorage.googleapis.com/v0/b/sincere-nirvana-292404.appspot.com/o/model.usdz?alt=media&amp;amp;token=96083418-ed43-435c-bce4-6fed085bbd7b&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Toward richer 3D content
&lt;/h2&gt;

&lt;p&gt;Some people upload models made this way to 3D-model sharing sites. It'd be great if 3D models and content grew richer as lots of people scan lots of things.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://qiita.com/john-rocky/items/5de42f382f0fce950093" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. I build apps with Core ML and ARKit and write about ML/AR. &lt;a href="https://github.com/john-rocky" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://twitter.com/JackdeS11" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>ar</category>
      <category>swift</category>
      <category>3d</category>
    </item>
  </channel>
</rss>
