<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vishal Gupta</title>
    <description>The latest articles on DEV Community by Vishal Gupta (@vgupta98).</description>
    <link>https://dev.to/vgupta98</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4006559%2Fcec49528-ff86-4d1c-8a36-62b4deb565d5.jpg</url>
      <title>DEV Community: Vishal Gupta</title>
      <link>https://dev.to/vgupta98</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vgupta98"/>
    <language>en</language>
    <item>
      <title>I built an open-source, on-device photo culler for macOS in Kotlin</title>
      <dc:creator>Vishal Gupta</dc:creator>
      <pubDate>Sun, 28 Jun 2026 16:11:24 +0000</pubDate>
      <link>https://dev.to/vgupta98/i-built-an-open-source-on-device-photo-culler-for-macos-in-kotlin-473j</link>
      <guid>https://dev.to/vgupta98/i-built-an-open-source-on-device-photo-culler-for-macos-in-kotlin-473j</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flxe3muvfsikxkie50yq6.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flxe3muvfsikxkie50yq6.gif" alt="A preview of the app" width="600" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have ever shot an event, the painful part is not the shoot. It is sitting&lt;br&gt;
down afterwards with three thousand near-identical frames and deciding, one arrow&lt;br&gt;
key at a time, which single frame of each burst is the keeper.&lt;/p&gt;

&lt;p&gt;I am a Kotlin engineer, not a photographer. But I kept watching people I know&lt;br&gt;
grind through this with tools that either cost a subscription or quietly upload&lt;br&gt;
their clients' photos to some cloud. So I built &lt;strong&gt;Rhenium&lt;/strong&gt; - a free,&lt;br&gt;
open-source, &lt;strong&gt;100% on-device&lt;/strong&gt; photo culler for macOS. No account, no cloud, no&lt;br&gt;
telemetry. The photos never leave your machine.&lt;/p&gt;

&lt;p&gt;It is &lt;a href="https://github.com/vgupta98/rhenium" rel="noopener noreferrer"&gt;MIT-licensed on GitHub&lt;/a&gt; and built&lt;br&gt;
entirely in &lt;strong&gt;Kotlin + Compose for Desktop&lt;/strong&gt;. This post is the engineering side&lt;br&gt;
of it - the decisions that worked, and the ones that bit me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core idea: one decision per moment, not per frame
&lt;/h2&gt;

&lt;p&gt;A burst of 8 frames of the same moment should not be 8 decisions. Rhenium groups&lt;br&gt;
near-identical frames into a single tile, suggests the sharpest one as the pick,&lt;br&gt;
and lets you cull the whole moment with one keystroke. Everything is&lt;br&gt;
keyboard-first, because culling is a flow state and reaching for the mouse breaks&lt;br&gt;
it.&lt;/p&gt;

&lt;p&gt;There are two grouping lenses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time&lt;/strong&gt; - groups by capture time and camera (a classic burst detector).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Similarity&lt;/strong&gt; - groups by what the frames actually look like, using an
on-device vision model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  On-device similarity, for real
&lt;/h2&gt;

&lt;p&gt;The Similarity lens runs a &lt;strong&gt;MobileNetV3-Small&lt;/strong&gt; backbone (classifier stripped)&lt;br&gt;
through &lt;strong&gt;ONNX Runtime&lt;/strong&gt;, entirely locally. The model is ~6 MB and ships inside&lt;br&gt;
the app. For each photo I compute an embedding, cache it to disk (keyed by&lt;br&gt;
content and model id, so edits and model swaps invalidate it), and group adjacent&lt;br&gt;
frames whose embeddings are close.&lt;/p&gt;

&lt;p&gt;Three things I learned the hard way here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A fixed similarity threshold is bad.&lt;/strong&gt; A clean event with distinct shots and a&lt;br&gt;
rapid-fire burst have completely different cosine-distance distributions. A single&lt;br&gt;
global cutoff over-groups one and splits the other. So the threshold is adaptive&lt;br&gt;
per event - derived from each contiguous run's own adjacent-pair distance spread,&lt;br&gt;
behind a small &lt;code&gt;ThresholdRule&lt;/code&gt; seam. On a labelled real-wedding set this moved F1&lt;br&gt;
from 0.61 to 0.70, better precision and recall, and it stays unsupervised (it&lt;br&gt;
reads only the cosine spread, never the labels).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capture time is good corroborating evidence, but a bad primary signal.&lt;/strong&gt; I added&lt;br&gt;
a &lt;code&gt;JoinRule&lt;/code&gt; that joins two frames when the visual cut clears, or when they were&lt;br&gt;
shot within ~3 seconds and clear a relaxed floor. That recovers same-moment bursts&lt;br&gt;
whose embedding drifted (a zoom or framing shift) that the visual cut alone would&lt;br&gt;
split. The 3s window is deliberately tight - a wider one regressed the clean&lt;br&gt;
events in leave-one-event-out validation. And time can only ever add a join, never&lt;br&gt;
block one, so a photo with no EXIF time just falls back to visual-only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not reuse the embedding decode for sharpness.&lt;/strong&gt; Variance-of-Laplacian (the&lt;br&gt;
sharpness metric) is per-pixel, so scoring it on the tiny 224px embedding decode&lt;br&gt;
hid real focus differences. Sharpness gets its own 768px canvas. Sharing the&lt;br&gt;
decode was a "clever" optimisation that silently picked blurry keepers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The platform reality: HEIC, RAW, and a JNA bridge
&lt;/h2&gt;

&lt;p&gt;Here is one that surprised me: &lt;strong&gt;skiko (Compose's Skia binding) cannot decode&lt;br&gt;
HEIC/HEIF.&lt;/strong&gt; I verified it by probe - &lt;code&gt;Image.makeFromEncoded&lt;/code&gt; just throws. And&lt;br&gt;
there is no maintained, cross-platform JVM HEIC library on Maven that I could find&lt;br&gt;
(the commonly cited &lt;code&gt;org.bytedeco:libheif&lt;/code&gt; does not exist; FFmpeg was rejected for&lt;br&gt;
DMG bloat).&lt;/p&gt;

&lt;p&gt;So HEIC, and camera RAW, decode through a &lt;strong&gt;JNA bridge into macOS's own ImageIO&lt;br&gt;
frameworks&lt;/strong&gt;. It sits behind a &lt;code&gt;PhotoDecoder&lt;/code&gt; interface and is registered only on&lt;br&gt;
macOS, so a future Windows build slots its own decoder in without touching&lt;br&gt;
callers. One trap I had to document in the code: RAW must decode by file path, not&lt;br&gt;
from a byte buffer. Hand Sony ARW bytes and you get an empty image; hand Nikon NEF&lt;br&gt;
bytes and it silently downgrades to the embedded thumbnail.&lt;/p&gt;

&lt;p&gt;This is also why Rhenium is macOS-only today. The decoders lean on Apple&lt;br&gt;
frameworks. Windows support is on the roadmap, but it is real work, not a flag.&lt;/p&gt;

&lt;h2&gt;
  
  
  Packaging a JVM desktop app without it being 200 MB
&lt;/h2&gt;

&lt;p&gt;Shipping a JVM app as a real native bundle is &lt;code&gt;jpackage&lt;/code&gt; plus a trimmed &lt;code&gt;jlink&lt;/code&gt;&lt;br&gt;
runtime. Three things that cost me time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A module you only reach via reflection must be in the &lt;code&gt;jlink&lt;/code&gt; module list.&lt;/strong&gt; The&lt;br&gt;
update checker uses &lt;code&gt;HttpClient&lt;/code&gt; (&lt;code&gt;java.net.http&lt;/code&gt;). It worked under &lt;code&gt;gradle run&lt;/code&gt;&lt;br&gt;
and threw &lt;code&gt;ClassNotFoundException&lt;/code&gt; in the packaged app, because the trimmed runtime&lt;br&gt;
did not include the module. Invisible until you test the packaged app.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ProGuard keep-rules are load-bearing.&lt;/strong&gt; The release DMG is minified, which&lt;br&gt;
tree-shakes the whole classpath. Anything reached only via reflection, JNI or&lt;br&gt;
codegen - ONNX's native bindings, the JNA bridge, kotlinx.serialization's generated&lt;br&gt;
serializers - survives only because a keep rule says so. A missing keep builds&lt;br&gt;
clean and breaks at runtime. ONNX even fails silently (it falls back to a classical&lt;br&gt;
embedder), so I validate keeps against the packaged app, not &lt;code&gt;gradle run&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ONNX Runtime jar is fat.&lt;/strong&gt; The published artifact bundles every platform's&lt;br&gt;
natives plus debug symbols. I added a Gradle task that repackages it down to just&lt;br&gt;
the macOS dylibs before it goes into the DMG, which is a big size win.&lt;/p&gt;

&lt;h2&gt;
  
  
  A few architecture notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clean architecture, single module.&lt;/strong&gt; &lt;code&gt;domain&lt;/code&gt; (entities and use cases, no
framework deps) -&amp;gt; &lt;code&gt;data&lt;/code&gt; (decoders, repositories, the ML pipeline) -&amp;gt;
&lt;code&gt;presentation&lt;/code&gt; (Compose and view models). Manual DI, no framework, one
&lt;code&gt;AppContainer&lt;/code&gt; wires everything.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An Atomic-Design system&lt;/strong&gt; (atoms/molecules/organisms plus design tokens) so
screens compose from shared pieces instead of inlining literals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Value classes for index spaces.&lt;/strong&gt; The grid juggles flat photo indices
(navigation and persistence) and tile indices (focus and selection). Mixing them
up was a recurring bug, so they are now distinct &lt;code&gt;@JvmInline value class&lt;/code&gt;es -
&lt;code&gt;FlatIndex&lt;/code&gt; and &lt;code&gt;TileIndex&lt;/code&gt; - and mixing them is a compile error.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headless screenshot tests.&lt;/strong&gt; There is no Layout Inspector on desktop, so UI
changes are verified by rendering Compose to a bitmap and eyeballing the PNG. It
catches &lt;code&gt;ContentScale&lt;/code&gt; and bitmap-conversion bugs that pixel asserts on
intermediate buffers miss entirely.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The honest part
&lt;/h2&gt;

&lt;p&gt;Two caveats I would rather you hear from me.&lt;/p&gt;

&lt;p&gt;It is &lt;strong&gt;macOS-only for now&lt;/strong&gt; (the HEIC/RAW decoders above).&lt;/p&gt;

&lt;p&gt;It is &lt;strong&gt;not notarized yet.&lt;/strong&gt; I do not have an Apple Developer account ($99/year),&lt;br&gt;
so on first launch macOS will block it. You right-click -&amp;gt; Open, or the Homebrew&lt;br&gt;
cask handles it more smoothly. I have put up GitHub Sponsors specifically to fund&lt;br&gt;
notarization so the next release installs in one click. If that is useful to you,&lt;br&gt;
it directly buys everyone a cleaner install.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it, or tear it apart
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Site and download:&lt;/strong&gt; &lt;a href="https://vgupta98.github.io/rhenium/" rel="noopener noreferrer"&gt;https://vgupta98.github.io/rhenium/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source (MIT):&lt;/strong&gt; &lt;a href="https://github.com/vgupta98/rhenium" rel="noopener noreferrer"&gt;https://github.com/vgupta98/rhenium&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install via Homebrew:&lt;/strong&gt; &lt;code&gt;brew install --cask vgupta98/tap/rhenium&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is my first real launch, so I would genuinely like feedback - on the culling&lt;br&gt;
UX, the grouping quality, the on-device ML choices, or the packaging. The repo is&lt;br&gt;
open. Issues and stars are both welcome, and I am happy to answer anything in the&lt;br&gt;
comments.&lt;/p&gt;

</description>
      <category>kotlin</category>
      <category>compose</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
