I built an open-source, on-device photo culler for macOS in Kotlin

#kotlin #compose #opensource #machinelearning

If you have ever shot an event, the painful part is not the shoot. It is sitting
down afterwards with three thousand near-identical frames and deciding, one arrow
key at a time, which single frame of each burst is the keeper.

I am a Kotlin engineer, not a photographer. But I kept watching people I know
grind through this with tools that either cost a subscription or quietly upload
their clients' photos to some cloud. So I built Rhenium - a free,
open-source, 100% on-device photo culler for macOS. No account, no cloud, no
telemetry. The photos never leave your machine.

It is MIT-licensed on GitHub and built
entirely in Kotlin + Compose for Desktop. This post is the engineering side
of it - the decisions that worked, and the ones that bit me.

The core idea: one decision per moment, not per frame

A burst of 8 frames of the same moment should not be 8 decisions. Rhenium groups
near-identical frames into a single tile, suggests the sharpest one as the pick,
and lets you cull the whole moment with one keystroke. Everything is
keyboard-first, because culling is a flow state and reaching for the mouse breaks
it.

There are two grouping lenses:

Time - groups by capture time and camera (a classic burst detector).
Similarity - groups by what the frames actually look like, using an on-device vision model.

On-device similarity, for real

The Similarity lens runs a MobileNetV3-Small backbone (classifier stripped)
through ONNX Runtime, entirely locally. The model is ~6 MB and ships inside
the app. For each photo I compute an embedding, cache it to disk (keyed by
content and model id, so edits and model swaps invalidate it), and group adjacent
frames whose embeddings are close.

Three things I learned the hard way here.

A fixed similarity threshold is bad. A clean event with distinct shots and a
rapid-fire burst have completely different cosine-distance distributions. A single
global cutoff over-groups one and splits the other. So the threshold is adaptive
per event - derived from each contiguous run's own adjacent-pair distance spread,
behind a small ThresholdRule seam. On a labelled real-wedding set this moved F1
from 0.61 to 0.70, better precision and recall, and it stays unsupervised (it
reads only the cosine spread, never the labels).

Capture time is good corroborating evidence, but a bad primary signal. I added
a JoinRule that joins two frames when the visual cut clears, or when they were
shot within ~3 seconds and clear a relaxed floor. That recovers same-moment bursts
whose embedding drifted (a zoom or framing shift) that the visual cut alone would
split. The 3s window is deliberately tight - a wider one regressed the clean
events in leave-one-event-out validation. And time can only ever add a join, never
block one, so a photo with no EXIF time just falls back to visual-only.

Do not reuse the embedding decode for sharpness. Variance-of-Laplacian (the
sharpness metric) is per-pixel, so scoring it on the tiny 224px embedding decode
hid real focus differences. Sharpness gets its own 768px canvas. Sharing the
decode was a "clever" optimisation that silently picked blurry keepers.

The platform reality: HEIC, RAW, and a JNA bridge

Here is one that surprised me: skiko (Compose's Skia binding) cannot decode
HEIC/HEIF. I verified it by probe - Image.makeFromEncoded just throws. And
there is no maintained, cross-platform JVM HEIC library on Maven that I could find
(the commonly cited org.bytedeco:libheif does not exist; FFmpeg was rejected for
DMG bloat).

So HEIC, and camera RAW, decode through a JNA bridge into macOS's own ImageIO
frameworks. It sits behind a PhotoDecoder interface and is registered only on
macOS, so a future Windows build slots its own decoder in without touching
callers. One trap I had to document in the code: RAW must decode by file path, not
from a byte buffer. Hand Sony ARW bytes and you get an empty image; hand Nikon NEF
bytes and it silently downgrades to the embedded thumbnail.

This is also why Rhenium is macOS-only today. The decoders lean on Apple
frameworks. Windows support is on the roadmap, but it is real work, not a flag.

Packaging a JVM desktop app without it being 200 MB

Shipping a JVM app as a real native bundle is jpackage plus a trimmed jlink
runtime. Three things that cost me time.

A module you only reach via reflection must be in the jlink module list. The
update checker uses HttpClient (java.net.http). It worked under gradle run
and threw ClassNotFoundException in the packaged app, because the trimmed runtime
did not include the module. Invisible until you test the packaged app.

ProGuard keep-rules are load-bearing. The release DMG is minified, which
tree-shakes the whole classpath. Anything reached only via reflection, JNI or
codegen - ONNX's native bindings, the JNA bridge, kotlinx.serialization's generated
serializers - survives only because a keep rule says so. A missing keep builds
clean and breaks at runtime. ONNX even fails silently (it falls back to a classical
embedder), so I validate keeps against the packaged app, not gradle run.

The ONNX Runtime jar is fat. The published artifact bundles every platform's
natives plus debug symbols. I added a Gradle task that repackages it down to just
the macOS dylibs before it goes into the DMG, which is a big size win.

A few architecture notes

Clean architecture, single module. domain (entities and use cases, no framework deps) -> data (decoders, repositories, the ML pipeline) -> presentation (Compose and view models). Manual DI, no framework, one AppContainer wires everything.
An Atomic-Design system (atoms/molecules/organisms plus design tokens) so screens compose from shared pieces instead of inlining literals.
Value classes for index spaces. The grid juggles flat photo indices (navigation and persistence) and tile indices (focus and selection). Mixing them up was a recurring bug, so they are now distinct @JvmInline value classes - FlatIndex and TileIndex - and mixing them is a compile error.
Headless screenshot tests. There is no Layout Inspector on desktop, so UI changes are verified by rendering Compose to a bitmap and eyeballing the PNG. It catches ContentScale and bitmap-conversion bugs that pixel asserts on intermediate buffers miss entirely.

The honest part

Two caveats I would rather you hear from me.

It is macOS-only for now (the HEIC/RAW decoders above).

It is not notarized yet. I do not have an Apple Developer account ($99/year),
so on first launch macOS will block it. You right-click -> Open, or the Homebrew
cask handles it more smoothly. I have put up GitHub Sponsors specifically to fund
notarization so the next release installs in one click. If that is useful to you,
it directly buys everyone a cleaner install.

Try it, or tear it apart

Site and download: https://vgupta98.github.io/rhenium/
Source (MIT): https://github.com/vgupta98/rhenium
Install via Homebrew: brew install --cask vgupta98/tap/rhenium

This is my first real launch, so I would genuinely like feedback - on the culling
UX, the grouping quality, the on-device ML choices, or the packaging. The repo is
open. Issues and stars are both welcome, and I am happy to answer anything in the
comments.