DEV Community

ARmedia
ARmedia

Posted on

Building a Smart Refrigerator with a $15 IoT Camera and SAM 3: Solving the "Warm Water" Problem

I've been battling with AI (Claude) for 14 hours a day. Couldn't be happier.

— Akio Shiki (@ar_akio) October 20, 2025

Black Friday Haul and the Warm Water Problem

Hi, I'm Akio, an engineer at an AI startup.

So, how did everyone spend Amazon Black Friday this year? As usual, I bulk-ordered sparkling water and bottled drinks purely because they were "on sale," and now my entryway is buried under a tower of cardboard boxes.

And then comes the inevitable: "I forgot to move them to the fridge, and now I'm stuck drinking warm water."

"Can technology solve this?" ...Half joking, half serious.

This time, as a PoC (Proof of Concept) with an eye toward future business applications, I built an object recognition system combining the ultra-cheap IoT microcontroller "ESP32" with Meta's latest model "SAM 3 (Segment Anything Model 3)."

Mountains of bottled water still sitting in cardboard boxes at the office

Why "water bottles in a refrigerator"? Because hidden within this seemingly trivial problem are technical challenges that any AI engineer can appreciate.

Why "Refrigerator × Water Bottles" Is the Perfect PoC

This might seem like a joke topic, but from a computer vision (CV) perspective, the inside of a refrigerator is an S-rank difficulty dungeon.

Here are the three reasons I chose this as my PoC target. These same challenges apply directly to industrial robotics and autonomous vehicles:

  1. The Difficulty of Transparent/Translucent Objects: Water bottles are transparent. Not only does the background show through, but they create complex light reflections from the interior lighting. Traditional CNN-based object detection often fails to capture the contours and processes them as part of the background.

  2. The "Transparent Shelf" Trap: Modern refrigerators have transparent plastic or glass shelves. Even powerful segmentation models like SAM can misidentify shelf edges as "object boundaries," or detect objects on lower shelves by seeing through the transparent shelf above.

  3. Challenging Lighting Conditions: The back of the fridge is dark; the front is bright. On top of the extreme contrast, bottle shadows fall on transparent shelves, and the system might mistake those "shadows" for actual objects.

The actual inside of the refrigerator

In other words: "If we can accurately segment water bottles in this harsh environment, most object recognition tasks in offices or factories will be a breeze."

System Architecture: $15 Edge Device Meets State-of-the-Art AI

The setup is extremely simple and low-cost:

  • Edge (Eyes): ESP32S3-CAM
  • Brain: Local PC Server (running Ryzen AI MAX+ 395 with Meta SAM 3)
  • Network: WiFi (HTTP POST)

The ESP32 itself doesn't have the horsepower to run SAM 3. It functions purely as a capture device, sending images to the server. The server handles inference and returns results (inventory count, mask images, etc.).

System architecture diagram

The Zero-Tuning Revelation: Fully AI-Driven Results

The achievement I want to emphasize most from this PoC isn't the recognition accuracy itself—it's that we made zero environment-specific customizations.

Typically, for demos like this, there's a temptation to "cheat" (or "optimize") by adjusting lighting or fixing the camera at an angle that's easy to recognize. For this system, we eliminated all of that.

  • Refrigerator-agnostic: Zero calibration for specific refrigerator models or shelf arrangements.
  • Object-agnostic: No pre-training on specific water bottle brands or shapes (round vs. square).
  • Camera position-free: No precise adjustment of camera placement or angle.

This means the transparent object recognition was achieved entirely through SAM 3's inference capabilities and automatic adjustments.

In traditional image processing development, you'd need tedious parameter tuning (heuristic craftsmanship) like "for this refrigerator's lighting conditions, the binarization threshold should be around X..."

But this time, we completely eliminated that human "overfitting to the environment." The fact that a pure foundation model demonstrated this level of environmental adaptability has huge implications for reducing deployment costs and accelerating rollout speed.

Results and Future Outlook

The result: water bottles inside the refrigerator were segmented with remarkable accuracy, unfazed by transparent shelves or reflections.

Water bottles in the refrigerator with clean segmentation masks (color-coded)

Now I might finally avoid the tragedy of having a packed fridge when I don't want a drink, and an empty one when I do... maybe.

Through this PoC, we confirmed that foundation models like SAM 3 can be a powerful solution to the classic challenge of "transparent object recognition."

As a startup, we're searching for the seeds of society-changing innovation through the accumulation of experiments like this—experiments that are relatable (and admittedly a bit ridiculous).

If you have thoughts or feedback on this article, or if you're an engineer thinking "I want to try this with my fridge!"—drop a comment below!

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.