ARmedia

Posted on Dec 18, 2025

SAM 3 Is Here: Meta's Latest Vision AI Can Now Understand Your Words

#ai #deeplearning #news

I've been battling with AI (Claude) for 14 hours a day. Couldn't be happier.

— Akio Shiki (@ar_akio) October 20, 2025

Meta Official Site

Hi there! I'm Akio, an engineer at an AI development startup.

In November 2025, Meta quietly dropped SAM 3 (Segment Anything Model 3). Have you had a chance to try it yet?

"Wait, didn't SAM 2 just come out?" "What's new this time?"

If you're asking these questions, you're not alone. But here's the thing—SAM 3 isn't just an incremental update or minor accuracy improvement. It represents a fundamental leap toward true multimodal segmentation, making the old "click to segment" workflow feel like ancient history.

In this article, I'll break down what makes SAM 3 so impressive based on the official GitHub and Hugging Face releases. And in the second half, I'll give you a sneak peek at our successful local implementation using the AMD Ryzen AI Max+ 395—complete with screenshots from our dev environment.

A Quick Refresher: What Is SAM?

Let's briefly look back at the Segment Anything Model lineage:

SAM 1 (2023): Introduced as a zero-shot model that could segment any object in an image with just clicks or bounding boxes. It revolutionized segmentation tasks overnight.
SAM 2 (2024): Extended capabilities to video, enabling object tracking across frames. This opened up new possibilities for video editing and analysis.

And now, we have SAM 3.

What Makes SAM 3 Revolutionary: 3 Key Advancements

After diving into the official repository (facebookresearch/sam3) and demos, the direction of evolution is crystal clear.

1. Just Tell It What You Want: Open Vocabulary Segmentation

This is the headline feature—and as an engineer, it's what excites me most.

Previous SAM versions required you to specify where to segment (via clicks or bounding boxes). SAM 3 natively understands text prompts.

For example, given street footage:

Type "red car" and it detects and masks every red car in the frame.
Say "yellow school bus" and it instantly identifies and tracks it.

This means detection, segmentation, and tracking are now fully unified. You no longer need to tell the AI where something is—it understands what you're describing and connects that to the visual information automatically.

Type "impala" and it segments only the impalas

2. A Unified Vision Foundation Across Images, Video, and 3D

SAM 3 completely breaks down the barrier between still images and video.

Using a shared vision backbone, it performs object detection on individual frames while maintaining consistent tracking across the temporal axis.

Even more exciting is the 3D reconstruction capability. Sometimes called "SAM 3D," this feature enables not just 2D segmentation but also estimation of an object's three-dimensional shape from images or video. This opens up real possibilities for XR (AR/VR) development and robotics applications.

SAM 3's architecture integrating image, video, and prompt processing
(Source: Meta AI GitHub)

3. Impressive Efficiency and Optimization

More features usually means heavier models, but SAM 3 bucks this trend with optimized inference efficiency. Benchmarks on Meta's "SA-Co Dataset" show it outperforming previous models in accuracy while being designed with edge device deployment in mind.

The Startup Perspective: Why SAM 3 Matters

For AI startups like ours, SAM 3 means dramatically faster development cycles.

Previously, detecting specific objects—say, a particular component in a factory or a specific crop variety—required collecting massive labeled datasets and fine-tuning dedicated models (like YOLO variants).

With SAM 3's open vocabulary capability, you can simply prompt "damaged component" or "ripe tomato" and get high-accuracy detection and segmentation with zero-shot inference—no additional training required.

This has the potential to compress PoC timelines from months to days. For startups where speed-to-proposal is everything, this is an incredibly powerful advantage.

Sneak Peek: Our Local Implementation

Now for the technical deep-dive. Our lab has already completed a local implementation of SAM 3.

Running on-premises rather than through cloud APIs is crucial for security, latency, and cost considerations.

Our hardware of choice: the beast that is the AMD Ryzen AI Max+ 395.

CPU: 16-core Zen 5 (Strix Halo)
Memory: 128GB LPDDR5x (8000MT/s)
Overall TOPS: Up to 126 TOPS

Conventional wisdom says running massive models like SAM 3 requires expensive GPU servers like the H100. However, by leveraging the Ryzen AI's unified memory architecture with its generous memory capacity, we've successfully run SAM 3 smoothly in a local environment without sending any data to the cloud.

That said, the Ryzen AI Max+ 395's main strength is its unified memory architecture, making it ideal for memory-hungry workloads like running gpt-oss-120b locally at low cost. For use cases like this one that require lightweight memory footprint plus speed, an NVIDIA GPU (including consumer-grade options) is probably the better fit.

Running SAM 3 locally on the Ryzen AI Max+ 395. Inference is impressively fast.

We're currently developing a system that integrates this setup with IoT devices (edge cameras) for real-time "segment by description" detection.

The detailed implementation guide for this SAM 3 × Ryzen AI Max+ 395 × IoT stack—including code and benchmark results—will be covered in an upcoming article. Follow along to stay updated!

Conclusion: Vision AI Enters the "Understanding" Phase

SAM 3 isn't just a segmentation tool. It's a model that serves as "eyes" capable of understanding the world through language and perceiving it spatially.

GitHub: facebookresearch/sam3
Hugging Face: facebook/sam3
Demo: Segment Anything Demo

I encourage you to try the official demo yourself. Prepare to be impressed by the accuracy. AI is evolving faster than most of us realize.

If you have thoughts, feedback, or specific requests like "I'd love to know more about X aspect of the Ryzen AI implementation!"—drop a comment below. I'll take your input into account for the next article!

We're hiring! We're looking for engineers who want to tackle real-world AI implementation with cutting-edge technology. Interested? Check out the link in my profile!

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.