Author: Prerna Dhareshwar (Machine Learning / Customer Success at Voxel51)
Segment Anything 2 (SAM 2) which was released on July 29th, 2024 represents a major leap forward in segmentation technology, offering cutting-edge performance in both images and videos. Building on the foundation of the original Segment Anything, which Meta released in April 2023, SAM 2 not only enhances image segmentation but also introduces advanced video capabilities. With SAM 2, users can achieve precise segmentation and tracking in video sequences using simple prompts—like bounding boxes or points—from a single frame. This enhanced functionality opens up exciting new possibilities for a wide array of video applications.
In this post you will see how to load and apply SAM 2 models on both images as well as videos in FiftyOne.
Using SAM 2 in FiftyOne for Images
FiftyOne makes it easy for AI builders to work with visual data. With SAM 2 in FiftyOne you can now seamlessly generate segmentation labels and visualize them on your datasets. With just a few simple commands you can download SAM 2 models and run inference on your FiftyOne datasets directly from the FiftyOne Model Zoo, which contains a collection of pretrained models.
To get started, ensure that you have FiftyOne installed:
pip install fiftyone
You also need to install SAM 2 using the instructions in the segment-anything-2 github repository.
The following code snippet demonstrates how you can load a dataset in Fiftyone and provide bounding box prompts to a SAM 2 model to generate segmentations.
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"quickstart", max_samples=25, shuffle=True, seed=51
)
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")
# Prompt with boxes
dataset.apply_model(
model,
label_field="segmentations",
prompt_field="ground_truth",
)
We can now look at our data with the segmentation labels created by SAM 2.
session = fo.launch_app(dataset)
We can see that the inference of the SAM 2 model prompted with bounding box detections are stored under the field segmentations
.
You can also prompt with keypoints instead of bounding boxes. To do this, we first filter the images in the quickstart
dataset that contain the label person
.
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset("quickstart")
dataset = dataset.filter_labels("ground_truth", F("label") == "person")
Next we need to generate keypoints on this dataset. We can use another FiftyOne Zoo model to generate these keypoints.
# Generate some keypoints
model = foz.load_zoo_model("keypoint-rcnn-resnet50-fpn-coco-torch")
dataset.default_skeleton = model.skeleton
dataset.apply_model(model, label_field="gt")
Let us look at this dataset and the keypoints that were generated.
session = fo.launch_app(dataset)
Now we can run a SAM 2 model on this dataset using the keypoints field gt_keypoints
to prompt the model.
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")
# Prompt with keypoints
dataset.apply_model(
model,
label_field="segmentations",
prompt_field="gt_keypoints",
)
session = fo.launch_app(dataset)
You can also use SAM 2 to automatically generate masks for the whole image without any prompts!
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"quickstart", max_samples=5, shuffle=True, seed=51
)
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")
# Automatic segmentation
dataset.apply_model(model, label_field="auto")
session = fo.launch_app(dataset)
Using SAM 2 in FiftyOne for Video
SAM 2’s video segmentation and tracking capabilities make the process of propagating masks from one frame to another seamless. Let’s load a video dataset and only retain the bounding boxes for the first frame.
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset("quickstart-video", max_samples=2)
# Only retain detections on the first frame of each video
(
dataset
.match_frames(F("frame_number") > 1)
.set_field("frames.detections", None)
.save()
)
session = fo.launch_app(dataset)
We see that only the first frame has annotations retained. So now we can use this prompt to generate segmentations using SAM 2 for the first frame and propagate it to all the frames of the video. It is as simple as calling apply_model
on the dataset.
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-video-torch")
# Prompt with boxes
dataset.apply_model(
model,
label_field="segmentations",
prompt_field="frames.detections", # Can be a detections or a keypoint field
)
session = fo.launch_app(dataset)
SAM 2’s segmentation and tracking capabilities in videos are very powerful. In this tutorial we have used the sam2_hiera_tiny
model but you can use any of the following models now available in the Fiftyone Model Zoo:
Image models:
segment-anything-2-hiera-tiny-image-torch
segment-anything-2-hiera-small-image-torch
segment-anything-2-hiera-base-plus-image-torch
Segment-anything-2-hiera-large-image-torch
Video models:
segment-anything-2-hiera-tiny-video-torch
segment-anything-2-hiera-small-video-torch
segment-anything-2-hiera-base-plus-video-torch
Segment-anything-2-hiera-large-video-torch
Conclusion & Next Steps
In this tutorial we showed you how to, with just a few commands, download SAM 2 models and run inference on your FiftyOne image or video datasets. If you’d like to learn more, here are a few ways to get started:
- Join the 3000+ AI builders in the FiftyOne Community Slack. This is the place to ask questions and get answers from fellow developers and scientists working on Visual AI in production.
- Attend one of our Getting Started Workshops that cover all the topics you need to get up and running with FiftyOne and your datasets and models.
- Hit up the FiftyOne GitHub repo to find everything you need to use FiftyOne for your Visual AI projects.
Top comments (0)