Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Performing semantic video search
Community Slack member Yashovardhan asked:
How can I do an embedding search on videos that are groups of image sequences?
Check out the Semantic Video Search plugin for FiftyOne that streamlines this process for you. With a single prompt, you can find exactly what you are looking for across every frame in your dataset!
Check out dozens of other plugins to help streamline your workflows.
Working with grouped datasets
Community Slack member Gantugs asked:
Is there a way to show images with five channels? For example microscopic images?
Yes! The way to accomplish this is to create a grouped dataset. In FiftyOne, grouped datasets contain multiple slices of samples of possibly different modalities (image, video, or point cloud) that are organized into groups. These grouped datasets can be used to represent multiview scenes, where data for multiple perspectives of the same scene can be stored, visualized, and queried in ways that respect the relationships between the slices of data.
Uniqueness of sample ids
Community Slack member Edward asked:
Is a
sample_id
unique for each sample regardless of which dataset or FiftyOne version the sample is from? For example, if I have multiple datasets with millions of samples, will there be a chance that asample_id
will be the same for two samples from different datasets, or will they always be unique?
FiftyOne’s sample_id
is a UUID in the strictest sense. FiftyOne ensures that these IDs are universally unique within a given dataset. However, be aware that samples can have the same ID between datasets. For example, if you clone a dataset:
print(dataset.first().id)
# 6594a24a6fdac5bcf12b5ce2
print(dataset.clone().first().id)
# 6594a24a6fdac5bcf12b5ce2
Specifying a color scheme for your dataset
Community Slack member Nadu asked:
I'm trying to visualize images and their segmentation masks. How can I specify that each class corresponds to a specific color?
You can configure the color scheme used by the FiftyOne App to render content by clicking on the color palette icon above the sample grid. The GIF below demonstrates how to:
- Configure a custom color pool from which to draw colors for otherwise unspecified fields/values
- Configure the colors assigned to specific fields in color by field mode
- Configure the colors used to render specific annotations based on their attributes in color by value mode
- Save the customized color scheme as the default for the dataset
Configuring persistent dataset options
Community Slack member Gantugs asked:
When using persistent datasets, where are they saved by default? Is there a way to configure that?
Recall that all of the FiftyOne data is being stored in a MongoDB backend. By default it stores the data in ~/.fiftyone/var/lib/mongo
Check out the “Configuring FiftyOne” section of the Docs. Here you find the configuration options that are available. You can change the database_dir
to a different drive or location and even set the database_uri
.
Top comments (0)