Table of Contents
Introduction
What is a problem ?
Diving to the SAM model structure
Export SAM to ONNX - the right way
Export the i...
For further actions, you may consider blocking this person and/or reporting abuse
Thanks so much for sharing this!
Thank you! so much!
Hi, I tried to export onnx file for the vit_h model by modifying the line:
sam = sam_model_registry["vit_b"](checkpoint="./sam_vit_b_01ec64.pth")
to
sam = sam_model_registry["vit_h"](checkpoint="./checkpoint/sam_vit_h_4b8939.pth")
It then generate 455 files. Some of them are:
And the encoder onnx file is only about 1Mb size (vs 350Mb of the vit_b )
Did I miss changing anything in the script?
Encountered the same issue, turns out its an onnx limitation, models over 2Gb will be exported like this. It's still usable however, I followed the rest of the tutorial and all worked. As a workaround I went and quantized the split model and received a nice 600mb encoder in onnx format, as far as I know the quality loss should be minimal. You can give this a try:
Or simply use one of the smaller SAM models.
Thank you!
I have a problem and I wonder if you can help me solve it?
Hi, could you provide more context? If you share your code and the input image, resulted in this error, then I can say more.
At the first glance in looks like the input tensor for SAM decoder (the image embeddings, returned by the encoder) has incorrect shape. It should be (1,256,64,64), but can confirm this only after seeing the code.
Hello, I have solved this problem, may I ask how SAM supports the input of multiple bounding boxes?
As far as I know, it's impossible now due to the SAM model limitations. You can read some workarounds here: github.com/facebookresearch/segmen...
However, you can partially emulate this, by specifying 4 points, that belong to the object and located close to its corners. So, you can specify 8 points to define two boxes. It's not the same as two real bounding boxes, but better than nothing.
If follow the sample, provided in this article, to detect both dog and cat on the image, using the points of top left and bottom right corners of these objects, you can use the following code:
If run this in the sam_onnx_inference.ipynb notebook, the output should be like this:
This is great! Thank you!
One question. In the section Encode the prompt , what does the line
input_labels = np.array([2,3])
mean when the input is a bounding box? In the official instruction, I didn't see any label required for box input.Each coordinate (x,y) should have a label, so it means that top left corner of bounding box has label 2 and bottom right corner has label 3.
Thanks!
Great works, thanks.
Can the automatic mask generator be exported to onnx though?
Hello, thank you.
The automatic mask generator is not a different model, that can be exported to ONNX. It's a Python class, that uses the same model many times for different points of the image and combines the resulted masks.
Is there a guide on how to use that in the best way?
github.com/facebookresearch/segmen...
Also, the source code of SamAutomaticMaskGenerator class can help to understand how it exactly works: github.com/facebookresearch/segmen...
Yes I saw that, but how is it done using the onnx type files? Maybe can you make a quide on that too?
Is it possible to optimize the encoder for GPU?
Wanted to ask if SAM model can take tiled images (like openseadragon themed images) , If yes , can you provide some resources or references to apply that?
Thanks,
The SAM does not take images, it takes tensors of 1024x1024x3 size. The image should be converted to a tensor before passing to the SAM model. I am not familiar with OpenSeadragon images, but if they can be exported to standard PNG or JPG and then converted to the tensors, as described in this article, then yes.
I think it is time to try the new SAM2, we need your guidance! <3