Encountered the same issue, turns out its an onnx limitation, models over 2Gb will be exported like this. It's still usable however, I followed the rest of the tutorial and all worked. As a workaround I went and quantized the split model and received a nice 600mb encoder in onnx format, as far as I know the quality loss should be minimal. You can give this a try:
import onnxruntime
from onnxruntime.quantization import QuantType
from onnxruntime.quantization.quantize import quantize_dynamic
out_onnx_model_quantized_path = "vit_h_encoder_quantized.onnx"
input_onnx_encoder_path = "vit_h_encoder.onnx" # Path to the encoder onnx file, all the other blocks files must be in the same dir!
quantize_dynamic(
model_input="input_onnx_encoder_path ",
model_output=onnx_model_quantized_path,
optimize_model=True,
per_channel=False,
reduce_range=False,
weight_type=QuantType.QUInt8,
)
Encountered the same issue, turns out its an onnx limitation, models over 2Gb will be exported like this. It's still usable however, I followed the rest of the tutorial and all worked. As a workaround I went and quantized the split model and received a nice 600mb encoder in onnx format, as far as I know the quality loss should be minimal. You can give this a try:
Or simply use one of the smaller SAM models.
Thank you!