DEV Community

Jambo
Jambo

Posted on • Originally published at techcommunity.microsoft.com

3

Running Phi-3-vision via ONNX on Jetson Platform

This article aims to run the quantized Phi-3-vision model in ONNX format on the Jetson platform and successfully perform inference for image+text dialogue tasks.

What is Jetson?

Jetson is a series of small arm64 devices launched by NVIDIA, equipped with powerful GPU computing capabilities. It is designed for edge computing and AI applications. Running on a Linux system, Jetson can handle complex computing tasks with low power consumption, making it ideal for developing embedded AI and machine learning projects.

Why ONNX Runtime?

ONNX Runtime is a high-performance inference engine for executing ONNX (Open Neural Network Exchange) models. It provides a simple way to run large language models like Llama, Phi, Gemma, and Mistral via the onnxruntime-genai API.

Running the Phi-3-vision Model

In this project, I ran the Phi-3-vision model on the Jetson platform using ONNX Runtime. Here’s a sneak peek at the results.

Inference Speed and Resource Utilization

By running the Phi-3-vision model in ONNX format on the Jetson Orin Nano with the Int4 quantized model, I achieved remarkable performance metrics. The Python process utilized 5.4 GB of GPU memory while keeping the CPU load minimal and almost fully utilizing the GPU.

Resource Utilization

The inference speed was impressively fast, even on a device with just a 15W power consumption.

Inference Speed

Example Output

Here’s an example of running the Phi-3-vision model on the Jetson platform. Given an image and a prompt, the model successfully converted the image to markdown format:

  • Input Image:

    table.png

  • Prompt:

    Convert this image to markdown format
    
  • Output:

    | Product             | Qtr 1    | Qtr 2    | Grand Total |
    |---------------------|----------|----------|-------------|
    | Chocolade           | $744.60  | $162.56  | $907.16     |
    | Gummibarchen        | $5,079.60| $1,249.20| $6,328.80   |
    | Scottish Longbreads | $1,267.50| $1,062.50| $2,330.00   |
    | Sir Rodney's Scones | $1,418.00| $756.00  | $2,174.00   |
    | Tarte au sucre      | $4,728.00| $4,547.92| $9,275.92   |
    | Chocolate Biscuits  | $943.89  | $349.60  | $1,293.49   |
    | Total               | $14,181.59| $8,127.78| $22,309.37  |
    

    The table lists various products along with their sales figures for Qtr 1, Qtr 2, and the Grand Total. The products include Chocolade, Gummibarchen, Scottish Longbreads, Sir Rodney's Scones, Tarte au sucre, and Chocolate Biscuits. The Grand Total column sums up the sales for each product across the two quarters.

Conclusion

Running the Phi-3-vision model in ONNX format on the Jetson platform demonstrates the incredible potential of combining powerful AI models with efficient edge computing devices. The results are impressive, and the resource utilization is optimized for low-power devices.

👉 For a detailed step-by-step guide on setting up and running the Phi-3-vision model on Jetson, including preparation and installation, please visit the complete article here: Running Phi-3-vision via ONNX on Jetson Platform

If you're interested in AI and edge computing, don't miss out on this comprehensive tutorial!

Heroku

Deploy with ease. Manage efficiently. Scale faster.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (1)

Collapse
 
fosberg profile image
Fosberg-codex

Great. I am even using the ONNX runtime for a node project. I thinks it's underrated

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

👋 Kindness is contagious

If this post resonated with you, feel free to hit ❤️ or leave a quick comment to share your thoughts!

Okay