<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: NodeShift</title>
    <description>The latest articles on DEV Community by NodeShift (@nodeshiftcloud).</description>
    <link>https://dev.to/nodeshiftcloud</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F9342%2F593abc8f-fa38-4bf2-a081-fa5996a536d5.png</url>
      <title>DEV Community: NodeShift</title>
      <link>https://dev.to/nodeshiftcloud</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nodeshiftcloud"/>
    <language>en</language>
    <item>
      <title>A Step-by-Step Guide to Install Qwen3-Next 80B</title>
      <dc:creator>Aditi Bindal</dc:creator>
      <pubDate>Mon, 22 Sep 2025 07:15:58 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/a-step-by-step-guide-to-install-qwen3-next-80b-3dho</link>
      <guid>https://dev.to/nodeshiftcloud/a-step-by-step-guide-to-install-qwen3-next-80b-3dho</guid>
      <description>&lt;p&gt;If you're relentlessly following AI advancements, one thing can be clearly observed, the trend has been simple: go bigger. However, the new Qwen3-Next-80B series models challenges this paradigm by focusing on groundbreaking efficiency rather than just raw scale. This model represents a monumental leap forward, delivering the performance of a much larger model with a fraction of the computational cost. At its core is a revolutionary Hybrid Attention mechanism, to process ultra-long context lengths, natively supporting 262,144 tokens and extensible to over a million. This is paired with a High-Sparsity Mixture-of-Experts (MoE) architecture that keeps a staggering 80 billion total parameters on tap while only activating 3 billion at any given time. The result? Drastically reduced computational load, leading to inference speeds up to 10 times faster than its predecessors on long-context tasks. With additional enhancements like Multi-Token Prediction for accelerated performance and advanced stability optimizations, Qwen3-Next-80B proves its worth by outperforming models like Qwen3-32B with only 10% of the training cost and performing on par with models on key reasoning, coding, and alignment benchmarks.&lt;/p&gt;

&lt;p&gt;In this article we'll dive into the simple and straightforward walkthrough showing the installation, setup and usage of this model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;The minimum system requirements for running this model are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPU: 2x H200s or 4x H100s&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage: 1TB+ (preferable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VRAM: at least 160GB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda installed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step-by-step process to install Qwen3-Next-80B-A3B Locally
&lt;/h2&gt;

&lt;p&gt;For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by NodeShift since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Setting up a NodeShift Account
&lt;/h3&gt;

&lt;p&gt;Visit &lt;a href="https://app.nodeshift.com/sign-up" rel="noopener noreferrer"&gt;app.nodeshift.com&lt;/a&gt; and create an account by filling in basic details, or continue signing up with your Google/GitHub account.&lt;/p&gt;

&lt;p&gt;If you already have an account, &lt;a href="http://app.nodeshift.com" rel="noopener noreferrer"&gt;login&lt;/a&gt; straight to your dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" alt="Image-step1-1" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create a GPU Node
&lt;/h3&gt;

&lt;p&gt;After accessing your account, you should see a dashboard (see image), now:&lt;/p&gt;

&lt;p&gt;1) Navigate to the menu on the left side.&lt;/p&gt;

&lt;p&gt;2) Click on the &lt;strong&gt;GPU Nodes&lt;/strong&gt; option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" alt="Image-step2-1" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Click on &lt;strong&gt;Start&lt;/strong&gt; to start creating your very first GPU node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" alt="Image-step2-2" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Selecting configuration for GPU (model, region, storage)
&lt;/h3&gt;

&lt;p&gt;1) For this tutorial, we’ll be using 2x H200 GPU, however, you can choose any GPU as per the prerequisites.&lt;/p&gt;

&lt;p&gt;2) Similarly, we’ll opt for 5 TB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78ex5301m0jmnmar6gbe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78ex5301m0jmnmar6gbe.png" alt="Image-step3-1" width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Choose GPU Configuration and Authentication method
&lt;/h3&gt;

&lt;p&gt;1) After selecting your required configuration options, you’ll see the available GPU nodes in your region and according to (or very close to) your configuration. In our case, we’ll choose a 2x H200 140GB GPU node with 192vCPUs/504GB RAM/5TB SSD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3cbh4j66eg49118de0mi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3cbh4j66eg49118de0mi.png" alt="Image-step4-1" width="800" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Next, you'll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" alt="Image-step4-2" width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;The final step is to choose an image for the VM, which in our case is &lt;strong&gt;Nvidia Cuda&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" alt="Image-step5-1" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click &lt;strong&gt;Create&lt;/strong&gt; to deploy the node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" alt="Image-step5-2" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk810i78g0piq7z2jxu8j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk810i78g0piq7z2jxu8j.png" alt="Image-step5-3" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Connect to active Compute Node using SSH
&lt;/h3&gt;

&lt;p&gt;1) As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status &lt;strong&gt;Running&lt;/strong&gt; in green, meaning that our Compute node is ready to use!&lt;/p&gt;

&lt;p&gt;2) Once your GPU shows this status, navigate to the three dots on the right, click on &lt;strong&gt;Connect with SSH&lt;/strong&gt;, and copy the SSH details that appear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvw5wtv572xfkv01hoyp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvw5wtv572xfkv01hoyp.png" alt="Image-step6-1" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you copy the details, follow the below steps to connect to the running GPU VM via SSH:&lt;/p&gt;

&lt;p&gt;1) Open your terminal, paste the SSH command, and run it.&lt;/p&gt;

&lt;p&gt;2) In some cases, your terminal may take your consent before connecting. Enter ‘yes’.&lt;/p&gt;

&lt;p&gt;3) A prompt will request a password. Type the SSH password, and you should be connected.&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" alt="Image-step6-2" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the following command in the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Set up the project environment with dependencies
&lt;/h3&gt;

&lt;p&gt;1) Create a virtual environment using &lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda create -n qwen python=3.11 -y &amp;amp;&amp;amp; conda activate qwen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flriuy01pg5znv68e8o5c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flriuy01pg5znv68e8o5c.png" alt="Image-step7-1" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Install required dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 
pip install git+https://github.com/huggingface/transformers.git@main
pip install git+https://github.com/huggingface/accelerate
pip install huggingface_hub
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7x6myqz7f0exmlawxik.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7x6myqz7f0exmlawxik.png" alt="Image-step7-2" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Login to Hugging Face with HF READ token.&lt;/p&gt;

&lt;p&gt;This is a gated model, make sure to get access granted from the model card.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hf auth login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq8ft3orv1ku3qro1jdpn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq8ft3orv1ku3qro1jdpn.png" alt="Image-step7-3" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;4) Install and run jupyter notebook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda install -c conda-forge --override-channels notebook -y
conda install -c conda-forge --override-channels ipywidgets -y
jupyter notebook --allow-root
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;5) If you’re on a remote machine (e.g., NodeShift GPU), you’ll need to do SSH port forwarding in order to access the jupyter notebook session on your local browser.&lt;/p&gt;

&lt;p&gt;Run the following command in your local terminal after replacing:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_PORT&amp;gt;&lt;/code&gt; with the PORT allotted to your remote server (For the NodeShift server – you can find it in the deployed GPU details on the dashboard).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;PATH_TO_SSH_KEY&amp;gt;&lt;/code&gt; with the path to the location where your SSH key is stored.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_IP&amp;gt;&lt;/code&gt; with the IP address of your remote server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -L 8888:localhost:8888 -p &amp;lt;YOUR_SERVER_PORT&amp;gt; -i &amp;lt;PATH_TO_SSH_KEY&amp;gt; root@&amp;lt;YOUR_SERVER_IP&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnb4ojg5ic1gib6uigxtb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnb4ojg5ic1gib6uigxtb.png" alt="Image-step7-4" width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After this copy the URL you received in your remote server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz74xo2mne7xx5flisla6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz74xo2mne7xx5flisla6.png" alt="Image-step7-5" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And paste this on your local browser to access the Jupyter Notebook session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Download and Run the model
&lt;/h3&gt;

&lt;p&gt;1) Open a Python notebook inside Jupyter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhjeb8u2ttf96pxi3enag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhjeb8u2ttf96pxi3enag.png" alt="Image-step8-1" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Download the model checkpoints.&lt;/p&gt;

&lt;p&gt;To download the thinking model, just replace the model_name value with &lt;code&gt;"Qwen/Qwen3-Next-80B-A3B-Thinking"&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-Next-80B-A3B-Instruct"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype="auto",
    device_map="auto",
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgsnbi60mg9ihs7qyy6k2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgsnbi60mg9ihs7qyy6k2.png" alt="Image-step8-2" width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Run the model for inference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And for inferencing with the thinking model, use the following snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (&amp;lt;/think&amp;gt;)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content) # no opening &amp;lt;think&amp;gt; tag
print("content:", content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx2tngy25r69t5d6tkd11.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx2tngy25r69t5d6tkd11.png" alt="Image-step8-3" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Qwen3-Next-80B model represents a shift in AI development, prioritizing efficiency over raw scale through its Hybrid Attention and High-Sparsity Mixture-of-Experts (MoE) architecture. This allows it to achieve high performance with a fraction of the computational load, enabling it to handle massive context lengths and accelerate inference speeds. NodeShift Cloud plays a crucial role in making this advanced technology accessible and practical by providing a cost-effective, secured platform for deploying and running such compute-intensive models. By offering affordable GPU resources, NodeShift Cloud democratizes access to state-of-the-art AI, allowing developers and businesses to leverage the power of models like Qwen3-Next-80B without the prohibitive costs and infrastructure management typically associated with large-scale AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For more information about NodeShift:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/company/nodeshift/?%0Aref=blog.nodeshift.com" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/nodeshiftai?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/4dHNxnW7p7?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://app.daily.dev/nodeshift?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;daily.dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>opensource</category>
      <category>qwen</category>
    </item>
    <item>
      <title>How to Install &amp; Run EmbeddingGemma-300m Locally?</title>
      <dc:creator>Ayush kumar</dc:creator>
      <pubDate>Mon, 08 Sep 2025 09:32:42 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/how-to-install-run-embeddinggemma-300m-locally-223a</link>
      <guid>https://dev.to/nodeshiftcloud/how-to-install-run-embeddinggemma-300m-locally-223a</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s8jvagxzyu2c0p4fo5i.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s8jvagxzyu2c0p4fo5i.webp" alt=" " width="640" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;EmbeddingGemma-300M is Google DeepMind’s lightweight, multilingual (100+ languages) embedding model built on Gemma 3/T5Gemma foundations. It outputs 768-dim vectors (with Matryoshka down-projections to 512/256/128) optimized for retrieval, classification, clustering, semantic similarity, QA, and code retrieval. It’s designed for low-resource / on-device use, loads via SentenceTransformers, and does not support float16—use FP32 or bfloat16.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evaluation
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Benchmark Results
&lt;/h4&gt;

&lt;p&gt;The model was evaluated against a large collection of different datasets and metrics to cover different aspects of text understanding.&lt;/p&gt;

&lt;h4&gt;
  
  
  Full Precision Checkpoint
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flowbkwbpttefnyxmx696.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flowbkwbpttefnyxmx696.png" alt=" " width="728" height="804"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  QAT Checkpoints
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmolve0cxa1t19thcxcga.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmolve0cxa1t19thcxcga.png" alt=" " width="727" height="673"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: QAT models are evaluated after quantization&lt;/p&gt;

&lt;p&gt;Mixed Precision refers to per-channel quantization with int4 for embeddings, feedforward, and projection layers, and int8 for attention (e4_a8_f4_p4).&lt;/p&gt;

&lt;h3&gt;
  
  
  GPU/CPU Configuration Table
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftop29mw9z5evjqlj4uox.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftop29mw9z5evjqlj4uox.png" alt=" " width="730" height="781"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use the following prompts based on your use case and input data type. These may already be available in the EmbeddingGemma configuration in your modeling framework of choice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F267y5bqmbmpp76kliwoj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F267y5bqmbmpp76kliwoj.png" alt=" " width="730" height="1082"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;p&gt;Link: &lt;a href="https://huggingface.co/google/embeddinggemma-300m" rel="noopener noreferrer"&gt;https://huggingface.co/google/embeddinggemma-300m&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step-by-Step Process to Install &amp;amp; Run EmbeddingGemma-300m Locally
&lt;/h3&gt;

&lt;p&gt;For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Sign Up and Set Up a NodeShift Cloud Account
&lt;/h3&gt;

&lt;p&gt;Visit the &lt;a href="https://app.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;NodeShift Platform&lt;/a&gt; and create an account. Once you’ve signed up, log into your account.&lt;/p&gt;

&lt;p&gt;Follow the account setup process and provide the necessary details and information.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1q7rsaawzyhravi6r02x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1q7rsaawzyhravi6r02x.png" alt=" " width="640" height="393"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Create a GPU Node (Virtual Machine)
&lt;/h3&gt;

&lt;p&gt;GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopxgo5fjs9g7oico94jk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopxgo5fjs9g7oico94jk.png" alt=" " width="640" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4m3dhq1wr33a49ihvkl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4m3dhq1wr33a49ihvkl.png" alt=" " width="640" height="399"&gt;&lt;/a&gt;&lt;br&gt;
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: Select a Model, Region, and Storage
&lt;/h3&gt;

&lt;p&gt;In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx1a2dn42bsv6umr30ae.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx1a2dn42bsv6umr30ae.png" alt=" " width="640" height="312"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkoav4839vsrf8qgdksqq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkoav4839vsrf8qgdksqq.png" alt=" " width="640" height="312"&gt;&lt;/a&gt;&lt;br&gt;
We will use 1 x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: Select Authentication Method
&lt;/h3&gt;

&lt;p&gt;There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running EmbeddingGemma-300m, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.&lt;/p&gt;

&lt;p&gt;We chose the following image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This image is essential because it includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full CUDA toolkit (including nvcc)&lt;/li&gt;
&lt;li&gt;Proper support for building and running GPU-based applications like EmbeddingGemma-300m&lt;/li&gt;
&lt;li&gt;Compatibility with CUDA 12.1.1 required by certain model operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Launch Mode
&lt;/h3&gt;

&lt;p&gt;We selected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Interactive shell server

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like EmbeddingGemma-300m.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Repository Authentication
&lt;/h3&gt;

&lt;p&gt;We left all fields empty here.&lt;/p&gt;

&lt;p&gt;Since the Docker image is publicly available on Docker Hub, no login credentials are required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Identification
&lt;/h3&gt;

&lt;p&gt;Template Name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rmzf8k4qok26mm8izj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rmzf8k4qok26mm8izj0.png" alt=" " width="640" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lf7i43zo43xkp3ea12y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lf7i43zo43xkp3ea12y.png" alt=" " width="640" height="314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This setup ensures that the EmbeddingGemma-300m runs in a GPU-enabled environment with proper CUDA access and high compute performance.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0j197wuier91oyk5vji0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0j197wuier91oyk5vji0.png" alt=" " width="640" height="294"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxk4hb2qls0ihfd36ra6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxk4hb2qls0ihfd36ra6.png" alt=" " width="640" height="293"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 6: Virtual Machine Successfully Deployed
&lt;/h3&gt;

&lt;p&gt;You will get visual confirmation that your node is up and running.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg1jhmusnpsjq9u8nx21u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg1jhmusnpsjq9u8nx21u.png" alt=" " width="640" height="249"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 7: Connect to GPUs using SSH
&lt;/h3&gt;

&lt;p&gt;NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.&lt;/p&gt;

&lt;p&gt;Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sofyd8mrz7v0hp6m2kq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sofyd8mrz7v0hp6m2kq.png" alt=" " width="640" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F18zn8nuu0bwybljt5fdi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F18zn8nuu0bwybljt5fdi.png" alt=" " width="640" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now open your terminal and paste the proxy SSH IP or direct SSH IP.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4dx438rmhgsrjx9p4koj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4dx438rmhgsrjx9p4koj.png" alt=" " width="640" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the command below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia-smi

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7blq7r6dybq8cq0n38t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7blq7r6dybq8cq0n38t.png" alt=" " width="640" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Verify Python Version &amp;amp; Install pip (if not present)
&lt;/h3&gt;

&lt;p&gt;Since Python 3.10 is already installed, we’ll confirm its version and ensure pip is available for package installation.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 8.1: Check Python Version
&lt;/h4&gt;

&lt;p&gt;Run the following command to verify Python 3.10 is installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Python 3.10.12

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 8.2: Install pip (if not already installed)
&lt;/h3&gt;

&lt;p&gt;Even if Python is installed, pip might not be available.&lt;/p&gt;

&lt;p&gt;Check if pip exists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip3 --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get an error like command not found, then install pip manually.&lt;/p&gt;

&lt;p&gt;Install pip via get-pip.py:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -O https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will download and install pip into your system.&lt;/p&gt;

&lt;p&gt;You may see a warning about running as root — that’s okay for now.&lt;/p&gt;

&lt;p&gt;After installation, verify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip3 --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip 25.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now pip is ready to install packages like transformers, torch, etc.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcxf69324kayruz15xsm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcxf69324kayruz15xsm.png" alt=" " width="640" height="341"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 9: Created and Activated Python 3.10 Virtual Environment
&lt;/h3&gt;

&lt;p&gt;Run the following commands to created and activated Python 3.10 virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apt update &amp;amp;&amp;amp; apt install -y python3.10-venv git wget
python3.10 -m venv gemma
source gemma/bin/activate

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzfq8jzynessrjrogeeu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzfq8jzynessrjrogeeu.png" alt=" " width="640" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: Install Dependencies
&lt;/h3&gt;

&lt;p&gt;Run the following command to install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -U sentence-transformers faiss-cpu

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu81nkj0if1309ewak5vn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu81nkj0if1309ewak5vn.png" alt=" " width="640" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 11: Install Hugging Face Hub
&lt;/h3&gt;

&lt;p&gt;Run the following command to install huggingface_hub:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -U huggingface_hub

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filjf3m01qa2mkz6dje37.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filjf3m01qa2mkz6dje37.png" alt=" " width="640" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 12: Log in to Hugging Face (CLI)
&lt;/h3&gt;

&lt;p&gt;Run the following command to login in to hugging face:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;huggingface-cli login

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When prompted, paste your HF token (from &lt;a href="https://huggingface.co/settings/tokens" rel="noopener noreferrer"&gt;https://huggingface.co/settings/tokens&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;For “Add token as git credential? (Y/n)”:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Y if you plan to git clone models/repos.&lt;/li&gt;
&lt;li&gt;n if you only use huggingface_hub downloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You should see: “Token is valid… saved to /root/.cache/huggingface/stored_tokens”.&lt;/p&gt;

&lt;p&gt;The red line “Cannot authenticate through git-credential…” just means no Git credential helper is set. It’s safe to ignore.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5qikxgm86loiip6zaee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5qikxgm86loiip6zaee.png" alt=" " width="640" height="345"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 13: Connect to Your GPU VM with a Code Editor
&lt;/h3&gt;

&lt;p&gt;Before you start running model script with the EmbeddingGemma-300m model, it’s a good idea to connect your GPU virtual machine (VM) to a code editor of your choice. This makes writing, editing, and running code much easier.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can use popular editors like VS Code, Cursor, or any other IDE that supports SSH remote connections.&lt;/li&gt;
&lt;li&gt;In this example, we’re using cursor code editor.&lt;/li&gt;
&lt;li&gt;Once connected, you’ll be able to browse files, edit scripts, and run commands directly on your remote server, just like working locally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why do this?&lt;br&gt;
Connecting your VM to a code editor gives you a powerful, streamlined workflow for Python development, allowing you to easily manage your code, install dependencies, and experiment with large models.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fydkt41s289q3bjvnm6j4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fydkt41s289q3bjvnm6j4.png" alt=" " width="640" height="350"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 14: Create app.py and Add the Following Code
&lt;/h3&gt;

&lt;p&gt;Create the file&lt;br&gt;
From your VM terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nano app.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in VS Code (as in your screenshot), click New File → name it app.py.&lt;/p&gt;

&lt;p&gt;Paste this code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sentence_transformers import SentenceTransformer
import numpy as np

# Load the EmbeddingGemma-300M model (Google’s open embedding model)
model = SentenceTransformer("google/embeddinggemma-300m")  # auto device (CPU/GPU)

# A sample query
query = "Which planet is known as the Red Planet?"

# A small list of candidate documents
docs = [
    "Venus is often called Earth's twin.",
    "Mars, with its reddish hue, is the Red Planet.",
    "Jupiter is the largest planet.",
    "Saturn has iconic rings."
]

# Encode the query → vector representation optimized for search
q = model.encode_query(query)

# Encode the documents → vector representations optimized for retrieval
D = model.encode_document(docs)

# Compute similarity between the query vector and each document vector
scores = model.similarity(q, D).squeeze().tolist()

# Pair each score with its document and sort (highest similarity first)
ranked = sorted(zip(scores, docs), reverse=True)

# Print top 3 results
print(ranked[:3])

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What this file does (detailed)
&lt;/h4&gt;

&lt;p&gt;Imports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SentenceTransformer loads the EmbeddingGemma-300M model.&lt;/li&gt;
&lt;li&gt;numpy is for vector math.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Model load:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loads the Google EmbeddingGemma-300M embedding model, which converts text into vectors (embeddings).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Query + documents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Defines one query ("Which planet is known as the Red Planet?") and a small set of candidate sentences (our mini “document corpus”).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Encoding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model.encode_query(query) → creates a vector representation of the query.&lt;/li&gt;
&lt;li&gt;model.encode_document(docs) → creates vector representations of the candidate docs.&lt;/li&gt;
&lt;li&gt;Using separate methods ensures query/document embeddings are tuned for retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Similarity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model.similarity(q, D) computes how close each doc is to the query in vector space.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ranking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sorts docs by similarity score (highest first). The result shows which document best answers the query.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prints the top 3 results. You should see “Mars…” ranked highest, since it matches the Red Planet question.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short:&lt;br&gt;
app.py is a minimal semantic search demo using EmbeddingGemma. It shows how to encode queries &amp;amp; docs, compute similarity, and rank results — the basic workflow behind search engines, chatbots, and RAG systems.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgv7jrq0h05wyxvabijn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjgv7jrq0h05wyxvabijn.png" alt=" " width="640" height="326"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 15: Run the Script
&lt;/h3&gt;

&lt;p&gt;Run the script from the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 app.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will download the model and generate response on terminal.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokisiub1bayg6h2lggn7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokisiub1bayg6h2lggn7.png" alt=" " width="640" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtquv8ccogdva01cj30f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtquv8ccogdva01cj30f.png" alt=" " width="640" height="336"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 16: Create build_index.py and add the following code
&lt;/h3&gt;

&lt;p&gt;Create the file&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nano build_index.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in VS Code → New File → name it build_index.py.&lt;/p&gt;

&lt;p&gt;Paste the full code (you already have it):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os, json, argparse, numpy as np
from pathlib import Path
from sentence_transformers import SentenceTransformer
import faiss

def read_corpus(folder):
    paths = []
    texts = []
    for p in Path(folder).rglob("*"):
        if p.suffix.lower() in {".txt", ".md"} and p.stat().st_size &amp;gt; 0:
            paths.append(str(p))
            texts.append(p.read_text(encoding="utf-8", errors="ignore"))
    return paths, texts

def mrl_truncate_and_norm(X, k):
    X = X[:, :k]
    X = X / np.linalg.norm(X, axis=1, keepdims=True)
    return X.astype("float32")

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--data_dir", required=True, help="Folder with .txt/.md")
    ap.add_argument("--dim", type=int, default=768, choices=[768,512,256,128])
    ap.add_argument("--out_dir", default="index")
    args = ap.parse_args()

    os.makedirs(args.out_dir, exist_ok=True)

    print("Loading model…")
    model = SentenceTransformer("google/embeddinggemma-300m")  # fp32/bf16 only

    print("Reading corpus…")
    paths, texts = read_corpus(args.data_dir)
    assert texts, "No .txt/.md files found"

    print(f"Encoding {len(texts)} docs…")
    D = model.encode_document(texts, batch_size=64, convert_to_numpy=True)
    # L2-normalize (cosine sim via inner product)
    D = D / np.linalg.norm(D, axis=1, keepdims=True)

    if args.dim &amp;lt; 768:
        print(f"Applying Matryoshka truncation to {args.dim}…")
        D = mrl_truncate_and_norm(D, args.dim)

    index = faiss.IndexFlatIP(D.shape[1])
    index.add(D)

    faiss.write_index(index, f"{args.out_dir}/faiss_{args.dim}.index")
    np.save(f"{args.out_dir}/embeddings_{args.dim}.npy", D)
    with open(f"{args.out_dir}/mapping.json", "w") as f:
        json.dump(paths, f, indent=2)

    print(f"Saved index to {args.out_dir} (dim={args.dim}, N={len(texts)})")

if __name__ == "__main__":
    main()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What this script does
&lt;/h4&gt;

&lt;p&gt;read_corpus(folder):&lt;br&gt;
Reads all .txt and .md files in the given folder. Returns two lists:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;paths → file paths&lt;/li&gt;
&lt;li&gt;texts → file contents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;mrl_truncate_and_norm(X, k):&lt;br&gt;
Implements Matryoshka Representation Learning.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Takes embeddings of size 768.&lt;/li&gt;
&lt;li&gt;Truncates to smaller dimension (512, 256, or 128).&lt;/li&gt;
&lt;li&gt;Re-normalizes them for cosine similarity search.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;main():&lt;br&gt;
Parse arguments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;--data_dir → where your text files are.&lt;/li&gt;
&lt;li&gt;--dim → embedding size (default 768).&lt;/li&gt;
&lt;li&gt;--out_dir → where to save the index (default index/).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Load the EmbeddingGemma-300M model.&lt;br&gt;
Read all docs from your folder.&lt;br&gt;
Encode them with model.encode_document().&lt;br&gt;
Normalize vectors.&lt;br&gt;
Optionally shrink with MRL.&lt;br&gt;
Create a FAISS index (cosine similarity using IndexFlatIP).&lt;/p&gt;

&lt;p&gt;Save:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;faiss_.index → the FAISS index file.&lt;/li&gt;
&lt;li&gt;embeddings_.npy → numpy array of embeddings.&lt;/li&gt;
&lt;li&gt;mapping.json → file path mapping to docs.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfvf4dl5olpot3vfest4.png" alt=" " width="640" height="330"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  How to run it
&lt;/h4&gt;

&lt;p&gt;Create some docs (if you don’t have any yet):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir docs
echo "Mars is the Red Planet." &amp;gt; docs/mars.txt
echo "Venus is Earth's twin." &amp;gt; docs/venus.txt
echo "Jupiter is the largest planet." &amp;gt; docs/jupiter.txt

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jvvdtb4v4xof85kdnkh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jvvdtb4v4xof85kdnkh.png" alt=" " width="640" height="176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 build_index.py --data_dir ./docs

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read your .txt files in docs/&lt;/li&gt;
&lt;li&gt;Encode them with EmbeddingGemma-300M&lt;/li&gt;
&lt;li&gt;Save an index under ./index/&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Output example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Loading model…
Reading corpus…
Encoding 3 docs…
Saved index to index (dim=768, N=3)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsqvphtaudele3w0oxp0o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsqvphtaudele3w0oxp0o.png" alt=" " width="640" height="137"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What you get after running
&lt;/h4&gt;

&lt;p&gt;Inside the index/ folder:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;faiss_768.index → FAISS index file&lt;/li&gt;
&lt;li&gt;embeddings_768.npy → stored embeddings&lt;/li&gt;
&lt;li&gt;mapping.json → JSON mapping file paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short: build_index.py prepares your text files into a searchable embedding index using EmbeddingGemma + FAISS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;EmbeddingGemma-300M is a powerful yet lightweight open embedding model from Google DeepMind, designed for retrieval, semantic similarity, classification, clustering, and more — all while being efficient enough to run on laptops, desktops, or modest GPUs. In this guide, we walked through setting up a NodeShift GPU VM, installing dependencies, and building two core scripts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;app.py for a quick semantic search demo using queries and documents.&lt;/li&gt;
&lt;li&gt;build_index.py for preparing and indexing your own text corpus with FAISS, ready for scalable search.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these steps, you now have everything you need to integrate EmbeddingGemma into search pipelines, recommendation systems, or retrieval-augmented applications. Whether on-device or in the cloud, EmbeddingGemma-300M provides a practical and cost-effective foundation for embedding-based workflows.&lt;/p&gt;

</description>
      <category>gemma</category>
      <category>opensource</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>How to Install &amp; Run Microsoft Kosmos-2.5 Locally?</title>
      <dc:creator>Ayush kumar</dc:creator>
      <pubDate>Mon, 08 Sep 2025 08:24:43 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/how-to-install-run-microsoft-kosmos-25-locally-l5a</link>
      <guid>https://dev.to/nodeshiftcloud/how-to-install-run-microsoft-kosmos-25-locally-l5a</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuu8jvqu8jbo4ho3ulrr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuu8jvqu8jbo4ho3ulrr.png" alt=" " width="800" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kosmos-2.5 is Microsoft’s multimodal “literate” model for reading text-heavy images (receipts, invoices, forms, docs). It does two things out of the box using task prompts: (a) OCR with spatially-aware text blocks (text + bounding boxes) via , and (b) image→Markdown conversion via . It’s implemented in Transformers (supported from v4.56+) with ready-to-run Python snippets, and the paper details the shared decoder-only architecture and doc-understanding focus.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPU Configuration (What Actually Works)
&lt;/h3&gt;

&lt;p&gt;Ballpark VRAM based on 1.3B-param model running in bfloat16 with image patches; add headroom for long outputs / larger pages.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1hog1w5xqt0nz4vvxwo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1hog1w5xqt0nz4vvxwo.png" alt=" " width="738" height="571"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;p&gt;Link: &lt;a href="https://huggingface.co/microsoft/kosmos-2.5" rel="noopener noreferrer"&gt;https://huggingface.co/microsoft/kosmos-2.5&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step-by-Step Process to Install &amp;amp; Run Microsoft Kosmos-2.5 Locally
&lt;/h3&gt;

&lt;p&gt;For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Sign Up and Set Up a NodeShift Cloud Account
&lt;/h3&gt;

&lt;p&gt;Visit the &lt;a href="https://app.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;NodeShift Platform&lt;/a&gt; and create an account. Once you’ve signed up, log into your account.&lt;/p&gt;

&lt;p&gt;Follow the account setup process and provide the necessary details and information.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8e06ybc9bl8jmcg81gh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8e06ybc9bl8jmcg81gh.png" alt=" " width="640" height="386"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Create a GPU Node (Virtual Machine)
&lt;/h3&gt;

&lt;p&gt;GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2oycbf1l536gbkrsynq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2oycbf1l536gbkrsynq.png" alt=" " width="640" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu73a1bdseer5kzzn9enj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu73a1bdseer5kzzn9enj.png" alt=" " width="640" height="390"&gt;&lt;/a&gt;&lt;br&gt;
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: Select a Model, Region, and Storage
&lt;/h3&gt;

&lt;p&gt;In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1el6nd3ybk4p0uxghue.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1el6nd3ybk4p0uxghue.png" alt=" " width="640" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4t8yafbfs07z15pxzl3m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4t8yafbfs07z15pxzl3m.png" alt=" " width="640" height="369"&gt;&lt;/a&gt;&lt;br&gt;
We will use 1 x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: Select Authentication Method
&lt;/h3&gt;

&lt;p&gt;There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5l2fe0el4zbqq1c7h39.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5l2fe0el4zbqq1c7h39.png" alt=" " width="640" height="176"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Microsoft Kosmos-2.5, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.&lt;/p&gt;

&lt;p&gt;We chose the following image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This image is essential because it includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full CUDA toolkit (including nvcc)&lt;/li&gt;
&lt;li&gt;Proper support for building and running GPU-based applications like Microsoft Kosmos-2.5&lt;/li&gt;
&lt;li&gt;Compatibility with CUDA 12.1.1 required by certain model operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Launch Mode
&lt;/h3&gt;

&lt;p&gt;We selected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Interactive shell server

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching tools like Microsoft Kosmos-2.5.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Repository Authentication
&lt;/h3&gt;

&lt;p&gt;We left all fields empty here.&lt;/p&gt;

&lt;p&gt;Since the Docker image is publicly available on Docker Hub, no login credentials are required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Identification
&lt;/h3&gt;

&lt;p&gt;Template Name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9mzpd3x3j7e1pfp0rha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9mzpd3x3j7e1pfp0rha.png" alt=" " width="640" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wegse6qk0918lzixzt5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wegse6qk0918lzixzt5.png" alt=" " width="640" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This setup ensures that the Gemma-3-270m &amp;amp; Instruct  runs in a GPU-enabled environment with proper CUDA access and high compute performance.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9jjdsztl4zdcryc6cwn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9jjdsztl4zdcryc6cwn4.png" alt=" " width="640" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwjvikx8vc9q24g3vpo3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwjvikx8vc9q24g3vpo3.png" alt=" " width="640" height="317"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 7: Connect to GPUs using SSH
&lt;/h3&gt;

&lt;p&gt;NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.&lt;/p&gt;

&lt;p&gt;Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zntu345gnwicehiu5uh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zntu345gnwicehiu5uh.png" alt=" " width="640" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0f2vrxt4w6menx7ny5g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0f2vrxt4w6menx7ny5g.png" alt=" " width="640" height="304"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now open your terminal and paste the proxy SSH IP or direct SSH IP.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4011a9majttmj69ked9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4011a9majttmj69ked9.png" alt=" " width="640" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the command below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia-smi

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf2yrbjv7jvo9ptgqbol.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf2yrbjv7jvo9ptgqbol.png" alt=" " width="640" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Verify Python Version &amp;amp; Install pip (if not present)
&lt;/h3&gt;

&lt;p&gt;Since Python 3.10 is already installed, we’ll confirm its version and ensure pip is available for package installation.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 8.1: Check Python Version
&lt;/h4&gt;

&lt;p&gt;Run the following command to verify Python 3.10 is installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Python 3.10.12

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 8.2: Install pip (if not already installed)
&lt;/h3&gt;

&lt;p&gt;Even if Python is installed, pip might not be available.&lt;/p&gt;

&lt;p&gt;Check if pip exists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip3 --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get an error like command not found, then install pip manually.&lt;/p&gt;

&lt;p&gt;Install pip via get-pip.py:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -O https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will download and install pip into your system.&lt;/p&gt;

&lt;p&gt;You may see a warning about running as root — that’s okay for now.&lt;/p&gt;

&lt;p&gt;After installation, verify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip3 --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip 25.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now pip is ready to install packages like transformers, torch, etc.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpp7tikoe370l4sir6glm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpp7tikoe370l4sir6glm.png" alt=" " width="640" height="411"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 9: Created and Activated Python 3.10 Virtual Environment
&lt;/h3&gt;

&lt;p&gt;Run the following commands to created and activated Python 3.10 virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apt update &amp;amp;&amp;amp; apt install -y python3.10-venv git wget
python3.10 -m venv kosmos
source kosmos/bin/activate

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdu71stjheh20nkesfgcu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdu71stjheh20nkesfgcu.png" alt=" " width="640" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: Install PyTorch
&lt;/h3&gt;

&lt;p&gt;Run the following command to install PyTorch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhxyq96g0fzo0q2trnx6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhxyq96g0fzo0q2trnx6.png" alt=" " width="640" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 11: Install Model Dependencies
&lt;/h3&gt;

&lt;p&gt;Run the following command to install model dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install "transformers&amp;gt;=4.56" accelerate pillow requests

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Transformers ≥4.56 is required.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85iq1bempki2g23kg8s4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85iq1bempki2g23kg8s4.png" alt=" " width="640" height="411"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 12: Install Wheel &amp;amp; Flash Attn
&lt;/h3&gt;

&lt;p&gt;Run the following command to install wheel &amp;amp; flash-attn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install wheel
pip install flash-attn --no-build-isolation

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2btkzojw7sh4bs16icq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2btkzojw7sh4bs16icq.png" alt=" " width="640" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 13: Connect to Your GPU VM with a Code Editor
&lt;/h3&gt;

&lt;p&gt;Before you start running model script with the Microsoft Kosmos-2.5 model, it’s a good idea to connect your GPU virtual machine (VM) to a code editor of your choice. This makes writing, editing, and running code much easier.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can use popular editors like VS Code, Cursor, or any other IDE that supports SSH remote connections.&lt;/li&gt;
&lt;li&gt;In this example, we’re using cursor code editor.&lt;/li&gt;
&lt;li&gt;Once connected, you’ll be able to browse files, edit scripts, and run commands directly on your remote server, just like working locally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why do this?&lt;br&gt;
Connecting your VM to a code editor gives you a powerful, streamlined workflow for Python development, allowing you to easily manage your code, install dependencies, and experiment with large models.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oia2dz8u3zcddnyp1nu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oia2dz8u3zcddnyp1nu.png" alt=" " width="640" height="471"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 14: Smoke Test: Markdown Extraction
&lt;/h3&gt;

&lt;p&gt;Create kosmos25_md.py and add the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import torch, requests
from PIL import Image
from transformers import AutoProcessor, Kosmos2_5ForConditionalGeneration

repo = "microsoft/kosmos-2.5"
device = "cuda:0"
dtype = torch.bfloat16

model = Kosmos2_5ForConditionalGeneration.from_pretrained(
    repo,
    device_map=device,
    torch_dtype=dtype,
    # If you installed flash-attn, uncomment the next line
    # attn_implementation="flash_attention_2",
)
processor = AutoProcessor.from_pretrained(repo)

# Sample image from the model card
url = "https://huggingface.co/microsoft/kosmos-2.5/resolve/main/receipt_00008.png"
image = Image.open(requests.get(url, stream=True).raw)

prompt = "&amp;lt;md&amp;gt;"
inputs = processor(text=prompt, images=image, return_tensors="pt")
# Keep &amp;amp; use the scaled dimensions from the model card example
height, width = inputs.pop("height"), inputs.pop("width")

inputs = {k: (v.to(device) if v is not None else None) for k, v in inputs.items()}
inputs["flattened_patches"] = inputs["flattened_patches"].to(dtype)

out_ids = model.generate(**inputs, max_new_tokens=1024)
text = processor.batch_decode(out_ids, skip_special_tokens=True)[0]
print(text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg8896qqc4sdouvlgh4z1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg8896qqc4sdouvlgh4z1.png" alt=" " width="640" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run the script from the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 kosmos25_md.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What kosmos25_md.py does
&lt;/h4&gt;

&lt;p&gt;Imports libraries&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;torch: for running the model on GPU/CPU.&lt;/li&gt;
&lt;li&gt;requests: to download a sample image from the Hugging Face repo.&lt;/li&gt;
&lt;li&gt;PIL.Image: to load and process that image.&lt;/li&gt;
&lt;li&gt;transformers: provides the AutoProcessor (for preprocessing text+images) and Kosmos2_5ForConditionalGeneration (the actual model).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Defines model + device setup&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chooses repo = “microsoft/kosmos-2.5”.&lt;/li&gt;
&lt;li&gt;Sets device = "cuda:0" (so it uses your first GPU).&lt;/li&gt;
&lt;li&gt;Uses dtype = torch.bfloat16 (lighter precision for efficiency).&lt;/li&gt;
&lt;li&gt;Loads the model weights from Hugging Face into GPU memory.&lt;/li&gt;
&lt;li&gt;Loads the paired processor, which knows how to tokenize text and convert images into patches.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fetches a sample image&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Downloads a receipt image (receipt_00008.png) directly from the Hugging Face repo.&lt;/li&gt;
&lt;li&gt;Opens it with PIL so it’s ready to feed to the model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prepares the task prompt&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sets prompt = "".&lt;/li&gt;
&lt;li&gt;This tells Kosmos-2.5 you want Markdown transcription (not OCR bounding boxes).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Processes input into tensors&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calls the processor with the text () + image.&lt;/li&gt;
&lt;li&gt;Returns model-ready tensors (pixel_values, input_ids, flattened_patches, height, width).&lt;/li&gt;
&lt;li&gt;Keeps track of height and width (for scaling purposes).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Moves data to GPU&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Iterates over input tensors and sends them to the CUDA device.&lt;/li&gt;
&lt;li&gt;Ensures flattened_patches are stored in bfloat16 for efficiency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Runs generation with the model&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calls model.generate() with inputs.&lt;/li&gt;
&lt;li&gt;max_new_tokens=1024 → allows up to 1024 tokens of output.&lt;/li&gt;
&lt;li&gt;The model produces a sequence representing Markdown text.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Decodes the output&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses processor.batch_decode() to convert model IDs back into text.&lt;/li&gt;
&lt;li&gt;Skips special tokens (, , etc.).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prints result to terminal&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Displays the generated Markdown string representing the document layout.&lt;/li&gt;
&lt;li&gt;Example: headings, tables, or text blocks reflecting the receipt’s content.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0cjfw7apajxjl723bxh.png" alt=" " width="640" height="409"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Summary
&lt;/h4&gt;

&lt;p&gt;When you run python kosmos25_md.py, the script:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loads Kosmos-2.5 on GPU in bf16.&lt;/li&gt;
&lt;li&gt;Downloads a sample receipt image.&lt;/li&gt;
&lt;li&gt;Sends  + image through the model.&lt;/li&gt;
&lt;li&gt;Generates structured Markdown output of the document.&lt;/li&gt;
&lt;li&gt;Prints the Markdown text to your terminal.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa4wki15wcxzkz9uywkr1.png" alt=" " width="640" height="408"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 15: OCR with bounding boxes
&lt;/h3&gt;

&lt;p&gt;Create kosmos25_ocr.py and add the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import re, torch, requests
from PIL import Image, ImageDraw
from transformers import AutoProcessor, Kosmos2_5ForConditionalGeneration

repo = "microsoft/kosmos-2.5"
device = "cuda:0"; dtype = torch.bfloat16

model = Kosmos2_5ForConditionalGeneration.from_pretrained(
    repo,
    device_map=device,
    torch_dtype=dtype,
    # attn_implementation="flash_attention_2",
)
processor = AutoProcessor.from_pretrained(repo)

url = "https://huggingface.co/microsoft/kosmos-2.5/resolve/main/receipt_00008.png"
image = Image.open(requests.get(url, stream=True).raw)

prompt = "&amp;lt;ocr&amp;gt;"
inputs = processor(text=prompt, images=image, return_tensors="pt")
height, width = inputs.pop("height"), inputs.pop("width")
raw_width, raw_height = image.size
scale_h = raw_height / height
scale_w = raw_width / width

inputs = {k: (v.to(device) if v is not None else None) for k, v in inputs.items()}
inputs["flattened_patches"] = inputs["flattened_patches"].to(dtype)

out_ids = model.generate(**inputs, max_new_tokens=1024)
y = processor.batch_decode(out_ids, skip_special_tokens=True)[0]

# Post-process (from model card example)
pattern = r"&amp;lt;bbox&amp;gt;&amp;lt;x_\\d+&amp;gt;&amp;lt;y_\\d+&amp;gt;&amp;lt;x_\\d+&amp;gt;&amp;lt;y_\\d+&amp;gt;&amp;lt;/bbox&amp;gt;"
boxes_raw = re.findall(pattern, y)
lines = re.split(pattern, y)[1:]
boxes = [[int(j) for j in re.findall(r"\\d+", i)] for i in boxes_raw]

draw = ImageDraw.Draw(image)
for i, line in enumerate(lines):
    x0,y0,x1,y1 = boxes[i]
    if x0 &amp;lt; x1 and y0 &amp;lt; y1:
        x0,y0,x1,y1 = int(x0*scale_w), int(y0*scale_h), int(x1*scale_w), int(y1*scale_h)
        draw.polygon([x0,y0, x1,y0, x1,y1, x0,y1], outline="red")
image.save("ocr_output.png")
print("Saved ocr_output.png")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowi8zi007jp6n9rxfjr6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowi8zi007jp6n9rxfjr6.png" alt=" " width="640" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run the script from the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 kosmos25_ocr.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What kosmos25_ocr.py does
&lt;/h4&gt;

&lt;p&gt;Imports libraries&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same as the Markdown script: torch, requests, PIL.Image, and transformers.&lt;/li&gt;
&lt;li&gt;Adds re (regular expressions) to parse bounding box tags in the model’s output.&lt;/li&gt;
&lt;li&gt;Adds ImageDraw from PIL to draw boxes on the image.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Defines model + device setup&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loads the Kosmos-2.5 model (microsoft/kosmos-2.5) into GPU memory.&lt;/li&gt;
&lt;li&gt;Uses device = "cuda:0" and dtype = torch.bfloat16 for GPU execution.&lt;/li&gt;
&lt;li&gt;Loads the paired processor for tokenization and image preprocessing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fetches the sample image&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Downloads the same receipt image (receipt_00008.png) from Hugging Face.&lt;/li&gt;
&lt;li&gt;Opens it using PIL.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prepares the task prompt&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sets prompt = "".&lt;/li&gt;
&lt;li&gt;This tells Kosmos-2.5 to generate text with bounding box coordinates for each block of text it detects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Processes input into tensors&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calls the processor with text () + image.&lt;/li&gt;
&lt;li&gt;Extracts height and width from the processed input for scaling.&lt;/li&gt;
&lt;li&gt;Keeps track of raw image dimensions (raw_width, raw_height).&lt;/li&gt;
&lt;li&gt;Computes scaling factors (scale_height, scale_width) so that bounding boxes from the model can be mapped correctly to the real image size.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Moves data to GPU&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Just like in the Markdown script, pushes tensors to the GPU.&lt;/li&gt;
&lt;li&gt;Converts flattened_patches to bfloat16.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Runs generation with the model&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calls model.generate() with max 1024 tokens.&lt;/li&gt;
&lt;li&gt;Output contains both text and bounding box tags (e.g., ...).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Post-processes the output&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decodes the model output back to text.&lt;/li&gt;
&lt;li&gt;Removes the  prompt from the result.&lt;/li&gt;
&lt;li&gt;Uses regex to extract bounding box coordinates.&lt;/li&gt;
&lt;li&gt;Splits the text into lines associated with those bounding boxes.&lt;/li&gt;
&lt;li&gt;Scales the bounding boxes to match the original image resolution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overlays bounding boxes on the image&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses PIL’s ImageDraw.Draw to draw red polygons around detected text regions.&lt;/li&gt;
&lt;li&gt;Associates each bounding box with its recognized text.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Saves + prints results&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Saves a new image (output.png) with bounding boxes drawn.&lt;/li&gt;
&lt;li&gt;Prints the recognized text with bounding box coordinates in the terminal.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdgmfsf18py9cvub76pc.png" alt=" " width="640" height="408"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Key Difference vs Markdown script
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Markdown script (kosmos25_md.py) → Converts the entire document into structured Markdown text (no spatial layout).&lt;/li&gt;
&lt;li&gt;OCR script (kosmos25_ocr.py) → Extracts text with spatial coordinates and draws bounding boxes directly onto the image.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqh0u4qb2gq4vuwbmq9h1.png" alt=" " width="640" height="409"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run Markdown mode when you want a neat Markdown document version of your image.&lt;/li&gt;
&lt;li&gt;Run OCR mode when you want raw text + bounding boxes for further analysis or visualization.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3t8hc1b1cy1oziip03c.png" alt=" " width="640" height="508"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 16: Install Streamlit
&lt;/h3&gt;

&lt;p&gt;Run the following command to install streamlit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install streamlit

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcyx7u4sltfw3y8bz71r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcyx7u4sltfw3y8bz71r.png" alt=" " width="640" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 17: Create a app.py
&lt;/h3&gt;

&lt;p&gt;Create a file (ex: app.py) and add the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import streamlit as st
import torch, requests, re
from PIL import Image, ImageDraw
from transformers import AutoProcessor, Kosmos2_5ForConditionalGeneration

# Load once at startup
repo = "microsoft/kosmos-2.5"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if "cuda" in device else torch.float32

@st.cache_resource
def load_model():
    model = Kosmos2_5ForConditionalGeneration.from_pretrained(
        repo,
        device_map=device,
        torch_dtype=dtype,
    )
    processor = AutoProcessor.from_pretrained(repo)
    return model, processor

model, processor = load_model()

st.title("Kosmos-2.5 WebUI (OCR + Markdown)")
mode = st.radio("Choose task:", ["Markdown (&amp;lt;md&amp;gt;)", "OCR (&amp;lt;ocr&amp;gt;)"])
uploaded = st.file_uploader("Upload an image", type=["png","jpg","jpeg"])

if uploaded:
    image = Image.open(uploaded).convert("RGB")
    st.image(image, caption="Uploaded Image", use_column_width=True)

    if st.button("Run Kosmos-2.5"):
        prompt = "&amp;lt;md&amp;gt;" if mode.startswith("Markdown") else "&amp;lt;ocr&amp;gt;"
        inputs = processor(text=prompt, images=image, return_tensors="pt")
        height, width = inputs.pop("height"), inputs.pop("width")
        raw_w, raw_h = image.size
        scale_h, scale_w = raw_h/height, raw_w/width

        inputs = {k: (v.to(device) if v is not None else None) for k,v in inputs.items()}
        inputs["flattened_patches"] = inputs["flattened_patches"].to(dtype)

        with torch.no_grad():
            out_ids = model.generate(**inputs, max_new_tokens=1024)
        text = processor.batch_decode(out_ids, skip_special_tokens=True)[0]

        if mode.startswith("Markdown"):
            st.subheader("Markdown Output")
            st.code(text, language="markdown")
        else:
            # Post-process OCR boxes
            pattern = r"&amp;lt;bbox&amp;gt;&amp;lt;x_\d+&amp;gt;&amp;lt;y_\d+&amp;gt;&amp;lt;x_\d+&amp;gt;&amp;lt;y_\d+&amp;gt;&amp;lt;/bbox&amp;gt;"
            boxes_raw = re.findall(pattern, text)
            lines = re.split(pattern, text)[1:]
            boxes = [[int(j) for j in re.findall(r"\d+", i)] for i in boxes_raw]

            draw = ImageDraw.Draw(image)
            for i, line in enumerate(lines):
                x0,y0,x1,y1 = boxes[i]
                if x0 &amp;lt; x1 and y0 &amp;lt; y1:
                    x0,y0,x1,y1 = int(x0*scale_w), int(y0*scale_h), int(x1*scale_w), int(y1*scale_h)
                    draw.polygon([x0,y0, x1,y0, x1,y1, x0,y1], outline="red")
            st.subheader("OCR with Bounding Boxes")
            st.image(image)
            st.text_area("OCR Text", "\n".join(lines), height=200)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2tn9ayk4vhw1267t5vu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2tn9ayk4vhw1267t5vu.png" alt=" " width="640" height="572"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 18: Launch Streamlit
&lt;/h3&gt;

&lt;p&gt;Run the following command to launch streamlit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;streamlit run app.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxoksxgt88xd74uxib5aj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxoksxgt88xd74uxib5aj.png" alt=" " width="640" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 19: Access the WebUI in Your Browser
&lt;/h3&gt;

&lt;p&gt;Once Streamlit is running, it will display three links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local URL → &lt;a href="http://localhost:8501" rel="noopener noreferrer"&gt;http://localhost:8501&lt;/a&gt; (works if you’re running on your own machine).&lt;/li&gt;
&lt;li&gt;Network URL → http://:8501 (for internal access inside your VM network).&lt;/li&gt;
&lt;li&gt;External URL → http://:8501 (use this to open from your laptop/PC browser).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open the External URL in your browser.&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://38.29.145.10:8501

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Kosmos-2.5 WebUI will load with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A task selector (Markdown  or OCR ).&lt;/li&gt;
&lt;li&gt;An upload box to drag &amp;amp; drop or browse images.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upload any PNG/JPG/JPEG image (e.g., receipts, invoices, documents).&lt;/p&gt;

&lt;p&gt;Click Run and view:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Markdown Mode → a structured Markdown transcription of the document.&lt;/li&gt;
&lt;li&gt;OCR Mode → text + bounding boxes drawn directly on your image.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tip: If your VM is remote (e.g., NodeShift), ensure port 8501 is open in firewall/security settings, or use SSH port forwarding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -L 8501:localhost:8501 root@&amp;lt;your-vm-ip&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flioss8i1oiikl1c2fg31.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flioss8i1oiikl1c2fg31.png" alt=" " width="640" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 20: Upload and Process Documents
&lt;/h3&gt;

&lt;p&gt;In the WebUI, click Browse files (or drag &amp;amp; drop) to upload an image.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supported formats: PNG, JPG, JPEG&lt;/li&gt;
&lt;li&gt;File size limit: 200 MB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once uploaded, the file name will appear below the upload box (e.g., receipt_00008.png).&lt;/p&gt;

&lt;p&gt;Choose the task mode:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Markdown () → generates a structured Markdown transcription.&lt;/li&gt;
&lt;li&gt;OCR () → extracts text with bounding boxes overlaid on the uploaded image.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model will process the image and show results below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In Markdown Mode → you’ll see neatly formatted text output.&lt;/li&gt;
&lt;li&gt;In OCR Mode → the uploaded image will be re-rendered with red bounding boxes drawn around detected text regions, along with extracted text output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tip: If you see a warning about use_column_width being deprecated, you can safely ignore it — it’s a Streamlit UI message and doesn’t affect the model’s output.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focs9l3xvuvki7tu7jy81.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focs9l3xvuvki7tu7jy81.png" alt=" " width="640" height="590"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rava6bpalnc4j2o893b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rava6bpalnc4j2o893b.png" alt=" " width="640" height="591"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 21: View OCR Results
&lt;/h3&gt;

&lt;p&gt;Switch the task selector to OCR ().&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This tells Kosmos-2.5 to extract text + bounding box coordinates instead of Markdown.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After uploading the image (e.g., receipt_00008.png), the model will process it and return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Annotated Image → your uploaded image will now display with red bounding boxes drawn around detected text areas.&lt;/li&gt;
&lt;li&gt;OCR Text Output → the recognized text lines will appear below the image (or in a text box), showing exactly what was extracted from each bounding box.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use this mode when you need precise localization of text in documents (e.g., invoices, receipts, forms).&lt;/p&gt;

&lt;p&gt;Tip: If you want to save the annotated output, check the next step (Step 22) where we’ll enable download options for both the Markdown text and the OCR image.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29leqenm9awmbcyus7va.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29leqenm9awmbcyus7va.png" alt=" " width="640" height="549"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixm5sromrufg8vykaw49.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixm5sromrufg8vykaw49.png" alt=" " width="640" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsiwz9p6ucgklgx7afe1q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsiwz9p6ucgklgx7afe1q.png" alt=" " width="640" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Kosmos-2.5 makes working with text-heavy images simple — whether you need clean Markdown transcriptions or OCR with bounding boxes. By setting it up on a GPU-powered NodeShift VM and integrating it with a Streamlit WebUI, you now have an efficient, browser-based workflow for document understanding at scale.&lt;/p&gt;

</description>
      <category>microsoft</category>
      <category>ai</category>
      <category>llm</category>
      <category>kosmos</category>
    </item>
    <item>
      <title>Generate Expressive, Long Form Multi-Speaker Audios &amp; Podcasts with Microsoft's VibeVoice</title>
      <dc:creator>Aditi Bindal</dc:creator>
      <pubDate>Wed, 03 Sep 2025 16:48:39 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/generate-expressive-long-form-multi-speaker-audios-podcasts-with-microsofts-vibevoice-4hfp</link>
      <guid>https://dev.to/nodeshiftcloud/generate-expressive-long-form-multi-speaker-audios-podcasts-with-microsofts-vibevoice-4hfp</guid>
      <description>&lt;p&gt;If you're looking for an open-source text-to-speech system that can generate podcasts, audiobooks, or multi-speaker conversations that actually sound real, Microsoft’s VibeVoice is a model you’ll want to try. Unlike traditional TTS systems that often feel robotic, inconsistent, or restricted to short clips, VibeVoice is designed from the ground up to produce expressive, long-form, multi-speaker audio with remarkable naturalness and flow. It can synthesize speech lasting up to 90 minutes and seamlessly handle up to four distinct speakers, an impressive upgrade over most existing models that struggle to maintain quality beyond a few minutes or across more than two voices. What makes this possible is its continuous speech tokenizers (acoustic and semantic) that operate at a very low frame rate (7.5 Hz), preserving audio richness while drastically reducing computation. On top of this, the model uses a next-token diffusion framework, powered by a Qwen2.5-based LLM, to understand dialogue context and generate nuanced turn-taking, while a lightweight diffusion head ensures high-fidelity acoustic detail. The result: smooth, consistent, and lifelike conversations that feel like they were recorded, not generated.&lt;/p&gt;

&lt;p&gt;In this guide, we have covered a simple and step-by-step walkthrough of how to get this model up and running locally or in GPU-accelerated environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;The minimum system requirements for running this model are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPU: 1x RTX 4090 or 1x RTX A6000&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage: 50GB (preferable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VRAM: at least 16GB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda installed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step-by-step process to install and run VibeVoice
&lt;/h2&gt;

&lt;p&gt;For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by &lt;a href="https://nodeshift.com" rel="noopener noreferrer"&gt;NodeShift&lt;/a&gt; since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Setting up a NodeShift Account
&lt;/h3&gt;

&lt;p&gt;Visit &lt;a href="https://app.nodeshift.com/sign-up" rel="noopener noreferrer"&gt;app.nodeshift.com&lt;/a&gt; and create an account by filling in basic details, or continue signing up with your Google/GitHub account.&lt;/p&gt;

&lt;p&gt;If you already have an account, &lt;a href="http://app.nodeshift.com" rel="noopener noreferrer"&gt;login&lt;/a&gt; straight to your dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" alt="Image-step1-1" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create a GPU Node
&lt;/h3&gt;

&lt;p&gt;After accessing your account, you should see a dashboard (see image), now:&lt;/p&gt;

&lt;p&gt;1) Navigate to the menu on the left side.&lt;/p&gt;

&lt;p&gt;2) Click on the &lt;strong&gt;GPU Nodes&lt;/strong&gt; option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" alt="Image-step2-1" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Click on &lt;strong&gt;Start&lt;/strong&gt; to start creating your very first GPU node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" alt="Image-step2-2" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Selecting configuration for GPU (model, region, storage)
&lt;/h3&gt;

&lt;p&gt;1) For this tutorial, we’ll be using 1x A100 SXM4 GPU, however, you can choose any GPU as per the prerequisites.&lt;/p&gt;

&lt;p&gt;2) Similarly, we’ll opt for 100GB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dbmr98w4nv70agy6bq7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dbmr98w4nv70agy6bq7.png" alt="Image-step3-1" width="800" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Choose GPU Configuration and Authentication method
&lt;/h3&gt;

&lt;p&gt;1) After selecting your required configuration options, you’ll see the available GPU nodes in your region and according to (or very close to) your configuration. In our case, we’ll choose a 1x RTXA6000 GPU node with 64vCPUs/63GB RAM/200GB SSD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs3xs9imbvtkd43jv002.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs3xs9imbvtkd43jv002.png" alt="Image-step4-1" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Next, you'll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" alt="Image-step4-2" width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;The final step is to choose an image for the VM, which in our case is &lt;strong&gt;Nvidia Cuda&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" alt="Image-step5-1" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running a CUDA dependent application like VibeVoice, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.&lt;/p&gt;

&lt;p&gt;We chose the following image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This image is essential because it includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full CUDA toolkit (including nvcc)&lt;/li&gt;
&lt;li&gt;Proper support for building and running GPU-based applications&lt;/li&gt;
&lt;li&gt;Compatibility with CUDA 12.1.1 required by certain model operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Launch Mode
&lt;/h4&gt;

&lt;p&gt;We selected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Interactive shell server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frhol4bqjn2f6zi2sv4ab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frhol4bqjn2f6zi2sv4ab.png" alt="Image-step5-2" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Docker Repository Authentication
&lt;/h4&gt;

&lt;p&gt;We left all fields &lt;strong&gt;empty&lt;/strong&gt; here.&lt;/p&gt;

&lt;p&gt;Since the Docker image is publicly available on Docker Hub, no login credentials are required.&lt;/p&gt;

&lt;h4&gt;
  
  
  Identification
&lt;/h4&gt;

&lt;p&gt;Template Name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwt7p0dw5c44p7u3axrs9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwt7p0dw5c44p7u3axrs9.png" alt="Image-step5-3" width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click &lt;strong&gt;Create&lt;/strong&gt; to deploy the node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" alt="Image-step5-4" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbg0gd8w65m7lkncav5p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbg0gd8w65m7lkncav5p.png" alt="Image-step5-5" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Connect to active Compute Node using SSH
&lt;/h3&gt;

&lt;p&gt;1) As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status &lt;strong&gt;Running&lt;/strong&gt; in green, meaning that our Compute node is ready to use!&lt;/p&gt;

&lt;p&gt;2) Once your GPU shows this status, navigate to the three dots on the right, click on &lt;strong&gt;Connect with SSH&lt;/strong&gt;, and copy the SSH details that appear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqwto9145nkb27vd3d71.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqwto9145nkb27vd3d71.png" alt="Image-step6-1" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you copy the details, follow the below steps to connect to the running GPU VM via SSH:&lt;/p&gt;

&lt;p&gt;1) Open your terminal, paste the SSH command, and run it.&lt;/p&gt;

&lt;p&gt;2) In some cases, your terminal may take your consent before connecting. Enter ‘yes’.&lt;/p&gt;

&lt;p&gt;3) A prompt will request a password. Type the SSH password, and you should be connected.&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" alt="Image-step6-2" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the following command in the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Set up the project environment with dependencies
&lt;/h3&gt;

&lt;p&gt;1) Create a virtual environment using &lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda create -n vibe python=3.11 -y &amp;amp;&amp;amp; conda activate vibe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa44mazey6p4kary0n39p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa44mazey6p4kary0n39p.png" alt="Image-step7-1" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Clone the official repository and move inside the project directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/microsoft/VibeVoice.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguwl3vseqa9vadrmvjkg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguwl3vseqa9vadrmvjkg.png" alt="Image-step7-2" width="800" height="153"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Install required dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -e .
pip install flash-attn --no-build-isolation
apt update &amp;amp;&amp;amp; apt install ffmpeg -y
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyux491lk91d70ve2hqir.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyux491lk91d70ve2hqir.png" alt="Image-step7-3" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;4) Launch the Gradio demo. This will automatically download the model checkpoints as well.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python demo/gradio_demo.py --model_path microsoft/VibeVoice-1.5B --share
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fib4ai6y8vtdm1v9nfz9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fib4ai6y8vtdm1v9nfz9u.png" alt="Image-step7-4" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhf2qf7pmjiw0eu590fy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhf2qf7pmjiw0eu590fy.png" alt="Image-step7-5" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;5) If you're on a remote machine (e.g., NodeShift GPU), you'll need to do SSH port forwarding in order to access the Gradio session on your local browser.&lt;/p&gt;

&lt;p&gt;Run the following command in your local terminal after replacing:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_PORT&amp;gt;&lt;/code&gt; with the PORT allotted to your remote server (For the NodeShift server - you can find it in the deployed GPU details on the dashboard).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;PATH_TO_SSH_KEY&amp;gt;&lt;/code&gt; with the path to the location where your SSH key is stored.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_IP&amp;gt;&lt;/code&gt; with the IP address of your remote server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -L 7860:localhost:7860 -p &amp;lt;YOUR_SERVER_PORT&amp;gt; -i &amp;lt;PATH_TO_SSH_KEY&amp;gt; root@&amp;lt;YOUR_SERVER_IP&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this copy the URL you received in your remote server: &lt;a href="http://0.0.0.0:7860" rel="noopener noreferrer"&gt;http://0.0.0.0:7860&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And paste this on your local browser to access the Gradio session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Run the model
&lt;/h3&gt;

&lt;p&gt;1) Once you access the Gradio interface, it will look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv99jk5854w87o84sfqiw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv99jk5854w87o84sfqiw.png" alt="Image-step8-1" width="800" height="484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Generate podcast from script.&lt;/p&gt;

&lt;p&gt;Put any script of your chocie, we’re using one of the &lt;a href="https://github.com/microsoft/VibeVoice/blob/main/demo/text_examples/4p_climate_45min.txt" rel="noopener noreferrer"&gt;example scripts&lt;/a&gt; given in the official repo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffgm3ovx013se97c9o6nl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffgm3ovx013se97c9o6nl.png" alt="Image-step8-2" width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) The app has started to stream generated podcast audio in real-time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pit993yejnqhe0y45kj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pit993yejnqhe0y45kj.png" alt="Image-step8-3" width="800" height="690"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;VibeVoice stands out as a groundbreaking open-source TTS framework that combines continuous speech tokenizers, a Qwen2.5-powered LLM, and a diffusion head to deliver expressive, long-form, multi-speaker audio that feels astonishingly real. Its ability to generate up to 90 minutes of consistent, multi-voice speech makes it a powerful tool for creators and researchers alike. And while running it locally is a great way to get started, NodeShift makes the experience even smoother by providing GPU-accelerated environments, simplified deployment, and scalability out of the box, so you can focus on exploring and scaling with the model’s capabilities without worrying about complex infrastructure setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For more information about NodeShift:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/company/nodeshift/?%0Aref=blog.nodeshift.com" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/nodeshiftai?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/4dHNxnW7p7?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://app.daily.dev/nodeshift?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;daily.dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>microsoft</category>
      <category>podcast</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>A Step-by-Step Guide to Install DeepSeek V3.1</title>
      <dc:creator>Aditi Bindal</dc:creator>
      <pubDate>Mon, 01 Sep 2025 18:21:46 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/a-step-by-step-guide-to-install-deepseek-v31-3dgg</link>
      <guid>https://dev.to/nodeshiftcloud/a-step-by-step-guide-to-install-deepseek-v31-3dgg</guid>
      <description>&lt;p&gt;DeepSeek has once again pushed the boundaries of what’s possible in open-source AI with the release of DeepSeek-V3.1, a next-generation hybrid model that seamlessly supports both thinking and non-thinking modes. Building on the foundation of its powerful V3 base checkpoint, this version introduces smarter tool calling, faster reasoning efficiency, and a more versatile chat template design that adapts effortlessly to different use cases. Its post-training optimization dramatically boosts performance in agent tasks and tool usage, making it a strong choice for developers working on automation, research assistance, and coding agents. Moreover, the model’s ability to process extended contexts has been expanded through a two-phase long context extension approach: a massive 10x increase in the 32K token phase to 630B tokens and a 3.3x increase in the 128K token phase to 209B tokens. Combined with training on the cutting-edge UE8M0 FP8 data format, DeepSeek-V3.1 not only ensures efficiency and scalability but also guarantees compatibility with modern microscaling data pipelines.&lt;/p&gt;

&lt;p&gt;Deploying a model of this caliber locally might seem daunting at first due to its substantial 671 billion parameters. However, Unsloth has made it entirely feasible. Unsloth has used selective quantization techniques to reduce the model's size without any significant loss of accuracy by targeting specific layers, such as the Mixture-of-Experts (MoE) layers, while preserving the precision of attention and other critical layers.&lt;/p&gt;

&lt;p&gt;In the following guide, we'll walk you through the step-by-step process of installing and running DeepSeek-V3.1 locally using LLaMA.cpp and Unsloth's dynamic quants, ensuring you can access its full potential efficiently and effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;The system requirements for running DeepSeek-V3.1 are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPU: Multiple H100s or H200s (count may vary across different bits)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage: 1TB+ (preferable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nvidia Cuda installed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda installed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Disk Space requirements depending on the type of model are as follows:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rtfv01ia0jnqggux4nb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7rtfv01ia0jnqggux4nb.png" alt="Image-prequisites" width="800" height="426"&gt;&lt;/a&gt;&lt;br&gt;
Source: Unsloth&lt;/p&gt;

&lt;p&gt;We recommend you to take a screenshot of this chart and save it somewhere to quickly look up to the disk space prerequisites before trying a specific bit quantized version.&lt;/p&gt;

&lt;p&gt;For this article, we’ll download the 2.71-bit version (recommended).&lt;/p&gt;
&lt;h2&gt;
  
  
  Step-by-step process to install DeepSeek-V3.1 Locally
&lt;/h2&gt;

&lt;p&gt;For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by NodeShift since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Setting up a NodeShift Account
&lt;/h3&gt;

&lt;p&gt;Visit &lt;a href="https://app.nodeshift.com/sign-up" rel="noopener noreferrer"&gt;app.nodeshift.com&lt;/a&gt; and create an account by filling in basic details, or continue signing up with your Google/GitHub account.&lt;/p&gt;

&lt;p&gt;If you already have an account, &lt;a href="http://app.nodeshift.com" rel="noopener noreferrer"&gt;login&lt;/a&gt; straight to your dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" alt="Image-step1-1" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Create a GPU Node
&lt;/h3&gt;

&lt;p&gt;After accessing your account, you should see a dashboard (see image), now:&lt;/p&gt;

&lt;p&gt;1) Navigate to the menu on the left side.&lt;/p&gt;

&lt;p&gt;2) Click on the &lt;strong&gt;GPU Nodes&lt;/strong&gt; option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" alt="Image-step2-1" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Click on &lt;strong&gt;Start&lt;/strong&gt; to start creating your very first GPU node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" alt="Image-step2-2" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: Selecting configuration for GPU (model, region, storage)
&lt;/h3&gt;

&lt;p&gt;1) For this tutorial, we’ll be using 1x H200 GPU, however, you can choose any GPU as per the prerequisites.&lt;/p&gt;

&lt;p&gt;2) Similarly, we’ll opt for 200 GB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78ex5301m0jmnmar6gbe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78ex5301m0jmnmar6gbe.png" alt="Image-step3-1" width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: Choose GPU Configuration and Authentication method
&lt;/h3&gt;

&lt;p&gt;1) After selecting your required configuration options, you’ll see the available GPU nodes in your region and according to (or very close to) your configuration. In our case, we’ll choose a 1x H100 SXM 80GB GPU node with 192vCPUs/80GB RAM/200GB SSD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffs8a0r7ibcqiyeu9v2oy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffs8a0r7ibcqiyeu9v2oy.png" alt="Image-step4-1" width="800" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Next, you'll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" alt="Image-step4-2" width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;The final step is to choose an image for the VM, which in our case is &lt;strong&gt;Nvidia Cuda&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" alt="Image-step5-1" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click &lt;strong&gt;Create&lt;/strong&gt; to deploy the node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" alt="Image-step5-2" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk810i78g0piq7z2jxu8j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk810i78g0piq7z2jxu8j.png" alt="Image-step5-3" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 6: Connect to active Compute Node using SSH
&lt;/h3&gt;

&lt;p&gt;1) As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status &lt;strong&gt;Running&lt;/strong&gt; in green, meaning that our Compute node is ready to use!&lt;/p&gt;

&lt;p&gt;2) Once your GPU shows this status, navigate to the three dots on the right, click on &lt;strong&gt;Connect with SSH&lt;/strong&gt;, and copy the SSH details that appear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8bldddlk57tshry6ngf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8bldddlk57tshry6ngf.png" alt="Image-step6-1" width="800" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you copy the details, follow the below steps to connect to the running GPU VM via SSH:&lt;/p&gt;

&lt;p&gt;1) Open your terminal, paste the SSH command, and run it.&lt;/p&gt;

&lt;p&gt;2) In some cases, your terminal may take your consent before connecting. Enter ‘yes’.&lt;/p&gt;

&lt;p&gt;3) A prompt will request a password. Type the SSH password, and you should be connected.&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" alt="Image-step6-2" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the following command in the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Install and build LLaMA.cpp
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;llama.cpp&lt;/code&gt; is a C++ library for running LLaMA and other large language models efficiently on GPUs, CPUs and edge devices.&lt;/p&gt;

&lt;p&gt;We’ll first install &lt;code&gt;llama.cpp&lt;/code&gt; as we’ll use it to install and run DeepSeek-V3-0324.&lt;/p&gt;

&lt;p&gt;1) Start by creating a virtual environment using &lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda create -n deepseek python=3.11 -y &amp;amp;&amp;amp; conda activate deepseek
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hwf3q01mgtlwhxavyue.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hwf3q01mgtlwhxavyue.png" alt="Image-step7-1" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Once inside the environment, update the Ubuntu package source-list for fetching the latest repository updates and patches.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
apt-get update&lt;br&gt;
&lt;/code&gt;`&lt;/p&gt;

&lt;p&gt;3) Install dependencies for llama.cpp.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgotwcz9o08de7837bj33.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgotwcz9o08de7837bj33.png" alt="Image-step7-2" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;4) Clone the official repository of llama.cpp.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
git clone https://github.com/ggml-org/llama.cpp&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frpg8mmhvabkqtol1ldex.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frpg8mmhvabkqtol1ldex.png" alt="Image-step7-3" width="800" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;5) Compile &lt;code&gt;llama.cpp&lt;/code&gt;‘s build files.&lt;/p&gt;

&lt;p&gt;In the below command, keep &lt;code&gt;-DGGML_CUDA=OFF&lt;/code&gt; if you’re running it on a non-GPU system. However, it’s recommended to keep it OFF, even if you’re on a GPU-based system, as it will allow llama.cpp’s compilation process to occur through CPU, which is faster in this case as compared to GPU-based compilation. In addition to being slow, compiling llama.cpp through GPU can sometimes throw unwanted errors.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
cmake llama.cpp -B llama.cpp/build \&lt;br&gt;
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=OFF -DLLAMA_CURL=ON&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpobvl0sy0x4vp1ivg5h1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpobvl0sy0x4vp1ivg5h1.png" alt="Image-step7-4" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;6) Build &lt;code&gt;llama.cpp&lt;/code&gt; from the build directory.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fym4abt71d49jjhwb13s6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fym4abt71d49jjhwb13s6.png" alt="Image-step7-5" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;7) Finally, we’ll copy all the executables from llama.cpp/build/bin/ that start with llama- into the llama.cpp directory.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
cp llama.cpp/build/bin/llama-* llama.cpp&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Download the Model Files
&lt;/h3&gt;

&lt;p&gt;We’ll download the model files from Hugging Face using a Python script.&lt;/p&gt;

&lt;p&gt;1) To do that, let’s first install the Hugging Face Python packages.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
pip install huggingface_hub hf_transfer&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;huggingface_hub&lt;/code&gt;&lt;/strong&gt; – Provides an interface to interact with the Hugging Face Hub, allowing you to download, upload, and manage models, datasets, and other resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;hf_transfer&lt;/code&gt;&lt;/strong&gt; – A tool optimized for faster uploads and downloads of large files (e.g., LLaMA, DeepSeek models) from the Hugging Face Hub using a more efficient transfer protocol.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgdd0nqckkule24fogxo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgdd0nqckkule24fogxo.png" alt="Image-step8-1" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Run the model installation script with Python.&lt;/p&gt;

&lt;p&gt;The script below will download all the specifical quant’s checkpoints from &lt;a href="https://huggingface.co/unsloth/DeepSeek-V3.1" rel="noopener noreferrer"&gt;unsloth/DeepSeek-V3.1&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;`&lt;br&gt;
python -c "import os; os.environ['HF_HUB_ENABLE_HF_TRANSFER']='0'; from huggingface_hub import snapshot_download; snapshot_download(repo_id='unsloth/DeepSeek-V3.1-GGUF', local_dir='unsloth/DeepSeek-V3.1-GGUF', allow_patterns=['*UD-Q2_K_XL*'])"&lt;br&gt;
`&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8jthrsf9njkvdi9zyb3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8jthrsf9njkvdi9zyb3.png" alt="Image-step8-2" width="800" height="242"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Depending on your GPU configuration, the download process can be slow and take some time. The installation might also seem stuck at some points, which is normal, so do not interrupt or kill the installation in between.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 9: Run the model for Inference
&lt;/h3&gt;

&lt;p&gt;Finally, once all checkpoints are downloaded, we can proceed to the inference part.&lt;/p&gt;

&lt;p&gt;In the below command, we’ll run the model with a prompt given inside a formatted template which will be run through LLaMA.cpp’s LLaMA-CLI tool. The prompt will ask the model to create a Flappy Bird game in Python with all the interface, logic, and controls.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;`&lt;br&gt;
./llama.cpp/llama-cli \&lt;br&gt;
    --model unsloth/DeepSeek-V3.1-GGUF/UD-Q2_K_XL/DeepSeek-V3.1-UD-Q2_K_XL-00001-of-00006.gguf \&lt;br&gt;
    --cache-type-k q4_0 \&lt;br&gt;
    --threads -1 \&lt;br&gt;
    --n-gpu-layers 99 \&lt;br&gt;
    --prio 3 \&lt;br&gt;
    --temp 0.6 \&lt;br&gt;
    --top_p 0.95 \&lt;br&gt;
    --min_p 0.01 \&lt;br&gt;
    --ctx-size 16384 \&lt;br&gt;
    --seed 3407 \&lt;br&gt;
    -ot ".ffn_.*_exps.=CPU" \&lt;br&gt;
    -no-cnv \&lt;br&gt;
    --prompt "&amp;lt;｜User｜&amp;gt;Create a Flappy Bird game in Python. You must include these things:\n1. You must use pygame.\n2. The background color should be randomly chosen and is a light shade. Start with a light blue color.\n3. Pressing SPACE multiple times will accelerate the bird.\n4. The bird's shape should be randomly chosen as a square, circle or triangle. The color should be randomly chosen as a dark color.\n5. Place on the bottom some land colored as dark brown or yellow chosen randomly.\n6. Make a score shown on the top right side. Increment if you pass pipes and don't hit them.\n7. Make randomly spaced pipes with enough space. Color them randomly as dark green or light brown or a dark gray shade.\n8. When you lose, show the best score. Make the text inside the screen. Pressing q or Esc will quit the game. Restarting is pressing SPACE again.\nThe final game should be inside a markdown section in Python. Check your code for errors and fix them before the final markdown section.&amp;lt;｜Assistant｜&amp;gt;"&lt;br&gt;
`&lt;/code&gt;&lt;br&gt;
Output:&lt;/p&gt;

&lt;p&gt;The model has started generating the code as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flb15xhpd0wel4j1kt2mx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flb15xhpd0wel4j1kt2mx.png" alt="Image-step9-1" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the process is complete, it may end the output like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ma3y5j4w9b5fxm6nh1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ma3y5j4w9b5fxm6nh1r.png" alt="Image-step9-2" width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we run the code for the Flappy Bird game generated by DeepSeek-V3.1 through VSCode Editor, it opens a game panel as shown below (Note: Install pygame in your editor before running the code):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvduyozxu1wf5th8n9ak.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvduyozxu1wf5th8n9ak.png" alt=" " width="655" height="1024"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see the live demonstration of the game in the video attached on the original article &lt;a href="https://nodeshift.cloud/blog/a-step-by-step-guide-to-install-deepseek-v3-1?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=content_share" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this guide, we explored how DeepSeek-V3.1 elevates open-source AI with its hybrid thinking modes, smarter tool calling, faster reasoning, and extended long-context capabilities, all supported by efficient training techniques like FP8 scaling and Unsloth’s dynamic quantization. While deploying such a massive model locally with LLaMA.cpp is now more accessible, it still demands considerable compute resources. This is where NodeShip Cloud steps in, offering a seamless alternative with scalable, cost-effective GPU and compute infrastructure. By offloading deployment to NodeShip’s intuitive cloud platform, developers can unlock the full potential of DeepSeek-V3.1 without the burden of managing heavy local infrastructure, making experimentation, scaling, and production use both faster and simpler.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For more information about NodeShift:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/company/nodeshift/?%0Aref=blog.nodeshift.com" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/nodeshiftai?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/4dHNxnW7p7?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://app.daily.dev/nodeshift?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;daily.dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>deepseek</category>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>A Complete Setup Guide to Powerful AI Image Editing with Qwen-Image-Edit</title>
      <dc:creator>Aditi Bindal</dc:creator>
      <pubDate>Mon, 01 Sep 2025 16:52:19 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/a-complete-setup-guide-to-powerful-ai-image-editing-with-qwen-image-edit-3kg1</link>
      <guid>https://dev.to/nodeshiftcloud/a-complete-setup-guide-to-powerful-ai-image-editing-with-qwen-image-edit-3kg1</guid>
      <description>&lt;p&gt;Image editing has always required a delicate balance between precision and creativity, and that’s exactly what Qwen-Image-Edit delivers. Built on the robust 20B Qwen-Image model, this cutting-edge tool takes image editing to the next level by combining semantic control (powered by Qwen2.5-VL) with appearance control (via its VAE Encoder). This dual-system approach allows users to seamlessly perform both low-level edits, like adding or removing objects while keeping the rest of the image untouched, and high-level transformations, such as rotating objects, transferring artistic styles, or even creating new concepts entirely. What truly sets Qwen-Image-Edit apart, however, is its precise text editing capability, enabling direct modification of text in English and Chinese while preserving the original font, size, and style. &lt;/p&gt;

&lt;p&gt;If you’re looking for an image editing model that’s powerful, versatile, and incredibly easy to use, Qwen-Image-Edit is a must-try. Let's see how to get it up and running on your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;The minimum system requirements for running this model are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPU: 1x H100&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage: 50 GB (preferable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VRAM: at least 64 GB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda installed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step-by-step process to install and run Qwen Image Edit
&lt;/h2&gt;

&lt;p&gt;For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by &lt;a href="https://nodeshift.com" rel="noopener noreferrer"&gt;NodeShift&lt;/a&gt; since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Setting up a NodeShift Account
&lt;/h3&gt;

&lt;p&gt;Visit &lt;a href="https://app.nodeshift.com/sign-up" rel="noopener noreferrer"&gt;app.nodeshift.com&lt;/a&gt; and create an account by filling in basic details, or continue signing up with your Google/GitHub account.&lt;/p&gt;

&lt;p&gt;If you already have an account, &lt;a href="http://app.nodeshift.com" rel="noopener noreferrer"&gt;login&lt;/a&gt; straight to your dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" alt="Image-step1-1" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create a GPU Node
&lt;/h3&gt;

&lt;p&gt;After accessing your account, you should see a dashboard (see image), now:&lt;/p&gt;

&lt;p&gt;1) Navigate to the menu on the left side.&lt;/p&gt;

&lt;p&gt;2) Click on the &lt;strong&gt;GPU Nodes&lt;/strong&gt; option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" alt="Image-step2-1" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Click on &lt;strong&gt;Start&lt;/strong&gt; to start creating your very first GPU node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" alt="Image-step2-2" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Selecting configuration for GPU (model, region, storage)
&lt;/h3&gt;

&lt;p&gt;1) For this tutorial, we’ll be using 1x H100 GPU, however, you can choose any GPU as per the prerequisites.&lt;/p&gt;

&lt;p&gt;2) Similarly, we’ll opt for 200 GB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nwqblqu9dtn5vbnnpvm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nwqblqu9dtn5vbnnpvm.png" alt="Image-step3-1" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Choose GPU Configuration and Authentication method
&lt;/h3&gt;

&lt;p&gt;1) After selecting your required configuration options, you’ll see the available GPU nodes in your region and according to (or very close to) your configuration. In our case, we’ll choose a 1x H100 SXM 80GB GPU node with 192vCPUs/80GB RAM/200GB SSD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F33ctuz0kf0n28kilc7zj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F33ctuz0kf0n28kilc7zj.png" alt="Image-step4-1" width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Next, you'll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" alt="Image-step4-2" width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;The final step is to choose an image for the VM, which in our case is &lt;strong&gt;Nvidia Cuda&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" alt="Image-step5-1" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click &lt;strong&gt;Create&lt;/strong&gt; to deploy the node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" alt="Image-step5-2" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filngygf82xfgk2o7lyxv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filngygf82xfgk2o7lyxv.png" alt="Image-step5-3" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Connect to active Compute Node using SSH
&lt;/h3&gt;

&lt;p&gt;1) As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status &lt;strong&gt;Running&lt;/strong&gt; in green, meaning that our Compute node is ready to use!&lt;/p&gt;

&lt;p&gt;2) Once your GPU shows this status, navigate to the three dots on the right, click on &lt;strong&gt;Connect with SSH&lt;/strong&gt;, and copy the SSH details that appear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqckok7vzis7m6g0pecxw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqckok7vzis7m6g0pecxw.png" alt="Image-step6-1" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you copy the details, follow the below steps to connect to the running GPU VM via SSH:&lt;/p&gt;

&lt;p&gt;1) Open your terminal, paste the SSH command, and run it.&lt;/p&gt;

&lt;p&gt;2) In some cases, your terminal may take your consent before connecting. Enter ‘yes’.&lt;/p&gt;

&lt;p&gt;3) A prompt will request a password. Type the SSH password, and you should be connected.&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" alt="Image-step6-2" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the following command in the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Set up the project environment with dependencies
&lt;/h3&gt;

&lt;p&gt;1) Create a virtual environment using &lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda create -n qwen python=3.11 -y &amp;amp;&amp;amp; conda activate qwen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffytezgl8rh29jhqc9dk8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffytezgl8rh29jhqc9dk8.png" alt="Image-step7-1" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Install required dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install git+https://github.com/huggingface/diffusers
pip install transformers accelerate gradio pillow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fluaqs2mc3pgm6b65qusv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fluaqs2mc3pgm6b65qusv.png" alt="Image-step7-2" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Install and run jupyter notebook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda install -c conda-forge --override-channels notebook -y
conda install -c conda-forge --override-channels ipywidgets -y
jupyter notebook --allow-root
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4) If you’re on a remote machine (e.g., NodeShift GPU), you’ll need to do SSH port forwarding in order to access the jupyter notebook session on your local browser.&lt;/p&gt;

&lt;p&gt;Run the following command in your local terminal after replacing:&lt;/p&gt;

&lt;p&gt;` with the PORT allotted to your remote server (For the NodeShift server – you can find it in the deployed GPU details on the dashboard).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;PATH_TO_SSH_KEY&amp;gt;&lt;/code&gt; with the path to the location where your SSH key is stored.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_IP&amp;gt;&lt;/code&gt; with the IP address of your remote server.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
ssh -L 8888:localhost:8888 -p &amp;lt;YOUR_SERVER_PORT&amp;gt; -i &amp;lt;PATH_TO_SSH_KEY&amp;gt; root@&amp;lt;YOUR_SERVER_IP&amp;gt;&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r8j1owgltt9aq3dt6yq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r8j1owgltt9aq3dt6yq.png" alt="Image-step7-3" width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After this copy the URL you received in your remote server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfvxbys411rncwon4x0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfvxbys411rncwon4x0s.png" alt="Image-step7-4" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And paste this on your local browser to access the Jupyter Notebook session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Download and Run the model
&lt;/h3&gt;

&lt;p&gt;1) Open a Python notebook inside Jupyter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tiohhqac3l7fo2zgy0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tiohhqac3l7fo2zgy0n.png" alt="Image-step8-1" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Download the model checkpoints.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
import os&lt;br&gt;
from PIL import Image&lt;br&gt;
import torch&lt;/p&gt;

&lt;p&gt;from diffusers import QwenImageEditPipeline&lt;/p&gt;

&lt;p&gt;pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit")&lt;br&gt;
print("pipeline loaded")&lt;br&gt;
pipeline.to(torch.bfloat16)&lt;br&gt;
pipeline.to("cuda")&lt;br&gt;
pipeline.set_progress_bar_config(disable=None)&lt;br&gt;
image = Image.open("./cat.jpg").convert("RGB")&lt;br&gt;
prompt = "Add a pillow under cat's head and cover it with a blanket."&lt;br&gt;
inputs = {&lt;br&gt;
    "image": image,&lt;br&gt;
    "prompt": prompt,&lt;br&gt;
    "generator": torch.manual_seed(0),&lt;br&gt;
    "true_cfg_scale": 4.0,&lt;br&gt;
    "negative_prompt": " ",&lt;br&gt;
    "num_inference_steps": 50,&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;with torch.inference_mode():&lt;br&gt;
    output = pipeline(**inputs)&lt;br&gt;
    output_image = output.images[0]&lt;br&gt;
    output_image.save("output_image_edit.png")&lt;br&gt;
    print("image saved at", os.path.abspath("output_image_edit.png"))&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F021cu0h7kfedcuvtv2pe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F021cu0h7kfedcuvtv2pe.png" alt="Image-step8-2" width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Original Image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8o8vgqmw74kjunq1i7n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8o8vgqmw74kjunq1i7n.png" alt="Image-step8-3" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Edited Image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk26adm8z6p71vx7tn68a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk26adm8z6p71vx7tn68a.png" alt="Image-step8-4" width="786" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Qwen-Image-Edit stands out as a next-generation image editing model, seamlessly blending semantic intelligence with appearance precision to enable everything from subtle object adjustments to bold creative transformations, all while offering unmatched text editing capabilities. By running it on Nodeshift Cloud, you gain a frictionless way to harness this power, eliminating complex setup hurdles and ensuring a smooth, scalable environment for experimentation. Together, Qwen-Image-Edit and Nodeshift Cloud make advanced image editing not just possible, but practical and accessible for creators, developers, and enterprises alike.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For more information about NodeShift:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/company/nodeshift/?%0Aref=blog.nodeshift.com" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/nodeshiftai?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/4dHNxnW7p7?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://app.daily.dev/nodeshift?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;daily.dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>qwen</category>
      <category>opensource</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Get Started with MiniCPM-v4: The Next-Gen Multimodal AI Model by OpenBMB</title>
      <dc:creator>Aditi Bindal</dc:creator>
      <pubDate>Mon, 01 Sep 2025 16:24:49 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/get-started-with-minicpm-v4-the-next-gen-multimodal-ai-model-by-openbmb-4occ</link>
      <guid>https://dev.to/nodeshiftcloud/get-started-with-minicpm-v4-the-next-gen-multimodal-ai-model-by-openbmb-4occ</guid>
      <description>&lt;p&gt;Multimodal AI is rapidly evolving, MiniCPM-V 4.0 by OpenBMB emerges as a game-changer, combining cutting-edge visual understanding with unprecedented efficiency. Built on SigLIP2-400M and MiniCPM4-3B, this compact yet powerful model packs 4.1B parameters, but consistently punches above its weight. It not only inherits the strong single-image, multi-image, and video comprehension capabilities of its predecessor (MiniCPM-V 2.6), but also surpasses it with remarkable efficiency. Benchmark results on OpenCompass demonstrate this leap. MiniCPM-V 4.0 achieves a 69.0 average score, outperforming models like GPT-4.1-mini-20250414, MiniCPM-V 2.6 (8.1B), and Qwen2.5-VL-3B-Instruct, proving that smaller can indeed be smarter. What makes it even more exciting is its real-world usability: the model runs seamlessly on end devices, delivering under 2s first-token delay and over 17 tokens/s decoding on iPhone 16 Pro Max, all without heating issues, making on-device multimodal AI finally practical. With easy integration across frameworks like llama.cpp, Ollama, vLLM, SGLang, LLaMA-Factory, and even a native iOS app, MiniCPM-V 4.0 isn’t just another AI model, it’s a versatile, efficient, and deployment-ready multimodal powerhouse.&lt;/p&gt;

&lt;p&gt;In this article, we're going to see a step-by-step process to install and run this model locally or in GPU-accelerated environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;The minimum system requirements for running this model are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPU: 1x RTX 4090 or 1x RTX A6000&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage: 50GB (preferable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VRAM: at least 16GB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda installed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step-by-step process to install and run MiniCPM-v4
&lt;/h2&gt;

&lt;p&gt;For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by &lt;a href="https://nodeshift.com" rel="noopener noreferrer"&gt;NodeShift&lt;/a&gt; since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Setting up a NodeShift Account
&lt;/h3&gt;

&lt;p&gt;Visit &lt;a href="https://app.nodeshift.com/sign-up" rel="noopener noreferrer"&gt;app.nodeshift.com&lt;/a&gt; and create an account by filling in basic details, or continue signing up with your Google/GitHub account.&lt;/p&gt;

&lt;p&gt;If you already have an account, &lt;a href="http://app.nodeshift.com" rel="noopener noreferrer"&gt;login&lt;/a&gt; straight to your dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" alt="Image-step1-1" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create a GPU Node
&lt;/h3&gt;

&lt;p&gt;After accessing your account, you should see a dashboard (see image), now:&lt;/p&gt;

&lt;p&gt;1) Navigate to the menu on the left side.&lt;/p&gt;

&lt;p&gt;2) Click on the &lt;strong&gt;GPU Nodes&lt;/strong&gt; option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" alt="Image-step2-1" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Click on &lt;strong&gt;Start&lt;/strong&gt; to start creating your very first GPU node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" alt="Image-step2-2" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Selecting configuration for GPU (model, region, storage)
&lt;/h3&gt;

&lt;p&gt;1) For this tutorial, we’ll be using 1x A100 SXM4 GPU, however, you can choose any GPU as per the prerequisites.&lt;/p&gt;

&lt;p&gt;2) Similarly, we’ll opt for 100GB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dbmr98w4nv70agy6bq7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dbmr98w4nv70agy6bq7.png" alt="Image-step3-1" width="800" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Choose GPU Configuration and Authentication method
&lt;/h3&gt;

&lt;p&gt;1) After selecting your required configuration options, you’ll see the available GPU nodes in your region and according to (or very close to) your configuration. In our case, we’ll choose a 1x RTXA6000 GPU node with 64vCPUs/63GB RAM/200GB SSD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs3xs9imbvtkd43jv002.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs3xs9imbvtkd43jv002.png" alt="Image-step4-1" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Next, you'll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" alt="Image-step4-2" width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;The final step is to choose an image for the VM, which in our case is &lt;strong&gt;Nvidia Cuda&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" alt="Image-step5-1" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click &lt;strong&gt;Create&lt;/strong&gt; to deploy the node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" alt="Image-step5-2" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbg0gd8w65m7lkncav5p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbg0gd8w65m7lkncav5p.png" alt="Image-step5-3" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Connect to active Compute Node using SSH
&lt;/h3&gt;

&lt;p&gt;1) As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status &lt;strong&gt;Running&lt;/strong&gt; in green, meaning that our Compute node is ready to use!&lt;/p&gt;

&lt;p&gt;2) Once your GPU shows this status, navigate to the three dots on the right, click on &lt;strong&gt;Connect with SSH&lt;/strong&gt;, and copy the SSH details that appear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zxg4ejw14rh5vwg7z7e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zxg4ejw14rh5vwg7z7e.png" alt="Image-step6-1" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you copy the details, follow the below steps to connect to the running GPU VM via SSH:&lt;/p&gt;

&lt;p&gt;1) Open your terminal, paste the SSH command, and run it.&lt;/p&gt;

&lt;p&gt;2) In some cases, your terminal may take your consent before connecting. Enter ‘yes’.&lt;/p&gt;

&lt;p&gt;3) A prompt will request a password. Type the SSH password, and you should be connected.&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" alt="Image-step6-2" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the following command in the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Set up the project environment with dependencies
&lt;/h3&gt;

&lt;p&gt;1) Create a virtual environment using &lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda create -n minicpm python=3.11 -y &amp;amp;&amp;amp; conda activate minicpm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1g2ubh3ivrkyj8f4pyo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1g2ubh3ivrkyj8f4pyo.png" alt="Image-step7-1" width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Install required dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 
pip install einops timm pillow
pip install git+https://github.com/huggingface/transformers
pip install git+https://github.com/huggingface/accelerate
pip install git+https://github.com/huggingface/diffusers
pip install huggingface_hub
pip install sentencepiece bitsandbytes protobuf decord numpy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falkb0c52i63ff1t01k0d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falkb0c52i63ff1t01k0d.png" alt="Image-step7-2" width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Install and run jupyter notebook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda install -c conda-forge --override-channels notebook -y
conda install -c conda-forge --override-channels ipywidgets -y
jupyter notebook --allow-root
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4) If you're on a remote machine (e.g., NodeShift GPU), you'll need to do SSH port forwarding in order to access the jupyter notebook session on your local browser.&lt;/p&gt;

&lt;p&gt;Run the following command in your local terminal after replacing:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_PORT&amp;gt;&lt;/code&gt; with the PORT allotted to your remote server (For the NodeShift server - you can find it in the deployed GPU details on the dashboard).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;PATH_TO_SSH_KEY&amp;gt;&lt;/code&gt; with the path to the location where your SSH key is stored.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_IP&amp;gt;&lt;/code&gt; with the IP address of your remote server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -L 8888:localhost:8888 -p &amp;lt;YOUR_SERVER_PORT&amp;gt; -i &amp;lt;PATH_TO_SSH_KEY&amp;gt; root@&amp;lt;YOUR_SERVER_IP&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vq35fq4cbfchpj5wc90.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vq35fq4cbfchpj5wc90.png" alt="Image-step7-3" width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After this copy the URL you received in your remote server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyz3eexinsdfsvredbkcd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyz3eexinsdfsvredbkcd.png" alt="Image-step7-4" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And paste this on your local browser to access the Jupyter Notebook session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Download and Run the model
&lt;/h3&gt;

&lt;p&gt;1) Open a Python notebook inside Jupyter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jb500tp8lgtfj4pmh1m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jb500tp8lgtfj4pmh1m.png" alt="Image-step8-1" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Download the model checkpoints.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from PIL import Image
import torch
from transformers import AutoModel, AutoTokenizer

model_path = 'openbmb/MiniCPM-V-4'
model = AutoModel.from_pretrained(model_path, trust_remote_code=True,
                                  # sdpa or flash_attention_2, no eager
                                  attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(
    model_path, trust_remote_code=True)



image = Image.open('./landform.jpg').convert('RGB')

# First round chat 
question = "What is the landform in the picture?"
msgs = [{'role': 'user', 'content': [image, question]}]

answer = model.chat(
    msgs=msgs,
    image=image,
    tokenizer=tokenizer
)
print(answer)


# Second round chat, pass history context of multi-turn conversation
msgs.append({"role": "assistant", "content": [answer]})
msgs.append({"role": "user", "content": [
            "What should I pay attention to when traveling here?"]})

answer = model.chat(
    msgs=msgs,
    image=None,
    tokenizer=tokenizer
)
print(answer)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4wbgse6j4bjxhe2vidk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4wbgse6j4bjxhe2vidk.png" alt="Image-step8-2" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s the image we used to testing the model:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqmmtouc957t3wd0uycp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqmmtouc957t3wd0uycp.png" alt="Image-step8-3" width="200" height="300"&gt;&lt;/a&gt;&lt;br&gt;
Picsum ID: 866&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgztxszybzjktbxkxa6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgztxszybzjktbxkxa6a.png" alt="Image-step8-4" width="800" height="183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;To wrap up, MiniCPM-V 4.0 clearly demonstrates how multimodal AI is becoming more efficient, accessible, and deployment-ready, setting a new benchmark in balancing compact design with powerful visual and reasoning capabilities. From its ability to outperform larger models on benchmarks to its seamless real-world usability on devices like the iPhone 16 Pro Max, it proves that high performance no longer requires massive scale. At the same time, Nodeshift Cloud makes experimenting with and deploying such state-of-the-art models far more practical, offering GPU-accelerated environments, simple setup workflows, and flexible scaling to match your needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For more information about NodeShift:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/company/nodeshift/?%0Aref=blog.nodeshift.com" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/nodeshiftai?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/4dHNxnW7p7?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://app.daily.dev/nodeshift?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;daily.dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>machinelearning</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Install &amp; Run Qwen Image</title>
      <dc:creator>Aditi Bindal</dc:creator>
      <pubDate>Mon, 01 Sep 2025 16:10:21 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/how-to-install-run-qwen-image-1pjd</link>
      <guid>https://dev.to/nodeshiftcloud/how-to-install-run-qwen-image-1pjd</guid>
      <description>&lt;p&gt;Imagine transforming a simple text prompt into a high-quality image with just a few lines of code. Qwen-Image makes this possible by combining advanced image generation with precise text rendering, whether you’re working in English or Chinese. It handles everything from photorealistic scenes and impressionist-style paintings to clean, minimalist designs, adapting its output to your needs. On top of that, Qwen-Image offers powerful editing features: you can insert or remove objects, fine-tune colours and details, edit text directly within an image, and even adjust human poses—all through clear, natural-language commands. Behind the scenes, it also performs tasks like object detection, semantic segmentation, depth estimation and super-resolution, giving you a complete toolkit for creating and refining images with ease.&lt;/p&gt;

&lt;p&gt;Getting started is simple. In the next section, you’ll see exactly how to install Qwen-Image and run your first prompt in minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;The minimum system requirements for running this model are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPU: 1x H100&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage: 50 GB (preferable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VRAM: at least 64 GB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda installed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step-by-step process to install and run Qwen Image
&lt;/h2&gt;

&lt;p&gt;For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by &lt;a href="https://nodeshift.com" rel="noopener noreferrer"&gt;NodeShift&lt;/a&gt; since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Setting up a NodeShift Account
&lt;/h3&gt;

&lt;p&gt;Visit &lt;a href="https://app.nodeshift.com/sign-up" rel="noopener noreferrer"&gt;app.nodeshift.com&lt;/a&gt; and create an account by filling in basic details, or continue signing up with your Google/GitHub account.&lt;/p&gt;

&lt;p&gt;If you already have an account, &lt;a href="http://app.nodeshift.com" rel="noopener noreferrer"&gt;login&lt;/a&gt; straight to your dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" alt="Image-step1-1" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create a GPU Node
&lt;/h3&gt;

&lt;p&gt;After accessing your account, you should see a dashboard (see image), now:&lt;/p&gt;

&lt;p&gt;1) Navigate to the menu on the left side.&lt;/p&gt;

&lt;p&gt;2) Click on the &lt;strong&gt;GPU Nodes&lt;/strong&gt; option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" alt="Image-step2-1" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Click on &lt;strong&gt;Start&lt;/strong&gt; to start creating your very first GPU node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" alt="Image-step2-2" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Selecting configuration for GPU (model, region, storage)
&lt;/h3&gt;

&lt;p&gt;1) For this tutorial, we’ll be using 1x H200 GPU, however, you can choose any GPU as per the prerequisites.&lt;/p&gt;

&lt;p&gt;2) Similarly, we’ll opt for 200 GB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nwqblqu9dtn5vbnnpvm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nwqblqu9dtn5vbnnpvm.png" alt="Image-step3-1" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Choose GPU Configuration and Authentication method
&lt;/h3&gt;

&lt;p&gt;1) After selecting your required configuration options, you’ll see the available GPU nodes in your region and according to (or very close to) your configuration. In our case, we’ll choose a 1x H100 SXM 80GB GPU node with 192vCPUs/80GB RAM/200GB SSD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F33ctuz0kf0n28kilc7zj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F33ctuz0kf0n28kilc7zj.png" alt="Image-step4-1" width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Next, you'll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" alt="Image-step4-2" width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;The final step is to choose an image for the VM, which in our case is &lt;strong&gt;Nvidia Cuda&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" alt="Image-step5-1" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click &lt;strong&gt;Create&lt;/strong&gt; to deploy the node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" alt="Image-step5-2" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filngygf82xfgk2o7lyxv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filngygf82xfgk2o7lyxv.png" alt="Image-step5-3" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Connect to active Compute Node using SSH
&lt;/h3&gt;

&lt;p&gt;1) As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status &lt;strong&gt;Running&lt;/strong&gt; in green, meaning that our Compute node is ready to use!&lt;/p&gt;

&lt;p&gt;2) Once your GPU shows this status, navigate to the three dots on the right, click on &lt;strong&gt;Connect with SSH&lt;/strong&gt;, and copy the SSH details that appear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqckok7vzis7m6g0pecxw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqckok7vzis7m6g0pecxw.png" alt="Image-step6-1" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you copy the details, follow the below steps to connect to the running GPU VM via SSH:&lt;/p&gt;

&lt;p&gt;1) Open your terminal, paste the SSH command, and run it.&lt;/p&gt;

&lt;p&gt;2) In some cases, your terminal may take your consent before connecting. Enter ‘yes’.&lt;/p&gt;

&lt;p&gt;3) A prompt will request a password. Type the SSH password, and you should be connected.&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" alt="Image-step6-2" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the following command in the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Set up the project environment with dependencies
&lt;/h3&gt;

&lt;p&gt;1) Create a virtual environment using &lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda create -n qwen-img python=3.11 -y &amp;amp;&amp;amp; conda activate qwen-img
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zy0g2h5aopcr1jenciq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zy0g2h5aopcr1jenciq.png" alt="Image-step7-1" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Install required dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 
pip install einops timm pillow
pip install git+https://github.com/huggingface/transformers
pip install git+https://github.com/huggingface/accelerate
pip install git+https://github.com/huggingface/diffusers
pip install huggingface_hub
pip install sentencepiece bitsandbytes protobuf decord numpy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq57nt1dltxdp38396921.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq57nt1dltxdp38396921.png" alt="Image-step7-2" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Install and run jupyter notebook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda install -c conda-forge --override-channels notebook -y
conda install -c conda-forge --override-channels ipywidgets -y
jupyter notebook --allow-root
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4) If you’re on a remote machine (e.g., NodeShift GPU), you’ll need to do SSH port forwarding in order to access the jupyter notebook session on your local browser.&lt;/p&gt;

&lt;p&gt;Run the following command in your local terminal after replacing:&lt;/p&gt;

&lt;p&gt;` with the PORT allotted to your remote server (For the NodeShift server – you can find it in the deployed GPU details on the dashboard).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;PATH_TO_SSH_KEY&amp;gt;&lt;/code&gt; with the path to the location where your SSH key is stored.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_IP&amp;gt;&lt;/code&gt; with the IP address of your remote server.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
ssh -L 8888:localhost:8888 -p &amp;lt;YOUR_SERVER_PORT&amp;gt; -i &amp;lt;PATH_TO_SSH_KEY&amp;gt; root@&amp;lt;YOUR_SERVER_IP&amp;gt;&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r8j1owgltt9aq3dt6yq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r8j1owgltt9aq3dt6yq.png" alt="Image-step7-3" width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After this copy the URL you received in your remote server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfvxbys411rncwon4x0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfvxbys411rncwon4x0s.png" alt="Image-step7-4" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And paste this on your local browser to access the Jupyter Notebook session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Download and Run the model
&lt;/h3&gt;

&lt;p&gt;1) Open a Python notebook inside Jupyter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tiohhqac3l7fo2zgy0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tiohhqac3l7fo2zgy0n.png" alt="Image-step8-1" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Download the model checkpoints.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
from diffusers import DiffusionPipeline&lt;br&gt;
import torch&lt;/p&gt;

&lt;p&gt;model_name = "Qwen/Qwen-Image"&lt;/p&gt;

&lt;h1&gt;
  
  
  Load the pipeline
&lt;/h1&gt;

&lt;p&gt;if torch.cuda.is_available():&lt;br&gt;
    torch_dtype = torch.bfloat16&lt;br&gt;
    device = "cuda"&lt;br&gt;
else:&lt;br&gt;
    torch_dtype = torch.float32&lt;br&gt;
    device = "cpu"&lt;/p&gt;

&lt;p&gt;pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)&lt;br&gt;
pipe = pipe.to(device)&lt;/p&gt;

&lt;p&gt;positive_magic = {&lt;br&gt;
    "en": "Ultra HD, 4K, cinematic composition." # for english prompt,&lt;br&gt;
    "zh": "超清，4K，电影级构图" # for chinese prompt,&lt;br&gt;
}&lt;/p&gt;

&lt;h1&gt;
  
  
  Generate image
&lt;/h1&gt;

&lt;p&gt;prompt = '''A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition'''&lt;/p&gt;

&lt;p&gt;negative_prompt = " " # using an empty string if you do not have specific concept to remove&lt;/p&gt;

&lt;h1&gt;
  
  
  Generate with different aspect ratios
&lt;/h1&gt;

&lt;p&gt;aspect_ratios = {&lt;br&gt;
    "1:1": (1328, 1328),&lt;br&gt;
    "16:9": (1664, 928),&lt;br&gt;
    "9:16": (928, 1664),&lt;br&gt;
    "4:3": (1472, 1140),&lt;br&gt;
    "3:4": (1140, 1472),&lt;br&gt;
    "3:2": (1584, 1056),&lt;br&gt;
    "2:3": (1056, 1584),&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;width, height = aspect_ratios["16:9"]&lt;/p&gt;

&lt;p&gt;image = pipe(&lt;br&gt;
    prompt=prompt + positive_magic["en"],&lt;br&gt;
    negative_prompt=negative_prompt,&lt;br&gt;
    width=width,&lt;br&gt;
    height=height,&lt;br&gt;
    num_inference_steps=50,&lt;br&gt;
    true_cfg_scale=4.0,&lt;br&gt;
    generator=torch.Generator(device="cuda").manual_seed(42)&lt;br&gt;
).images[0]&lt;/p&gt;

&lt;p&gt;image.save("example.png")&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F704sfgamozztlieecl2r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F704sfgamozztlieecl2r.png" alt="Image-step8-2" width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71h42kbar10oiyz8qz71.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71h42kbar10oiyz8qz71.png" alt="Image-step8-3" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You’ve seen how Qwen-Image turns simple text prompts into stunning, high-fidelity images, whether photorealistic, painterly or minimal, and offers intuitive editing for objects, colour, text and even human poses, all backed by robust image-understanding capabilities like segmentation and super-resolution. Equally straightforward is getting up and running: a few commands installs the model via diffusers and, in minutes, you’re generating your first visuals. By pairing Qwen-Image with NodeShift Cloud, you gain instant access to scalable GPU instances, automated deployment of your inference pipeline and managed versioning, so you can focus on creativity while NodeShift ensures performance, reliability and easy integration into your existing workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For more information about NodeShift:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/company/nodeshift/?%0Aref=blog.nodeshift.com" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/nodeshiftai?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/4dHNxnW7p7?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://app.daily.dev/nodeshift?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;daily.dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>qwen</category>
      <category>ai</category>
      <category>genai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How to Install &amp; Run Gemma-3-270m, GGUF &amp; Instruct Locally?</title>
      <dc:creator>Ayush kumar</dc:creator>
      <pubDate>Fri, 22 Aug 2025 07:56:27 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/how-to-install-run-gemma-3-270m-gguf-instruct-locally-4nka</link>
      <guid>https://dev.to/nodeshiftcloud/how-to-install-run-gemma-3-270m-gguf-instruct-locally-4nka</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6s5ktq191wnz7m0g5sdx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6s5ktq191wnz7m0g5sdx.jpg" alt=" " width="800" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;google/gemma-3-270m (Pre-trained)&lt;br&gt;
A lightweight, open vision-language model from Google DeepMind, designed for both text and image inputs. With a 32K context window, it’s suitable for general-purpose text generation, summarization, reasoning, and image analysis. Trained on diverse multilingual, code, math, and visual datasets, it offers strong performance in resource-constrained environments like laptops or small cloud VMs.&lt;/p&gt;

&lt;p&gt;google/gemma-3-270m-it (Instruction-Tuned)&lt;br&gt;
An instruction-optimized variant of Gemma 3-270M that’s fine-tuned to follow user prompts more accurately. It keeps the same multimodal capabilities as the base model but excels in conversational AI, question answering, and structured output tasks, making it more user-friendly for chatbots, assistants, and guided content generation.&lt;/p&gt;

&lt;p&gt;unsloth/gemma-3-270m-it-GGUF&lt;br&gt;
A GGUF-format, instruction-tuned Gemma 3-270M released by Unsloth AI for efficient local inference with llama.cpp and similar tools. It’s optimized for faster performance and lower memory usage while retaining multimodal capabilities, making it ideal for on-device or low-resource deployment scenarios.&lt;/p&gt;
&lt;h3&gt;
  
  
  Gemma 3 270M
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7nnc7qhgdxdfaapcan8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7nnc7qhgdxdfaapcan8.png" alt=" " width="740" height="664"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  GPU Configuration Table for Gemma-3-270m, GGUF &amp;amp; Instruct Models
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3rekr6p6c95byubfqnwx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3rekr6p6c95byubfqnwx.png" alt=" " width="752" height="400"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Notes:
&lt;/h3&gt;

&lt;p&gt;The GGUF version is much lighter because it uses quantization, so it can run even on lower-end GPUs or CPUs.&lt;br&gt;
The pre-trained (PT) and instruction-tuned (IT) models from Google will require more VRAM if used in FP16 or BF16 formats.&lt;br&gt;
If you use CPU inference with GGUF, you should have at least 8–16 GB of system RAM for smooth execution.&lt;/p&gt;
&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;p&gt;Link 1: &lt;a href="https://huggingface.co/google/gemma-3-270m" rel="noopener noreferrer"&gt;https://huggingface.co/google/gemma-3-270m&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Link 2: &lt;a href="https://huggingface.co/google/gemma-3-270m-it" rel="noopener noreferrer"&gt;https://huggingface.co/google/gemma-3-270m-it&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Link 3: &lt;a href="https://huggingface.co/unsloth/gemma-3-270m-it-GGUF" rel="noopener noreferrer"&gt;https://huggingface.co/unsloth/gemma-3-270m-it-GGUF&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step-by-Step Process to Install &amp;amp; Run Gemma-3-270m, GGUF &amp;amp; Instruct Locally
&lt;/h3&gt;

&lt;p&gt;For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Sign Up and Set Up a NodeShift Cloud Account
&lt;/h3&gt;

&lt;p&gt;Visit the &lt;a href="https://app.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;NodeShift Platform&lt;/a&gt; and create an account. Once you’ve signed up, log into your account.&lt;/p&gt;

&lt;p&gt;Follow the account setup process and provide the necessary details and information.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdr3yc1k41r8zsn2wlgki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdr3yc1k41r8zsn2wlgki.png" alt=" " width="640" height="396"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Create a GPU Node (Virtual Machine)
&lt;/h3&gt;

&lt;p&gt;GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9pbmixbvn8afjslbavp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9pbmixbvn8afjslbavp.png" alt=" " width="640" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5piva8ejsqy4zim9x1z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5piva8ejsqy4zim9x1z.png" alt=" " width="640" height="399"&gt;&lt;/a&gt;&lt;br&gt;
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: Select a Model, Region, and Storage
&lt;/h3&gt;

&lt;p&gt;In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe52345dxx9fqevz9mkrf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe52345dxx9fqevz9mkrf.png" alt=" " width="640" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bk92gan3fqftpbi048p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bk92gan3fqftpbi048p.png" alt=" " width="640" height="335"&gt;&lt;/a&gt;&lt;br&gt;
We will use 1 x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: Select Authentication Method
&lt;/h3&gt;

&lt;p&gt;There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg8g4h76hysmaqmvd62xi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg8g4h76hysmaqmvd62xi.png" alt=" " width="640" height="189"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Gemma-3-270m &amp;amp; Instruct, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.&lt;/p&gt;

&lt;p&gt;We chose the following image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This image is essential because it includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full CUDA toolkit (including nvcc)&lt;/li&gt;
&lt;li&gt;Proper support for building and running GPU-based applications like Gemma-3-270m &amp;amp; Instruct&lt;/li&gt;
&lt;li&gt;Compatibility with CUDA 12.1.1 required by certain model operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Launch Mode
&lt;/h3&gt;

&lt;p&gt;We selected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Interactive shell server

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like Gemma-3-270m &amp;amp; Instruct.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Repository Authentication
&lt;/h3&gt;

&lt;p&gt;We left all fields empty here.&lt;/p&gt;

&lt;p&gt;Since the Docker image is publicly available on Docker Hub, no login credentials are required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Identification
&lt;/h3&gt;

&lt;p&gt;Template Name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23pyqw7fmvetq7z751xd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23pyqw7fmvetq7z751xd.png" alt=" " width="640" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4sy6wwh77x0qxb49eaj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4sy6wwh77x0qxb49eaj.png" alt=" " width="640" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This setup ensures that the Gemma-3-270m &amp;amp; Instruct  runs in a GPU-enabled environment with proper CUDA access and high compute performance.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sazpjemi472wgkzrd1i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sazpjemi472wgkzrd1i.png" alt=" " width="640" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxaswq35oyrrbaj9mkla.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxaswq35oyrrbaj9mkla.png" alt=" " width="640" height="334"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 6: Virtual Machine Successfully Deployed
&lt;/h3&gt;

&lt;p&gt;You will get visual confirmation that your node is up and running.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gizh1t0tp1ymzkyp6v6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gizh1t0tp1ymzkyp6v6.png" alt=" " width="640" height="321"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 7: Connect to GPUs using SSH
&lt;/h3&gt;

&lt;p&gt;NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.&lt;/p&gt;

&lt;p&gt;Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuhcatazzsggdhlg6ecn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuhcatazzsggdhlg6ecn4.png" alt=" " width="640" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft7zkmqkpxybjl11wvzur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft7zkmqkpxybjl11wvzur.png" alt=" " width="640" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now open your terminal and paste the proxy SSH IP or direct SSH IP.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbsn5uszp609kurj6i3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbsn5uszp609kurj6i3y.png" alt=" " width="640" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the command below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia-smi

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpwhj2efowsnegl61ys1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpwhj2efowsnegl61ys1.png" alt=" " width="640" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Check the Available Python version and Install the new version
&lt;/h3&gt;

&lt;p&gt;Run the following commands to check the available Python version.&lt;/p&gt;

&lt;p&gt;If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.&lt;/p&gt;

&lt;p&gt;Run the following commands to add the deadsnakes PPA:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02vj084cooz94ya7e4jk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02vj084cooz94ya7e4jk.png" alt=" " width="640" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 9: Install Python 3.11
&lt;/h3&gt;

&lt;p&gt;Now, run the following command to install Python 3.11 or another desired version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install -y python3.11 python3.11-venv python3.11-dev

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17yuu799qp5sxpevgwhy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17yuu799qp5sxpevgwhy.png" alt=" " width="640" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: Update the Default Python3 Version
&lt;/h3&gt;

&lt;p&gt;Now, run the following command to link the new Python version as the default python3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, run the following command to verify that the new Python version is active:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfpjf9ntq0ghgik2nb8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfpjf9ntq0ghgik2nb8s.png" alt=" " width="640" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 11: Install and Update Pip
&lt;/h3&gt;

&lt;p&gt;Run the following command to install and update the pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, run the following command to check the version of pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3rsmq4n5hl8o0ohjnw1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3rsmq4n5hl8o0ohjnw1.png" alt=" " width="640" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 12: Created and activated Python 3.11 virtual environment
&lt;/h3&gt;

&lt;p&gt;Run the following commands to created and activated Python 3.11 virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apt update &amp;amp;&amp;amp; apt install -y python3.11-venv git wget
python3.11 -m venv openwebui
source openwebui/bin/activate

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F99t9a3ynqtultuvixcn7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F99t9a3ynqtultuvixcn7.png" alt=" " width="640" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 13: Install Open-WebUI
&lt;/h3&gt;

&lt;p&gt;Run the following command to install open-webui:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install open-webui

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxo7chd2zumh0r9asee1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxo7chd2zumh0r9asee1.png" alt=" " width="640" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 14: Serve Open-WebUI
&lt;/h3&gt;

&lt;p&gt;In your activated Python environment, start the Open-WebUI server by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;open-webui serve

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhi3mpn6o9w3tz4m5vyce.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhi3mpn6o9w3tz4m5vyce.png" alt=" " width="640" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wait for the server to complete all database migrations and set up initial files. You’ll see a series of INFO logs and a large “OPEN WEBUI” banner in the terminal.&lt;/li&gt;
&lt;li&gt;When setup is complete, the WebUI will be available and ready for you to access via your browser.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxavrj0hyi6lts0tulyl0.png" alt=" " width="640" height="381"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcl4jy899dsn9u2tdlgyo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcl4jy899dsn9u2tdlgyo.png" alt=" " width="640" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 15: Set up SSH port forwarding from your local machine
&lt;/h3&gt;

&lt;p&gt;On your local machine (Mac/Windows/Linux), open a terminal and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -L 8080:localhost:8080 -p 40128 root@38.29.145.10

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This forwards:&lt;/p&gt;

&lt;p&gt;Local localhost:8000 → Remote VM 127.0.0.1:8000&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8t72eu2r91x8x98nctod.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8t72eu2r91x8x98nctod.png" alt=" " width="640" height="229"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 16: Access Open-WebUI in Your Browser
&lt;/h3&gt;

&lt;p&gt;Go to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8080

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;You should see the Open-WebUI login or setup page.&lt;/li&gt;
&lt;li&gt;Log in or create a new account if this is your first time.&lt;/li&gt;
&lt;li&gt;You’re now ready to use Open-WebUI to interact with your models!
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2l3v1nf5j7cnbxiu2zg.png" alt=" " width="640" height="397"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 17: Install Ollama
&lt;/h3&gt;

&lt;p&gt;After connecting to the terminal via SSH, it’s now time to install Ollama from the official Ollama website.&lt;/p&gt;

&lt;p&gt;Website Link: &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;https://ollama.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run the following command to install the Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -fsSL https://ollama.com/install.sh | sh

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffaie54x9ehoqassbq9r4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffaie54x9ehoqassbq9r4.png" alt=" " width="640" height="218"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 18: Serve Ollama
&lt;/h3&gt;

&lt;p&gt;Run the following command to host the Ollama so that it can be accessed and utilized efficiently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama serve

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcesqxii9jbhc87k0m60a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcesqxii9jbhc87k0m60a.png" alt=" " width="640" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 19: Pull the Gemma3:270M Model
&lt;/h3&gt;

&lt;p&gt;Run this command to pull the gemma3:270m model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama pull gemma3:270m

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqna8j5gnab52eb464qfk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqna8j5gnab52eb464qfk.png" alt=" " width="640" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 20: Run the Gemma3:270M Model for Inference
&lt;/h3&gt;

&lt;p&gt;Now that your models are installed, you can start running them and interacting directly from the terminal.&lt;/p&gt;

&lt;p&gt;To run the gemma3:270m model, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama run gemma3:270m

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rpouiiyzmupq6ptiimj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rpouiiyzmupq6ptiimj.png" alt=" " width="640" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 21 — Chat with Gemma-3-270M in Open WebUI (auto-detected from Ollama)
&lt;/h3&gt;

&lt;p&gt;You’ve already tested the model in the terminal with Ollama and installed Open WebUI earlier. Now we’ll use the Web UI to chat with the same local model.&lt;/p&gt;

&lt;p&gt;Make sure Ollama is running&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you’re in a VM, keep the Ollama service up.&lt;/li&gt;
&lt;li&gt;Quick check:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama pull gemma3:270m   # if not pulled yet
curl http://localhost:11434/api/tags | jq . # should list gemma3:270m

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the Web UI&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visit your Open WebUI URL (e.g., http://:8080).&lt;/li&gt;
&lt;li&gt;Click the model dropdown at the top (“Select a model”).&lt;/li&gt;
&lt;li&gt;Pick the model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You should see gemma3:270m under Local. Select it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;That’s it—Open WebUI automatically detects any model you’ve pulled with Ollama and shows it in the list.&lt;/li&gt;
&lt;li&gt;(Your screen should look like the screenshot: gemma3:270m visible in the model picker.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Start chatting&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Type your prompt in the chat box and send.&lt;/li&gt;
&lt;li&gt;Use the icon (if available) to tweak temperature, max tokens, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the model doesn’t appear&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the refresh icon next to the model list, or go to Settings → Providers → Ollama and confirm the Base URL (usually &lt;a href="http://localhost:11434" rel="noopener noreferrer"&gt;http://localhost:11434&lt;/a&gt;), then Save and Sync Models.&lt;/li&gt;
&lt;li&gt;If Ollama runs on another machine, set the Base URL to that host (make sure the port is reachable).
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7q2phky1venihnk9yxxm.png" alt=" " width="640" height="231"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 22 — Stress-test the model in Open WebUI (tune settings + quick rubric)
&lt;/h3&gt;

&lt;p&gt;Now that gemma3:270m shows up in Open WebUI and you can chat, do a fast quality check and tune generation so it behaves well.&lt;/p&gt;

&lt;p&gt;Open a new chat → pick gemma3:270m&lt;/p&gt;

&lt;p&gt;Click the gear (generation settings) and start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Temperature: 0.6&lt;/li&gt;
&lt;li&gt;Top-p: 0.9&lt;/li&gt;
&lt;li&gt;Max new tokens: 512&lt;/li&gt;
&lt;li&gt;Repeat penalty: 1.1&lt;/li&gt;
&lt;li&gt;(Optional) Seed: 42 for reproducible runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste 3 single-line “hard” prompts to probe reasoning &amp;amp; constraints&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If five painters take five hours to paint five walls, how long would 100 painters take to paint 100 walls? Explain without skipping steps.&lt;/li&gt;
&lt;li&gt;Summarize the book “The Little Prince” in exactly 7 words, keeping its emotional tone intact.&lt;/li&gt;
&lt;li&gt;Translate “La vie est belle” into English, reverse each word, and then write a haiku using the reversed words as the first line.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Grade quickly with a mini-rubric (write notes in the chat or a doc)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correctness (math/logic right?)&lt;/li&gt;
&lt;li&gt;Constraint keeping (exact word count, formatting, “no synonyms” rules)&lt;/li&gt;
&lt;li&gt;Clarity (step-by-step, no hand-waving)&lt;/li&gt;
&lt;li&gt;Latency (tokens/sec acceptable?)&lt;/li&gt;
&lt;li&gt;Determinism (does it change across retries? if yes, lower temp)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If it struggles, tweak and retry&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reasoning tasks: lower Temperature → 0.2–0.4.&lt;/li&gt;
&lt;li&gt;Short answers cut off: raise Max new tokens.&lt;/li&gt;
&lt;li&gt;Add a System message like: “Follow constraints strictly. Show numbered steps.”
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvg43qdpst6tjrfxpdxu.png" alt=" " width="640" height="438"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jojw85t5wjxpix9anfo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jojw85t5wjxpix9anfo.png" alt=" " width="640" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzsqg5kg3bbqv6gtyogve.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzsqg5kg3bbqv6gtyogve.png" alt=" " width="640" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Up to here, we’ve been interacting with google/gemma-3-270m via Ollama in the terminal and through Open WebUI in the browser (Open WebUI auto-detected the Ollama model, so chatting worked in both places). Now we’ll install the lightweight GGUF variant of this model directly from Hugging Face inside Open WebUI’s Manage Models panel, so you can run the llama.cpp-style build with lower memory usage and switch between the Ollama and GGUF versions from the same model dropdown.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 23 — Pull the GGUF build from Hugging Face (Unsloth)
&lt;/h3&gt;

&lt;p&gt;Unsloth publishes a ready-to-run GGUF pack for this model: unsloth/gemma-3-270m-it-GGUF.&lt;br&gt;
In Open WebUI → Settings → Models → Manage Models, paste this repo path into “Pull a model from Ollama.com” (it accepts hf.co/... too):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hf.co/unsloth/gemma-3-270m-it-GGUF

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Click the download icon. When file choices appear, I recommend starting with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;gemma-3-270m-it.Q4_K_M.gguf (best speed/quality balance)&lt;/li&gt;
&lt;li&gt;Lighter options if RAM/VRAM is tiny: IQ2_XXS / IQ3_XXS&lt;/li&gt;
&lt;li&gt;Higher quality: Q8_0 (or F16 if you want full precision)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After the download finishes, the GGUF model will show up in your model selector alongside the Ollama one, and you can chat with either version directly in Open WebUI.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs35gnf8xaw55n45nwd1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs35gnf8xaw55n45nwd1.png" alt=" " width="640" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3khwx9wx902tx0whkuy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3khwx9wx902tx0whkuy.png" alt=" " width="640" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bkm9wf6yziasybsfkhl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bkm9wf6yziasybsfkhl.png" alt=" " width="640" height="281"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 24 — Chat with the GGUF model in Open WebUI (verify + tune)
&lt;/h3&gt;

&lt;p&gt;Select the GGUF build&lt;br&gt;
Open a new chat and pick hf.co/unsloth/gemma-3-270m-it-GGUF:latest from the model dropdown (you’ll see the full HF path in the header, like in your screenshot).&lt;/p&gt;

&lt;p&gt;Use the same stress prompts&lt;br&gt;
Paste the three single-line tests (√2 proof without “number”, paradox in one sentence, 12-word Inception). This makes A/B comparison with the Ollama version straightforward.&lt;/p&gt;

&lt;p&gt;Tune generation for GGUF&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Temperature 0.4–0.6 (start 0.5)&lt;/li&gt;
&lt;li&gt;Top-p 0.9&lt;/li&gt;
&lt;li&gt;Max new tokens 512&lt;/li&gt;
&lt;li&gt;Repeat penalty 1.1&lt;/li&gt;
&lt;li&gt;Context/window: 8192 (you can go higher if your RAM allows)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compare vs. Ollama run&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correctness: does it keep constraints (exact word counts, banned words)?&lt;/li&gt;
&lt;li&gt;Coherence: fewer/random jumps → nudge temp down to 0.3–0.4.&lt;/li&gt;
&lt;li&gt;Latency: if slow on CPU, try a lighter quant (IQ3_XXS) or shorter max tokens. If quality feels thin, bump to Q6_K or Q8_0.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Optional: save a preset&lt;br&gt;
Click … → Save as preset (e.g., “Gemma3-270m-GGUF-Q4KM”) so future chats load your tuned settings instantly.&lt;/p&gt;

&lt;p&gt;If something’s off&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model not loading: re-open Settings → Models → Manage Models → Sync/Refresh.&lt;/li&gt;
&lt;li&gt;Quality too low: switch the file to a higher quant (Q6_K / Q8_0).&lt;/li&gt;
&lt;li&gt;Memory tight: keep quant at Q4_K_M and reduce context or max tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now you can flip between Ollama (gemma3:270m) and GGUF (hf.co/unsloth/…) in the same UI and capture side-by-side behavior for your write-up.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5adxl96tz1go62yxa8v9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5adxl96tz1go62yxa8v9.png" alt=" " width="640" height="366"&gt;&lt;/a&gt;&lt;br&gt;
Up to this point, we’ve been chatting with google/gemma-3-270m, google/gemma-3-270m-it, and the unsloth/gemma-3-270m-it-GGUF build via Ollama in the terminal and Open WebUI in the browser (which auto-detected our Ollama pulls). Now we’ll move beyond the UI and run the original Hugging Face models google/gemma-3-270m (pretrained) and google/gemma-3-270m-it (instruction-tuned) directly via script—downloading them with Transformers using your HF token, so we can control settings programmatically, batch tests, and log clean benchmarks.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 25 — Install Torch
&lt;/h3&gt;

&lt;p&gt;Run the following command to install torch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid7lmakbl1w9fq1j06cq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid7lmakbl1w9fq1j06cq.png" alt=" " width="640" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 26: Install Python Dependencies
&lt;/h3&gt;

&lt;p&gt;Run the following command to install python dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python -m pip install -U "transformers&amp;gt;=4.53" accelerate sentencepiece

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fs9wa1kkh1vjsy3g4n4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fs9wa1kkh1vjsy3g4n4.png" alt=" " width="640" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 27 — Install/Verify Hugging Face Hub (CLI + token)
&lt;/h3&gt;

&lt;p&gt;Install (or update) the Hub tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -U huggingface_hub "transformers&amp;gt;=4.53"
huggingface-cli --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcm6ridv2ndbscvxh0yy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcm6ridv2ndbscvxh0yy.png" alt=" " width="640" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Authenticate (same account that accepted Gemma access):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;huggingface-cli login            # paste HF_xxx token with read scope
# optional env var so scripts/daemons inherit it
export HF_TOKEN=HF_xxx
echo 'export HF_TOKEN=HF_xxx' &amp;gt;&amp;gt; ~/.bashrc

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuowvuz04tfwh0vd7zr2i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuowvuz04tfwh0vd7zr2i.png" alt=" " width="640" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 28: Connect to Your GPU VM with a Code Editor
&lt;/h3&gt;

&lt;p&gt;Before you start running Python scripts with the Gemma-3-270m &amp;amp; Instruct models and Transformers, it’s a good idea to connect your GPU virtual machine (VM) to a code editor of your choice. This makes writing, editing, and running code much easier.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can use popular editors like VS Code, Cursor, or any other IDE that supports SSH remote connections.&lt;/li&gt;
&lt;li&gt;In this example, we’re using cursor code editor.&lt;/li&gt;
&lt;li&gt;Once connected, you’ll be able to browse files, edit scripts, and run commands directly on your remote server, just like working locally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why do this?&lt;br&gt;
Connecting your VM to a code editor gives you a powerful, streamlined workflow for Python development, allowing you to easily manage your code, install dependencies, and experiment with large models.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1dkzj289b5zwgjg6mae.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1dkzj289b5zwgjg6mae.png" alt=" " width="640" height="397"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 29: Run Gemma-3-270M Models with Transformers in Python
&lt;/h3&gt;

&lt;p&gt;Now you’re ready to interact with Gemma-3-270M directly in your own Python scripts using the Transformers library.&lt;/p&gt;

&lt;p&gt;Here’s an example script (gemma3_run.py) you can use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
import torch

model_id = "google/gemma-3-270m-it"  # or "google/gemma-3-27m" for base PT

tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",   # GPU if present, else CPU
    attn_implementation="sdpa"  # good default in recent PyTorch
)

streamer = TextStreamer(tok)
inputs = tok("Explain Rust ownership like I'm 12:", return_tensors="pt").to(model.device)
_ = model.generate(**inputs, max_new_tokens=200, streamer=streamer)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9ure15ucmclltas7hxs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9ure15ucmclltas7hxs.png" alt=" " width="640" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 30: Run the script and generate a response
&lt;/h3&gt;

&lt;p&gt;Run the script with the following command to load google/gemma-3-270m-it and generate a response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 gemma3_run.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjv8gltsekn4mannr4w63.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjv8gltsekn4mannr4w63.png" alt=" " width="640" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha9rmrzfqtbvwy8r0m3b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha9rmrzfqtbvwy8r0m3b.png" alt=" " width="640" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 31: Run Gemma-3-270M Models with Transformers in Python
&lt;/h3&gt;

&lt;p&gt;Next we will interact with Gemma-3-270M directly in your own Python scripts using the Transformers library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
import torch

model_id = "google/gemma-3-270m"  # or "google/gemma-3-27m-it" for instruct

tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",   # GPU if present, else CPU
    attn_implementation="sdpa"  # good default in recent PyTorch
)

streamer = TextStreamer(tok)
inputs = tok("Explain Rust ownership like I'm 12:", return_tensors="pt").to(model.device)
_ = model.generate(**inputs, max_new_tokens=200, streamer=streamer)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6c4sfkrvdjuhn0hac6q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6c4sfkrvdjuhn0hac6q.png" alt=" " width="640" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 32: Run the script and generate a response
&lt;/h3&gt;

&lt;p&gt;Run the script with the following command to load google/gemma-3-270m and generate a response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 gemma3_run.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nsnbq4j7q2og4b9hhi1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nsnbq4j7q2og4b9hhi1.png" alt=" " width="640" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Gemma-3-270M is a perfect example of how cutting-edge AI can be scaled down without losing its versatility. Whether you’re experimenting with the pre-trained variant for raw, general-purpose tasks, the instruction-tuned version for natural conversations, or the GGUF build for low-resource deployments, you get a model that’s fast, flexible, and surprisingly capable for its size.&lt;/p&gt;

&lt;p&gt;With this guide, you’ve learned how to set up a GPU-powered environment, run Gemma models through Ollama, Open WebUI, and Transformers, and even optimize them for speed and memory efficiency. You can now seamlessly switch between interactive browser-based chats, terminal sessions, and custom Python scripts—all while taking advantage of the model’s multimodal capabilities.&lt;/p&gt;

&lt;p&gt;Whether you’re building a chatbot, testing reasoning skills, summarizing content, or just exploring model behavior, Gemma-3-270M gives you the freedom to run it your way—from high-end GPUs to modest local machines. Now, it’s your turn to put it to the test, push its limits, and see what’s possible when big ideas meet small but mighty AI.&lt;/p&gt;

</description>
      <category>google</category>
      <category>gemma3</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>The OCR Model That Outranks GPT-4o</title>
      <dc:creator>Ayush kumar</dc:creator>
      <pubDate>Fri, 22 Aug 2025 06:28:33 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/the-ocr-model-that-outranks-gpt-4o-586b</link>
      <guid>https://dev.to/nodeshiftcloud/the-ocr-model-that-outranks-gpt-4o-586b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj69kxosssvz0e0lnt6fd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj69kxosssvz0e0lnt6fd.png" alt=" " width="800" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;NuMarkdown-8B-Thinking is a reasoning-powered OCR Vision-Language Model (VLM) built to transform documents into clean, structured Markdown. Fine-tuned from Qwen2.5-VL-7B, it introduces thinking tokens that help the model analyze complex layouts, tables, and unusual document structures before generating output. This makes it especially useful for RAG pipelines, document extraction, and knowledge organization. With its reasoning-first approach, NuMarkdown-8B-Thinking consistently outperforms generic OCR and even rivals large closed-source reasoning models in accuracy and layout understanding.&lt;/p&gt;

&lt;p&gt;Arena ranking against popular alternatives (using trueskill-2 ranking system, with around 500 model-anonymized votes):&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rmloqrur3gwd5ynej03.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rmloqrur3gwd5ynej03.png" alt=" " width="738" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Win/Draw/Lose-rate against others Models&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tr6lsdjthdpstjmhbht.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tr6lsdjthdpstjmhbht.png" alt=" " width="732" height="321"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  GPU Configuration Table – NuMarkdown-8B-Thinking
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqhm4lfucoq0cvtufw56.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqhm4lfucoq0cvtufw56.png" alt=" " width="734" height="539"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step-by-Step Process to Install &amp;amp; Run NuMarkdown-8B-Thinking Locally
&lt;/h3&gt;

&lt;p&gt;For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Sign Up and Set Up a NodeShift Cloud Account
&lt;/h3&gt;

&lt;p&gt;Visit the &lt;a href="https://app.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;NodeShift Platform&lt;/a&gt; and create an account. Once you’ve signed up, log into your account.&lt;/p&gt;

&lt;p&gt;Follow the account setup process and provide the necessary details and information.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7pl0k2h5ne22f94had3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7pl0k2h5ne22f94had3.png" alt=" " width="640" height="396"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Create a GPU Node (Virtual Machine)
&lt;/h3&gt;

&lt;p&gt;GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwas758krgsifp4ms94b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwas758krgsifp4ms94b.png" alt=" " width="640" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc9j75dudu8d27881wlf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc9j75dudu8d27881wlf.png" alt=" " width="640" height="345"&gt;&lt;/a&gt;&lt;br&gt;
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: Select a Model, Region, and Storage
&lt;/h3&gt;

&lt;p&gt;In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F342r3scgupbda6v431zy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F342r3scgupbda6v431zy.png" alt=" " width="640" height="403"&gt;&lt;/a&gt;&lt;br&gt;
We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: Select Authentication Method
&lt;/h3&gt;

&lt;p&gt;There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9z4djtierpw4f7yhvi3c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9z4djtierpw4f7yhvi3c.png" alt=" " width="640" height="223"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running NuMarkdown-8B-Thinking, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.&lt;/p&gt;

&lt;p&gt;We chose the following image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This image is essential because it includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full CUDA toolkit (including nvcc)&lt;/li&gt;
&lt;li&gt;Proper support for building and running GPU-based applications like NuMarkdown-8B-Thinking&lt;/li&gt;
&lt;li&gt;Compatibility with CUDA 12.1.1 required by certain model operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Launch Mode
&lt;/h3&gt;

&lt;p&gt;We selected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Interactive shell server

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like NuMarkdown-8B-Thinking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Repository Authentication
&lt;/h3&gt;

&lt;p&gt;We left all fields empty here.&lt;/p&gt;

&lt;p&gt;Since the Docker image is publicly available on Docker Hub, no login credentials are required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Identification
&lt;/h3&gt;

&lt;p&gt;Template Name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia/cuda:12.1.1-devel-ubuntu22.04

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72ly5rg2j4egz6ypclm0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72ly5rg2j4egz6ypclm0.png" alt=" " width="640" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fme9ccqm7m5p9rcmjhc51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fme9ccqm7m5p9rcmjhc51.png" alt=" " width="640" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This setup ensures that the NuMarkdown-8B-Thinking runs in a GPU-enabled environment with proper CUDA access and high compute performance.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8e11y90eopm053tp7q8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8e11y90eopm053tp7q8.png" alt=" " width="640" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foptfgkhjzsl4m644kh26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foptfgkhjzsl4m644kh26.png" alt=" " width="640" height="346"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 6: Virtual Machine Successfully Deployed
&lt;/h3&gt;

&lt;p&gt;You will get visual confirmation that your node is up and running.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgke1fbd00363p3nbp3c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgke1fbd00363p3nbp3c.png" alt=" " width="640" height="259"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 7: Connect to GPUs using SSH
&lt;/h3&gt;

&lt;p&gt;NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.&lt;/p&gt;

&lt;p&gt;Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbw3fiq6o3mkcuy5wva7u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbw3fiq6o3mkcuy5wva7u.png" alt=" " width="640" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvga8mbrmri96vm2v970t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvga8mbrmri96vm2v970t.png" alt=" " width="640" height="303"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now open your terminal and paste the proxy SSH IP or direct SSH IP.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6n329idbds5hl5xd6uo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6n329idbds5hl5xd6uo.png" alt=" " width="640" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the command below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia-smi

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmlww3pfai7ny74qljgw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmlww3pfai7ny74qljgw.png" alt=" " width="640" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Check the Available Python version and Install the new version
&lt;/h3&gt;

&lt;p&gt;Run the following commands to check the available Python version.&lt;/p&gt;

&lt;p&gt;If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.&lt;/p&gt;

&lt;p&gt;Run the following commands to add the deadsnakes PPA:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02vj084cooz94ya7e4jk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02vj084cooz94ya7e4jk.png" alt=" " width="640" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 9: Install Python 3.11
&lt;/h3&gt;

&lt;p&gt;Now, run the following command to install Python 3.11 or another desired version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install -y python3.11 python3.11-venv python3.11-dev

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17yuu799qp5sxpevgwhy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17yuu799qp5sxpevgwhy.png" alt=" " width="640" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: Update the Default Python3 Version
&lt;/h3&gt;

&lt;p&gt;Now, run the following command to link the new Python version as the default python3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, run the following command to verify that the new Python version is active:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfpjf9ntq0ghgik2nb8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfpjf9ntq0ghgik2nb8s.png" alt=" " width="640" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 11: Install and Update Pip
&lt;/h3&gt;

&lt;p&gt;Run the following command to install and update the pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, run the following command to check the version of pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3rsmq4n5hl8o0ohjnw1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3rsmq4n5hl8o0ohjnw1.png" alt=" " width="640" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 12: Created and activated Python 3.11 virtual environment
&lt;/h3&gt;

&lt;p&gt;Run the following commands to created and activated Python 3.11 virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apt update &amp;amp;&amp;amp; apt install -y python3.11-venv git wget
python3.11 -m venv numarkdown
source numarkdown/bin/activate

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey5pxc2rb93liy16aqww.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey5pxc2rb93liy16aqww.png" alt=" " width="640" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 13: Install Torch
&lt;/h3&gt;

&lt;p&gt;Run the following command to install torch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install "torchvision==0.18.1+cu121" --index-url https://download.pytorch.org/whl/cu121

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 14: Install Dependencies
&lt;/h3&gt;

&lt;p&gt;Run the following command to install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -U pillow transformers accelerate

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76ckl1v1h6nbw9ce14gy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76ckl1v1h6nbw9ce14gy.png" alt=" " width="640" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 15: Connect to your GPU VM using Remote SSH
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Open VS Code, cursor or choice of code editor on your Mac.&lt;/li&gt;
&lt;li&gt;Press Cmd + Shift + P, then choose Remote-SSH: Connect to Host.&lt;/li&gt;
&lt;li&gt;Select your configured host.&lt;/li&gt;
&lt;li&gt;Once connected, you’ll see SSH: 149.7.4.3(Your VM IP) in the bottom-left status bar (like in the image).
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zj8bf1d4x73zyg0a8g2.png" alt=" " width="640" height="450"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 16: Create a New Python Script ex.py and Add the Following Code
&lt;/h3&gt;

&lt;p&gt;Create a new python script (example: numarkdown.py) and add the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration

# --- Force stable attention backend (avoid FlashAttention-2) ---
os.environ["TRANSFORMERS_ATTENTION_IMPLEMENTATION"] = "sdpa"
os.environ["HF_USE_FLASH_ATTENTION_2"] = "0"

# --- Model &amp;amp; processor setup ---
model_id = "numind/NuMarkdown-8B-Thinking"

# Use slow processor to silence "fast vs slow" warnings (optional)
processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True,
    use_fast=False,  # keep legacy processor
    min_pixels=100 * 28 * 28,
    max_pixels=5000 * 28 * 28
)

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype="bfloat16",        # efficient on modern GPUs
    device_map="auto",             # auto-GPU placement
    trust_remote_code=True,
    attn_implementation="sdpa",    # force PyTorch SDPA attention
)

# --- Input image (replace with your doc image) ---
img = Image.open("sample.png").convert("RGB")

# Optional downscale: keep under ~3–4 MP to save VRAM
MAX_SIDE = 2200
img.thumbnail((MAX_SIDE, MAX_SIDE))

# --- Prompt &amp;amp; inputs ---
messages = [{"role": "user", "content": [{"type": "image"}]}]
prompt = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)

# --- Run inference ---
with torch.no_grad():
    out = model.generate(
        **inputs,
        temperature=1e-5,
        max_new_tokens=2000  # adjust if you need longer markdown
    )

result = processor.decode(out[0])

# --- Extract &amp;lt;answer&amp;gt; cleanly ---
def between(s, a, b):
    i = s.find(a)
    j = s.find(b, i + len(a))
    return s[i + len(a):j] if i != -1 and j != -1 else s

answer = between(result, "&amp;lt;answer&amp;gt;", "&amp;lt;/answer&amp;gt;")
print(answer)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8igvgad9lgpw7hfo7rhq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8igvgad9lgpw7hfo7rhq.png" alt=" " width="640" height="502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 17 — Upload Image via the Editor &amp;amp; Run the Script
&lt;/h3&gt;

&lt;h4&gt;
  
  
  17.1 Open the VM workspace in your editor
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;In VS Code: Remote Explorer → SSH Targets → connect to your VM → open /root (or your chosen project folder).&lt;/li&gt;
&lt;li&gt;You should see your project files (numarkdown.py, etc.) in the left Explorer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  17.2 Upload your local image to the VM (drag &amp;amp; drop)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;In VS Code Explorer (connected to the VM), right-click the folder where numarkdown.py lives (e.g., /root) and choose “Reveal in File Explorer” (optional) just to confirm location.&lt;/li&gt;
&lt;li&gt;Drag your local image file (e.g., sample.png or myscan.jpg) from your laptop’s file manager into the VS Code Explorer for the VM workspace.&lt;/li&gt;
&lt;li&gt;Confirm the upload when prompted. You should now see the image in the remote file list (e.g., /root/sample.png).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  17.3 (Optional) Rename the file to match the script
&lt;/h4&gt;

&lt;p&gt;If your script expects image.png:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In VS Code Explorer: right-click the uploaded file → Rename → image.png.
(Or skip this if your script accepts a CLI argument.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  17.4 Activate the venv in the editor’s terminal (remote)
&lt;/h4&gt;

&lt;p&gt;In VS Code, open a terminal (Terminal → New Terminal). It’s already running on the VM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;source ~/numarkdown/bin/activate
cd ~

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  17.5 Run the extractor
&lt;/h4&gt;

&lt;p&gt;If your script expects image.png:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 numarkdown.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your script accepts a filename:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 numarkdown.py sample.png

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll see the Markdown printed in the terminal.&lt;/p&gt;

&lt;h4&gt;
  
  
  17.6 Save the Markdown to a file (so you can open it in the editor)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# image.png route
python3 numarkdown.py &amp;gt; output.md

# argument route
python3 numarkdown.py sample.png &amp;gt; output.md

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In VS Code Explorer, click output.md to preview the formatted result right in your editor.&lt;/p&gt;

&lt;h4&gt;
  
  
  17.7 Quick checks &amp;amp; common fixes
&lt;/h4&gt;

&lt;p&gt;Don’t see the image in VS Code on the VM? You likely uploaded to a different folder. Check the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pwd &amp;amp;&amp;amp; ls -lh

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure the image sits next to numarkdown.py (or pass its full path).&lt;/p&gt;

&lt;p&gt;FileNotFoundError: 'image.png'&lt;br&gt;
Rename your uploaded file to image.png or run python3 numarkdown.py .&lt;/p&gt;

&lt;p&gt;Large scans / VRAM: If you hit OOM, downscale locally before upload, or let the script handle it (our script already thumbnails to ~3–4 MP).&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjslq6o4hizywnd3r6m2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjslq6o4hizywnd3r6m2a.png" alt=" " width="640" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoa2iogtsi7waan93yuh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoa2iogtsi7waan93yuh.png" alt=" " width="640" height="223"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1hu0nrrfg7pkap5bdoh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1hu0nrrfg7pkap5bdoh.png" alt=" " width="640" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokr5yca3skbyv3ikb2an.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokr5yca3skbyv3ikb2an.png" alt=" " width="640" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2sf6s2sdq2l4jxbhcz5j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2sf6s2sdq2l4jxbhcz5j.png" alt=" " width="640" height="409"&gt;&lt;/a&gt;&lt;br&gt;
Up until now, we’ve been running and interacting with our model directly from the terminal. That worked fine for quick tests, but now let’s make things smoother and more user-friendly by running it inside a browser interface. For that, we’ll use Streamlit, a lightweight Python framework that lets us build interactive web apps in just a few lines of code.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 18: Install Required Libraries for Browser App
&lt;/h3&gt;

&lt;p&gt;First, install Streamlit along with a few other helper libraries we’ll need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install streamlit pillow pdf2image pypdf transformers accelerate timm

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;streamlit → run the browser app&lt;/li&gt;
&lt;li&gt;pillow → handle image processing&lt;/li&gt;
&lt;li&gt;pdf2image &amp;amp; pypdf → process PDFs&lt;/li&gt;
&lt;li&gt;transformers, accelerate, timm → load and run the model efficiently
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdf7mp6e0ce5uvey0ga9y.png" alt=" " width="640" height="408"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 19: Fix APT Sources, Update, and Install Poppler Utils
&lt;/h3&gt;

&lt;p&gt;We’ll switch the Ubuntu mirror to the official archive, clean bad apt lists, update package indexes with resilience, and finally install poppler-utils (provides pdftoppm/pdftocairo) in one command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo sed -i 's|http://mirror.serverion.com/ubuntu|http://archive.ubuntu.com/ubuntu|g' /etc/apt/sources.list &amp;amp;&amp;amp; \
sudo apt-get clean &amp;amp;&amp;amp; \
sudo rm -rf /var/lib/apt/lists/* &amp;amp;&amp;amp; \
sudo apt-get update -o Acquire::Retries=3 --fix-missing &amp;amp;&amp;amp; \
sudo apt-get install -y poppler-utils

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftd9vxkmx1q9fjcxec33m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftd9vxkmx1q9fjcxec33m.png" alt=" " width="640" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 20: Create the Streamlit App Script (app.py)
&lt;/h3&gt;

&lt;p&gt;We’ll write a full Streamlit UI that lets you upload an image or PDF, runs NuMarkdown-8B-Thinking, and returns clean Markdown (with an option to view the raw output that contains ).&lt;/p&gt;

&lt;p&gt;Create app.py in your VM (inside your project folder) and add the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import io
import time
from typing import List, Tuple

import streamlit as st
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration

# --- Force stable attention backend (avoid FlashAttention-2) ---
os.environ["TRANSFORMERS_ATTENTION_IMPLEMENTATION"] = "sdpa"
os.environ["HF_USE_FLASH_ATTENTION_2"] = "0"

MODEL_ID = "numind/NuMarkdown-8B-Thinking"
MAX_SIDE = 2200                           # ~3–4MP safety
MIN_PIXELS = 100 * 28 * 28               # model hint
MAX_PIXELS = 5000 * 28 * 28              # model hint
DEFAULT_MAX_NEW_TOKENS = 2000

st.set_page_config(page_title="NuMarkdown-8B-Thinking UI", layout="wide")

@st.cache_resource(show_spinner=True)
def load_model_and_processor():
    processor = AutoProcessor.from_pretrained(
        MODEL_ID,
        trust_remote_code=True,
        use_fast=False,          # quiet warnings, stable behavior
        min_pixels=MIN_PIXELS,
        max_pixels=MAX_PIXELS,
    )
    model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
        MODEL_ID,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
        attn_implementation="sdpa",
    )
    model.eval()
    return processor, model

def pil_from_upload(file) -&amp;gt; Image.Image:
    img = Image.open(file).convert("RGB")
    img.thumbnail((MAX_SIDE, MAX_SIDE))
    return img

def pdf_to_images(file_bytes: bytes, dpi: int = 200) -&amp;gt; List[Image.Image]:
    # Convert PDF bytes to a list of PIL images (requires poppler-utils)
    try:
        from pdf2image import convert_from_bytes
    except Exception as e:
        raise RuntimeError(
            "pdf2image is not available or Poppler is missing. "
            "Install with `pip install pdf2image` and `sudo apt-get install poppler-utils`."
        ) from e
    images = convert_from_bytes(file_bytes, dpi=dpi)
    # downscale each page to ~3–4MP max
    for i in range(len(images)):
        images[i] = images[i].convert("RGB")
        images[i].thumbnail((MAX_SIDE, MAX_SIDE))
    return images

def between(s: str, a: str, b: str) -&amp;gt; str:
    i = s.find(a)
    j = s.find(b, i + len(a))
    return s[i + len(a):j] if i != -1 and j != -1 else s

@torch.inference_mode()
def run_single_image(processor, model, img: Image.Image, temperature: float, max_new_tokens: int) -&amp;gt; Tuple[str, str]:
    messages = [{"role": "user", "content": [{"type": "image"}]}]
    prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)

    out = model.generate(
        **inputs,
        temperature=max(temperature, 1e-5),  # must be &amp;gt; 0 in recent transformers
        max_new_tokens=max_new_tokens,
    )
    text = processor.decode(out[0])
    answer = between(text, "&amp;lt;answer&amp;gt;", "&amp;lt;/answer&amp;gt;")
    return answer, text  # (markdown, raw_with_think)

def concat_markdown(pages_md: List[str]) -&amp;gt; str:
    # Add page separators for clarity
    parts = []
    for i, md in enumerate(pages_md, 1):
        parts.append(f"\n\n---\n\n&amp;lt;!-- Page {i} --&amp;gt;\n\n{md.strip()}\n")
    return "".join(parts).strip()

# ----------------- UI -----------------

st.title("🧠 NuMarkdown-8B-Thinking — Document → Markdown")
st.caption("Upload a scanned page (PNG/JPG) or a PDF. The model reasons about layout, tables, etc., then returns clean Markdown.")

col_left, col_right = st.columns([2, 1])

with col_right:
    st.subheader("Settings")
    temperature = st.number_input("Temperature", value=0.00001, min_value=0.00001, max_value=2.0, step=0.00001, format="%.5f")
    max_new_tokens = st.number_input("Max new tokens", value=DEFAULT_MAX_NEW_TOKENS, min_value=200, max_value=6000, step=100)
    show_think = st.toggle("Show &amp;lt;think&amp;gt; (reasoning) raw output", value=False)
    run_button = st.button("Run Extraction", type="primary", use_container_width=True)

with col_left:
    upload = st.file_uploader("Upload an image or a PDF", type=["png", "jpg", "jpeg", "pdf"])

st.divider()

if run_button:
    if not upload:
        st.error("Please upload a PNG/JPG or PDF first.")
        st.stop()

    processor, model = load_model_and_processor()

    filetype = (upload.type or "").lower()
    start_time = time.time()

    if "pdf" in filetype or upload.name.lower().endswith(".pdf"):
        # PDF → images
        with st.status("Converting PDF to images…", expanded=False):
            pdf_bytes = upload.read()
            images = pdf_to_images(pdf_bytes, dpi=200)
        st.success(f"PDF pages: {len(images)}")

        pages_md = []
        progress = st.progress(0, text="Running model on pages…")
        for i, img in enumerate(images, 1):
            md, raw = run_single_image(processor, model, img, temperature, max_new_tokens)
            pages_md.append(md)
            progress.progress(i / len(images), text=f"Processed page {i}/{len(images)}")

            if show_think:
                with st.expander(f"Raw output (page {i})"):
                    st.code(raw)

        markdown_all = concat_markdown(pages_md)
        dur = time.time() - start_time

        st.subheader("📄 Markdown (all pages)")
        st.code(markdown_all, language="markdown")
        st.download_button("Download Markdown", data=markdown_all.encode("utf-8"),
                           file_name=f"{upload.name.rsplit('.',1)[0]}_extracted.md", mime="text/markdown")
        st.caption(f"Done in {dur:.1f}s")

    else:
        # Single image
        img = pil_from_upload(upload)
        st.image(img, caption="Input image", use_column_width=True)

        with st.status("Running model…", expanded=False):
            md, raw = run_single_image(processor, model, img, temperature, max_new_tokens)
        dur = time.time() - start_time

        st.subheader("📝 Markdown")
        st.code(md, language="markdown")
        st.download_button("Download Markdown", data=md.encode("utf-8"),
                           file_name=f"{upload.name.rsplit('.',1)[0]}_extracted.md", mime="text/markdown")

        if show_think:
            st.subheader("🧩 Raw output (with &amp;lt;think&amp;gt;)")
            st.code(raw)

        st.caption(f"Done in {dur:.1f}s")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffi02sukhps95cwzbe7f4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffi02sukhps95cwzbe7f4.png" alt=" " width="640" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 21: Launch the Streamlit App
&lt;/h3&gt;

&lt;p&gt;Now that we’ve written our app.py Streamlit script, the next step is to launch the app from the terminal.&lt;/p&gt;

&lt;p&gt;Run the following command inside your VM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;streamlit run app.py --server.port 7860 --server.address 0.0.0.0

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;--server.port 7860 → Runs the app on port 7860 (you can change it if needed).&lt;/li&gt;
&lt;li&gt;--server.address 0.0.0.0 → Ensures the app is accessible externally (not just inside the VM).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once executed, Streamlit will start the web server and you’ll see a message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You can now view your Streamlit app in your browser.

URL: http://0.0.0.0:7860

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvwhqfjzqegk4yogotsd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvwhqfjzqegk4yogotsd.png" alt=" " width="640" height="164"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 22: Access the Streamlit App in Browser
&lt;/h3&gt;

&lt;p&gt;After launching the app, you’ll see the interface in your browser.&lt;/p&gt;

&lt;p&gt;Go to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://0.0.0.0:7860/

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuvs550n4bd256i6d8mfw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuvs550n4bd256i6d8mfw.png" alt=" " width="640" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 23: Upload and Extract Documents
&lt;/h3&gt;

&lt;p&gt;Use the Drag and Drop or Browse files button to upload a scanned image (.jpg/.png) or a PDF.&lt;br&gt;
Adjust Settings on the right:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Temperature → Controls randomness (keep very low like 0.00001 for OCR).&lt;/li&gt;
&lt;li&gt;Max new tokens → Length of output (default: 2000).&lt;/li&gt;
&lt;li&gt;Show  reasoning → Optional, shows model’s reasoning process.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Click Run Extraction.&lt;/p&gt;

&lt;p&gt;The model will process your input file, convert images/PDF pages into clean Markdown output, and display it below. You can copy or download this Markdown directly.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtv2kernnip7t5brtvy6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtv2kernnip7t5brtvy6.png" alt=" " width="640" height="369"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;---

&amp;lt;!-- Page 1 --&amp;gt;

# Ayush Kumar

+91-998-4219-294 | ayushknj3@gmail.com | linktr.ee/Ayush7614
[in] ayush-kumar-984443191 | [Chat] Ayush7614 | [Twitter] @AyushKu38757918
Noida, Uttar Pradesh, India

### Objective
Developer Relations Engineer and Full-Stack Developer with deep expertise in open-source, cloud, LLMs, AI/ML, DevOps, and technical community building. Adept at creating large-scale developer education content and tools that empower engineers globally.

### Education
* ABES Engineering College
  * B.Tech in Electronics and Communication Engineering
  * – GPA: 7.7 / 10
  * – Courses: Operating Systems, Data Structures, Algorithms, AI, ML, Networking, Databases
  * July 2019 – August 2023
  * Ghaziabad, India

### Experience
* NodeShift AI Cloud
  * Lead Developer Relations Engineer
  * – Authored 150+ blogs on AI, LLMs, MCP, APIs, Web3, Gaming, Cloud, and TAK Server.
  * – Worked on the Dubai UAE Government’s TAK Server deployment project using NodeShift GPU and compute VMs.
  * – Designed and implemented marketing strategies to enhance brand visibility and audience engagement.
  * – Created developer-focused content in multiple formats (blogs, guides, videos) to educate and captivate our global community.
  * – Actively engaged with users across platforms to increase awareness and adoption of NodeShift services.
  * – Explored and initiated sponsorship and partnership opportunities across technical and developer communities.
  * – Reviewed customer feedback and usage patterns to refine developer experience and improve product documentation.
  * – Led efforts to improve and expand technical documentation to ensure a smoother onboarding experience and increased retention.
  * July 2024 – Present
  * Remote
* Techlatest.net
  * DevRel Engineer Consultant
  * – Content Lead – Developed strategy for AI/ML, DevOps, and GUI-based content.
  * – Authored 150+ blogs and tutorials across Cloud, Linux, Stable Diffusion, Flowise, Superset, etc.
  * – Built GUI Linux (Ubuntu, Kali, Rocky, Tails), Redash, VSCode, RStudio-based developer VMs.
  * – Created newsletters, video courses, and product documentation.
  * – Lead social media presence and SEO optimization; grow Discord and Twitter community.
  * – Worked across AWS, GCP, and Azure ecosystems for product testing and publishing.
  * March 2023 – July 2024
  * Estonia, Remote
* DEVs Dungeon
  * DevRel Engineer, Community Work (Part Time)
  * – Writing blogs for the DEVs Dungeon Community blog.
  * – Organizing Meetups and Hackathons in my Region.
  * – Participating in Events to Represent DEVs Dungeon.
  * – Social media marketing for DEVs Dungeon.
  * – Creating Content on GitHub, Twitter, and LinkedIn.
  * – Building and managing the community.
  * March 2023 – December 2023
  * Remote
* Google Summer of Code - Fossology
  * Student Developer
  * – Built REST APIs using ReactJs and improved legacy APIs.
  * – Created new endpoints with PHP and Slim Framework.
  * – Updated documentation using YAML files for API clarity.
  * May 2022 – August 2022
  * Remote


---

&amp;lt;!-- Page 2 --&amp;gt;

* **Humalect**
  * **DevRel Engineer (Intern)**
    – Content Lead for Humalect on social platforms.
    – Wrote blogs, newsletters, and planned podcasts.
    – Represented Humalect at events and built community.
  December 2022 – January 2023
  Remote

* **QwikSkills**
  * **Community Manager (Intern)**
    – Onboarded 300+ community members, hosted online events.
    – Managed Discord/Telegram and wrote community blogs.
    – Designed campaigns and handled technical support.
  August 2022 – January 2023
  Remote

* **NimbleEdge**
  * **Community Manager (Intern)**
    – Engaged OSS community and hosted global events.
    – Managed dev communities across GitHub, Discord, Meetup.
    – Created support content, handled social media and code issues.
  September 2022 – November 2022
  Remote

* **Keploy**
  * **Open Source Engineer (Intern)**
    – Set up CI/CD pipelines using GitHub Actions.
    – Built UI for Keploy website with ReactJs.
    – Contributed to the main platform.
  May 2022 – August 2022
  Remote

* **Keploy**
  * **DevRel Engineer (Intern)**
    – Provided API guidance and SDK support.
    – Built demo apps and participated in technical forums.
  April 2022 – July 2022
  Remote

* **CryptoCapable**
  * **DevRel Engineer (Intern)**
    – Promoted Web3, Crypto, Blockchain technologies.
    – Delivered talks and guided developer onboarding.
  February 2022 – April 2022
  Remote

* **Hyathi Technologies**
  * **Full Stack Developer (Intern)**
    – Built website MVP with React, Tailwind, NodeJS, MongoDB.
    – Implemented CI/CD using GitHub Actions.
  December 2021 – January 2022
  Remote

* **OneGo**
  * **Full Stack Developer (Intern)**
    – Developed startup site using HTML, CSS, Bootstrap.
    – Integrated Firebase backend, deployed via GitHub Actions.
  September 2021 – November 2021
  Ghaziabad, India

## Projects

* **Paanch-Editor**
  * **Responsive image editing tool using JS, HTML/CSS with 5+ effects**
    – Allows users to apply effects and download edited images directly in-browser.
  Remote

* **Etihaas Chrome Extension**
  * **Displays 'On this day' historical facts using public APIs**
    – Chrome extension shows history events for today’s date from API.
  Remote

* **Foody-Moody**
  * **Fusion food recipe site using React, Node, MongoDB**
    – Dynamic full-stack web app offering unique cuisine recipes.
  Remote

* **Tutorhuntz (Freelance)**
  * **Platform connecting tutors and students in 100+ subjects**
    – Built with React, Node.js, Express.js, Minimal UI, designed for academic support.
  Remote

* **Zipify**
  * **File compression web app built in Node.js**
    – Compress files into ZIPs using jszip and Express server.
  Remote

* **Women-Help Tracker**
  * **Health tracking web app for menstrual wellness**
    – Developed using HTML/CSS, Node.js, Python to support women’s wellness.
  Remote


---

&amp;lt;!-- Page 3 --&amp;gt;

## Honors and Awards

*   Winner – Smart India Hackathon 2022, led team of 5 to national victory.
*   First in college to become GitHub Campus Expert and GSoC contributor.
*   AWS Machine Learning and SUSE Cloud Native Scholarship by Udacity.
*   Top ranks: 3rd in KWOC, 5th SWOC, 17th JWOC, 81st DWOC, 6th CWOC.
*   Best Mentor Award – HSSOC, PSOC, DevicePT open source programs.

## Volunteer Experience

*   Founder – Nexus What The Hack: national-level hackathon community.
*   GitHub Campus Expert – Conducted 20+ technical events, meetups, and hackathons.
*   Auth0 Ambassador – Delivered tech sessions, supported community growth.
*   Mentor – SigmaHacks, CalHacks, Hack This November, HackVolunteer, Garuda Hacks.
*   Organized 15+ community bootcamps and mentored 2000+ budding OSS contributors.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;NuMarkdown-8B-Thinking brings reasoning into OCR like never before. By combining the power of Qwen2.5-VL with fine-tuned thinking tokens, it doesn’t just extract text — it understands layouts, tables, and complex structures before producing clean Markdown. This reasoning-first approach makes it a strong choice for document extraction, RAG pipelines, and knowledge organization, often rivaling even closed-source models in accuracy.&lt;/p&gt;

&lt;p&gt;With the setup steps we walked through — from provisioning a GPU VM to running the model inside an intuitive Streamlit interface — you now have a complete end-to-end workflow. You can upload PDFs or images, watch them convert into structured Markdown in real time, and immediately use that output in your own applications.&lt;/p&gt;

&lt;p&gt;Whether you’re a researcher, developer, or enterprise team, NuMarkdown-8B-Thinking offers a practical, open, and high-performing solution for document intelligence. Try it on your own documents, plug it into your pipelines, and experience what reasoning-powered OCR can unlock.&lt;/p&gt;

</description>
      <category>openai</category>
      <category>chatgpt</category>
      <category>ocr</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Open-Source App Builder That Ate SaaS: Dyad + Ollama Setup</title>
      <dc:creator>Ayush kumar</dc:creator>
      <pubDate>Fri, 22 Aug 2025 05:50:28 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/the-open-source-app-builder-that-ate-saas-dyad-ollama-setup-47o2</link>
      <guid>https://dev.to/nodeshiftcloud/the-open-source-app-builder-that-ate-saas-dyad-ollama-setup-47o2</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgim8q96vapz070ndpw6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgim8q96vapz070ndpw6.jpg" alt=" " width="800" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Dyad is a free, local, and open-source app builder that lets you create AI-powered apps without writing code. It’s a privacy-friendly alternative to platforms like Lovable, v0, Bolt, and Replit—designed to run entirely on your computer, with no lock-in or vendor dependency. With built-in Supabase integration, support for any AI model (including local ones via Ollama), and seamless connection to your existing tools, Dyad makes it easy to launch full-stack apps quickly. Fast, intuitive, and open-source, Dyad is built for makers who want control, speed, and limitless creativity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;p&gt;Website&lt;/p&gt;

&lt;p&gt;Link: &lt;a href="https://www.dyad.sh/" rel="noopener noreferrer"&gt;https://www.dyad.sh/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub&lt;/p&gt;

&lt;p&gt;Link: &lt;a href="https://github.com/dyad-sh/dyad" rel="noopener noreferrer"&gt;https://github.com/dyad-sh/dyad&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step-by-Step Process to Setup Dyad + Ollama
&lt;/h3&gt;

&lt;p&gt;For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Sign Up and Set Up a NodeShift Cloud Account
&lt;/h3&gt;

&lt;p&gt;Visit the &lt;a href="https://app.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;NodeShift Platform&lt;/a&gt; and create an account. Once you’ve signed up, log into your account.&lt;/p&gt;

&lt;p&gt;Follow the account setup process and provide the necessary details and information.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq99yae9g02o17tqnz4o7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq99yae9g02o17tqnz4o7.png" alt=" " width="640" height="365"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Create a GPU Node (Virtual Machine)
&lt;/h3&gt;

&lt;p&gt;GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fagcwrm6k2vb6ee8sti6d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fagcwrm6k2vb6ee8sti6d.png" alt=" " width="640" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5bxe0couy017d9ou2pti.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5bxe0couy017d9ou2pti.png" alt=" " width="640" height="391"&gt;&lt;/a&gt;&lt;br&gt;
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: Select a Model, Region, and Storage
&lt;/h3&gt;

&lt;p&gt;In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwnhdzbb7yqnbc3bhe73.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwnhdzbb7yqnbc3bhe73.png" alt=" " width="640" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2u29y8ecetgul6alh9x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2u29y8ecetgul6alh9x.png" alt=" " width="640" height="319"&gt;&lt;/a&gt;&lt;br&gt;
We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: Select Authentication Method
&lt;/h3&gt;

&lt;p&gt;There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxidb6wc1ardza9s1dc7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxidb6wc1ardza9s1dc7.png" alt=" " width="640" height="198"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;Next, you will need to choose an image for your Virtual Machine. We will deploy Ollama on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install Ollama on your GPU Node.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlev6zehrr58cf16hj4l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlev6zehrr58cf16hj4l.png" alt=" " width="640" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuzxy64s9iqeoz5fepmt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuzxy64s9iqeoz5fepmt.png" alt=" " width="640" height="353"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 6: Virtual Machine Successfully Deployed
&lt;/h3&gt;

&lt;p&gt;You will get visual confirmation that your node is up and running.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiekwby2h60ch0non5730.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiekwby2h60ch0non5730.png" alt=" " width="640" height="286"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 7: Connect to GPUs using SSH
&lt;/h3&gt;

&lt;p&gt;NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.&lt;/p&gt;

&lt;p&gt;Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtllnxpf0ajwhewol79a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtllnxpf0ajwhewol79a.png" alt=" " width="640" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fleobtnt5surebbep2rfy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fleobtnt5surebbep2rfy.png" alt=" " width="640" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now open your terminal and paste the proxy SSH IP or direct SSH IP.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqnrrao74a2dg0mgg880.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqnrrao74a2dg0mgg880.png" alt=" " width="640" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, if you want to check the GPU details, run the command below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nvidia-smi

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08srbft3zdl39elgeeid.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08srbft3zdl39elgeeid.png" alt=" " width="640" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Install Ollama
&lt;/h3&gt;

&lt;p&gt;After connecting to the terminal via SSH, it’s now time to install Ollama from the official Ollama website.&lt;/p&gt;

&lt;p&gt;Website Link: &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;https://ollama.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run the following command to install the Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -fsSL https://ollama.com/install.sh | sh

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0bvyk9trbohbh3o0wvs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0bvyk9trbohbh3o0wvs.png" alt=" " width="640" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 9: Serve Ollama
&lt;/h3&gt;

&lt;p&gt;Run the following command to host the Ollama so that it can be accessed and utilized efficiently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OLLAMA_HOST=0.0.0.0:11434 ollama serve

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhaoet91c7j34hlmvaqh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhaoet91c7j34hlmvaqh.png" alt=" " width="640" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 10: Pull the GPT OSS 120B Model
&lt;/h3&gt;

&lt;p&gt;Run the following command to pull the GPT OSS 120B Model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama pull gpt-oss:120b

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait for the download and extraction to finish until you see success.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz0ff89ert40qojksulbd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz0ff89ert40qojksulbd.png" alt=" " width="640" height="273"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 11: Verify Downloaded Models
&lt;/h3&gt;

&lt;p&gt;After pulling the GPT-OSS models, you can check that they’ve been successfully downloaded and are available on your system.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama list

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see output like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME           ID              SIZE   MODIFIED
gpt-oss:120b   735371f916a9    65 GB  50 seconds ago

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgnhcrt94zempg6hvbit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgnhcrt94zempg6hvbit.png" alt=" " width="640" height="111"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 12: Set Up SSH Port Forwarding (For Remote Models Like Ollama on a GPU VM)
&lt;/h3&gt;

&lt;p&gt;If you’re running a model like Ollama on a remote GPU Virtual Machine (e.g. via NodeShift, AWS, or your own server), you’ll need to port forward the Ollama server to your local machine so Dyad can connect to it.&lt;/p&gt;

&lt;p&gt;Here’s how to do it:&lt;/p&gt;

&lt;p&gt;Example (Mac/Linux Terminal):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -L 11434:localhost:11434 root@&amp;lt;your-vm-ip&amp;gt; -p &amp;lt;your-ssh-port&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once connected, your local machine will treat &lt;a href="http://localhost:11434" rel="noopener noreferrer"&gt;http://localhost:11434&lt;/a&gt; as if Ollama is running locally.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replace  with your VM’s IP address&lt;/li&gt;
&lt;li&gt;Replace  with the custom port (e.g. 19257)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On Windows:&lt;br&gt;
Use a tool like &lt;a href="https://www.putty.org/" rel="noopener noreferrer"&gt;PuTTY&lt;/a&gt; or ssh from WSL/PowerShell with similar port forwarding.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78pzow1l1ilpojnm4q7o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78pzow1l1ilpojnm4q7o.png" alt=" " width="640" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’re running large language models (like GPT-OSS 120b) on a remote GPU Virtual Machine, you’ll want Dyad on your local machine to talk to that remote Ollama instance.&lt;/p&gt;

&lt;p&gt;But since the model is running on the VM — not on your laptop — we need to bridge the gap.&lt;/p&gt;

&lt;p&gt;That’s where SSH port forwarding comes in.&lt;/p&gt;

&lt;p&gt;Why use a GPU VM?&lt;br&gt;
Large models require serious compute power. Your laptop might struggle or overheat trying to run them. So we spin up a GPU-powered VM in the cloud — it gives us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster responses&lt;/li&gt;
&lt;li&gt;Support for large models (7B, 13B, even 120B!)&lt;/li&gt;
&lt;li&gt;More RAM + VRAM for smoother inference&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 13: Download Dyad
&lt;/h3&gt;

&lt;p&gt;To get started with Dyad, you’ll need to download the installer from the official website:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open your web browser (Google Chrome, Safari, Firefox, or Edge).&lt;/li&gt;
&lt;li&gt;In the search bar, type “Dyad app” and press Enter.&lt;/li&gt;
&lt;li&gt;From the search results, click on the link to the official Dyad website (look for the domain that says it’s the official site).&lt;/li&gt;
&lt;li&gt;On the homepage, locate the “Download Dyad” button at the top right or center of the page.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select the correct version for your operating system:&lt;br&gt;
macOS (Apple Silicon or Intel)&lt;br&gt;
Windows&lt;br&gt;
Linux (if available)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click the button to start the download. The file will automatically save to your computer’s default download folder.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once the download is complete, you’re ready to move on to installation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tip: Dyad is free, open-source, and works without vendor lock-in. It supports building full-stack AI apps with Supabase integration and can connect with popular models like Gemini, GPT, and Claude.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fttsnam2bas5fhxtyw7uw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fttsnam2bas5fhxtyw7uw.png" alt=" " width="640" height="428"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 14: Set Up Dyad for the First Time
&lt;/h3&gt;

&lt;p&gt;Once Dyad is installed and launched, you’ll see a setup screen that helps you prepare your environment for building apps. Follow these steps carefully:&lt;/p&gt;

&lt;p&gt;Install Node.js (App Runtime)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dyad requires Node.js to run your applications locally.&lt;/li&gt;
&lt;li&gt;If Node.js is already installed on your machine, Dyad will detect it automatically and mark this step as complete (green check).&lt;/li&gt;
&lt;li&gt;If not, you’ll be prompted to download and install Node.js. Simply follow the link provided, install the latest LTS version, and restart Dyad.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Setup AI Model Access&lt;br&gt;
To generate and run apps, Dyad needs access to AI providers. You can connect one or multiple providers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google Gemini – Click “Setup Google Gemini API Key” to use Gemini for free. You’ll be redirected to create or retrieve your API key, then paste it back into Dyad.&lt;/li&gt;
&lt;li&gt;Other AI Providers – If you want more options, click “Setup other AI providers.” Dyad supports OpenAI, Anthropic, OpenRouter, and more. Enter the corresponding API keys in the fields provided.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Import or Start a New App&lt;br&gt;
Once setup is complete, you can either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click “Import App” to load an existing Dyad project.&lt;/li&gt;
&lt;li&gt;Or, type your idea directly in the “Ask Dyad to build…” box. For example, enter “Build a To-Do List App” or “Build a Recipe Finder App.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose from Starter Templates (Optional)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dyad also provides quick templates such as To-Do List App, Virtual Avatar Builder, Recipe Finder &amp;amp; Meal Planner, AI Image Generator, or 3D Portfolio Viewer.&lt;/li&gt;
&lt;li&gt;Select one to quickly spin up a project and start experimenting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tip: You can always switch between models (Auto/Pro) based on your needs and API access. Auto uses free/available models, while Pro unlocks premium capabilities.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fws0mprnacf3f6n154ffy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fws0mprnacf3f6n154ffy.png" alt=" " width="640" height="447"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 15: Configure AI Providers in Dyad
&lt;/h3&gt;

&lt;p&gt;To enable Dyad to build and run apps, you need to connect it with one or more AI providers. This allows Dyad to generate code using different models.&lt;/p&gt;

&lt;p&gt;Open Settings → AI → Model Providers&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On the left sidebar, click Settings, then select AI &amp;gt; Model Providers.&lt;/li&gt;
&lt;li&gt;You’ll see a list of supported providers: OpenAI, Anthropic, Google (Gemini), OpenRouter, Dyad, and an option to add a custom provider.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose Your Provider&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google (Gemini) – Offers a free tier. Click Setup and follow the link to get your API key. Paste it into the input field in Dyad.&lt;/li&gt;
&lt;li&gt;OpenAI – If you have an API key, click Setup, then paste your key to enable GPT models.&lt;/li&gt;
&lt;li&gt;Anthropic – Enter your Claude API key if you use Anthropic.&lt;/li&gt;
&lt;li&gt;OpenRouter – Supports multiple models with a free tier. Setup is similar — retrieve your key from OpenRouter and paste it.&lt;/li&gt;
&lt;li&gt;Dyad – If you prefer, you can set up Dyad’s native model.&lt;/li&gt;
&lt;li&gt;Custom Provider – Advanced users can connect any LLM endpoint by clicking Add custom provider and entering endpoint details + API key.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enable Telemetry (Optional)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Telemetry is enabled by default to anonymously record usage data and improve Dyad. You can toggle it ON or OFF based on your preference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enable Native Git (Optional)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Under Experiments, you can enable Native Git for faster version control. This requires installing Git on your system if not already installed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Save &amp;amp; Verify&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Once you enter API keys, Dyad will validate them.&lt;/li&gt;
&lt;li&gt;If successful, the status will change from “Needs Setup” to Active.&lt;/li&gt;
&lt;li&gt;You’re now ready to start building apps with your chosen AI models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tip: You can set up multiple providers and switch between them depending on which model you want to use for a project.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbggepfqo1m2gteg471uv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbggepfqo1m2gteg471uv.png" alt=" " width="640" height="478"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 16: Add a Custom AI Provider
&lt;/h3&gt;

&lt;p&gt;If you want Dyad to use a language model that isn’t listed (e.g., a self-hosted model, private API, or enterprise endpoint), you can configure it as a Custom Provider.&lt;/p&gt;

&lt;p&gt;Click “Add Custom Provider”&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the AI Providers section of the Settings menu, select Add Custom Provider.&lt;/li&gt;
&lt;li&gt;A setup form will appear (like in the screenshot).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fill Out Provider Details&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provider ID – A unique identifier without spaces (e.g., my-provider).&lt;/li&gt;
&lt;li&gt;Display Name – The friendly name you want to appear in Dyad’s interface (e.g., My Enterprise LLM).&lt;/li&gt;
&lt;li&gt;API Base URL – The root URL of the model’s API (e.g., &lt;a href="https://api.example.com/v1" rel="noopener noreferrer"&gt;https://api.example.com/v1&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Environment Variable (Optional) – If you want Dyad to reference a stored API key, enter its environment variable name here (e.g., MY_PROVIDER_API_KEY).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Authentication&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make sure the API key or token required by the provider is properly stored in your system’s environment variables.&lt;/li&gt;
&lt;li&gt;If not using environment variables, Dyad may prompt you to input the key directly when connecting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Save the Provider&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Once all fields are complete, click Add Provider.&lt;/li&gt;
&lt;li&gt;The provider will appear alongside OpenAI, Anthropic, Google, and others in your Model Providers list.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Test the Connection&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After adding, Dyad will validate the provider by making a test API call.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tip: This feature is powerful if you’re hosting open-source models locally, using private APIs like vLLM, or experimenting with custom endpoints. It gives you full flexibility without vendor lock-in.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi823bb2o751w92x1ois3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi823bb2o751w92x1ois3.png" alt=" " width="640" height="544"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 17: Connect Dyad with Ollama
&lt;/h3&gt;

&lt;p&gt;Now that you’ve filled out the Add Custom Provider form for Ollama:&lt;/p&gt;

&lt;p&gt;Enter Provider Details&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provider ID: ollama&lt;/li&gt;
&lt;li&gt;Display Name: ollama (or any friendly name you prefer).&lt;/li&gt;
&lt;li&gt;API Base URL:&lt;a href="http://localhost:11434/v1" rel="noopener noreferrer"&gt;http://localhost:11434/v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;This points Dyad to the local Ollama server that runs on port 11434.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Save the Provider&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click Add Provider to save the configuration.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You should now see Ollama listed as an active provider in your Dyad AI Providers panel.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run Ollama Locally&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make sure Ollama is running on your machine. Start the Ollama server by opening a terminal and running:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama serve

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;This ensures Dyad can connect to the Ollama API at localhost:11434.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Test the Connection&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In Dyad, try generating a simple app idea (e.g., “Build a To-Do List app”).&lt;/li&gt;
&lt;li&gt;If the connection is successful, Dyad will use Ollama to generate the project code.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisocigyv7rpczek2zfw1.png" alt=" " width="640" height="541"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 18: Add Ollama Models in Dyad (and verify)
&lt;/h3&gt;

&lt;p&gt;Now that Configure ollama shows Setup Complete, make the actual models available to Dyad.&lt;/p&gt;

&lt;p&gt;Make sure Ollama is running&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama serve

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register a model in Dyad&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In Settings → AI → Model Providers → ollama → Models, click Add Custom Model.&lt;/li&gt;
&lt;li&gt;Fill in:
Model ID: the exact Ollama model name (e.g., llama3:8b).
Display Name: anything friendly (e.g., Llama 3 (8B)).
Context Window: optional (set if you know it; otherwise leave blank).
Max Output Tokens: optional (e.g., 1024).&lt;/li&gt;
&lt;li&gt;Save. Repeat for any other Ollama models you want exposed.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcg76fp6nmghnxxdu5e3c.png" alt=" " width="640" height="544"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oejlt0hvoor06u8n1jv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5oejlt0hvoor06u8n1jv.png" alt=" " width="640" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 19: Add and Register a Custom Model in Dyad
&lt;/h3&gt;

&lt;p&gt;Fill Out the Model Details&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Model ID:gpt-oss:120b&lt;br&gt;
This must exactly match the model name available in your Ollama installation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Name: gpt-oss (this is the display name that will appear in Dyad).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Description (Optional): You can write something like “Open-source GPT OSS 120B model via Ollama”.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Max Output Tokens (Optional): e.g., 4096 (or adjust based on model capability).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Context Window (Optional): e.g., 8192.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Save the Model&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click Add Model.&lt;/li&gt;
&lt;li&gt;The model will now appear under Models in the Ollama provider section.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fga2oe3k7fty0kr0c6e3x.png" alt=" " width="640" height="563"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 20: Build your first Dyad app with gpt-oss (Ollama)
&lt;/h3&gt;

&lt;p&gt;Now that gpt-oss:120b shows up under Models and Ollama is Setup Complete, let’s generate an app.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fteq5kkeetzry7vzrz9a2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fteq5kkeetzry7vzrz9a2.png" alt=" " width="640" height="552"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 21: Select ollama → gpt-oss in the Builder and generate
&lt;/h3&gt;

&lt;p&gt;Open the model picker&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the build screen (the bar above “Ask Dyad to build…”), click the Model dropdown.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose the local provider&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to Local models → ollama (or directly ollama in the list).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pick your model&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select gpt-oss (the one you registered as gpt-oss:120b).&lt;/li&gt;
&lt;li&gt;Optional: switch Auto → Pro if you want Dyad to always use your chosen model without auto-switching.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set generation options (optional)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the small settings/gear near the prompt bar:&lt;/li&gt;
&lt;li&gt;Max output tokens: 2048–4096 (for long code generations).&lt;/li&gt;
&lt;li&gt;Temperature: 0.2–0.5 for reliable code; raise for creativity.&lt;/li&gt;
&lt;li&gt;Context window / system prompt: leave default unless you need custom guardrails.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prompt Dyad to build&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In “Ask Dyad to build…”, paste a concrete request, e.g.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a Newsletter Creator:
- Tech stack: React + Vite + Tailwind
- Features: editor with markdown preview, save drafts to localStorage, export to HTML/Markdown, simple dark UI, keyboard shortcuts
- Include README with setup &amp;amp; run steps

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Hit Send (paper-plane). Review the plan → Accept.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run and iterate&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When scaffolding completes, click Run (or open terminal) and follow the start script (usually npm install &amp;amp;&amp;amp; npm run dev).&lt;/li&gt;
&lt;li&gt;Iterate with follow-up prompts: “add image upload”, “add tags &amp;amp; search”, “deploy-ready build script”, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the model dropdown doesn’t show ollama/gpt-oss:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure ollama serve is running and the model exists (ollama list).&lt;/li&gt;
&lt;li&gt;Recheck the base URL &lt;a href="http://localhost:11434/v1" rel="noopener noreferrer"&gt;http://localhost:11434/v1&lt;/a&gt; in Settings → AI → Model Providers → ollama.&lt;/li&gt;
&lt;li&gt;If using a remote VM, use http://:11434/v1 or tunnel via SSH: ssh -L 11434:localhost:11434 &lt;a href="mailto:user@VM"&gt;user@VM&lt;/a&gt;.
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvy8daprtkl01tavczuz.png" alt=" " width="640" height="532"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0xrk5kj2v9upzvdr85t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0xrk5kj2v9upzvdr85t.png" alt=" " width="640" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this video, I walk through the entire process of setting up and using Dyad with Ollama as the custom AI provider. Starting from downloading and installing Dyad, I show how to configure Node.js, connect API providers, and register a custom model inside Ollama (gpt-oss:120b). The video captures each step clearly—adding the API base URL, activating Ollama, registering the model in Dyad, and finally selecting it from the model picker. To demonstrate the workflow, I use Dyad’s builder interface to generate a project, including an AI Image Generator app, showing how prompts translate into scaffolded code in real time. By the end, viewers can see a complete pipeline: from local model setup → integration in Dyad → running their first functional AI app without vendor lock-in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/z4kaIEPcIEc" rel="noopener noreferrer"&gt;https://youtu.be/z4kaIEPcIEc&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Dyad makes building AI-powered apps simple, fast, and completely under your control. By combining it with Ollama on a GPU-powered VM, you unlock the ability to run powerful open-source models locally or remotely—without vendor lock-in. Whether you’re a developer, a tinkerer, or someone exploring no-code AI tools, Dyad gives you the flexibility to prototype, build, and scale apps in minutes. With this setup, you now have a private, efficient, and future-proof way to turn your ideas into fully functional apps.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>saas</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>A Step-By-Step Guide to Install Qwen3 30B Locally</title>
      <dc:creator>Aditi Bindal</dc:creator>
      <pubDate>Mon, 11 Aug 2025 13:43:54 +0000</pubDate>
      <link>https://dev.to/nodeshiftcloud/a-step-by-step-guide-to-install-qwen3-30b-locally-o7j</link>
      <guid>https://dev.to/nodeshiftcloud/a-step-by-step-guide-to-install-qwen3-30b-locally-o7j</guid>
      <description>&lt;p&gt;The Qwen3-30B-A3B-Instruct-2507 is an advanced iteration of the Qwen3 series, marking a significant leap forward in the landscape of causal language models. Boasting an impressive 30.5 billion parameters with 3.3 billion actively engaged, this model excels across a diverse array of capabilities such as instruction following, complex logical reasoning, text comprehension, mathematics, and science. Its robust coding proficiency, demonstrated by high scores in benchmarks such as MultiPL-E and LiveCodeBench, makes it particularly attractive to developers and researchers. The model also excels in multilingual contexts and handles extensive 256K token contexts effortlessly, making it ideal for intricate, lengthy tasks. Furthermore, its refined alignment with user preferences in subjective and open-ended scenarios ensures that interactions feel natural, intuitive, and highly personalised.&lt;/p&gt;

&lt;p&gt;In this article, we guide you step-by-step on installing Qwen3-30B locally or in GPU-accelerated environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;The minimum system requirements for running this model are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPU: 1x H200&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Storage: 50 GB (preferable)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;VRAM: at least 64 GB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda installed&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step-by-step process to install and run Qwen3-30B
&lt;/h2&gt;

&lt;p&gt;For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by NodeShift since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Setting up a NodeShift Account
&lt;/h3&gt;

&lt;p&gt;Visit &lt;a href="https://app.nodeshift.com/sign-up" rel="noopener noreferrer"&gt;app.nodeshift.com&lt;/a&gt; and create an account by filling in basic details, or continue signing up with your Google/GitHub account.&lt;/p&gt;

&lt;p&gt;If you already have an account, &lt;a href="http://app.nodeshift.com" rel="noopener noreferrer"&gt;login&lt;/a&gt; straight to your dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3p61u5r46mrb6vcsiqr.png" alt="Image-step1-1" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create a GPU Node
&lt;/h3&gt;

&lt;p&gt;After accessing your account, you should see a dashboard (see image), now:&lt;/p&gt;

&lt;p&gt;1) Navigate to the menu on the left side.&lt;/p&gt;

&lt;p&gt;2) Click on the &lt;strong&gt;GPU Nodes&lt;/strong&gt; option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokdraa5tkg40fzgkn7fo.png" alt="Image-step2-1" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Click on &lt;strong&gt;Start&lt;/strong&gt; to start creating your very first GPU node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfhk9s2i1dfe211zgfev.png" alt="Image-step2-2" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Selecting configuration for GPU (model, region, storage)
&lt;/h3&gt;

&lt;p&gt;1) For this tutorial, we’ll be using 1x H200 GPU, however, you can choose any GPU as per the prerequisites.&lt;/p&gt;

&lt;p&gt;2) Similarly, we’ll opt for 200 GB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4eg1srvuvt289uey0baa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4eg1srvuvt289uey0baa.png" alt="Image-step3-1" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Choose GPU Configuration and Authentication method
&lt;/h3&gt;

&lt;p&gt;1) After selecting your required configuration options, you’ll see the available GPU nodes in your region and according to (or very close to) your configuration. In our case, we’ll choose a 1x H100 SXM 80GB GPU node with 192vCPUs/80GB RAM/200GB SSD.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffuuphp4rmseb1dr3c6qt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffuuphp4rmseb1dr3c6qt.png" alt="Image-step4-1" width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Next, you'll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our &lt;a href="https://docs.nodeshift.com/gpus/create-gpu-deployment" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchyrp5ijzlmevkc7puaf.png" alt="Image-step4-2" width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Choose an Image
&lt;/h3&gt;

&lt;p&gt;The final step is to choose an image for the VM, which in our case is &lt;strong&gt;Nvidia Cuda&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm3gwe0tprkoeqnx5x51.png" alt="Image-step5-1" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click &lt;strong&gt;Create&lt;/strong&gt; to deploy the node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F647pyrcdxwtp6gz0tieb.png" alt="Image-step5-2" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk810i78g0piq7z2jxu8j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk810i78g0piq7z2jxu8j.png" alt="Image-step5-3" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Connect to active Compute Node using SSH
&lt;/h3&gt;

&lt;p&gt;1) As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status &lt;strong&gt;Running&lt;/strong&gt; in green, meaning that our Compute node is ready to use!&lt;/p&gt;

&lt;p&gt;2) Once your GPU shows this status, navigate to the three dots on the right, click on &lt;strong&gt;Connect with SSH&lt;/strong&gt;, and copy the SSH details that appear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15qhftsr1k75orubyj15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15qhftsr1k75orubyj15.png" alt="Image-step6-1" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you copy the details, follow the below steps to connect to the running GPU VM via SSH:&lt;/p&gt;

&lt;p&gt;1) Open your terminal, paste the SSH command, and run it.&lt;/p&gt;

&lt;p&gt;2) In some cases, your terminal may take your consent before connecting. Enter ‘yes’.&lt;/p&gt;

&lt;p&gt;3) A prompt will request a password. Type the SSH password, and you should be connected.&lt;/p&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7307nybljxnshe9dm4p2.png" alt="Image-step6-2" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, If you want to check the GPU details, run the following command in the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Set up the project environment with dependencies
&lt;/h3&gt;

&lt;p&gt;1) Create a virtual environment using &lt;a href="https://nodeshift.com/blog/set-up-anaconda-on-ubuntu-22-04-in-minutes-simplify-your-ai-workflow" rel="noopener noreferrer"&gt;Anaconda&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conda create -n qwen python=3.11 -y &amp;amp;&amp;amp; conda activate qwen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hwf3q01mgtlwhxavyue.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hwf3q01mgtlwhxavyue.png" alt="Image-step7-1" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Once you’re inside the environment, install vllm with dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install --upgrade vllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8l3uska8dmuptv4jyb96.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8l3uska8dmuptv4jyb96.png" alt="Image-step7-2" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Also, open a second terminal, connect to remote server with SSH and install open-webui.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install open-webui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 8: Download and Run the model
&lt;/h3&gt;

&lt;p&gt;1) Download model with vllm and host the endpoint at 8000.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507 --max-model-len 32768 --gpu-memory-utilization 0.95
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9b7fry6idimqoi1sl2ur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9b7fry6idimqoi1sl2ur.png" alt="Image-step8-1" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) In the second terminal connected with the GPU host with ssh, serve the open-webui frontend endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;open-webui serve --port 3000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhszbpe7ynff5vyqgx36w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhszbpe7ynff5vyqgx36w.png" alt="Image-step8-2" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhk910xjcstcgte8buly.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjhk910xjcstcgte8buly.png" alt="Image-step8-3" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3) Forward both the ports and tunnel them to access in the local browser.&lt;/p&gt;

&lt;p&gt;If you’re on a remote machine (e.g., NodeShift GPU), you’ll need to do SSH port forwarding in order to access the both vllm and open-webui session on your local browser.&lt;/p&gt;

&lt;p&gt;Run the following command in your local terminal after replacing:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_PORT&amp;gt;&lt;/code&gt; with the PORT allotted to your remote server (For the NodeShift server – you can find it in the deployed GPU details on the dashboard).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;PATH_TO_SSH_KEY&amp;gt;&lt;/code&gt; with the path to the location where your SSH key is stored.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&amp;lt;YOUR_SERVER_IP&amp;gt;&lt;/code&gt; with the IP address of your remote server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -L 3000:localhost:3000 -p &amp;lt;YOUR_SERVER_PORT&amp;gt; -i &amp;lt;PATH_TO_SSH_KEY&amp;gt; root@&amp;lt;YOUR_SERVER_IP&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In another local terminal run forward the port for vllm endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -L 8000:localhost:8000 -p &amp;lt;YOUR_SERVER_PORT&amp;gt; -i &amp;lt;PATH_TO_SSH_KEY&amp;gt; root@&amp;lt;YOUR_SERVER_IP&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 9: Run the model via Open WebUI Interface
&lt;/h3&gt;

&lt;p&gt;Once ports are forwarded, you can simply access the model via Open WebUI interface and chat with it.&lt;/p&gt;

&lt;p&gt;1) Before running the model, connect the webui with vllm API endpoint in the settings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2aoj18gjzcu8pvrori0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2aoj18gjzcu8pvrori0.png" alt="Image-step9-1" width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) Select the Qwen3-30B model in the chat page and run the prompt.&lt;/p&gt;

&lt;p&gt;For e.g., we’re testing the following prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Summarize the following passage in 3 bullet points.
2. Then, extract 3 key insights and explain their implications.
3. Finally, write a Python function that could analyze similar passages for sentiment.

---
Passage:

"The rapid advancement of AI technologies has transformed industries across the globe. In healthcare, AI models are diagnosing diseases earlier and more accurately. In finance, algorithmic trading and risk modeling are becoming more sophisticated. Yet, as AI grows more powerful, ethical questions around bias, privacy, and job displacement remain urgent. Policymakers and technologists must collaborate to create guardrails that ensure innovation benefits society as a whole."
---

Give your response in clearly separated sections.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frtl4170rm5axn0mcm3ek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frtl4170rm5axn0mcm3ek.png" alt="Image-step9-2" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdxdp9zrtimcx7ageygde.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdxdp9zrtimcx7ageygde.png" alt="Image-step9-3" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskm5ssf4mesaqw432140.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskm5ssf4mesaqw432140.png" alt="Image-step9-4" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Installing Qwen3-30B-A3B-Instruct-2507 locally equips developers and researchers with a cutting-edge language model, renowned for its powerful reasoning, extensive multilingual support, and exceptional handling of long-context tasks. Pariring it with NodeShift GPUs further enhances this experience, providing streamlined deployment, efficient resource management, and scalable infrastructure. Together, these tools empower users to harness advanced AI capabilities effectively, bridging innovation with accessibility and performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For more information about NodeShift:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nodeshift.com/?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/company/nodeshift/?%0Aref=blog.nodeshift.com" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/nodeshiftai?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/4dHNxnW7p7?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://app.daily.dev/nodeshift?ref=blog.nodeshift.com" rel="noopener noreferrer"&gt;daily.dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>qwen</category>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
