<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: GPUStack</title>
    <description>The latest articles on DEV Community by GPUStack (@gpustack).</description>
    <link>https://dev.to/gpustack</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1673709%2F6dfb7108-ed8e-4105-99f3-aaeb9ca11abd.png</url>
      <title>DEV Community: GPUStack</title>
      <link>https://dev.to/gpustack</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gpustack"/>
    <language>en</language>
    <item>
      <title>GPUStack MaxKB: Build a Powerful and Easy-to-Use Open-Source Enterprise AI Agent Platform</title>
      <dc:creator>GPUStack</dc:creator>
      <pubDate>Tue, 10 Mar 2026 02:54:50 +0000</pubDate>
      <link>https://dev.to/gpustack/gpustack-x-maxkb-build-a-powerful-and-easy-to-use-open-source-enterprise-ai-agent-platform-1mb8</link>
      <guid>https://dev.to/gpustack/gpustack-x-maxkb-build-a-powerful-and-easy-to-use-open-source-enterprise-ai-agent-platform-1mb8</guid>
      <description>&lt;h1&gt;
  
  
  GPUStack × MaxKB: Build a Powerful and Easy-to-Use Open-Source Enterprise AI Agent Platform
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;: By leveraging GPUStack for efficient model deployment and management, and connecting those models to MaxKB, you can easily build an AI assistant with &lt;strong&gt;knowledge base retrieval + intelligent Q&amp;amp;A&lt;/strong&gt; capabilities.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As AI applications become increasingly common within organizations, more teams are beginning to focus on two core challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;How to efficiently manage and deploy local large models&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How to quickly build enterprise knowledge bases and AI Agents&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are looking for solutions to both problems, the combination of &lt;strong&gt;GPUStack + MaxKB&lt;/strong&gt; is well worth exploring.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPUStack&lt;/strong&gt;: Focuses on GPU resource management and model deployment, supporting multi-node clusters and multi-model services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MaxKB&lt;/strong&gt;: An open-source enterprise knowledge base and AI application platform that enables rapid development of knowledge-based Q&amp;amp;A systems and AI Agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By connecting &lt;strong&gt;GPUStack-provided model services to MaxKB&lt;/strong&gt;, you can easily build a &lt;strong&gt;practical enterprise AI knowledge assistant&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article will walk through the entire process from scratch.&lt;/p&gt;

&lt;h1&gt;
  
  
  📌 What You'll Learn
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Deploy the latest &lt;strong&gt;GPUStack v2.1.0&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Deploy required models in GPUStack&lt;/li&gt;
&lt;li&gt;Obtain GPUStack model connection information&lt;/li&gt;
&lt;li&gt;Deploy &lt;strong&gt;MaxKB&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Connect GPUStack models in MaxKB&lt;/li&gt;
&lt;li&gt;Practical example: Build a &lt;strong&gt;GPUStack documentation knowledge base&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Install GPUStack v2.1.0
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1. Install GPUStack Server
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; gpustack-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--restart&lt;/span&gt; unless-stopped &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 80:80 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; gpustack-data:/var/lib/gpustack &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /data/gpustack_cache:/var/lib/gpustack/cache &lt;span class="se"&gt;\&lt;/span&gt;
  gpustack/gpustack:v2.1.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bootstrap-password&lt;/span&gt; &lt;span class="s2"&gt;"123"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--debug&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9h0e1tc39w4ywj2lxpz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9h0e1tc39w4ywj2lxpz.png" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After running the command above, open your browser and visit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;http://your_host_ip
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will enter the &lt;strong&gt;GPUStack UI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Default login credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;admin / 123
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkr2u7axddu3iztbcu9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpkr2u7axddu3iztbcu9t.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Create a Cluster
&lt;/h2&gt;

&lt;p&gt;GPUStack manages worker nodes in units called &lt;strong&gt;Clusters&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When deploying GPUStack Server for the first time, you will be prompted to create your first cluster. Click:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create Your First Cluster&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Follow the UI instructions to complete the setup.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can also go to the &lt;strong&gt;Clusters&lt;/strong&gt; page from the sidebar and click &lt;strong&gt;Add Cluster&lt;/strong&gt; to create one manually.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gtyibvn40543z3jxov3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gtyibvn40543z3jxov3.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vkvxvmorhy10g8asyoa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vkvxvmorhy10g8asyoa.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3axbbhehoqre4jx66ob3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3axbbhehoqre4jx66ob3.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Add a Worker
&lt;/h2&gt;

&lt;p&gt;After creating a cluster, the system will prompt you to &lt;strong&gt;Add Worker&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Follow the instructions in the UI.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can also add one manually via the &lt;strong&gt;Workers&lt;/strong&gt; page in the sidebar.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawy5nphxjgt67h3goj0a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawy5nphxjgt67h3goj0a.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fabpe92adno4a57t0ez0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fabpe92adno4a57t0ez0z.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5v8saioaklguhlaxwnxp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5v8saioaklguhlaxwnxp.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run the diagnostic command provided in the guide interface.&lt;/p&gt;

&lt;p&gt;If the drivers and container runtime are correctly installed, you will see two &lt;strong&gt;OK&lt;/strong&gt; messages.&lt;/p&gt;

&lt;p&gt;If &lt;strong&gt;not configured&lt;/strong&gt; appears, follow the provided links to check dependency documentation and install the missing components according to your environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filo1thxijfqzoolpx3ai.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Filo1thxijfqzoolpx3ai.png" width="800" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6k60htirgwlwxb77cj7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6k60htirgwlwxb77cj7.png" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Model Cache Volume Mount&lt;/strong&gt;: Mount this directory to the model cache directory &lt;code&gt;/var/lib/gpustack/cache&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPUStack Data Volume&lt;/strong&gt;: Mount this directory to the data directory &lt;code&gt;/var/lib/gpustack&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07bslxly0i1ap7s8byfc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07bslxly0i1ap7s8byfc.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then run the Worker startup command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; gpustack-worker &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"GPUSTACK_RUNTIME_DEPLOY_MIRRORED_NAME=gpustack-worker"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"GPUSTACK_TOKEN=gpustack_7b42996d3f5571d5_8181f986537c100369eaa2dfcf6d6359"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;unless-stopped &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--privileged&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--volume&lt;/span&gt; /var/run/docker.sock:/var/run/docker.sock &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--volume&lt;/span&gt; gpustack-worker-data:/var/lib/gpustack &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--volume&lt;/span&gt; /data/gpustack_cache:/var/lib/gpustack/cache &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--runtime&lt;/span&gt; nvidia &lt;span class="se"&gt;\&lt;/span&gt;
   gpustack/gpustack:v2.1.0 &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--server-url&lt;/span&gt; http://192.168.50.14 &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--worker-ip&lt;/span&gt; 192.168.50.14
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Deploy Models in GPUStack
&lt;/h1&gt;

&lt;p&gt;Click &lt;strong&gt;Deployments&lt;/strong&gt; in the sidebar to open the model deployment page.&lt;/p&gt;

&lt;p&gt;If no models are currently deployed, you will see a &lt;strong&gt;Deploy Now&lt;/strong&gt; button in the center of the page.&lt;/p&gt;

&lt;p&gt;Click it to enter the &lt;strong&gt;Model Catalog&lt;/strong&gt;, select the desired model, and follow the prompts to deploy it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx10dblgvn4h3pij5z3ij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx10dblgvn4h3pij5z3ij.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Additional deployment methods are available under the &lt;strong&gt;Deploy Model&lt;/strong&gt; menu in the top-right corner.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For this tutorial, we deploy the following three models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Qwen3-Reranker-4B&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Qwen3-Embedding-4B&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Qwen3.5-35B-A3B&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;GPU memory allocation can be adjusted according to your environment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Deploy Qwen3-Reranker-4B
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjibvn5koxi1ct74qsd5z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjibvn5koxi1ct74qsd5z.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n12cp4z72r2cl1ov9p2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n12cp4z72r2cl1ov9p2.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After deployment, you can test it in the &lt;strong&gt;Playground&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx51lidduddb6adlj1fin.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx51lidduddb6adlj1fin.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploy Qwen3-Embedding-4B
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5k5av21dql29mtfcaemx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5k5av21dql29mtfcaemx.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsau5d2d7fxy6ro5g97gt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsau5d2d7fxy6ro5g97gt.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After deployment, test it in the Playground.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgac5xvjqrp9730t2q3sr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgac5xvjqrp9730t2q3sr.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploy Qwen3.5-35B-A3B
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Here we additionally set the &lt;strong&gt;PYPI_PACKAGES_INSTALL&lt;/strong&gt; environment variable to upgrade the &lt;code&gt;transformers&lt;/code&gt; library.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyobzc1l0h70325hxz3ab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyobzc1l0h70325hxz3ab.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xaluy944pfnelo9ze5e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xaluy944pfnelo9ze5e.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After deployment, test it in the &lt;strong&gt;Playground&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi55cpue7wxypd7qgctol.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi55cpue7wxypd7qgctol.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Obtain GPUStack Model Access Information
&lt;/h1&gt;

&lt;p&gt;Open the &lt;strong&gt;Routes&lt;/strong&gt; page from the sidebar.&lt;/p&gt;

&lt;p&gt;Click the three-dot menu next to the &lt;strong&gt;Route&lt;/strong&gt;, then select:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API Access Info&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgko0djpn61kr0az0w5cj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgko0djpn61kr0az0w5cj.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Record the following information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Base URL
Model Name
API Key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Base URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://192.168.50.14/v1&lt;/span&gt;

&lt;span class="na"&gt;Model Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="s"&gt;qwen3.5-35b-a3b&lt;/span&gt;
&lt;span class="s"&gt;qwen3-reranker-4b&lt;/span&gt;
&lt;span class="s"&gt;qwen3-embedding-4b&lt;/span&gt;

&lt;span class="na"&gt;API Key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="s"&gt;gpustack_xxxxxxxxxxxxxxxxx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;You can create an API Key following the instructions in the UI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Deploy MaxKB
&lt;/h1&gt;

&lt;p&gt;MaxKB supports one-command Docker deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;maxkb &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;always &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 &lt;span class="nt"&gt;-v&lt;/span&gt; ~/.maxkb:/opt/maxkb 1panel/maxkb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Default credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;admin / MaxKB@123..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foa0b22cjnz5xnqbpwppz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foa0b22cjnz5xnqbpwppz.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Upon first login, you will be prompted to change the password.&lt;/p&gt;

&lt;p&gt;Follow the instructions to update it.&lt;/p&gt;

&lt;h1&gt;
  
  
  Connect GPUStack Models in MaxKB
&lt;/h1&gt;

&lt;p&gt;In the top navigation bar of MaxKB, select &lt;strong&gt;Model&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw5legtdwqovc6ew4ot9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw5legtdwqovc6ew4ot9.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;Add Model&lt;/strong&gt; in the upper-right corner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgf9gjxldkjccok6rrgnn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgf9gjxldkjccok6rrgnn.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2qt03j6jobfzic92lrkg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2qt03j6jobfzic92lrkg.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fza2zo78gz8nzhnmfjtya.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fza2zo78gz8nzhnmfjtya.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;API URL&lt;/code&gt; and &lt;code&gt;API Key&lt;/code&gt; will only appear &lt;strong&gt;after entering the Base Model and pressing Enter&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Add the following models in the same way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;qwen3-reranker-4b&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;qwen3-embedding-4b&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;strong&gt;qwen3-reranker-4b&lt;/strong&gt;, you must enable &lt;strong&gt;Generic Proxy&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fop06mr01gctdps0cn8c4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fop06mr01gctdps0cn8c4.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is because MaxKB uses the endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/v2/rerank
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenpohberig3rmu39qpte.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenpohberig3rmu39qpte.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77amye6pqwm8aynpeoz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77amye6pqwm8aynpeoz7.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After configuration, it should look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgn82lhbdr3397lmpboj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgn82lhbdr3397lmpboj0.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Practical Example: Build a GPUStack Documentation Knowledge Base
&lt;/h1&gt;

&lt;p&gt;Open the &lt;strong&gt;Knowledge&lt;/strong&gt; page at the top and click &lt;strong&gt;Create&lt;/strong&gt; to create a knowledge base.&lt;/p&gt;

&lt;p&gt;Select &lt;strong&gt;Web Knowledge&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fni7ihbj48zje2qipvzjm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fni7ihbj48zje2qipvzjm.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Enter the GPUStack documentation URL.&lt;/p&gt;

&lt;p&gt;MaxKB will automatically crawl and parse the page content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsg3xdydqa7skt7a2etz2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsg3xdydqa7skt7a2etz2.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After crawling is complete:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr970a2psqyfex4cmyw2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr970a2psqyfex4cmyw2a.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create an AI Agent
&lt;/h2&gt;

&lt;p&gt;Go to the &lt;strong&gt;Agent&lt;/strong&gt; page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1z9av15o7qwzinsrl17u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1z9av15o7qwzinsrl17u.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;Create&lt;/strong&gt; to create a new Agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgs1echpbcfc4n3coi806.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgs1echpbcfc4n3coi806.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After completing the configuration, click &lt;strong&gt;Publish&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once published successfully, you can start chatting with the agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxtxx29184mfdv3r0lar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxtxx29184mfdv3r0lar.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Chat Demo
&lt;/h2&gt;

&lt;p&gt;Open the chat interface:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F333brb860bzvlckvkeaj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F333brb860bzvlckvkeaj.png" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71gkas76tl05mx7m33q5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71gkas76tl05mx7m33q5.png" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  🙌 Join the GPUStack Community
&lt;/h1&gt;

&lt;p&gt;If you have already started using GPUStack,&lt;br&gt;
or are exploring &lt;strong&gt;local large models / GPU resource management / AI infrastructure&lt;/strong&gt;,&lt;br&gt;
you are welcome to join our community group to exchange practical experience, pitfalls, and best practices.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://discord.gg/QAzGncGs" rel="noopener noreferrer"&gt;https://discord.gg/QAzGncGs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>No More Token Anxiety: Build an “Unlimited-Use” Local AI Assistant with GPUStack + OpenClaw</title>
      <dc:creator>GPUStack</dc:creator>
      <pubDate>Fri, 06 Mar 2026 02:32:29 +0000</pubDate>
      <link>https://dev.to/gpustack/no-more-token-anxiety-build-an-unlimited-use-local-ai-assistant-with-gpustack-openclaw-5de6</link>
      <guid>https://dev.to/gpustack/no-more-token-anxiety-build-an-unlimited-use-local-ai-assistant-with-gpustack-openclaw-5de6</guid>
      <description>&lt;p&gt;Over the past two years, more and more teams have integrated AI into their daily workflows.&lt;br&gt;
But soon, a practical issue emerged:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The more the model is used, the faster Tokens are consumed, and both costs and psychological pressure rise accordingly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many people rely on AI to improve efficiency, while at the same time having to “use it sparingly” and “let it think less.”&lt;br&gt;
In the end, AI instead becomes a carefully budgeted consumable.&lt;/p&gt;

&lt;p&gt;If AI can run on your own GPU,&lt;br&gt;
&lt;strong&gt;without being billed by Token, available for conversation at any time, and running long-term inside collaboration tools,&lt;/strong&gt;&lt;br&gt;
then it truly feels like a real “work assistant.”&lt;/p&gt;

&lt;p&gt;Based on the local model capabilities provided by GPUStack, combined with &lt;strong&gt;OpenClaw (supporting multiple collaboration platforms such as WhatsApp, Telegram, Discord, Slack, Lark, etc.)&lt;/strong&gt; and Telegram,&lt;br&gt;
this article will walk through step by step how to build a &lt;strong&gt;truly usable, sustainably running, and almost Token-worry-free&lt;/strong&gt; local AI assistant.&lt;/p&gt;
&lt;h2&gt;
  
  
  📌 What This Article Covers
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Deploying a model with GPUStack&lt;/li&gt;
&lt;li&gt;Creating a Telegram bot application and configuring permissions&lt;/li&gt;
&lt;li&gt;Installing, configuring, and key considerations for OpenClaw&lt;/li&gt;
&lt;li&gt;First-time authorization and connectivity testing on the Telegram side&lt;/li&gt;
&lt;li&gt;Practical example: Let the assistant star the GPUStack project&lt;/li&gt;
&lt;li&gt;Built-in assistant commands&lt;/li&gt;
&lt;li&gt;Useful OpenClaw commands and resource links&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  I. Deploy a Model with GPUStack and Prepare Access Information
&lt;/h2&gt;

&lt;p&gt;Before connecting OpenClaw, we need to complete model deployment in &lt;strong&gt;GPUStack&lt;/strong&gt; and obtain the model service access information.&lt;/p&gt;

&lt;p&gt;This section will use &lt;strong&gt;Qwen3.5-35B-A3B&lt;/strong&gt; as an example to demonstrate the complete process from&lt;br&gt;
&lt;strong&gt;Custom inference backend → Deploy model → Obtain access information&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Environment Preparation and Version Information
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GPUStack version: &lt;strong&gt;v2.0.3&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Custom inference backend image:
&lt;code&gt;vllm/vllm-openai:qwen3_5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Model weights: &lt;strong&gt;Qwen/Qwen3.5-35B-A3B&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ OpenClaw has requirements for the model context window:&lt;br&gt;
&lt;strong&gt;Minimum 16K, recommended 128K or above&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  2. Configure Custom Inference Backend (vLLM)
&lt;/h3&gt;

&lt;p&gt;In the GPUStack console, go to:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Inference Backends” → “Edit vLLM” → “Add Version”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4n50a6nmswapd37g55ys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4n50a6nmswapd37g55ys.png" width="800" height="508"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Deploy the Qwen3.5-35B-A3B Model
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ckvjnb5qu8z5ln5dc9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ckvjnb5qu8z5ln5dc9l.png" width="793" height="219"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F926wc9o61drenbfjx5hs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F926wc9o61drenbfjx5hs.png" width="770" height="1053"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--tensor-parallel-size=2
--mm-encoder-tp-mode data
--mm-processor-cache-type shm
--reasoning-parser qwen3
--enable-auto-tool-choice
--tool-call-parser qwen3_coder
--speculative-config '{"method": "mtp", "num_speculative_tokens": 1}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;If you encounter:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Error 803: system has unsupported display driver / cuda driver combination&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can try adding the environment variable:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/lib/x86_64-linux-gnu&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Obtain GPUStack Model Access Information
&lt;/h3&gt;

&lt;p&gt;Record the following three items:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;API Base URL&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model ID&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Key&lt;/strong&gt; (create it in GPUStack)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9r1i3skvqvc9rlf983qz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9r1i3skvqvc9rlf983qz.png" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  II. Create a Telegram Bot
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Open Telegram and search for &lt;strong&gt;BotFather&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open the &lt;strong&gt;BotFather&lt;/strong&gt; APP&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fausk5tydr18bj791dt5y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fausk5tydr18bj791dt5y.png" width="584" height="938"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new Bot and fill in the basic information&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fzp5sjbum5wsfr569me.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fzp5sjbum5wsfr569me.png" width="602" height="1068"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5arofy5sjknqcdbtbmhe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5arofy5sjknqcdbtbmhe.png" width="602" height="1068"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy the &lt;strong&gt;Bot Token&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz02jt86iacxqefqbmij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz02jt86iacxqefqbmij.png" width="602" height="1068"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For details, please refer to: &lt;a href="https://docs.openclaw.ai/channels/telegram" rel="noopener noreferrer"&gt;https://docs.openclaw.ai/channels/telegram&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  III. Install and Configure OpenClaw
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Demo environment: Ubuntu 24.04&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  1. One-Click Installation
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
curl -fsSL https://openclaw.ai/install.sh | bash&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The script will automatically install dependencies such as Node and Git.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rwhwbw5sb7lipqwfx2c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rwhwbw5sb7lipqwfx2c.png" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Interactive Configuration Wizard
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model/Auth Provider&lt;/strong&gt;
Select &lt;code&gt;Custom Provider (Any OpenAI or Anthropic compatible endpoint)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlvp9qjyo0us9mqdr4uv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlvp9qjyo0us9mqdr4uv.png" width="800" height="522"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enter the GPUStack &lt;strong&gt;API Base URL / API Key&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5mnlmvja8d5ymthuawt9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5mnlmvja8d5ymthuawt9.png" width="800" height="670"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select &lt;code&gt;Telegram&lt;/code&gt; for &lt;strong&gt;Channel&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5q3ms06a38jcg00jhxfu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5q3ms06a38jcg00jhxfu.png" width="800" height="522"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paste the &lt;strong&gt;Bot Token&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawtd0sp933q1vb14zf0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawtd0sp933q1vb14zf0w.png" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  IV. First-Time Authorization and Testing
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Send a message to the bot in Telegram&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On first use, it will prompt for &lt;strong&gt;Pairing authorization&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fno3ttqkzm8zuauz1yckv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fno3ttqkzm8zuauz1yckv.png" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;On the server, run:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
   openclaw pairing approve telegram &amp;lt;Pairing-Code&amp;gt;&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fka2vy065y262m9kdb82k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fka2vy065y262m9kdb82k.png" width="800" height="306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  V. Practical Example: Let the Bot Star the GPUStack Project
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Prepare a GitHub PAT
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Tokens (classic)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Check the &lt;code&gt;repo&lt;/code&gt; permission&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq51fy4dnqb4b2fb0bal5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq51fy4dnqb4b2fb0bal5.png" alt="GitHub PAT" width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Write to Environment Variables
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
vim ~/.openclaw/.env&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwkl5b2char6p6t3xz6z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwkl5b2char6p6t3xz6z.png" width="800" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Restart:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
openclaw gateway restart&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Send a Command to the Bot
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwg9cysmqnb18lq44nklr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwg9cysmqnb18lq44nklr.png" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiiy9x4dn3ino6r2xtlyh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiiy9x4dn3ino6r2xtlyh.png" width="800" height="724"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  VI. Common Commands
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/new&lt;/code&gt;: Start a new session&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/status&lt;/code&gt;: Check bot status&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/reset&lt;/code&gt;: Reset context&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/model&lt;/code&gt;: View / switch model&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  VII. Useful OpenClaw Commands and Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Common CLI Commands
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
openclaw logs --follow&lt;br&gt;
openclaw doctor&lt;br&gt;
openclaw gateway --help&lt;br&gt;
openclaw dashboard&lt;br&gt;
openclaw tui&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Documentation and Ecosystem
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;📘 &lt;a href="https://docs.openclaw.ai" rel="noopener noreferrer"&gt;https://docs.openclaw.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 &lt;a href="https://clawhub.ai" rel="noopener noreferrer"&gt;https://clawhub.ai&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion: When AI Becomes Infrastructure, Not a Consumable
&lt;/h2&gt;

&lt;p&gt;Looking back, &lt;strong&gt;the essence of Token anxiety is not that models are expensive, but that AI is treated as an “external consumable resource.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When models run in the cloud and capabilities are controlled by others,&lt;br&gt;
we become accustomed to careful budgeting, limiting usage, and controlling call frequency.&lt;/p&gt;

&lt;p&gt;But when the model truly runs on your own GPU,&lt;br&gt;
when inference capability, context, and tool calls all become part of your infrastructure,&lt;br&gt;
the role of AI changes accordingly—&lt;/p&gt;

&lt;p&gt;It is no longer a paid API call each time,&lt;br&gt;
but a &lt;strong&gt;readily available, long-term online, continuously evolving work assistant&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is exactly the significance of combining GPUStack and OpenClaw:&lt;br&gt;
&lt;strong&gt;Let AI return from a “cost item” to “productivity.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you already have GPU resources,&lt;br&gt;
you might as well try it yourself and truly integrate AI into your daily workflow.&lt;/p&gt;

&lt;p&gt;When you no longer worry about Tokens,&lt;br&gt;
you will truly begin to make good use of AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  🙌 Join the GPUStack Community
&lt;/h2&gt;

&lt;p&gt;If you have already started using GPUStack,&lt;br&gt;
or are exploring &lt;strong&gt;local large models / GPU resource management / AI Infra&lt;/strong&gt;,&lt;br&gt;
you are welcome to join our community group to exchange practical experience, pitfalls, and best practices together.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://discord.gg/QAzGncGs" rel="noopener noreferrer"&gt;https://discord.gg/QAzGncGs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gpustack</category>
      <category>openclaw</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Building Your Private ChatGPT and Knowledge Base with AnythingLLM and GPUStack</title>
      <dc:creator>GPUStack</dc:creator>
      <pubDate>Tue, 12 Nov 2024 05:00:48 +0000</pubDate>
      <link>https://dev.to/gpustack/building-your-private-chatgpt-and-knowledge-base-with-anythingllm-and-gpustack-dgi</link>
      <guid>https://dev.to/gpustack/building-your-private-chatgpt-and-knowledge-base-with-anythingllm-and-gpustack-dgi</guid>
      <description>&lt;p&gt;&lt;strong&gt;AnythingLLM&lt;/strong&gt; [&lt;a href="https://github.com/Mintplex-Labs/anything-llm" rel="noopener noreferrer"&gt;https://github.com/Mintplex-Labs/anything-llm&lt;/a&gt;] is an all-in-one AI application that runs on Mac, Windows, and Linux. Its goal is to enable the local creation of a &lt;strong&gt;personal ChatGPT&lt;/strong&gt; using either commercial or open-source LLMs along with vector database solutions. AnythingLLM goes beyond being a simple chatbot by including Retrieval-Augmented Generation (RAG) and Agent capabilities. These features allow it to perform a variety of tasks, such as fetching website information, generating charts, summarizing documents, and more.&lt;/p&gt;

&lt;p&gt;AnythingLLM can integrate various types of documents into different workspaces, enabling users to reference document content during chats. This provides a easy way to organize workspaces for different tasks and documents.&lt;/p&gt;

&lt;p&gt;In this article, we will introduce how to build a personal ChatGPT with knowledge base using &lt;strong&gt;AnythingLLM + GPUStack&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run models with GPUStack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GPUStack is an open-source GPU cluster manager for running large language models (LLMs)&lt;/strong&gt;. It enables you to create a unified cluster from GPUs across various platforms, including Apple MacBooks, Windows PCs, and Linux servers. Administrators can deploy LLMs from popular repositories like Hugging Face, allowing developers to access these models as easily as they would access public LLM services from providers such as OpenAI or Microsoft Azure.&lt;/p&gt;

&lt;p&gt;Unlike Ollama, &lt;strong&gt;GPUStack&lt;/strong&gt; is a cluster solution designed to aggregate GPU resources from multiple devices to run models.&lt;/p&gt;

&lt;p&gt;To deploy the &lt;strong&gt;Chat Model&lt;/strong&gt; and &lt;strong&gt;Embedding Model&lt;/strong&gt; on &lt;strong&gt;GPUStack&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;​ • &lt;strong&gt;Chat Model&lt;/strong&gt;: &lt;strong&gt;llama3.1&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;​ • &lt;strong&gt;Embedding Model&lt;/strong&gt;: &lt;strong&gt;bge-m3&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ja76tpzv298cm4rbmbg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ja76tpzv298cm4rbmbg.png" alt="image-20241105171908268" width="800" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And you need to create an API key. This key will be used by &lt;strong&gt;AnythingLLM&lt;/strong&gt; to authenticate when accessing the models API deployed on &lt;strong&gt;GPUStack&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install and configure AnythingLLM
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AnythingLLM&lt;/strong&gt; offers packages for &lt;strong&gt;Mac, Windows, and Linux&lt;/strong&gt;, you can download from &lt;a href="https://anythingllm.com/download" rel="noopener noreferrer"&gt;https://anythingllm.com/download&lt;/a&gt;. After installation, open AnythingLLM to begin the setup process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configure LLM Provider
&lt;/h3&gt;

&lt;p&gt;First, configure the chat model. Search for &lt;strong&gt;OpenAI&lt;/strong&gt;, select &lt;strong&gt;Generic OpenAI&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3nwdo25q7yjkjt1cfgvf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3nwdo25q7yjkjt1cfgvf.png" alt="image-20241105163235972" width="800" height="316"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And fill in the details for the model deployed on &lt;strong&gt;GPUStack&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56l0jenlvsbsjcrt3f3j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56l0jenlvsbsjcrt3f3j.png" alt="image-20241105163253668" width="800" height="312"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Save and configure embedding model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configure Embedding Provider
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AnythingLLM&lt;/strong&gt; includes a lightweight embedding model, &lt;strong&gt;all-MiniLM-L6-v2&lt;/strong&gt;, which offers limited performance and context length. For more powerful embedding capabilities, you can either opt for public embedding services or run open-source embedding models. Here, we’ll configure the embedding model &lt;strong&gt;bge-m3&lt;/strong&gt;, which is running on &lt;strong&gt;GPUStack&lt;/strong&gt;. Set the embedding provider to &lt;strong&gt;Generic OpenAI&lt;/strong&gt; and fill in the relevant configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpk4o2vigwew0gx9j62m8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpk4o2vigwew0gx9j62m8.png" alt="image-20241105162753929" width="800" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then create a workspace, and we can use AnythingLLM after it's completed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use AnythingLLM
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Chat with LLM
&lt;/h3&gt;

&lt;p&gt;Select a workspace, create a new thread, and send your question to the LLM:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7elc5gv56ax1c58nu6c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7elc5gv56ax1c58nu6c.png" alt="image-20241105163657917" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Fetch website content
&lt;/h3&gt;

&lt;p&gt;Click the upload button next to the workspace, enter the website URL in the &lt;strong&gt;Fetch website&lt;/strong&gt; box, and fetch the website content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fglm84zqsx96x94ismm4d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fglm84zqsx96x94ismm4d.png" alt="image-20241105164159767" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fetched website content will be sent to the embedding model for vectorization and then stored in the vector database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6uhgfy1a5j2xtatyeo0q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6uhgfy1a5j2xtatyeo0q.png" alt="image-20241105164252415" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check the content fetched from the website:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff4h5iuarbzfsvbv6powo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff4h5iuarbzfsvbv6powo.png" alt="image-20241105164801193" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Documents embedding
&lt;/h3&gt;

&lt;p&gt;Click the upload button next to the workspace, then click the upload box and upload a document. The document will be sent to the embedding model for vectorization and then stored in the vector database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptocyq1wx46igsbpb621.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptocyq1wx46igsbpb621.png" alt="image-20241105164914343" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check the content of embedded documents:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhx706cbhdrtmwdtde4sr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhx706cbhdrtmwdtde4sr.png" alt="image-20241105165047935" width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For more information, please read the &lt;code&gt;AnythingLLM&lt;/code&gt; documentation: &lt;a href="https://docs.anythingllm.com/" rel="noopener noreferrer"&gt;https://docs.anythingllm.com/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this tutorial, we have introduced how to use &lt;code&gt;AnythingLLM + GPUStack&lt;/code&gt; to aggregate GPUs across multiple devices and build an all-in-one AI application for RAG and AI Agents.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;GPUStack&lt;/code&gt; provides a standard OpenAI-compatible API, which can be quickly and smoothly integrated with various LLM ecosystem components. Wanna give it a go? Try to integrate your tools/frameworks/software with &lt;code&gt;GPUStack&lt;/code&gt; now and share with us!&lt;/p&gt;

&lt;p&gt;If you encounter any issues while integrating GPUStack with third parties, feel free to join &lt;a href="https://discord.gg/VXYJzuaqwD" rel="noopener noreferrer"&gt;GPUStack Discord Community&lt;/a&gt; and get support from our engineers.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Building Free GitHub Copilot Alternative with Continue + GPUStack</title>
      <dc:creator>GPUStack</dc:creator>
      <pubDate>Fri, 23 Aug 2024 17:00:00 +0000</pubDate>
      <link>https://dev.to/gpustack/building-free-github-copilot-alternative-with-continue-gpustack-2l37</link>
      <guid>https://dev.to/gpustack/building-free-github-copilot-alternative-with-continue-gpustack-2l37</guid>
      <description>&lt;p&gt;&lt;a href="https://seal.io/building-free-github-copilot-alternative-with-continue-and-gpustack/" rel="noopener noreferrer"&gt;Click here to read original post&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/continuedev/continue" rel="noopener noreferrer"&gt;&lt;code&gt;Continue&lt;/code&gt;&lt;/a&gt; is an open-source alternative to &lt;code&gt;GitHub Copilot&lt;/code&gt;, this is an open-source AI coding assistant that allows to connect various large language models(LLMs) within &lt;code&gt;VS Code&lt;/code&gt; and &lt;code&gt;JetBrains&lt;/code&gt; to build custom code autocompletion and chat capabilities. It supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code parsing
&lt;/li&gt;
&lt;li&gt;Code autocompletion&lt;/li&gt;
&lt;li&gt;Code optimization suggestions
&lt;/li&gt;
&lt;li&gt;Code refactoring &lt;/li&gt;
&lt;li&gt;Code implementations Inquiring
&lt;/li&gt;
&lt;li&gt;Documentation online searching&lt;/li&gt;
&lt;li&gt;Terminal errors parsing
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and more. It assists developers in coding and enhancing their development efficiency.&lt;/p&gt;

&lt;p&gt;In this tutorial, we are going to use &lt;strong&gt;&lt;code&gt;Continue + GPUStack&lt;/code&gt;&lt;/strong&gt; to build a free GitHub Copilot locally, providing developers with an AI-paired programming experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Models with GPUStack
&lt;/h2&gt;

&lt;p&gt;First, we will deploy the models on &lt;code&gt;GPUStack&lt;/code&gt;. There are three model types recommended by &lt;code&gt;Continue&lt;/code&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chat model&lt;/strong&gt;: select &lt;code&gt;llama3.1&lt;/code&gt;, this is the latest open-source model trained by Meta.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autocompletion model&lt;/strong&gt;: select &lt;code&gt;starcoder2:3b&lt;/code&gt;, a highly advanced autocompletion model trained by Hugging Face.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding model&lt;/strong&gt;: select &lt;code&gt;nomic-embed-text&lt;/code&gt;, which supports a context length of 8192 tokens, it outperforms OpenAI ada-002 and text-embedding-3-small models for both short and long context tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
    &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lJmdCjxo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822143650047.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lJmdCjxo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822143650047.png" alt="image 1" width="800" height="353"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;After deploying the models, you are also required to create an &lt;code&gt;API key&lt;/code&gt; in the API Keys section for authentication when &lt;code&gt;Continue&lt;/code&gt; accesses the models deployed on &lt;code&gt;GPUStack&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing and Configuring Continue
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Continue&lt;/code&gt; provides extensions for both &lt;code&gt;VS Code&lt;/code&gt; and &lt;code&gt;JetBrains&lt;/code&gt;. In this article, we will use &lt;code&gt;VS Code&lt;/code&gt; as an example. Install &lt;code&gt;Continue&lt;/code&gt; from the &lt;code&gt;VS Code&lt;/code&gt; extension store:&lt;/p&gt;

&lt;p&gt;
    &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7YAwG3bw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144006940.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7YAwG3bw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144006940.png" alt="image 2" width="800" height="393"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Once installed, drag the &lt;code&gt;Continue&lt;/code&gt; extension to the right panel to avoid conflict with the file explorer:&lt;/p&gt;

&lt;p&gt;
    &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--V5RTjRFc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822143946949.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--V5RTjRFc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822143946949.png" alt="image 3" width="800" height="423"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Then, select the settings button in the bottom-right corner to edit &lt;code&gt;Continue&lt;/code&gt;'s configuration and connect to the models deployed on &lt;code&gt;GPUStack&lt;/code&gt;. Replace the sections for &lt;code&gt;"models"&lt;/code&gt;, &lt;code&gt;"tabAutocompleteModel"&lt;/code&gt;, and &lt;code&gt;"embeddingsProvider"&lt;/code&gt; with your own GPUStack-generated API Key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Llama 3.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama3.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://192.168.50.4/v1-openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpustack_f58451c1c04d8f14_c7e8fb2213af93062b4e87fa3c319005"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabAutocompleteModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Starcoder 2 3b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"starcoder2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://192.168.50.4/v1-openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpustack_f58451c1c04d8f14_c7e8fb2213af93062b4e87fa3c319005"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"embeddingsProvider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nomic-embed-text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://192.168.50.4/v1-openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpustack_f58451c1c04d8f14_c7e8fb2213af93062b4e87fa3c319005"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;
    &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PaSNQp7S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144033667.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PaSNQp7S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144033667.png" alt="image 4" width="800" height="440"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
    &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4fWKwaFO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144055057.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4fWKwaFO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144055057.png" alt="image 5" width="800" height="439"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Get to Use Continue
&lt;/h2&gt;

&lt;p&gt;After configuring &lt;code&gt;Continue&lt;/code&gt; to connect to the GPUStack-deployed models, go to the top-right corner of the &lt;code&gt;Continue&lt;/code&gt; plugin interface and select &lt;code&gt;Llama 3.1&lt;/code&gt; model. Now you are able to use the features we mentioned at the beginning of this tutorial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Code Parsing&lt;/strong&gt;: Select the code, press &lt;code&gt;Cmd/Ctrl + L&lt;/code&gt;, and enter a prompt to let the local LLM parse the code:  &lt;/p&gt;

&lt;p&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JFkUTpoF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822145951464.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JFkUTpoF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822145951464.png" alt="image 6" width="800" height="430"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Code Autocompletion&lt;/strong&gt;: While coding, press &lt;code&gt;Tab&lt;/code&gt; to let the local LLM attempt to autocomplete the code:  &lt;/p&gt;

&lt;p&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uxNMqP1I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144132354.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uxNMqP1I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144132354.png" alt="image 7" width="800" height="500"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Code Refactoring&lt;/strong&gt;: Select the code, press &lt;code&gt;Cmd/Ctrl + I&lt;/code&gt;, and enter a prompt to let the local LLM attempt to optimize the code:  &lt;/p&gt;

&lt;p&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Y_5V-xQ4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822145544825.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y_5V-xQ4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822145544825.png" alt="image 8" width="800" height="429"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;The LLM will provide suggestions, and you can decide whether to accept or reject them:  &lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
    &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Tcmwjwj2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144207805.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Tcmwjwj2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144207805.png" alt="image 9" width="800" height="549"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Inquire About Code Implementation&lt;/strong&gt;: You can try &lt;code&gt;@Codebase&lt;/code&gt; to ask questions about the codebase, such as how a certain feature is implemented:  &lt;/p&gt;

&lt;p&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8Jxf_qzk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822151421841.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8Jxf_qzk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822151421841.png" alt="image 10" width="800" height="429"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Documentation Search&lt;/strong&gt;: Use &lt;code&gt;@Docs&lt;/code&gt; and select the document site you wish to search for and ask your questions, enabling you to find the results you need:&lt;/p&gt;

&lt;p&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jJADXV4A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144718627.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jJADXV4A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://gpustack-blogs.oss-cn-hongkong.aliyuncs.com/undefinedimage-20240822144718627.png" alt="image 11" width="800" height="428"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more information, please read the official &lt;code&gt;Continue&lt;/code&gt; documentation: &lt;a href="https://docs.continue.dev/how-to-use-continue" rel="noopener noreferrer"&gt;https://docs.continue.dev/how-to-use-continue&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this tutorial, we have introduced how to use &lt;code&gt;Continue + GPUStack&lt;/code&gt; to build a free local GitHub Copilot, offering AI-paired programming capabilities at no cost to developers.  &lt;/p&gt;

&lt;p&gt;&lt;code&gt;GPUStack&lt;/code&gt; provides a standard OpenAI-compatible API, which can be quickly and smoothly integrated with various LLM ecosystem components. Wanna give it a go? Try to integrate your tools/frameworks/software with &lt;code&gt;GPUStack&lt;/code&gt; now and share with us!&lt;/p&gt;

&lt;p&gt;If you encounter any issues while integrating GPUStack with third parties, feel free to join &lt;a href="https://discord.gg/VXYJzuaqwD" rel="noopener noreferrer"&gt;GPUStack Discord Community&lt;/a&gt; and get support from our engineers.&lt;/p&gt;

</description>
      <category>githubcopilot</category>
      <category>gpustack</category>
      <category>ai</category>
    </item>
    <item>
      <title>Introducing GPUStack: An open-source GPU cluster manager for running LLMs</title>
      <dc:creator>GPUStack</dc:creator>
      <pubDate>Thu, 25 Jul 2024 17:00:54 +0000</pubDate>
      <link>https://dev.to/gpustack/introducing-gpustack-an-open-source-gpu-cluster-manager-for-running-llms-5dmj</link>
      <guid>https://dev.to/gpustack/introducing-gpustack-an-open-source-gpu-cluster-manager-for-running-llms-5dmj</guid>
      <description>&lt;h2&gt;
  
  
  What is GPUStack?
&lt;/h2&gt;

&lt;p&gt;We are thrilled to launch GPUStack, an open-source GPU cluster manager for running Large Language Models (LLMs). Even though LLMs are widely available as public cloud services, organizations cannot easily host their own LLM deployments for private use. They need to install and manage complex clustering software such as Kubernetes and then figure out how to install and manage the AI tool stack on top. Popular ways to run LLMs locally, such as LMStudio and LocalAI, works on a single machine.&lt;/p&gt;

&lt;p&gt;GPUStack allows you to create a unified cluster from any brand of GPUs in Apple MacBooks, Windows PCs, and Linux servers. Administrators can deploy LLMs from popular repositories such as Hugging Face. Developers can then access LLMs just as easily as accessing public LLM services from vendors like OpenAI or Microsoft Azure.&lt;/p&gt;

&lt;p&gt;For more details about GPUStack, visit:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;GitHub repo: &lt;a href="https://github.com/gpustack/gpustack" rel="noopener noreferrer"&gt;https://github.com/gpustack/gpustack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;User guide: &lt;a href="https://docs.gpustack.ai" rel="noopener noreferrer"&gt;https://docs.gpustack.ai&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why GPUStack?
&lt;/h2&gt;

&lt;p&gt;Today, organizations who want to host LLMs on a cluster of GPU servers have to do a lot of work to integrate a complex software stack. By using GPUStack, organizations no longer need to worry about cluster management, GPU optimization, LLM interference engines, usage and metering, user management, API access, and dashboard UI. GPUStack is a complete software platform for building your own LLM-as-a-Service (LLMaaS).&lt;/p&gt;

&lt;p&gt;As the following figure illustrates, the admin deploys models into GPUStack from a repository like HuggingFace, and then developers can connect to GPUStack to use these models in their applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fllmaas.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fllmaas.png" alt="img"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key features of GPUStack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GPU cluster setup and resource aggregation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GPUStack aggregates all GPU resources within a cluster. It is designed to support all GPU vendors, including Nvidia, Apple, AMD, Intel, Qualcomm, and others. GPUStack is compatible with a laptops, desktops, workstations, and servers running MacOS, Windows, and Linux.&lt;/p&gt;

&lt;p&gt;The initial release of GPUStack supports Windows PCs and Linux servers with Nvidia graphics cards, and Apple Macs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment and Inference for Models&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GPUStack supports distributed deployment and inference of LLMs across a cluster of GPU machines.&lt;/p&gt;

&lt;p&gt;GPUStack selects the best inference engine for running the given LLM on the given GPU. The first LLM inference engine supported by GPUStack is LLaMA.cpp, which allows GPUStack to support GGUF models from Hugging Face and all models listed in the ollama library (&lt;a href="https://ollama.com/library" rel="noopener noreferrer"&gt;ollama.com/library&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;You can run any model on GPUStack by first converting it to GGUF format and uploading it to Hugging Face or Ollama library.&lt;/p&gt;

&lt;p&gt;Support of other inference engines, such as vLLM, is on our roadmap and will be provided in the future.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; GPUStack will automatically schedule the model you select to run on machines with appropriate resources, relieving you of manual intervention. If you want to assess the resource consumption of your chosen model, you can use our GGUF Parser project: &lt;a href="https://github.com/gpustack/gguf-parser-go" rel="noopener noreferrer"&gt;https://github.com/gpustack/gguf-parser-go&lt;/a&gt;. We intend to provide more detailed tutorials in the future.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Although GPU acceleration is recommended for inference, we also support CPU inference, though the performance isn't as good as GPU. Alternatively, using a mix of GPU and CPU for inference can maximize resource utilization, which is particularly useful in edge or resource-constrained environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Easy integration with your applications&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GPUStack offers OpenAI-compatible APIs and provides an LLM playground along with API keys. The playground enables AI developers to experiment with and customize your LLMs, and seamlessly integrate them into AI-enabled applications.&lt;/p&gt;

&lt;p&gt;Additionally, you can use the metrics GPUStack provides to understand how your AI applications utilize various LLMs. This helps administrators manage GPU resource consumption effectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability metrics for GPUs and LLMs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GPUStack provides comprehensive metrics performance, utilization, and status monitoring.&lt;/p&gt;

&lt;p&gt;For GPUs, administrators can use GPUStack to monitor real-time resource utilization and system status. Based on these metrics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Administrators perform scaling, optimization, and other maintenance operations.&lt;/li&gt;
&lt;li&gt;GPUStack adjusts its model scheduling algorithm.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For LLMs, developers can use GPUStack to access metrics like token throughput, token usage, and API request throughput. These metrics help developers evaluate model performance and optimize their applications. GPUStack plans to support auto-scaling based on these inference performance metrics in future releases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication and access control&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GPUStack also provides authentication and role-based access control (RBAC) for enterprises. Users on the platform can have either admin or regular user roles. This guarantees that only authorized administrators can deploy and manage LLMs and that only authorized developers can utilize them.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPUStack Use Cases
&lt;/h2&gt;

&lt;p&gt;GPUStack unlocks a world of possibilities for running LLMs on any GPU vendors. Here are just a few examples of what you can achieve with GPUStack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggregate existing MacBooks, Windows PCs, and other GPU resources to offer a low-cost LLMaaS for a development team.&lt;/li&gt;
&lt;li&gt;In limited resource environments, aggregate multiple edge nodes to provide LLMaaS on CPU resources.&lt;/li&gt;
&lt;li&gt;Create your own enterprise-wide LLMaaS in your own data center for highly sensitive workloads that cannot be hosted in a cloud.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started with GPUStack
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Linux or MacOS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GPUStack provides a script to install it as a service on systemd or launchd based systems. To install GPUStack using this method, execute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; https://get.gpustack.ai | sh -
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have deployed and started the GPUStack server, which serves as the first worker node. You can access the GPUStack page via &lt;a href="http://myserver" rel="noopener noreferrer"&gt;http://myserver&lt;/a&gt; (Replace with the IP address or domain of the host you installed)&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Log in to GPUStack with username &lt;code&gt;admin&lt;/code&gt; and the default password. You can run the following command to get the password for the default setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /var/lib/gpustack/initial_admin_password
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To add additional worker nodes and form a GPUStack cluster, please run the following command on each worker node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; https://get.gpustack.ai | sh - &lt;span class="nt"&gt;--server-url&lt;/span&gt; http://myserver &lt;span class="nt"&gt;--token&lt;/span&gt; mytoken
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;strong&gt;&lt;code&gt;http://myserver&lt;/code&gt;&lt;/strong&gt; with your GPUStack server URL and &lt;strong&gt;&lt;code&gt;mytoken&lt;/code&gt;&lt;/strong&gt; with your secret token for adding workers. To retrieve the token in the default setup from the GPUStack server, use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /var/lib/gpustack/token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or follow the instructions on GPUStack to add workers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fadd-worker.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fadd-worker.png" alt="img"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run PowerShell as administrator, then run the following command to install GPUStack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Invoke-Expression&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-WebRequest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Uri&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://get.gpustack.ai"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-UseBasicParsing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Content&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can access the GPUStack page via &lt;a href="http://myserver" rel="noopener noreferrer"&gt;http://myserver&lt;/a&gt; (Replace with the IP address or domain of the host you installed)&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Log in to GPUStack with username &lt;code&gt;admin&lt;/code&gt; and the default password. You can run the following command to get the password for the default setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Get-Content&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Join-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;APPDATA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ChildPath&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpustack\initial_admin_password"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Raw&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optionally, you can add extra workers to form a GPUStack cluster by running the following command on other nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Invoke-Expression&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;amp; { &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Invoke-WebRequest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Uri&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://get.gpustack.ai"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-UseBasicParsing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;.Content) } -ServerURL http://myserver -Token mytoken"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the default setup, you can run the following to get the token used for adding workers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Get-Content&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Join-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;APPDATA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ChildPath&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpustack\token"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Raw&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For other installation scenarios, please refer to our installation documentation at: &lt;a href="https://gpustack.github.io/docs/quickstart" rel="noopener noreferrer"&gt;https://docs.gpustack.ai/docs/quickstart&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Serving LLMs
&lt;/h3&gt;

&lt;p&gt;As an LLM administrator, you can log in to GPUStack as the default system admin, navigate to &lt;strong&gt;&lt;code&gt;Resources&lt;/code&gt;&lt;/strong&gt; to monitor your GPU status and capacities, and then go to &lt;strong&gt;&lt;code&gt;Models&lt;/code&gt;&lt;/strong&gt;  to deploy any open-source LLM into the GPUStack cluster. This enables you to provide these LLMs to regular users for integration into their applications. This approach helps you to efficiently utilize your existing resources and deliver stable LLM services for various needs and scenarios.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access GPUStack to deploy the LLMs you need. Choose models from Hugging Face (only GGUF format is currently supported) or Ollama Library, download them to your local environment, and run the LLMs:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fdeploy-model.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fdeploy-model.png" alt="img"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPUStack will automatically schedule the model to run on the appropriate Worker:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fmodel-list.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fmodel-list.png" alt="img"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can manage and maintain LLMs by checking API requests, token consumption, token throughput, resource utilization status, and more. This helps you decide whether to scale up or upgrade LLMs to ensure service stability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fdashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fdashboard.png" alt="img"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Integrating with your applications
&lt;/h3&gt;

&lt;p&gt;As an AI application developer, you can log in to GPUStack as a regular user and navigate to &lt;strong&gt;&lt;code&gt;Playground&lt;/code&gt;&lt;/strong&gt; from the menu. Here, you can interact with the LLM using the UI playground.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fplayground.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgpustack.ai%2Fwp-content%2Fuploads%2F2024%2F07%2Fplayground.png" alt="img"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, visit &lt;strong&gt;&lt;code&gt;API Keys&lt;/code&gt;&lt;/strong&gt; to generate and save your API key. Return to &lt;strong&gt;&lt;code&gt;Playground&lt;/code&gt;&lt;/strong&gt; to customize your LLM by adjusting the system prompt, adding few-shot learning examples, or resizing prompt parameters. When you're done, click &lt;strong&gt;&lt;code&gt;View Code&lt;/code&gt;&lt;/strong&gt; and select your preferred code format (curl, Python, Node.js) along with the API key. Use this code in your applications to enable communication with your private LLMs.&lt;/p&gt;

&lt;p&gt;you can access the OpenAI-compatible API now, for example, use curl as the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GPUSTACK_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;myapikey
curl http://myserver/v1-openai/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$GPUSTACK_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "llama3",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "stream": true
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Join Our Community
&lt;/h2&gt;

&lt;p&gt;Please find more information about GPUStack at: &lt;a href="https://gpustack.ai" rel="noopener noreferrer"&gt;https://gpustack.ai&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you encounter any issues or have suggestions for GPUStack, feel free to join our &lt;a href="https://discord.gg/VXYJzuaqwD" rel="noopener noreferrer"&gt;Community&lt;/a&gt; for support from the GPUStack team and to connect with fellow users globally.&lt;/p&gt;

&lt;p&gt;We are actively enhancing the GPUStack project and plan to introduce new features in the near future, including support for multimodal models, additional accelerators like AMD ROCm or Intel oneAPI, and more inference engines. Before getting started, we encourage you to follow and star our project on GitHub at &lt;a href="https://github.com/gpustack/gpustack" rel="noopener noreferrer"&gt;gpustack/gpustack&lt;/a&gt; to receive instant notifications about all future releases. We welcome your contributions to the project.&lt;/p&gt;

&lt;h2&gt;
  
  
  About Us
&lt;/h2&gt;

&lt;p&gt;GPUStack is brought to you by Seal, Inc., a team dedicated to enabling AI access for all. Our mission is to enable enterprises to use AI to conduct their business, and GPUStack is a significant step towards achieving that goal.&lt;/p&gt;

&lt;p&gt;Quickly build your own LLMaaS platform with GPUStack! Start experiencing the ease of creating GPU clusters locally, running and using LLMs, and integrating them into your applications.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>opensource</category>
      <category>news</category>
    </item>
  </channel>
</rss>
