GPUStack

Posted on Mar 10

GPUStack MaxKB: Build a Powerful and Easy-to-Use Open-Source Enterprise AI Agent Platform

#agents #ai #llm #opensource

GPUStack × MaxKB: Build a Powerful and Easy-to-Use Open-Source Enterprise AI Agent Platform

Summary: By leveraging GPUStack for efficient model deployment and management, and connecting those models to MaxKB, you can easily build an AI assistant with knowledge base retrieval + intelligent Q&A capabilities.

As AI applications become increasingly common within organizations, more teams are beginning to focus on two core challenges:

How to efficiently manage and deploy local large models
How to quickly build enterprise knowledge bases and AI Agents

If you are looking for solutions to both problems, the combination of GPUStack + MaxKB is well worth exploring.

GPUStack: Focuses on GPU resource management and model deployment, supporting multi-node clusters and multi-model services.
MaxKB: An open-source enterprise knowledge base and AI application platform that enables rapid development of knowledge-based Q&A systems and AI Agents.

By connecting GPUStack-provided model services to MaxKB, you can easily build a practical enterprise AI knowledge assistant.

This article will walk through the entire process from scratch.

📌 What You'll Learn

Deploy the latest GPUStack v2.1.0
Deploy required models in GPUStack
Obtain GPUStack model connection information
Deploy MaxKB
Connect GPUStack models in MaxKB
Practical example: Build a GPUStack documentation knowledge base

Install GPUStack v2.1.0

1. Install GPUStack Server

sudo docker run -d --name gpustack-server \
  --restart unless-stopped \
  -p 80:80 \
  -v gpustack-data:/var/lib/gpustack \
  -v /data/gpustack_cache:/var/lib/gpustack/cache \
  gpustack/gpustack:v2.1.0 \
  --bootstrap-password "123" \
  --debug

After running the command above, open your browser and visit:

http://your_host_ip

You will enter the GPUStack UI.

Default login credentials:

admin / 123

2. Create a Cluster

GPUStack manages worker nodes in units called Clusters.

When deploying GPUStack Server for the first time, you will be prompted to create your first cluster. Click:

Create Your First Cluster

Follow the UI instructions to complete the setup.

You can also go to the Clusters page from the sidebar and click Add Cluster to create one manually.

3. Add a Worker

After creating a cluster, the system will prompt you to Add Worker.

Follow the instructions in the UI.

You can also add one manually via the Workers page in the sidebar.

Run the diagnostic command provided in the guide interface.

If the drivers and container runtime are correctly installed, you will see two OK messages.

If not configured appears, follow the provided links to check dependency documentation and install the missing components according to your environment.

Model Cache Volume Mount: Mount this directory to the model cache directory /var/lib/gpustack/cache.

GPUStack Data Volume: Mount this directory to the data directory /var/lib/gpustack.

Then run the Worker startup command:

sudo docker run -d --name gpustack-worker \
   -e "GPUSTACK_RUNTIME_DEPLOY_MIRRORED_NAME=gpustack-worker" \
   -e "GPUSTACK_TOKEN=gpustack_7b42996d3f5571d5_8181f986537c100369eaa2dfcf6d6359" \
   --restart=unless-stopped \
   --privileged \
   --network=host \
   --volume /var/run/docker.sock:/var/run/docker.sock \
   --volume gpustack-worker-data:/var/lib/gpustack \
   --volume /data/gpustack_cache:/var/lib/gpustack/cache \
   --runtime nvidia \
   gpustack/gpustack:v2.1.0 \
   --server-url http://192.168.50.14 \
   --worker-ip 192.168.50.14

Deploy Models in GPUStack

Click Deployments in the sidebar to open the model deployment page.

If no models are currently deployed, you will see a Deploy Now button in the center of the page.

Click it to enter the Model Catalog, select the desired model, and follow the prompts to deploy it.

Additional deployment methods are available under the Deploy Model menu in the top-right corner.

For this tutorial, we deploy the following three models:

Qwen3-Reranker-4B
Qwen3-Embedding-4B
Qwen3.5-35B-A3B

GPU memory allocation can be adjusted according to your environment.

Deploy Qwen3-Reranker-4B

After deployment, you can test it in the Playground.

Deploy Qwen3-Embedding-4B

After deployment, test it in the Playground.

Deploy Qwen3.5-35B-A3B

Here we additionally set the PYPI_PACKAGES_INSTALL environment variable to upgrade the transformers library.

After deployment, test it in the Playground.

Obtain GPUStack Model Access Information

Open the Routes page from the sidebar.

Click the three-dot menu next to the Route, then select:

API Access Info

Record the following information:

Base URL
Model Name
API Key

Example:

Base URL: http://192.168.50.14/v1

Model Name:
qwen3.5-35b-a3b
qwen3-reranker-4b
qwen3-embedding-4b

API Key:
gpustack_xxxxxxxxxxxxxxxxx

You can create an API Key following the instructions in the UI.

Deploy MaxKB

MaxKB supports one-command Docker deployment:

docker run -d --name=maxkb --restart=always -p 8080:8080 -v ~/.maxkb:/opt/maxkb 1panel/maxkb

Default credentials:

admin / MaxKB@123..

Upon first login, you will be prompted to change the password.

Follow the instructions to update it.

Connect GPUStack Models in MaxKB

In the top navigation bar of MaxKB, select Model.

Click Add Model in the upper-right corner.

Note
API URL and API Key will only appear after entering the Base Model and pressing Enter.

Add the following models in the same way:

qwen3-reranker-4b
qwen3-embedding-4b

For qwen3-reranker-4b, you must enable Generic Proxy:

This is because MaxKB uses the endpoint:

/v2/rerank

After configuration, it should look like this:

Practical Example: Build a GPUStack Documentation Knowledge Base

Open the Knowledge page at the top and click Create to create a knowledge base.

Select Web Knowledge.

Enter the GPUStack documentation URL.

MaxKB will automatically crawl and parse the page content.

After crawling is complete:

Create an AI Agent

Go to the Agent page.

Click Create to create a new Agent.

After completing the configuration, click Publish.

Once published successfully, you can start chatting with the agent.

Chat Demo

Open the chat interface:

Example result:

🙌 Join the GPUStack Community

If you have already started using GPUStack,
or are exploring local large models / GPU resource management / AI infrastructure,
you are welcome to join our community group to exchange practical experience, pitfalls, and best practices.

https://discord.gg/QAzGncGs

DEV Community