Felicia Grace for BytesRack

Posted on Mar 7 • Originally published at bytesrack.com

How to Host Milvus Vector Database on a Dedicated Server (Save 80% on AI Costs)

#milvus #vectordb #ai #selfhosted

Everyone is building AI applications right now. But if you’ve ever deployed a RAG (Retrieval-Augmented Generation) app using managed cloud vector databases like Pinecone or Weaviate Cloud, you’ve likely run into two massive walls: cost and data privacy.

As your dataset grows from thousands to millions of vectors, those cloud bills start looking like a mortgage payment. Plus, do you really want to send your sensitive company data, financial records, or proprietary code to a public cloud API?

The solution is simple: Bring it home.

In this guide, I’m going to walk you through hosting Milvus, the world’s most advanced open-source vector database, right on a dedicated server. We are going to build a high-performance, private, and cost-effective infrastructure for your AI.

Let’s get technical.

Why Bare Metal for Vector Search?

Before we type a single command, you need to understand why we are doing this. Vector searches are computationally expensive. They require:

Massive RAM: Vector indexes (like HNSW) live in memory for speed.
Fast Storage: When RAM fills up, you need NVMe SSDs to swap data instantly.
Dedicated CPU Cycles: Indexing millions of vectors will choke a shared vCPU on a standard VPS.

A dedicated server gives you raw, unshared power. No "noisy neighbors" slowing down your AI's response time.

The Hardware You Need

For a production-ready Milvus setup, don't skimp on RAM. Here is my recommended baseline:

CPU: At least 8 Cores (Intel Xeon or AMD EPYC ideally).
RAM: 32GB minimum (64GB+ recommended for datasets over 10M vectors).
Storage: Enterprise NVMe SSD (Avoid HDDs; they are too slow for vector retrieval).
OS: Ubuntu 24.04 LTS or Debian 12.

Pro Tip: If you are looking for a server that handles this workload without breaking the bank, check out the High-RAM dedicated servers at BytesRack. We tune our hardware specifically for high-throughput IO tasks like this.

Step 1: Preparing the Environment

We will use Docker Compose to deploy Milvus. It’s the cleanest way to manage the database along with its dependencies (etcd for metadata and MinIO for object storage) without polluting your host OS.

First, SSH into your server and update your package lists.

sudo apt update && sudo apt upgrade -y

Now, let's install the Docker engine. (If you already have Docker installed, you can skip this).

# Install required certificates
sudo apt install -y ca-certificates curl gnupg

# Add Docker's official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL [https://download.docker.com/linux/ubuntu/gpg](https://download.docker.com/linux/ubuntu/gpg) | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] [https://download.docker.com/linux/ubuntu](https://download.docker.com/linux/ubuntu) \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker and Compose
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Verify that Docker is running:

sudo docker ps

Step 2: Configuring Milvus (Standalone Mode)

Milvus runs in two modes: Standalone (everything in one container) and Cluster (distributed across multiple nodes). For 99% of use cases—including serving RAG apps to thousands of users—Standalone mode on a powerful dedicated server is more than enough.

Create a directory for your project:

mkdir milvus-docker && cd milvus-docker

Now, download the official Docker Compose configuration file for Milvus.

wget [https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml](https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml) -O docker-compose.yml

The Secret Sauce: Optimization 🌶️

Don't just run the default file. We want to ensure Milvus has persistent storage so you don't lose data if you restart the container.

Open the file:

nano docker-compose.yml

Check the volumes section. Ensure that the paths/var/lib/milvus, /var/lib/etcd, and /var/lib/minio are mapped correctly. On a BytesRack server, if you have a secondary NVMe drive mounted (e.g., at /mnt/nvme), change the volume mapping to point there for maximum speed.

Example configuration:

volumes:
  - /mnt/nvme/milvus/db:/var/lib/milvus
  - /mnt/nvme/milvus/etcd:/var/lib/etcd
  - /mnt/nvme/milvus/minio:/var/lib/minio

Step 3: Launching the Vector Database

This is the easy part. Spin it up.

sudo docker compose up -d

Docker will pull the images and start three containers:

milvus-standalone: The core vector engine.
milvus-etcd: Stores metadata and coordinates processes.
milvus-minio: Stores the actual data logs and index files.

Check if everything is healthy:

sudo docker compose ps

sudo docker compose ps

Step 4: Installing "Attu" (The Management GUI)

Managing a vector DB via command line is a pain. Attu is an amazing open-source administration GUI for Milvus. Let's add it to our stack.

Run this command to start Attu on port 8000:

sudo docker run -d --name attu \
-p 8000:3000 \
-e MILVUS_URL=YOUR_SERVER_IP:19530 \
zilliz/attu:latest

(Replace YOUR_SERVER_IP with your actual server IP).

Now, open your browser and go to http://<your-server-ip>:8000. You will see a dashboard where you can view collections, check vector counts, and monitor query performance.

Step 5: Testing the Connection (The "Hello World" of AI)

Let's prove this works. We will use a simple Python script to connect to your new server, create a collection, and insert some random vectors.

First, install the Python SDK on your local machine (not the server):

pip install pymilvus

Create a file named test_milvus.py:

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
import random

# 1. Connect to your BytesRack Server
# Replace YOUR_SERVER_IP with your actual IP
connections.connect("default", host="YOUR_SERVER_IP", port="19530")

# 2. Define a schema
fields = [
    FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=128)
]
schema = CollectionSchema(fields, "Hello BytesRack AI")

# 3. Create collection
hello_milvus = Collection("hello_milvus", schema)

# 4. Insert dummy data
entities = [
    [i for i in range(1000)], # pk
    [[random.random() for _ in range(128)] for _ in range(1000)] # vectors
]
insert_result = hello_milvus.insert(entities)
hello_milvus.flush()

print(f"Success! Inserted {hello_milvus.num_entities} vectors into your private server.")

Run it. If you see the success message, congratulations! You just bypassed the cloud giants and built your own AI infrastructure.

Why This Matters for Your Business

By moving to a dedicated server, you have achieved three things:

Data Sovereignty: Your data never leaves a server you control.
Predictable Billing: Whether you run 10 queries or 10 million, your infrastructure cost stays the same.
Latency Reduction: Local network speeds on bare metal will always beat shared cloud API latency.

Ready to Scale?
If you are serious about AI, you need hardware that can keep up.

At BytesRack, we specialize in high-performance dedicated servers tailored for AI workloads. Whether you need massive RAM for vector storage or GPU power for inference, we have the metal you need to build the future.

DEV Community