Milvus Has a Free API: The Vector Database Powering AI Search at Scale

#milvus #vectordatabase #ai #embeddings

Milvus is an open-source vector database built for AI applications. It stores, indexes, and searches billions of embedding vectors with millisecond latency. Used by 1,000+ organizations including Salesforce, PayPal, and Shopee.

Why Milvus?

Purpose-built — designed for vector search from the ground up
Billion-scale — handles 1B+ vectors with consistent performance
Multi-index — IVF, HNSW, DiskANN, GPU indexes
Hybrid search — combine vector similarity with scalar filtering
Cloud-native — Kubernetes-native, scales horizontally

Quick Start

# Docker (standalone)
docker run -d --name milvus \
  -p 19530:19530 -p 9091:9091 \
  milvusdb/milvus:latest standalone

# Or Milvus Lite (embedded, for dev)
pip install pymilvus
# Uses SQLite-based local storage

Python SDK

from pymilvus import MilvusClient
import numpy as np

# Connect (or use Milvus Lite for local dev)
client = MilvusClient(uri="http://localhost:19530")

# Create collection
client.create_collection(
    collection_name="articles",
    dimension=768,  # Match your embedding model
)

# Insert vectors
data = [
    {"id": 1, "vector": np.random.rand(768).tolist(), "title": "Introduction to RAG", "category": "AI"},
    {"id": 2, "vector": np.random.rand(768).tolist(), "title": "Vector Databases Explained", "category": "Database"},
    {"id": 3, "vector": np.random.rand(768).tolist(), "title": "Building Search with Milvus", "category": "AI"},
]
client.insert(collection_name="articles", data=data)

# Similarity search
query_vector = np.random.rand(768).tolist()
results = client.search(
    collection_name="articles",
    data=[query_vector],
    limit=5,
    output_fields=["title", "category"],
)

for hits in results:
    for hit in hits:
        print(f"{hit['entity']['title']} — score: {hit['distance']:.4f}")

# Filtered search (hybrid)
results = client.search(
    collection_name="articles",
    data=[query_vector],
    filter='category == "AI"',
    limit=5,
    output_fields=["title"],
)

REST API

BASE="http://localhost:9091/api/v1"

# List collections
curl $BASE/collections

# Collection info
curl -X POST $BASE/collection \
  -d '{"collectionName": "articles"}'

# Insert
curl -X POST $BASE/entities \
  -H 'Content-Type: application/json' \
  -d '{
    "collectionName": "articles",
    "data": [{"id": 4, "vector": [...], "title": "New Article"}]
  }'

# Search
curl -X POST $BASE/search \
  -H 'Content-Type: application/json' \
  -d '{
    "collectionName": "articles",
    "vector": [...],
    "limit": 5,
    "outputFields": ["title", "category"]
  }'

RAG Pipeline Example

from openai import OpenAI
from pymilvus import MilvusClient

openai = OpenAI()
milvus = MilvusClient(uri="http://localhost:19530")

def embed(text: str) -> list[float]:
    response = openai.embeddings.create(input=text, model="text-embedding-3-small")
    return response.data[0].embedding

def search(query: str, top_k: int = 5) -> list[dict]:
    query_vec = embed(query)
    results = milvus.search(
        collection_name="knowledge_base",
        data=[query_vec],
        limit=top_k,
        output_fields=["text", "source"],
    )
    return [hit['entity'] for hit in results[0]]

def answer(question: str) -> str:
    context = search(question)
    context_text = "\n".join([doc['text'] for doc in context])

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer using this context:\n{context_text}"},
            {"role": "user", "content": question},
        ],
    )
    return response.choices[0].message.content

Key Features

Feature	Details
Scale	Billions of vectors
Indexes	IVF, HNSW, DiskANN, GPU
Search	ANN, range, hybrid
Filtering	Scalar + vector combined
Storage	Memory, disk, tiered
Deployment	Standalone, cluster, cloud

Resources

Building AI applications? Check my Apify actors or email spinov001@gmail.com.

DEV Community