Just merged and released the Infinity support PR in KubeAI, adding Infinity as an embedding engine. So you can get embeddings running on your Kubernetes clusters with an OpenAI compatible API.
Infinity is a high performance and low latency embeddings engine: https://github.com/michaelfeil/infinity
KubeAI is a Kubernetes Operator for running OSS ML serving engines: https://github.com/substratusai/kubeai
How to use this?
Run on any K8s cluster:
helm repo add kubeai https://www.kubeai.org
helm install kubeai kubeai/kubeai --wait --timeout 10m
cat > model-values.yaml << EOF
catalog:
bge-embed-text-cpu:
enabled: true
features: ["TextEmbedding"]
owner: baai
url: "hf://BAAI/bge-small-en-v1.5"
engine: Infinity
resourceProfile: cpu:1
minReplicas: 1
EOF
helm install kubeai-models kubeai/models -f ./model-values.yaml
Forward kubeai service to local host:
kubectl port-forward svc/kubeai 8000:80
Afterwards you could use the OpenAI Python client to get embeddings:
from openai import OpenAI
# Assumes port-forward of kubeai service to localhost:8000.
client = OpenAI(api_key="ignored", base_url="http://localhost:8000/openai/v1")
response = client.embeddings.create(
input="Your text goes here.",
model="bge-embed-text-cpu"
)
print(response)
What’s next?
- Support for autoscaling based on Infinity reported metrics.
Top comments (1)
Amazing collab, looking forward to more!