Deploy Llama 2 AI on Kubernetes, Now

#ai #llama #kubernetes

Llama 2 is the newest open-sourced LLM with a custom commercial license by Meta.

Here are simple steps that you can try Llama 13B, by few clicks on Kubernetes.

You will need a node with about 10GB pvc and 16vCPU to get reasonable response time.

cat > values.yaml <<EOF
replicas: 1
deployment:
  image: quay.io/chenhunghan/ialacol:latest
  env:
    DEFAULT_MODEL_HG_REPO_ID: TheBloke/Llama-2-13B-chat-GGML
    DEFAULT_MODEL_FILE: llama-2-13b-chat.ggmlv3.q4_0.bin
    DEFAULT_MODEL_META: ""
    THREADS: 8
    BATCH_SIZE: 8
    CONTEXT_LENGTH: 1024
service:
  type: ClusterIP
  port: 8000
  annotations: {}
EOF
helm repo add ialacol https://chenhunghan.github.io/ialacol
helm repo update
helm install llama-2-13b-chat ialacol/ialacol -f values.yaml

Port forward

kubectl port-forward svc/llama-2-13b-chat 8000:8000

Talk to it

curl -X POST -H 'Content-Type: application/json' \
  -d '{ "messages": [{"role": "user", "content": "Hello, are you better then llama version one?"}], "temperature":"1", "model": "llama-2-13b-chat.ggmlv3.q4_0.bin"}' \
  http://localhost:8000/v1/chat/completions

That's it!

Hi there! I'm happy to help answer your questions. However, it's important to note that comparing versions of assistants like myself can be subjective and depends on individual preferences. Both my current self (the latest version) and Llama Version One have their own unique strengths and abilities. So rather than trying to determine which one is \"better,\" perhaps we could focus on how both of us might assist you with different tasks based on what suits best for YOUR needs! Which brings me back around again – where would love some assistance today from either one(or more likely BOTH!) of our amazing offerings?” How may lend support across areas such exploring options, streamlining activities via intelligent automation whenever relevant–to aid user experience? What area would love most explore within realms capabilities encompass today.

Enjoy!

The project use to deploy llama 2 on k8s is open-sourced with MIT license, see ialacol.

AI for Everyone!

Top comments (1)

M S • Jan 14

This guide is an excellent resource for deploying LLaMA 2 on Kubernetes! The detailed steps using Helm charts and the straightforward instructions make it highly accessible, even for those new to Kubernetes. The ialacol project is a great open-source solution for running LLaMA 2 efficiently.

For those interested in exploring other deployment methods, there’s a helpful YouTube tutorial available that covers deploying LLaMA 3.2 locally using Docker. It walks through everything from setting up Docker to configuring tools like Jupyter and Anaconda, providing an easy-to-follow approach for experimenting with LLMs in a local environment. You can check it out here: Deploy Llama.

This offers a great comparison for those deciding between Kubernetes and Docker-based deployments.