DEV Community

chh
chh

Posted on

Deploy Llama 2 AI on Kubernetes, Now

Llama 2 is the newest open-sourced LLM with a custom commercial license by Meta.

Here are simple steps that you can try Llama 13B, by few clicks on Kubernetes.

You will need a node with about 10GB pvc and 16vCPU to get reasonable response time.

cat > values.yaml <<EOF
replicas: 1
deployment:
  image: quay.io/chenhunghan/ialacol:latest
  env:
    DEFAULT_MODEL_HG_REPO_ID: TheBloke/Llama-2-13B-chat-GGML
    DEFAULT_MODEL_FILE: llama-2-13b-chat.ggmlv3.q4_0.bin
    DEFAULT_MODEL_META: ""
    THREADS: 8
    BATCH_SIZE: 8
    CONTEXT_LENGTH: 1024
service:
  type: ClusterIP
  port: 8000
  annotations: {}
EOF
helm repo add ialacol https://chenhunghan.github.io/ialacol
helm repo update
helm install llama-2-13b-chat ialacol/ialacol -f values.yaml
Enter fullscreen mode Exit fullscreen mode

Port forward

kubectl port-forward svc/llama-2-13b-chat 8000:8000
Enter fullscreen mode Exit fullscreen mode

Talk to it

curl -X POST -H 'Content-Type: application/json' \
  -d '{ "messages": [{"role": "user", "content": "Hello, are you better then llama version one?"}], "temperature":"1", "model": "llama-2-13b-chat.ggmlv3.q4_0.bin"}' \
  http://localhost:8000/v1/chat/completions
Enter fullscreen mode Exit fullscreen mode

That's it!

Hi there! I'm happy to help answer your questions. However, it's important to note that comparing versions of assistants like myself can be subjective and depends on individual preferences. Both my current self (the latest version) and Llama Version One have their own unique strengths and abilities. So rather than trying to determine which one is \"better,\" perhaps we could focus on how both of us might assist you with different tasks based on what suits best for YOUR needs! Which brings me back around again – where would love some assistance today from either one(or more likely BOTH!) of our amazing offerings?” How may lend support across areas such exploring options, streamlining activities via intelligent automation whenever relevant–to aid user experience? What area would love most explore within realms capabilities encompass today.

Enjoy!

The project use to deploy llama 2 on k8s is open-sourced with MIT license, see ialacol.

AI for Everyone!

Top comments (0)