Karim

Posted on Dec 28, 2024 • Originally published at deep75.Medium on Dec 28, 2024

AIOps : Déboguer son cluster Kubernetes en utilisant l’intelligence artificielle générative via…

#kubernetes #devops #genai #docker

Les avancées dans l’Intelligence Artificielle Générative et sa mise en oeuvre simplifiée via certains outils, transforment profondément la gestion des clusters Kubernetes via ce concept d’ AIOps (acronyme anglais de Artificial Intelligence for IT Operations) énoncé par Gartner, un processus dans lequel vous utilisez des techniques d’intelligence artificielle (IA) pour maintenir par exemple une infrastructure. L’un des domaines dans lesquels les ingénieurs DevOps et les opérateurs de cluster débutants sont souvent confrontés, est relatif aux défis de l’identification, de la compréhension et de la résolution de problèmes au sein d’un cluster Kubernetes …

Dans cet article, nous allons donc explorer comment configurer et utiliser K8sGPT, un outil open source basé sur l’IA générative, en combinaison avec Ollama et le modèle Falcon3, pour identifier et résoudre les problèmes dans un cluster Kubernetes.

Pour cela, je pars ici d’une instance Ubuntu 24.04 LTS dans DigitalOcean :

En y installant localement le moteur Docker …

(base) root@k8sgpt:~# curl -fsSL https://get.docker.com | sh -
# Executing docker install script, commit: 4c94a56999e10efcf48c5b8e3f6afea464f9108e
+ sh -c apt-get -qq update >/dev/null
+ sh -c DEBIAN_FRONTEND=noninteractive apt-get -y -qq install ca-certificates curl >/dev/null
Scanning processes...                                                                                                                                                                         
Scanning candidates...                                                                                                                                                                        
Scanning linux images...                                                                                                                                                                      
+ sh -c install -m 0755 -d /etc/apt/keyrings
+ sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" -o /etc/apt/keyrings/docker.asc
+ sh -c chmod a+r /etc/apt/keyrings/docker.asc
+ sh -c echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu noble stable" > /etc/apt/sources.list.d/docker.list
+ sh -c apt-get -qq update >/dev/null
+ sh -c DEBIAN_FRONTEND=noninteractive apt-get -y -qq install docker-ce docker-ce-cli containerd.io docker-compose-plugin docker-ce-rootless-extras docker-buildx-plugin >/dev/null
Scanning processes...                                                                                                                                                                         
Scanning candidates...                                                                                                                                                                        
Scanning linux images...                                                                                                                                                                      
+ sh -c docker version
Client: Docker Engine - Community
 Version: 27.4.1
 API version: 1.47
 Go version: go1.22.10
 Git commit: b9d17ea
 Built: Tue Dec 17 15:45:46 2024
 OS/Arch: linux/amd64
 Context: default

Server: Docker Engine - Community
 Engine:
  Version: 27.4.1
  API version: 1.47 (minimum version 1.24)
  Go version: go1.22.10
  Git commit: c710b88
  Built: Tue Dec 17 15:45:46 2024
  OS/Arch: linux/amd64
  Experimental: false
 containerd:
  Version: 1.7.24
  GitCommit: 88bf19b2105c8b17560993bee28a01ddc2f97182
 runc:
  Version: 1.2.2
  GitCommit: v1.2.2-0-g7cb3632
 docker-init:
  Version: 0.19.0
  GitCommit: de40ad0

================================================================================

To run Docker as a non-privileged user, consider setting up the
Docker daemon in rootless mode for your user:

    dockerd-rootless-setuptool.sh install

Visit https://docs.docker.com/go/rootless/ to learn about rootless mode.

To run the Docker daemon as a fully privileged service, but granting non-root
users access, refer to https://docs.docker.com/go/daemon-access/

WARNING: Access to the remote API on a privileged Docker daemon is equivalent
         to root access on the host. Refer to the 'Docker daemon attack surface'
         documentation for details: https://docs.docker.com/go/attack-surface/

================================================================================

Et je suis capable d’y lancer directmeent Ollama via son image officiellle pour exécuter des modèles de langage de grande taille (LLM) localement en y utilisant les CPU premium d’Intel présents dans l’instance Ubuntu :

Ollama is now available as an official Docker image · Ollama Blog

(base) root@k8sgpt:~# docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:latest
Unable to find image 'ollama/ollama:latest' locally
latest: Pulling from ollama/ollama
6414378b6477: Pull complete 
9423a26b200c: Pull complete 
629da9618c4f: Pull complete 
00b71e3f044c: Pull complete 
Digest: sha256:18bfb1d605604fd53dcad20d0556df4c781e560ebebcd923454d627c994a0e37
Status: Downloaded newer image for ollama/ollama:latest
7b09d9fcdacff4319e553c41f741a15266eb5a5ec745959363e7754c53a203ef

(base) root@k8sgpt:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7b09d9fcdacf ollama/ollama:latest "/bin/ollama serve" About a minute ago Up About a minute 0.0.0.0:11434->11434/tcp, :::11434->11434/tcp ollama


(base) root@k8sgpt:~# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name    
tcp 0 0 0.0.0.0:11434 0.0.0.0:* LISTEN 46227/docker-proxy  
tcp 0 0 127.0.0.54:53 0.0.0.0:* LISTEN 744/systemd-resolve 
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 744/systemd-resolve 
tcp6 0 0 :::22 :::* LISTEN 1/init              
tcp6 0 0 :::11434 :::* LISTEN 46235/docker-proxy  
udp 0 0 127.0.0.54:53 0.0.0.0:* 744/systemd-resolve 
udp 0 0 127.0.0.53:53 0.0.0.0:* 744/systemd-resolve

Et en y chargeant le grand modèle Falcon 3 développé par le Technology Innovation Institute (TII) d’Abu Dhab et présent dans la bibliothèque de grands modèles d’Ollama :

(base) root@k8sgpt:~# docker exec -it ollama ollama pull falcon3
pulling manifest 
pulling 3717a52b7aea... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.6 GB                         
pulling 803b5adc3448... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 218 B                         
pulling 58f83c52a4e3... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 13 KB                         
pulling 35e31ed4c388... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 101 B                         
pulling acb75345e14b... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 487 B                         
verifying sha256 digest 
writing manifest 
success 
(base) root@k8sgpt:~# docker exec -it ollama ollama list
NAME ID SIZE MODIFIED       
falcon3:latest 472ea1c89f64 4.6 GB 11 seconds ago

Je peux à cette étape y charger K8sGPT, un outil pour scanner vos clusters kubernetes, diagnostiquer et trier les problèmes en anglais simple. Il dispose d’une expérience SRE codifiée dans ses analyseurs et aide à extraire les informations les plus pertinentes pour les enrichir avec de l’IA générative.

K8sGPT fonctionne en trois étapes :

Extraction : Récupération des détails de configuration de toutes les charges de travail déployées dans le cluster.
Filtration : Un composant appelé « analyzer » filtre les données nécessaires.
Génération : Les données filtrées sont traitées pour générer des insights et des rapports en anglais simple.

K8sGPT

(base) root@k8sgpt:~# curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.48/k8sgpt_amd64.deb
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 34.5M 100 34.5M 0 0 30.5M 0 0:00:01 0:00:01 --:--:-- 30.5M
(base) root@k8sgpt:~# dpkg -i k8sgpt_amd64.deb 
(Reading database ... 74447 files and directories currently installed.)
Preparing to unpack k8sgpt_amd64.deb ...
Unpacking k8sgpt (0.3.48) over (0.3.48) ...
Setting up k8sgpt (0.3.48) ...

(base) root@k8sgpt:~# k8sgpt
Kubernetes debugging powered by AI

Usage:
  k8sgpt [command]

Available Commands:
  analyze This command will find problems within your Kubernetes cluster
  auth Authenticate with your chosen backend
  cache For working with the cache the results of an analysis
  completion Generate the autocompletion script for the specified shell
  custom-analyzer Manage a custom analyzer
  dump Creates a dumpfile for debugging issues with K8sGPT
  filters Manage filters for analyzing Kubernetes resources
  generate Generate Key for your chosen backend (opens browser)
  help Help about any command
  integration Integrate another tool into K8sGPT
  serve Runs k8sgpt as a server
  version Print the version number of k8sgpt

Flags:
      --config string Default config file (/root/.config/k8sgpt/k8sgpt.yaml)
  -h, --help help for k8sgpt
      --kubeconfig string Path to a kubeconfig. Only required if out-of-cluster.
      --kubecontext string Kubernetes context to use. Only required if out-of-cluster.

Use "k8sgpt [command] --help" for more information about a command.

La ligne de commande est alors disponible pour y vérifier les fournisseurs disponibles localement dans cette instance :

Installation


(base) root@k8sgpt:~# k8sgpt auth list
Default: 
> openai
Active: 
Unused: 
> openai
> localai
> ollama
> azureopenai
> cohere
> amazonbedrock
> amazonsagemaker
> google
> noopai
> huggingface
> googlevertexai
> oci
> ibmwatsonxai

Ollama sera utilisé comme fournisseur de backend IA pour K8sGPT au travers de LocalAI (qui agit comme l’API REST de remplacement compatible avec les spécifications de l’API OpenAI pour une inférence locale).

Voici la commande pour configurer K8sGPT avec Ollama et le modèle Falcon3 :

(base) root@k8sgpt:~# k8sgpt auth add --backend localai --model falcon3 --baseurl http://localhost:11434/v1
localai added to the AI backend provider list

Je lance un cluster Kubernetes managé sur DigitalOcean via DigitalOcean Kubernetes (DOKS) :

DigitalOcean Managed Kubernetes | Starting at $12/mo.

Récupération du fichier Kubeconfig depuis ce cluster pour l’insérer localement sur l’instance Ubuntu pour l’utiliser avec le client Kubectl :

(base) root@k8sgpt:~# curl -LO https://dl.k8s.io/release/v1.32.0/bin/linux/amd64/kubectl && chmod +x ./kubectl && mv kubectl /usr/local/bin/ && kubectl
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 138 100 138 0 0 1000 0 --:--:-- --:--:-- --:--:-- 1007
100 54.6M 100 54.6M 0 0 120M 0 --:--:-- --:--:-- --:--:-- 120M
kubectl controls the Kubernetes cluster manager.

 Find more information at: https://kubernetes.io/docs/reference/kubectl/

Basic Commands (Beginner):
  create Create a resource from a file or from stdin
  expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes service
  run Run a particular image on the cluster
  set Set specific features on objects

Basic Commands (Intermediate):
  explain Get documentation for a resource
  get Display one or many resources
  edit Edit a resource on the server
  delete Delete resources by file names, stdin, resources and names, or by resources and label selector

Deploy Commands:
  rollout Manage the rollout of a resource
  scale Set a new size for a deployment, replica set, or replication controller
  autoscale Auto-scale a deployment, replica set, stateful set, or replication controller

Cluster Management Commands:
  certificate Modify certificate resources
  cluster-info Display cluster information
  top Display resource (CPU/memory) usage
  cordon Mark node as unschedulable
  uncordon Mark node as schedulable
  drain Drain node in preparation for maintenance
  taint Update the taints on one or more nodes

Troubleshooting and Debugging Commands:
  describe Show details of a specific resource or group of resources
  logs Print the logs for a container in a pod
  attach Attach to a running container
  exec Execute a command in a container
  port-forward Forward one or more local ports to a pod
  proxy Run a proxy to the Kubernetes API server
  cp Copy files and directories to and from containers
  auth Inspect authorization
  debug Create debugging sessions for troubleshooting workloads and nodes
  events List events

Advanced Commands:
  diff Diff the live version against a would-be applied version
  apply Apply a configuration to a resource by file name or stdin
  patch Update fields of a resource
  replace Replace a resource by file name or stdin
  wait Experimental: Wait for a specific condition on one or many resources
  kustomize Build a kustomization target from a directory or URL

Settings Commands:
  label Update the labels on a resource
  annotate Update the annotations on a resource
  completion Output shell completion code for the specified shell (bash, zsh, fish, or powershell)

Subcommands provided by plugins:

Other Commands:
  api-resources Print the supported API resources on the server
  api-versions Print the supported API versions on the server, in the form of "group/version"
  config Modify kubeconfig files
  plugin Provides utilities for interacting with plugins
  version Print the client and server version information

Usage:
  kubectl [flags] [options]

Use "kubectl <command> --help" for more information about a given command.
Use "kubectl options" for a list of global command-line options (applies to all commands).

(base) root@k8sgpt:~# kubectl cluster-info
Kubernetes control plane is running at https://738af175-32d4-43e9-9e31-b7ae3058be3e.k8s.ondigitalocean.com
CoreDNS is running at https://738af175-32d4-43e9-9e31-b7ae3058be3e.k8s.ondigitalocean.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
(base) root@k8sgpt:~# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
pool-kuaxj3k47-ejvl3 Ready <none> 6m40s v1.31.1 10.110.0.3 159.65.200.52 Debian GNU/Linux 12 (bookworm) 6.1.0-27-amd64 containerd://1.6.31
pool-kuaxj3k47-ejvl8 Ready <none> 6m38s v1.31.1 10.110.0.2 164.92.147.176 Debian GNU/Linux 12 (bookworm) 6.1.0-27-amd64 containerd://1.6.31
(base) root@k8sgpt:~# kubectl get po,svc -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/cilium-4gxht 1/1 Running 0 6m46s
kube-system pod/cilium-lw8gd 1/1 Running 0 6m48s
kube-system pod/coredns-c5c6457c-bnzfc 0/1 Running 0 36s
kube-system pod/coredns-c5c6457c-nz6gr 0/1 Running 0 36s
kube-system pod/cpc-bridge-proxy-ebpf-7ncbq 1/1 Running 0 55s
kube-system pod/cpc-bridge-proxy-ebpf-qth8w 1/1 Running 0 55s
kube-system pod/hubble-relay-67597fb8-kmlw5 1/1 Running 1 (51s ago) 8m40s
kube-system pod/hubble-ui-79957d9f7b-4n9kj 2/2 Running 0 74s
kube-system pod/konnectivity-agent-7ml7p 1/1 Running 0 61s
kube-system pod/konnectivity-agent-tnf8j 1/1 Running 0 61s
kube-system pod/kube-proxy-ebpf-4gt2z 1/1 Running 0 6m48s
kube-system pod/kube-proxy-ebpf-ztjql 1/1 Running 0 6m46s

NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.108.32.1 <none> 443/TCP 9m54s
kube-system service/hubble-peer ClusterIP 10.108.54.62 <none> 443/TCP 8m40s
kube-system service/hubble-relay ClusterIP 10.108.41.20 <none> 80/TCP 8m40s
kube-system service/hubble-ui ClusterIP 10.108.52.164 <none> 80/TCP 8m40s
kube-system service/kube-dns ClusterIP 10.108.32.10 <none> 53/UDP,53/TCP,9153/TCP 36s

Installation d’Headlamp, une interface web qui remplace le dashboard traditionnel de Kubernetes, facile à utiliser et extensible.

Headlamp a été créé pour combiner les fonctionnalités traditionnelles des autres interfaces web et tableaux de bord (c’est-à-dire pour lister et visualiser les ressources) avec des fonctionnalités supplémentaires.

(base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/kinvolk/headlamp/main/kubernetes-headlamp.yaml
service/headlamp created
deployment.apps/headlamp created
secret/headlamp-admin created
(base) root@k8sgpt:~# kubectl get po,svc -n kube-system
NAME READY STATUS RESTARTS AGE
pod/cilium-4gxht 1/1 Running 0 11m
pod/cilium-lw8gd 1/1 Running 0 11m
pod/coredns-c5c6457c-bnzfc 1/1 Running 0 5m7s
pod/coredns-c5c6457c-nz6gr 1/1 Running 0 5m7s
pod/cpc-bridge-proxy-ebpf-7ncbq 1/1 Running 0 5m26s
pod/cpc-bridge-proxy-ebpf-qth8w 1/1 Running 0 5m26s
pod/csi-do-node-gxlmb 2/2 Running 0 4m24s
pod/csi-do-node-swqfv 2/2 Running 0 4m24s
pod/do-node-agent-7bgsh 1/1 Running 0 4m11s
pod/do-node-agent-hwt6l 1/1 Running 0 4m11s
pod/headlamp-7dfd97b98b-wmn66 1/1 Running 0 48s
pod/hubble-relay-67597fb8-kmlw5 1/1 Running 1 (5m22s ago) 13m
pod/hubble-ui-79957d9f7b-4n9kj 2/2 Running 0 5m45s
pod/konnectivity-agent-7ml7p 1/1 Running 0 5m32s
pod/konnectivity-agent-tnf8j 1/1 Running 0 5m32s
pod/kube-proxy-ebpf-4gt2z 1/1 Running 0 11m
pod/kube-proxy-ebpf-ztjql 1/1 Running 0 11m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/headlamp ClusterIP 10.108.38.247 <none> 80/TCP 48s
service/hubble-peer ClusterIP 10.108.54.62 <none> 443/TCP 13m
service/hubble-relay ClusterIP 10.108.41.20 <none> 80/TCP 13m
service/hubble-ui ClusterIP 10.108.52.164 <none> 80/TCP 13m
service/kube-dns ClusterIP 10.108.32.10 <none> 53/UDP,53/TCP,9153/TCP 5m7s

en l’exposant localement et récupérant la clé nécessaire à l’accès à son interface web :

(base) root@k8sgpt:~# nohup kubectl port-forward -n kube-system service/headlamp 8080:80 &
(base) root@k8sgpt:~# cat nohup.out 
Forwarding from 127.0.0.1:8080 -> 4466
Forwarding from [::1]:8080 -> 4466

(base) root@k8sgpt:~# kubectl -n kube-system create serviceaccount headlamp-admin
serviceaccount/headlamp-admin created

(base) root@k8sgpt:~# kubectl create clusterrolebinding headlamp-admin --serviceaccount=kube-system:headlamp-admin --clusterrole=cluster-admin
clusterrolebinding.rbac.authorization.k8s.io/headlamp-admin created

(base) root@k8sgpt:~# kubectl create token headlamp-admin -n kube-system
eyJhbGciOiJSUzI1NiIsImtpZCI6Ikd0UHltNHV6c1liSkY0VkhNRWFZMXJKYklqY1R6ckZrMDZQS281dEg3dUUifQ.eyJhdWQiOlsic3lzdGVtOmtvbm5lY3Rpdml0eS1zZXJ2ZXIiXSwiZXhwIjoxNzM1MzkyNDEwLCJpYXQiOjE3MzUzODg4MTAsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiNTVkNjI1ZTItNjA2Yi00MTNhLTk2OTgtODFmYjdjZDU4MWY4Iiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJoZWFkbGFtcC1hZG1pbiIsInVpZCI6IjdmOGQwMTU0LWRiYWQtNGU2MS04NTUzLWU1NWI3ZWU0ZjhlOSJ9fSwibmJmIjoxNzM1Mzg4ODEwLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06aGVhZGxhbXAtYWRtaW4ifQ.jbrtuXS7uMP6HfwR3CbIbnpRTq4CDaacq0okwm_4tvmJNNcExi9-Dti3cGj1J3tteszpxVzurWPhrWgFlL4UkEacY9fD1TRH4GAZDCFldJ_jvyeaclzGeymrjEGAZ9TbBdoyuXtLeIVhApdICF1KNM-s8mfr1oOREDwlR9HzzrhoECozYxVS9uM1WIEZpum4FwMEl6cKPqOyNx1Rn5MtKPcc87JyK0FxuXzg9WC-cPSNOxu_rUFrZYHyrVapCDpl_XLymD3pFUUuB8XPVidVXcVOthH1Djwm8TRE6aAD4XlkHTcyTYchvN_CpOI2JQ6DVY60unSU8nq2pxfqLC6G2Q

Avant de lancer une analyse via K8sGPT, introduction d’un problème dans le cluster Kubernetes pour simuler une situation réelle. Vous pouvez utiliser des exemples de déploiements défectueux disponibles sur des répositories comme celui de Robusta :

GitHub - robusta-dev/kubernetes-demos: YAMLs for creating Kubernetes errors and other scenarios

Exemple avec ce déploiement d’un Pod cassé via :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-processing-worker
spec:
  replicas: 1
  selector:
    matchLabels:
      app: payment-processing-worker
  template:
    metadata:
      labels:
        app: payment-processing-worker
    spec:
      containers:
      - name: payment-processing-container
        image: bash
        command: ["/bin/sh"]
        args: ["-c", "if [[-z \"${DEPLOY_ENV}\"]]; then echo Environment variable DEPLOY_ENV is undefined ; else while true; do echo hello; sleep 10;done; fi"]

base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml
deployment.apps/payment-processing-worker created

(base) root@k8sgpt:~# kubectl get po
NAME READY STATUS RESTARTS AGE
payment-processing-worker-747ccfb9db-dzjqx 0/1 CrashLoopBackOff 1 (11s ago) 17s

(base) root@k8sgpt:~# kubectl logs po/payment-processing-worker-747ccfb9db-dzjqx
Environment variable DEPLOY_ENV is undefined

Il s’est en effet crashé …

Lancement de l’analyse avec une sortie en JSON :

(base) root@k8sgpt:~# k8sgpt analyze -o json --explain --filter=Pod --backend localai | jq .

{
  "provider": "localai",
  "errors": null,
  "status": "ProblemDetected",
  "problems": 1,
  "results": [
    {
      "kind": "Pod",
      "name": "default/payment-processing-worker-747ccfb9db-dzjqx",
      "error": [
        {
          "Text": "the last termination reason is Completed container=payment-processing-container pod=payment-processing-worker-747ccfb9db-dzjqx",
          "KubernetesDoc": "",
          "Sensitive": []
        }
      ],
      "details": "Error: The pod \"payment-processing-worker-747ccfb9db-dzjqx\" has completed its execution with a \"Completed\" termination reason, indicating the container \"payment-processing-container\" has finished successfully.\n\nSolution: Verify the logs for the container to ensure data integrity, then check related services for expected outcomes; if successful, mark the pod as ready in the cluster.",
      "parentObject": "Deployment/payment-processing-worker"
    }
  ]
}

ou en mode texte :

(base) root@k8sgpt:~# k8sgpt analyze --explain --backend localai --with-doc
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (1/1, 7960 it/s)        
AI Provider: localai

0: Pod default/payment-processing-worker-747ccfb9db-dzjqx(Deployment/payment-processing-worker)
- Error: the last termination reason is Completed container=payment-processing-container pod=payment-processing-worker-747ccfb9db-dzjqx
Error: The pod "payment-processing-worker-747ccfb9db-dzjqx" has completed its execution with a "Completed" termination reason, indicating the container "payment-processing-container" has finished successfully.

Solution: Verify the logs for the container to ensure data integrity, then check related services for expected outcomes; if successful, mark the pod as ready in the cluster.

Comme le montre cette sortie, K8sGPT a identifié et signalé le pod problématique et a également fourni des conseils sur les mesures potentielles à prendre pour comprendre et résoudre le problème.

Autre exemple avec ce Pod Nginx problématique :

apiVersion: v1
kind: Pod
metadata:
  name: inventory-management-api
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
    command:
      - wge
      - "-O"
      - "/work-dir/index.html"
      - https://home.robusta.dev

(base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/refs/heads/main/crashloop_backoff/create_crashloop_backoff.yaml
pod/inventory-management-api created

(base) root@k8sgpt:~# kubectl get po
NAME READY STATUS RESTARTS AGE
inventory-management-api 0/1 ContainerCreating 0 5s

(base) root@k8sgpt:~# kubectl get po
NAME READY STATUS RESTARTS AGE
inventory-management-api 0/1 RunContainerError 1 (1s ago) 10s

Et une nouvelle analyse met le doigt sur le Pod problématique …

(base) root@k8sgpt:~# k8sgpt analyze --explain --backend localai --with-doc
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (1/1, 2 it/min)         
AI Provider: localai

0: Pod default/inventory-management-api()
- Error: the last termination reason is StartError container=nginx pod=inventory-management-api
Error: The Kubernetes error indicates that there was a StartError issue with the nginx container for the pod named inventory-management-api.

Solution: 
1. Check the nginx configuration file for syntax errors.
2. Ensure all required resources and permissions are correctly set.
3. Verify network accessibility within the pod.
4. Confirm proper image pull secrets if using Docker images.
5. Review any recent changes to the deployment or service configurations.

Dans cet autre exemple il peut ne pas détecter de problématique avec cette simulation de faux positif via busybox :

apiVersion: batch/v1
kind: Job
metadata:
  name: java-api-checker
spec:
  template:
    spec:
      containers:
      - name: java-beans
        image: busybox
        command: ["/bin/sh", "-c"]
        args: ["echo 'Java Network Exception: \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256'; sleep 60; exit 1"]
      restartPolicy: Never
  backoffLimit: 1

(base) root@k8sgpt:~# kubectl delete -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/refs/heads/main/crashloop_backoff/create_crashloop_backoff.yaml
pod "inventory-management-api" deleted
(base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/refs/heads/main/job_failure/job_crash.yaml
job.batch/java-api-checker created

(base) root@k8sgpt:~# kubectl get po
NAME READY STATUS RESTARTS AGE
java-api-checker-5s6dc 1/1 Running 0 7s
(base) root@k8sgpt:~# kubectl logs po/java-api-checker-5s6dc
Java Network Exception: 
All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 
All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 
All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 
All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256

(base) root@k8sgpt:~# k8sgpt analyze --explain --backend localai --with-doc
AI Provider: localai

No problems detected

Pour une intégration plus complète, vous pouvez installer l’opérateur K8sGPT dans votre cluster Kubernetes. Cet opérateur surveille en continu les problèmes dans le cluster et génère des insights que vous pouvez consulter en interrogeant la ressource personnalisée (CR) de l'opérateur.

GitHub - k8sgpt-ai/k8sgpt-operator: Automatic SRE Superpowers within your Kubernetes cluster

$ helm repo add k8sgpt https://charts.k8sgpt.ai/
$ helm repo update
$ helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace

# Installer l'opérateur k8sGPT
$ kubectl apply -n k8sgpt-operator-system -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-ollama
spec:
  ai:
    enabled: true
    model: falcon3
    backend: localai
    baseUrl: http://localhost:11434/v1
  noCache: false
  filters: ["Pod"]
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.48
EOF

Debugging your Rancher Kubernetes Cluster the GenAI Way w...

Il est également possible d’analyser plusieurs clusters Kubernetes en spécifiant le chemin vers le fichier Kubeconfig concerné :

$ k8sgpt analyze --explain --backend localai --with-doc --kubeconfig <chemin vers le fichier Kubeconfig>

L’opérateur recherchera les problèmes dans le cluster et générera des résultats d’analyse. En fonction de la puissance de votre machine (pour accélerer les temps de réponse d’Ollama, des ressources GPU sont nécessaires), il faut un certain temps à l’opérateur pour appeler le LLM et générer les informations …

Pour conclure, K8sGPT en combinaison avec Ollama offre une solution puissante pour déboguer et gérer les clusters Kubernetes de manière efficace. Cette intégration utilise l'intelligence artificielle pour fournir des insights clairs et des recommandations pour résoudre les problèmes, simplifiant ainsi la vie des opérateurs de cluster. En suivant ces étapes, vous pouvez mettre en place une solution de diagnostic automatisée et basée sur l'IA pour votre environnement Kubernetes …

À suivre !

DEV Community

AIOps : Déboguer son cluster Kubernetes en utilisant l’intelligence artificielle générative via…

Top comments (0)