DEV Community: Yash Panchal

Understanding the significance of the CPU in the age of GPU

Yash Panchal — Tue, 03 Mar 2026 15:15:47 +0000

Why does CPU matter ?

With the exponential growth of AI models the primary compute component that is being focused right now is GPU. Most of the people are out there gauging for the latest GPU benchmarks and talking about how to use GPUs. However the significance of CPU is being undermined due to this hype.

Most new folks (students and new engineers) are emphasising more on the GPU benchmarks than understanding the system as a whole.

While GPU capacity matters a alot, we are reaching a point where Compute is cheap while the loading and unloading of the data is becoming expensive in terms of time.

CPUs are still critical component

Let us understand why CPU is essential to any AI workload.

Understanding the flow of an opensource model inference:

Downloading model from the internet (Gets stored in the Harddrive)
Running AI workload :
1. The stored model is moved from Harddrive to the CPU memory first
2. CPU then moves this stored model from CPU memory to GPU memory
3. Model needs input parameters these are transferred from CPU memory to the GPU memory.
4. Execution of the workload (CPU tells the GPU to use the stored model and start execution)
5. Post execution of the workload the output resides in the GPU memory.
6. This stored output is moved from GPU memory to CPU memory.

The steps 1. and 2. are a one time steps, but the steps 3 to 6 happen during each inference request.

In the entire flow above the shots are called by CPU.

Beyond nvidia-smi part — 1

Yash Panchal — Thu, 19 Feb 2026 18:02:43 +0000

Common pitfalls and methods to measure GPU efficiency.

This will be a two part series regarding monitoring a GPU

GPUs are much simple compared to CPUs

Bunch of simple compute intensive hardware but in a massive proportion relative to memory and other decoder components within the same chipset.

GPU = Compute (Massive portion of the chip) + Memory + Decoders

Compute is commonly refered to Streaming Multiprocessors (SM) in case of NVIDIA GPUs

Monitoring normal servers vs GPUs:

One can monitor normal linux server using utilities like htop/top or using exporters like node-exporter to get basic usage idea like CPU% RAM no of VCPUs, etc.

However GPUs are not that staightforward to monitor.

Common bad habit that one might encounter is to use nvidia-smi utility just like we use htop/top.

Though it will work for identifying if GPU is doing any work at all, it won't give you any idea regarding efficiency of your GPU.

So what are these common mistakes ?

1. Relying solely on nvidia-smi

nvidia-smi is a nice utility, however when it comes to monitoring your GPU metrics it is myopic.

The GPU-Util % that you see using nvidia-smi tells you if any CUDA kernel is active at a point in time, It will not help you in identifying efficiency of your GPU.

Figure A taken from my FOSDEM 26 talk: Beyond nvidia-smi: Tools for Real GPU Performance Metrics https://fosdem.org/2026/schedule/event/BBYZLU-gpu-performance-monitoring/

Even if you would be doing a simple matrix multiplication on an H100 GPU you will still see GPU-Util to be at 100%.

This is something that might lead a newbie to conclude that the GPU is being fully used.

One such example is trying to run a Wan2.2 TI2V 5B parameter video generation model on an 80GB H100 GPU, You might see 23GB VRAM usage but 100% GPU-Util using nvidia-smi will make you question the efficiency of your GPU compute.

Case of FP16 vs FP32 performance:

Figure B & C taken from my FOSDEM 26 talk: Beyond nvidia-smi: Tools for Real GPU Performance Metrics https://fosdem.org/2026/schedule/event/BBYZLU-gpu-performance-monitoring/In the Figure B and C both show 100% GPU-Util for simple FP16 and FP32 matrix multiplication python scripts.

Both are showing 100% but wait ! What exactly is being used by our GPU ?

2. Not identifying workload relevant metrics to monitor

Identifying what workload is supported by your GPU is essential to speeding up your inference workload.

If you are aware that your GPU supports Tensor Cores and your workload can be modified to utilize these Tensor Cores it can significantly speedup your inference.

Before we get into the details of the identification of relevant metrics to monitor, we first need to understand what is actually being supported by our GPU.

Basic performance specs of the GPU are generally available in the whitepaper by the provider.

Table 1 taken from H100 gtc22 whitepaper

As we can see from Table 1 for our H100,
There are two divisions:

Tensor Cores Performance (Explicitly mentioned as Tensor Core)
CUDA Cores Performance

Tensor Cores are significantly faster ~8–10x than CUDA Cores.

So ideally whenever possible we should be using Tensor Cores that will significantly improve your inference performance.

In our case we ran FP16 workload that should use Tensor Cores and no CUDA Cores while FP32 should use the CUDA Cores and no Tensor cores.

While nvidia-smi is showing us 100% in both cases we would like to know if our workload is using the Tensor cores or not.
nvidia-smi is not able to show us that info, So what option do we have now ?

DCGM is our saviour !

DCGM stands for Data Center GPU Manager it is a suite of utility provided by nvidia for monitoring GPU.

Figure B & C monitored using DCGM

We can now see that Tensor Cores are not used in FP32 workload while FP16 is showing Tensor Core usage.

This is what can help us in understanding depth of our workload.

Using nvidia-smi we were not able to monitor if Tensor Cores were used or not.
I recently gave a talk at FOSDEM 2026 regarding this: https://fosdem.org/2026/schedule/event/BBYZLU-gpu-performance-monitoring/

Kubernetes components and their usage

Yash Panchal — Mon, 19 Jun 2023 05:58:54 +0000

Basic Kubernetes components and their use utility.

K8s Components

k8s-control-plane:

Responsible for managing the cluster.
Can be on a single machine or can span across multiple servers (Usually across dedicated controller machines).

Kubernetes Node (Control Plane Node)

Control plane node is the backbone of the entire kubernetes cluster.

Following are the 5 main control plane components:-

kube-api-server:

Serves the K8s API
Primary interface to interact with k8s cluster and is a central component.

etcd:

HA backend data store for the k8s cluster.
data stored is related to the state of the cluster.

kube-scheduler

handles scheduling: selecting the node to run containers.

kube-controller-manager

Single process
Runs collection of controller utilities.
- controllers: execute various automation related tasks inside the cluster.

cloud-controller-manager

provides interfaces between the cluster and cloud providers.
useful only when you need to interact with cloud providers from the cluster.

These components can be on same server or can be spread across multiple servers.

In case of HA requirement these components can have more than single instance running simultaneously across multiple servers.

Kubernetes Node (Worker Nodes)

containers managed by cluster run here.
can have any no of nodes.
has node components on the server that communicate with the control plane via kube-api-server

Node components:-

kubelet
- k8s agent
- ensures that containers are run on its node.
- communicates with control plane and follows what it commands.
- communicates with control plane regarding node status and various info of the running containers.
container-runtime
- container software for actual container process.
- ex: docker and contianerd
kube-proxy
- Network proxy
- runs on each node
- handles networking between containers and servers in the cluster

local k8s setup using vagrant.

Yash Panchal — Mon, 12 Jun 2023 16:52:47 +0000

Steps to initialize k8s cluster

Initialize the k8s cluster on node1

       kubeadm init --apiserver-advertise-address=192.168.1.101 --pod-network-cidr=192.168.1.0/24 --v=5


       IPADDR="192.168.1.101"
       NODENAME=$(hostname -s)
       POD_CIDR="192.168.1.0/16"

       sudo kubeadm init --apiserver-advertise-address=$IPADDR  --apiserver-cert-extra-sans=$IPADDR  --pod-network-cidr=$POD_CIDR --node-name $NODENAME --ignore-preflight-errors Swap

    To start using your cluster, you need to run the following as a regular user:

       mkdir -p $HOME/.kube
       sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
       sudo chown $(id -u):$(id -g) $HOME/.kube/config

    Alternatively, if you are the root user, you can run:

       export KUBECONFIG=/etc/kubernetes/admin.conf

Setup Calico/WEAVE Net networking interface

Use any one of the following CNI setup, I used Weavenet

Calico

       kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/calico.yaml

  Weavenet

       kubectl apply -f https://github.com/weaveworks/weave/releases/download/v2.8.1/weave-daemonset-k8s.yaml

Join the other nodes with the node1

Paste the output from the node 1 to join the node 1

Vagrantfile and Scripts for setup

Ensure that the Vagrantfile and setup.sh are in the same folder.

Vagrantfile


    Vagrant.configure("2") do |config|
      config.vm.define "db1" do |db1|
        db1.vm.box = "ubuntu/focal64"
        db1.vm.hostname = 'db1'
        db1.vm.box_url = "ubuntu/focal64"
        db1.vm.provision "shell", path: "setup-k8s.sh"
        db1.vm.provision "shell", path: "master.sh"
        db1.vm.network "public_network", ip: '192.168.1.101', bridge: 'enp11s0'
        db1.vm.provider :virtualbox do |v|
          v.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
          v.customize ["modifyvm", :id, "--memory", 2000]
          v.customize ["modifyvm", :id, "--name", "db1"]
        end
      end

      config.vm.define "db2" do |db2|
        db2.vm.box = "ubuntu/focal64"
        db2.vm.hostname = 'db2'
        db2.vm.box_url = "ubuntu/focal64"
        db2.vm.provision "shell", path: "setup-k8s.sh"
        db2.vm.provision "shell", path: "worker.sh"
        db2.vm.network "public_network", ip: '192.168.1.102', bridge: 'enp11s0'
        db2.vm.provider :virtualbox do |v|
          v.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
          v.customize ["modifyvm", :id, "--memory", 2000]
          v.customize ["modifyvm", :id, "--name", "db2"]

        end
      end

      config.vm.define "db3" do |db3|
        db3.vm.box = "ubuntu/focal64"
        db3.vm.hostname = 'db3'
        db3.vm.box_url = "ubuntu/focal64"
        db3.vm.provision "shell", path: "setup-k8s.sh"
        db3.vm.provision "shell", path: "worker.sh"
        db3.vm.network "public_network", ip: '192.168.1.103', bridge: 'enp11s0'
        db3.vm.provider :virtualbox do |v|
          v.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
          v.customize ["modifyvm", :id, "--memory", 2000]
          v.customize ["modifyvm", :id, "--name", "db3"]

        end
      end
    end

setup-k8s.sh

    #!/bin/bash

    set +xe

    yes y | ssh-keygen -q -t rsa -N '' >/dev/null

    echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC4ns0dEv0sJV+rMDftaaTDwsj2y0hf0/vOsPepy+YJzFW4B8dgTa75bN12uexH78Xcth06MkOCiB3iOuIkoxEcQx8JMUiUCiIpNSWTTTjxu4zhx6k68Fw6eczbbBoXenNO6i7lCB1rXsd2NO4JgOEMobi6IzdkOXINV3LX5Pu3zrbxOKSeTIKnVEt3kK0/yrvCEKAg8lyGIuZ6Xh6zOLkbhQGpWDNexQa8kx4K/2QN98dNWAFktihcy1UOZJ4ha17MEsDRxyNb5lixWurv23/BpjbaiywpQbmZ+hAfS3wN2hxMSuP4pwkoCiRBvQjT7fD5jeMJ3YiYVv56VBbf0TAAcLentCowfzEdwPYyExma0J0PXmregNPlaw38KcmlSmUfXn77XRIgJ70aAcq3MscsqlKpIN7AYYbTBuDj/7ENpI8dsJarNWmeHMlfoi0mwI9izPnJim3XODdGWAZlV0CXvG2NpmzASxuKYrf8occNtyjjrD/Fn5DBHuD6PbJn8KE= yash@yash-ThinkPad-P15-Gen-2i" >> ~/.ssh/authorized_keys

    cat <<EOF >> /etc/hosts
    192.168.1.101 db1
    192.168.1.102 db2
    192.168.1.103 db3
    EOF
    sudo swapoff -a
    sudo apt-get install -y apt-transport-https 
    cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
    overlay
    br_netfilter
    EOF
    sudo modprobe overlay
    sudo modprobe br_netfilter
    # sysctl params required by setup, params persist across reboots
    cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
    net.bridge.bridge-nf-call-iptables  = 1
    net.bridge.bridge-nf-call-ip6tables = 1
    net.ipv4.ip_forward                 = 1
    EOF
    # Apply sysctl params without reboot
    sudo sysctl --system
    sudo apt-get update

    for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do sudo apt-get remove $pkg; done

    sudo apt-get update
    sudo apt-get install ca-certificates curl gnupg -y
    sudo install -m 0755 -d /etc/apt/keyrings
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    sudo chmod a+r /etc/apt/keyrings/docker.gpg
    echo \
    "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
    "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

    sudo apt-get update

    sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y



    curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
    cat << EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
    deb https://apt.kubernetes.io/ kubernetes-xenial main
    EOF
    sudo apt-get update
    sudo apt-get install -y kubelet=1.26.0-00 kubeadm=1.26.0-00 kubectl=1.26.0-00
    sudo apt-mark hold kubelet kubeadm kubectl

    sudo rm /etc/containerd/config.toml
    #https://stackoverflow.com/questions/72504257/i-encountered-when-executing-kubeadm-init-error-issue
    sudo systemctl restart containerd

    apt-get install net-tools -y

master.sh (Setup to provision and initialize control node k8s)

    #!/bin/bash

    kubeadm init --apiserver-advertise-address=192.168.1.101 --pod-network-cidr=192.168.1.0/24 --v=5


    IPADDR="192.168.1.101"
    NODENAME=$(hostname -s)
    POD_CIDR="192.168.1.0/16"

    sudo kubeadm init --apiserver-advertise-address=$IPADDR  --apiserver-cert-extra-sans=$IPADDR  --pod-network-cidr=$POD_CIDR --node-name $NODENAME --ignore-preflight-errors Swap


    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config


    echo "Setting up Weaveworks Network"

    kubectl apply -f https://github.com/weaveworks/weave/releases/download/v2.8.1/weave-daemonset-k8s.yaml

    kubectl get nodes
    kubectl get pods --all-namespaces
    kubectl cluster-info

    kubeadm token create --print-join-command > index.html

    docker run -dit -p 80:80 -v ./index.html:/usr/share/nginx/html/index.html nginx

    cat ~/.kube/config > index-config.html

    docker run -dit -p 8080:80 -v ./index-config.html:/usr/share/nginx/html/index.html nginx



**worker.sh** (Setup to provision the worker nodes)

    #!/bin/bash

    JOIN_COMMAND=$(curl db1:80)

    echo "Following is the Join Command:- $JOIN_COMMAND"

    $JOIN_COMMAND

    curl db1:8080 > /etc/kubernetes/kubelet-admin.conf

    echo "KUBECONFIG=/etc/kubernetes/kubelet-admin.conf" >> /etc/environment

    source /etc/environment

Start the vms and provision them using setup.sh using the following command:

    vagrant up

ssh into the vms using the following command:

    vagrant ssh db1

Note: You might need to replace wlp9s0 with your own network interface along with the cidr ranges (here: 192.168.56.0/24).

Executing kubectl commands from worker nodes

By default your kubeconfig file of control node is not present in the worker node.
This means that you can only administer cluster from the control node.

In order to admin the cluster from the worker nodes as well, you will need to do the following configurations:-

Ensure that the kubectl config file from the control plane node is copied to the worker nodes.

Usually preset at ~/.kube/config on controller node.
Copy the control node kube config file data and paste it in a config file at the worker node.
Set the path using export KUBECONFIG=PATH/TO/YOUR/CONFIG (Can be located anywhere)

In my case I override /etc/kubernetes/kubelet.conf file with the kubeconfig of the control node.

Exposing the service (sample nginx application)

First download the sample manifest of nginx application created by nonanom.

   curl -lo nginx.yaml "https://gist.githubusercontent.com/nonanom/498b913a69cede7037d55e28bb00344e/raw"

nginx.yaml

    kind: Service
    apiVersion: v1
    metadata:
    name: nginx
    labels:
       app: nginx
    spec:
    selector:
       app: nginx
    ports:
    - port: 80
       protocol: TCP
       targetPort: 80
    type: ClusterIP
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: nginx
    labels:
       app: nginx
    spec:
    replicas: 1
    selector:
       matchLabels:
          app: nginx
    template:
       metadata:
          labels:
          app: nginx
       spec:
          containers:
          - name: nginx
          image: nginx:latest
          imagePullPolicy: Always
          ports:
          - containerPort: 80
             protocol: TCP

Deploy the manifest in the cluster

kubectl apply --filename nginx.yaml
Expose the service (Portforward the service)

kubectl port-forward service/nginx --address 0.0.0.0 8080:80

Note: the node from where you execute the command for exposing the service will be the node from which you can access the application service.

Reference: https://nonanom.medium.com/run-nginx-on-kubernetes-ee6ea937bc99

Github repo for reference

https://github.com/panchal-yash/vagrant-k8s-setup