Max Vyaznikov

Posted on Mar 12

CUDA Compute Capability: What It Is and Why It Matters for ML Engineers

#nvidia

If you've ever seen an error like "CUDA error: no kernel image is available for execution on the device" or "minimum required Cuda capability is 3.5" — you've run into Compute Capability issues. Here's everything you need to know.

What Is Compute Capability?

CUDA Compute Capability (CC) is a version number assigned to every NVIDIA GPU that identifies its architecture and supported feature set. It's NOT a performance score.

Format: Major.Minor

Major = GPU architecture generation
Minor = incremental improvements within that generation

GeForce GTX 1080  → CC 6.1 (Pascal)
GeForce RTX 3090  → CC 8.6 (Ampere)
GeForce RTX 4090  → CC 8.9 (Ada Lovelace)
H100              → CC 9.0 (Hopper)
RTX 5090          → CC 10.0 (Blackwell)

Why It Matters

1. Framework compatibility

Modern ML frameworks have minimum CC requirements:

Framework	Minimum CC	What's excluded
PyTorch 2.x	3.7	Kepler (K80), some Maxwell
TensorFlow 2.15+	5.0	All Maxwell, Kepler
JAX latest	5.2	Same as TF
Flash Attention 2	8.0	Everything before Ampere

If your GPU's CC is below the minimum, the framework will not use it — you'll silently fall back to CPU or get a hard error.

2. Feature availability

Each CC level unlocks hardware features:

CC	Architecture	Key ML Features
5.0-5.2	Maxwell	Basic CUDA, cuDNN
6.0-6.1	Pascal	FP16 compute, unified memory
7.0	Volta	Tensor Cores (1st gen), WMMA
7.5	Turing	INT8/INT4 Tensor Cores, mixed precision
8.0	Ampere	BF16, TF32, sparse Tensor Cores, 3rd gen
8.6	Ampere (consumer)	Same features, fewer SMs
8.9	Ada Lovelace	FP8, 4th gen Tensor Cores
9.0	Hopper	Transformer Engine, FP8 matmul, DPX
10.0	Blackwell	5th gen Tensor Cores, FP4

3. Compilation targets

When you compile CUDA code (or when PyTorch ships prebuilt binaries), it targets specific CC versions:

# Compile for multiple architectures
nvcc -gencode arch=compute_80,code=sm_80 \
     -gencode arch=compute_86,code=sm_86 \
     -gencode arch=compute_89,code=sm_89 \
     my_kernel.cu

PyTorch wheels on PyPI typically include CC 5.0, 6.0, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0. If your GPU isn't covered, you may need to build from source.

How to Check Your GPU's CC

nvidia-smi (easiest, no CUDA toolkit needed)

nvidia-smi --query-gpu=compute_cap --format=csv,noheader
# Output: 8.6

Python (PyTorch)

import torch
major, minor = torch.cuda.get_device_capability()
print(f"Compute Capability: {major}.{minor}")

Python (TensorFlow)

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
    details = tf.config.experimental.get_device_details(gpu)
    print(details.get('compute_capability'))

C++ (CUDA Runtime)

cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
printf("CC: %d.%d\n", prop.major, prop.minor);

Lookup table

Don't have the GPU installed yet? The CUDA Compute Capability table on gpuark.com covers every NVIDIA GPU from Kepler to Blackwell.

Common CC-Related Errors and Fixes

"no kernel image is available for execution on the device"

Your PyTorch/TensorFlow binary wasn't compiled for your GPU's CC. Fix:

# Install PyTorch with the right CUDA version
pip install torch --index-url https://download.pytorch.org/whl/cu124

Or build from source with your CC:

TORCH_CUDA_ARCH_LIST="8.6" pip install torch --no-binary torch

"minimum required Cuda capability is X.X"

Your GPU is too old for the framework version. Options:

Use an older framework version
Upgrade your GPU
Use CPU mode: CUDA_VISIBLE_DEVICES="" python train.py

Flash Attention requires CC ≥ 8.0

Flash Attention 2 only works on Ampere (RTX 3000) and newer. For older GPUs:

# Use xformers instead (supports CC ≥ 6.0)
pip install xformers
# Or use PyTorch's built-in SDPA
from torch.nn.functional import scaled_dot_product_attention

Practical Advice for GPU Shopping

When buying a GPU for ML:

Minimum CC 7.5 (Turing) for mixed precision training — gives you Tensor Cores
CC 8.0+ (Ampere) strongly recommended — BF16, Flash Attention, much better ML performance
CC 8.9 (Ada) for bleeding-edge features like FP8 quantization-aware training
VRAM matters more than CC in most cases — a 3090 (CC 8.6, 24GB) beats a 4070 (CC 8.9, 12GB) for LLMs

CC tells you what features your GPU supports. VRAM tells you how big a model fits. Both matter, but for LLM inference, VRAM is usually the bottleneck.

What GPU are you running your ML workloads on? Have you hit CC compatibility issues? Let me know in the comments!

DEV Community