DEV Community

Max Vyaznikov
Max Vyaznikov

Posted on

CUDA Compute Capability: What It Is and Why It Matters for ML Engineers

If you've ever seen an error like "CUDA error: no kernel image is available for execution on the device" or "minimum required Cuda capability is 3.5" — you've run into Compute Capability issues. Here's everything you need to know.

What Is Compute Capability?

CUDA Compute Capability (CC) is a version number assigned to every NVIDIA GPU that identifies its architecture and supported feature set. It's NOT a performance score.

Format: Major.Minor

  • Major = GPU architecture generation
  • Minor = incremental improvements within that generation
GeForce GTX 1080  → CC 6.1 (Pascal)
GeForce RTX 3090  → CC 8.6 (Ampere)
GeForce RTX 4090  → CC 8.9 (Ada Lovelace)
H100              → CC 9.0 (Hopper)
RTX 5090          → CC 10.0 (Blackwell)
Enter fullscreen mode Exit fullscreen mode

Why It Matters

1. Framework compatibility

Modern ML frameworks have minimum CC requirements:

Framework Minimum CC What's excluded
PyTorch 2.x 3.7 Kepler (K80), some Maxwell
TensorFlow 2.15+ 5.0 All Maxwell, Kepler
JAX latest 5.2 Same as TF
Flash Attention 2 8.0 Everything before Ampere

If your GPU's CC is below the minimum, the framework will not use it — you'll silently fall back to CPU or get a hard error.

2. Feature availability

Each CC level unlocks hardware features:

CC Architecture Key ML Features
5.0-5.2 Maxwell Basic CUDA, cuDNN
6.0-6.1 Pascal FP16 compute, unified memory
7.0 Volta Tensor Cores (1st gen), WMMA
7.5 Turing INT8/INT4 Tensor Cores, mixed precision
8.0 Ampere BF16, TF32, sparse Tensor Cores, 3rd gen
8.6 Ampere (consumer) Same features, fewer SMs
8.9 Ada Lovelace FP8, 4th gen Tensor Cores
9.0 Hopper Transformer Engine, FP8 matmul, DPX
10.0 Blackwell 5th gen Tensor Cores, FP4

3. Compilation targets

When you compile CUDA code (or when PyTorch ships prebuilt binaries), it targets specific CC versions:

# Compile for multiple architectures
nvcc -gencode arch=compute_80,code=sm_80 \
     -gencode arch=compute_86,code=sm_86 \
     -gencode arch=compute_89,code=sm_89 \
     my_kernel.cu
Enter fullscreen mode Exit fullscreen mode

PyTorch wheels on PyPI typically include CC 5.0, 6.0, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0. If your GPU isn't covered, you may need to build from source.

How to Check Your GPU's CC

nvidia-smi (easiest, no CUDA toolkit needed)

nvidia-smi --query-gpu=compute_cap --format=csv,noheader
# Output: 8.6
Enter fullscreen mode Exit fullscreen mode

Python (PyTorch)

import torch
major, minor = torch.cuda.get_device_capability()
print(f"Compute Capability: {major}.{minor}")
Enter fullscreen mode Exit fullscreen mode

Python (TensorFlow)

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
    details = tf.config.experimental.get_device_details(gpu)
    print(details.get('compute_capability'))
Enter fullscreen mode Exit fullscreen mode

C++ (CUDA Runtime)

cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
printf("CC: %d.%d\n", prop.major, prop.minor);
Enter fullscreen mode Exit fullscreen mode

Lookup table

Don't have the GPU installed yet? The CUDA Compute Capability table on gpuark.com covers every NVIDIA GPU from Kepler to Blackwell.

Common CC-Related Errors and Fixes

"no kernel image is available for execution on the device"

Your PyTorch/TensorFlow binary wasn't compiled for your GPU's CC. Fix:

# Install PyTorch with the right CUDA version
pip install torch --index-url https://download.pytorch.org/whl/cu124
Enter fullscreen mode Exit fullscreen mode

Or build from source with your CC:

TORCH_CUDA_ARCH_LIST="8.6" pip install torch --no-binary torch
Enter fullscreen mode Exit fullscreen mode

"minimum required Cuda capability is X.X"

Your GPU is too old for the framework version. Options:

  1. Use an older framework version
  2. Upgrade your GPU
  3. Use CPU mode: CUDA_VISIBLE_DEVICES="" python train.py

Flash Attention requires CC ≥ 8.0

Flash Attention 2 only works on Ampere (RTX 3000) and newer. For older GPUs:

# Use xformers instead (supports CC ≥ 6.0)
pip install xformers
# Or use PyTorch's built-in SDPA
from torch.nn.functional import scaled_dot_product_attention
Enter fullscreen mode Exit fullscreen mode

Practical Advice for GPU Shopping

When buying a GPU for ML:

  1. Minimum CC 7.5 (Turing) for mixed precision training — gives you Tensor Cores
  2. CC 8.0+ (Ampere) strongly recommended — BF16, Flash Attention, much better ML performance
  3. CC 8.9 (Ada) for bleeding-edge features like FP8 quantization-aware training
  4. VRAM matters more than CC in most cases — a 3090 (CC 8.6, 24GB) beats a 4070 (CC 8.9, 12GB) for LLMs

CC tells you what features your GPU supports. VRAM tells you how big a model fits. Both matter, but for LLM inference, VRAM is usually the bottleneck.


What GPU are you running your ML workloads on? Have you hit CC compatibility issues? Let me know in the comments!

Top comments (0)