If you've ever seen an error like "CUDA error: no kernel image is available for execution on the device" or "minimum required Cuda capability is 3.5" — you've run into Compute Capability issues. Here's everything you need to know.
What Is Compute Capability?
CUDA Compute Capability (CC) is a version number assigned to every NVIDIA GPU that identifies its architecture and supported feature set. It's NOT a performance score.
Format: Major.Minor
- Major = GPU architecture generation
- Minor = incremental improvements within that generation
GeForce GTX 1080 → CC 6.1 (Pascal)
GeForce RTX 3090 → CC 8.6 (Ampere)
GeForce RTX 4090 → CC 8.9 (Ada Lovelace)
H100 → CC 9.0 (Hopper)
RTX 5090 → CC 10.0 (Blackwell)
Why It Matters
1. Framework compatibility
Modern ML frameworks have minimum CC requirements:
| Framework | Minimum CC | What's excluded |
|---|---|---|
| PyTorch 2.x | 3.7 | Kepler (K80), some Maxwell |
| TensorFlow 2.15+ | 5.0 | All Maxwell, Kepler |
| JAX latest | 5.2 | Same as TF |
| Flash Attention 2 | 8.0 | Everything before Ampere |
If your GPU's CC is below the minimum, the framework will not use it — you'll silently fall back to CPU or get a hard error.
2. Feature availability
Each CC level unlocks hardware features:
| CC | Architecture | Key ML Features |
|---|---|---|
| 5.0-5.2 | Maxwell | Basic CUDA, cuDNN |
| 6.0-6.1 | Pascal | FP16 compute, unified memory |
| 7.0 | Volta | Tensor Cores (1st gen), WMMA |
| 7.5 | Turing | INT8/INT4 Tensor Cores, mixed precision |
| 8.0 | Ampere | BF16, TF32, sparse Tensor Cores, 3rd gen |
| 8.6 | Ampere (consumer) | Same features, fewer SMs |
| 8.9 | Ada Lovelace | FP8, 4th gen Tensor Cores |
| 9.0 | Hopper | Transformer Engine, FP8 matmul, DPX |
| 10.0 | Blackwell | 5th gen Tensor Cores, FP4 |
3. Compilation targets
When you compile CUDA code (or when PyTorch ships prebuilt binaries), it targets specific CC versions:
# Compile for multiple architectures
nvcc -gencode arch=compute_80,code=sm_80 \
-gencode arch=compute_86,code=sm_86 \
-gencode arch=compute_89,code=sm_89 \
my_kernel.cu
PyTorch wheels on PyPI typically include CC 5.0, 6.0, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0. If your GPU isn't covered, you may need to build from source.
How to Check Your GPU's CC
nvidia-smi (easiest, no CUDA toolkit needed)
nvidia-smi --query-gpu=compute_cap --format=csv,noheader
# Output: 8.6
Python (PyTorch)
import torch
major, minor = torch.cuda.get_device_capability()
print(f"Compute Capability: {major}.{minor}")
Python (TensorFlow)
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
details = tf.config.experimental.get_device_details(gpu)
print(details.get('compute_capability'))
C++ (CUDA Runtime)
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
printf("CC: %d.%d\n", prop.major, prop.minor);
Lookup table
Don't have the GPU installed yet? The CUDA Compute Capability table on gpuark.com covers every NVIDIA GPU from Kepler to Blackwell.
Common CC-Related Errors and Fixes
"no kernel image is available for execution on the device"
Your PyTorch/TensorFlow binary wasn't compiled for your GPU's CC. Fix:
# Install PyTorch with the right CUDA version
pip install torch --index-url https://download.pytorch.org/whl/cu124
Or build from source with your CC:
TORCH_CUDA_ARCH_LIST="8.6" pip install torch --no-binary torch
"minimum required Cuda capability is X.X"
Your GPU is too old for the framework version. Options:
- Use an older framework version
- Upgrade your GPU
- Use CPU mode:
CUDA_VISIBLE_DEVICES="" python train.py
Flash Attention requires CC ≥ 8.0
Flash Attention 2 only works on Ampere (RTX 3000) and newer. For older GPUs:
# Use xformers instead (supports CC ≥ 6.0)
pip install xformers
# Or use PyTorch's built-in SDPA
from torch.nn.functional import scaled_dot_product_attention
Practical Advice for GPU Shopping
When buying a GPU for ML:
- Minimum CC 7.5 (Turing) for mixed precision training — gives you Tensor Cores
- CC 8.0+ (Ampere) strongly recommended — BF16, Flash Attention, much better ML performance
- CC 8.9 (Ada) for bleeding-edge features like FP8 quantization-aware training
- VRAM matters more than CC in most cases — a 3090 (CC 8.6, 24GB) beats a 4070 (CC 8.9, 12GB) for LLMs
CC tells you what features your GPU supports. VRAM tells you how big a model fits. Both matter, but for LLM inference, VRAM is usually the bottleneck.
What GPU are you running your ML workloads on? Have you hit CC compatibility issues? Let me know in the comments!
Top comments (0)