TL;DR: We built fastrad, a PyTorch-native Python library that extracts all 8 IBSI-standardised radiomic features from medical images 25× faster than PyRadiomics on GPU — with numerically identical results. It's open-source, pip-installable, and a drop-in replacement.
pip install fastrad
The Problem: Radiomics is Slow
Radiomics — the extraction of quantitative features from CT and MRI scans — is increasingly central to oncology research. Radiomic signatures have been used to predict treatment response, prognosis, and tumour phenotype across lung, head-and-neck, and many other cancer types.
The standard tool for this is PyRadiomics, developed at Dana-Farber / Brigham and Women's Hospital. It's robust, well-validated, and widely adopted. But it has one significant limitation: it runs entirely on CPU, and it's slow.
On a modern 32-thread workstation, PyRadiomics takes ~3 seconds per scan. That might sound fine — until you're processing thousands of scans for a multi-cohort clinical study, or iterating rapidly over radiomic feature spaces in an ML pipeline. At that scale, extraction time becomes the bottleneck.
Introducing fastrad
fastrad is a GPU-native Python library that reimplements the full PyRadiomics feature set as native PyTorch tensor operations. Everything — from DICOM ingestion to feature output — runs on torch.Tensor objects, with transparent auto, cuda, and cpu device routing.
from fastrad import RadiomicsFeatureExtractor
extractor = RadiomicsFeatureExtractor(device="auto") # uses GPU if available
features = extractor.execute(image_path, mask_path)
The API is intentionally familiar. If you've used PyRadiomics, there's nothing new to learn.
What's Covered
fastrad implements all 8 IBSI-standardised feature classes:
| Feature Class | Features | Description |
|---|---|---|
| First-order statistics | 18 | Intensity distribution: mean, entropy, kurtosis, etc. |
| Shape (3D) | 14 | Volume, surface area, sphericity, compactness |
| Shape (2D) | — | Per-slice axial shape descriptors |
| GLCM | 24 | Grey-Level Co-occurrence Matrix |
| GLRLM | 16 | Grey-Level Run-Length Matrix |
| GLSZM | 16 | Grey-Level Size-Zone Matrix |
| GLDM | 14 | Grey-Level Dependence Matrix |
| NGTDM | 5 | Neighbourhood Grey-Tone Difference Matrix |
This is the complete PyRadiomics feature set — not just the easy classes. Prior GPU-accelerated alternatives covered at most 2 of these 8 classes.
Performance
Benchmarked on an NVIDIA RTX 4070 Ti against PyRadiomics on a real NSCLC CT from the TCIA dataset:
| Configuration | Time (s) | Speedup |
|---|---|---|
| PyRadiomics (1 thread) | 2.90 | 1× |
| PyRadiomics (32 threads) | 2.90 | 1× |
| fastrad CPU (1 thread) | 1.10 | 2.6× |
| fastrad GPU | 0.116 | 25× |
PyRadiomics does not benefit from multi-threading at the feature computation level — 32 threads gives essentially no speedup over 1. fastrad single-thread CPU already outperforms it by 2.6×, and GPU extraction is 25× faster end-to-end.
Per-class GPU speedups range from 12.9× (GLRLM) to 49.3× (first-order):
| Class | PyRadiomics (s) | fastrad GPU (s) | GPU speedup |
|---|---|---|---|
| First-order | 0.408 | 0.008 | 49.3× |
| Shape | 0.411 | 0.012 | 35.0× |
| GLCM | 0.418 | 0.021 | 19.9× |
| GLRLM | 0.414 | 0.032 | 12.9× |
| GLSZM | 0.413 | 0.018 | 22.5× |
| GLDM | 0.421 | 0.011 | 37.2× |
| NGTDM | 0.412 | 0.013 | 31.7× |
At 0.116s per scan, a single RTX 4070 Ti can process approximately 860 scans per minute — enough to run a multi-site trial cohort in minutes rather than hours.
Apple Silicon
On an M3 MacBook Air (CPU-only), fastrad is 3.56× faster than PyRadiomics 8-thread, thanks to PyTorch's ARM NEON vectorisation.
ROI size scaling
Speedup is maintained across all clinically relevant nodule sizes:
| Radius | Voxels | Speedup |
|---|---|---|
| 5 mm | 199 | 25.9× |
| 15 mm | 8,263 | 18.9× |
| 30 mm | 67,461 | 9.7× |
Even at 30 mm — representative of large solid pulmonary nodules — fastrad GPU retains a 9.7× advantage.
Numerical Validation
Speed means nothing if the numbers are wrong. Radiomic features go into clinical research and ML models, so numerical correctness is non-negotiable.
IBSI Phase 1 compliance
fastrad was validated against the Image Biomarker Standardisation Initiative (IBSI) Phase 1 digital phantom — the gold-standard compliance benchmark for radiomics tools. Across all 105 reference features:
- Maximum absolute relative deviation: 3.20 × 10⁻¹⁴% (machine epsilon)
- 0 features outside the 1% compliance threshold
PyRadiomics parity
On a real NSCLC CT from the TCIA dataset, fastrad was compared feature-by-feature against PyRadiomics:
- All 105 features agree to within 10⁻¹¹
- The tolerance threshold is 10⁻⁴ — fastrad is 7 orders of magnitude better
- 0 features outside tolerance across all 7 feature classes
This means models trained on PyRadiomics features can be applied directly to fastrad outputs without recalibration or retraining.
Scan-rescan reproducibility
Reproducibility was assessed on the RIDER Lung CT scan-rescan dataset (n=32 subjects, same-day repeat scans). ICC distributions were compared between fastrad and PyRadiomics via paired Wilcoxon signed-rank test:
- W = 647, p = 0.411 — no statistically significant difference
- fastrad does not introduce additional scan-rescan variability
Architecture Highlights
Everything is a tensor
All computation in fastrad operates on torch.Tensor objects. There is no NumPy roundtrip before your model — features stay on the GPU and can be passed directly into downstream PyTorch pipelines.
Device routing
# Automatic: uses GPU if available, silently falls back to CPU
extractor = RadiomicsFeatureExtractor(device="auto")
# Explicit GPU: raises RuntimeError if CUDA unavailable
extractor = RadiomicsFeatureExtractor(device="cuda")
# CPU-only
extractor = RadiomicsFeatureExtractor(device="cpu")
Device resolution happens once at initialisation. Individual feature modules are entirely device-agnostic.
GLSZM: an algorithmic win
The GLSZM class achieves its speedup through an algorithmic improvement rather than parallelisation alone. PyRadiomics passes the full image volume to scipy.ndimage.label before discarding background labels. fastrad performs connected-component labelling on the bounding-box-cropped ROI only — reducing the labelled volume by ~3 orders of magnitude for typical clinical nodule sizes. The result: 23.3× CPU speedup on GLSZM, exceeding several GPU-exclusive classes.
Memory
Peak VRAM for the full pipeline is 654.78 MB — within the capacity of any consumer GPU with ≥1 GB VRAM.
Note on CPU RAM: fastrad materialises full intermediate tensor representations throughout the pipeline, resulting in higher CPU RAM usage than PyRadiomics for large ROIs (up to 11.4× at 30 mm). For typical clinical nodule sizes this is not a practical concern; a lazy-evaluation mode to address memory-constrained CPU deployments is planned.
Current Limitations
We believe in being upfront about what fastrad doesn't yet do:
-
DICOM only: NIfTI and MetaImage formats are not currently supported.
nibabelintegration is planned. - CPU RAM: Higher peak RAM than PyRadiomics for large ROIs under CPU-only execution (see above).
- IBSI Phase 2: Convolutional filter features (wavelets, LoG) are not yet implemented.
Installation
CPU + GPU:
pip install fastrad[cuda]
CPU only:
pip install fastrad
Requires Python ≥ 3.11. CUDA extras pin PyTorch to the CUDA 12.x index and add cucim for GPU-accelerated connected-component labelling.
Reproducibility
All benchmarks are fully reproducible. A Zenodo-archived reproducibility package containing the exact environment specification, benchmark scripts, and data retrieval instructions is deposited alongside the paper.
Continuous integration runs the full validation test suite on CPU on every pull request via GitHub Actions.
Links
- 📦 PyPI:
pip install fastrad - 💻 GitHub: helloerikaaa/fastrad
- 📄 Paper: [link to preprint]
- 🗄️ Reproducibility archive: Zenodo [DOI to be assigned]
- 📜 License: Apache 2.0
Citation
If you use fastrad in your research, please cite:
@misc{sanchez-femat2025fastrad,
title = {fastrad: Complete, IBSI-Validated GPU Acceleration of the Full PyRadiomics Feature Set},
author = {S{\'a}nchez-Femat, Erika and Celaya-Padilla, Jos{\'e}-Mar{\'i}a and Galvan-Tejada, Carlos Eric},
year = {2025},
howpublished = {SSRN},
note = {Available at SSRN: \url{https://ssrn.com/abstract=6436486}},
doi = {10.2139/ssrn.6436486},
url = {https://dx.doi.org/10.2139/ssrn.6436486}
}
Contributions welcome — especially for NIfTI support, lazy-evaluation mode, and IBSI Phase 2 filter features. Open an issue or PR on GitHub.
Top comments (0)