Erika Sánchez-Femat

Posted on Mar 30

fastrad: GPU-Native Radiomics at 25x the Speed of PyRadiomics

#opensource #python #showdev #api

TL;DR: We built fastrad, a PyTorch-native Python library that extracts all 8 IBSI-standardised radiomic features from medical images 25× faster than PyRadiomics on GPU — with numerically identical results. It's open-source, pip-installable, and a drop-in replacement.

pip install fastrad

The Problem: Radiomics is Slow

Radiomics — the extraction of quantitative features from CT and MRI scans — is increasingly central to oncology research. Radiomic signatures have been used to predict treatment response, prognosis, and tumour phenotype across lung, head-and-neck, and many other cancer types.

The standard tool for this is PyRadiomics, developed at Dana-Farber / Brigham and Women's Hospital. It's robust, well-validated, and widely adopted. But it has one significant limitation: it runs entirely on CPU, and it's slow.

On a modern 32-thread workstation, PyRadiomics takes ~3 seconds per scan. That might sound fine — until you're processing thousands of scans for a multi-cohort clinical study, or iterating rapidly over radiomic feature spaces in an ML pipeline. At that scale, extraction time becomes the bottleneck.

Introducing fastrad

fastrad is a GPU-native Python library that reimplements the full PyRadiomics feature set as native PyTorch tensor operations. Everything — from DICOM ingestion to feature output — runs on torch.Tensor objects, with transparent auto, cuda, and cpu device routing.

from fastrad import RadiomicsFeatureExtractor

extractor = RadiomicsFeatureExtractor(device="auto")  # uses GPU if available
features = extractor.execute(image_path, mask_path)

The API is intentionally familiar. If you've used PyRadiomics, there's nothing new to learn.

What's Covered

fastrad implements all 8 IBSI-standardised feature classes:

Feature Class	Features	Description
First-order statistics	18	Intensity distribution: mean, entropy, kurtosis, etc.
Shape (3D)	14	Volume, surface area, sphericity, compactness
Shape (2D)	—	Per-slice axial shape descriptors
GLCM	24	Grey-Level Co-occurrence Matrix
GLRLM	16	Grey-Level Run-Length Matrix
GLSZM	16	Grey-Level Size-Zone Matrix
GLDM	14	Grey-Level Dependence Matrix
NGTDM	5	Neighbourhood Grey-Tone Difference Matrix

This is the complete PyRadiomics feature set — not just the easy classes. Prior GPU-accelerated alternatives covered at most 2 of these 8 classes.

Performance

Benchmarked on an NVIDIA RTX 4070 Ti against PyRadiomics on a real NSCLC CT from the TCIA dataset:

Configuration	Time (s)	Speedup
PyRadiomics (1 thread)	2.90	1×
PyRadiomics (32 threads)	2.90	1×
fastrad CPU (1 thread)	1.10	2.6×
fastrad GPU	0.116	25×

PyRadiomics does not benefit from multi-threading at the feature computation level — 32 threads gives essentially no speedup over 1. fastrad single-thread CPU already outperforms it by 2.6×, and GPU extraction is 25× faster end-to-end.

Per-class GPU speedups range from 12.9× (GLRLM) to 49.3× (first-order):

Class	PyRadiomics (s)	fastrad GPU (s)	GPU speedup
First-order	0.408	0.008	49.3×
Shape	0.411	0.012	35.0×
GLCM	0.418	0.021	19.9×
GLRLM	0.414	0.032	12.9×
GLSZM	0.413	0.018	22.5×
GLDM	0.421	0.011	37.2×
NGTDM	0.412	0.013	31.7×

At 0.116s per scan, a single RTX 4070 Ti can process approximately 860 scans per minute — enough to run a multi-site trial cohort in minutes rather than hours.

Apple Silicon

On an M3 MacBook Air (CPU-only), fastrad is 3.56× faster than PyRadiomics 8-thread, thanks to PyTorch's ARM NEON vectorisation.

ROI size scaling

Speedup is maintained across all clinically relevant nodule sizes:

Radius	Voxels	Speedup
5 mm	199	25.9×
15 mm	8,263	18.9×
30 mm	67,461	9.7×

Even at 30 mm — representative of large solid pulmonary nodules — fastrad GPU retains a 9.7× advantage.

Numerical Validation

Speed means nothing if the numbers are wrong. Radiomic features go into clinical research and ML models, so numerical correctness is non-negotiable.

IBSI Phase 1 compliance

fastrad was validated against the Image Biomarker Standardisation Initiative (IBSI) Phase 1 digital phantom — the gold-standard compliance benchmark for radiomics tools. Across all 105 reference features:

Maximum absolute relative deviation: 3.20 × 10⁻¹⁴% (machine epsilon)
0 features outside the 1% compliance threshold

PyRadiomics parity

On a real NSCLC CT from the TCIA dataset, fastrad was compared feature-by-feature against PyRadiomics:

All 105 features agree to within 10⁻¹¹
The tolerance threshold is 10⁻⁴ — fastrad is 7 orders of magnitude better
0 features outside tolerance across all 7 feature classes

This means models trained on PyRadiomics features can be applied directly to fastrad outputs without recalibration or retraining.

Scan-rescan reproducibility

Reproducibility was assessed on the RIDER Lung CT scan-rescan dataset (n=32 subjects, same-day repeat scans). ICC distributions were compared between fastrad and PyRadiomics via paired Wilcoxon signed-rank test:

W = 647, p = 0.411 — no statistically significant difference
fastrad does not introduce additional scan-rescan variability

Architecture Highlights

Everything is a tensor

All computation in fastrad operates on torch.Tensor objects. There is no NumPy roundtrip before your model — features stay on the GPU and can be passed directly into downstream PyTorch pipelines.

Device routing

# Automatic: uses GPU if available, silently falls back to CPU
extractor = RadiomicsFeatureExtractor(device="auto")

# Explicit GPU: raises RuntimeError if CUDA unavailable
extractor = RadiomicsFeatureExtractor(device="cuda")

# CPU-only
extractor = RadiomicsFeatureExtractor(device="cpu")

Device resolution happens once at initialisation. Individual feature modules are entirely device-agnostic.

GLSZM: an algorithmic win

The GLSZM class achieves its speedup through an algorithmic improvement rather than parallelisation alone. PyRadiomics passes the full image volume to scipy.ndimage.label before discarding background labels. fastrad performs connected-component labelling on the bounding-box-cropped ROI only — reducing the labelled volume by ~3 orders of magnitude for typical clinical nodule sizes. The result: 23.3× CPU speedup on GLSZM, exceeding several GPU-exclusive classes.

Memory

Peak VRAM for the full pipeline is 654.78 MB — within the capacity of any consumer GPU with ≥1 GB VRAM.

Note on CPU RAM: fastrad materialises full intermediate tensor representations throughout the pipeline, resulting in higher CPU RAM usage than PyRadiomics for large ROIs (up to 11.4× at 30 mm). For typical clinical nodule sizes this is not a practical concern; a lazy-evaluation mode to address memory-constrained CPU deployments is planned.

Current Limitations

We believe in being upfront about what fastrad doesn't yet do:

DICOM only: NIfTI and MetaImage formats are not currently supported. nibabel integration is planned.
CPU RAM: Higher peak RAM than PyRadiomics for large ROIs under CPU-only execution (see above).
IBSI Phase 2: Convolutional filter features (wavelets, LoG) are not yet implemented.

Installation

CPU + GPU:

pip install fastrad[cuda]

CPU only:

pip install fastrad

Requires Python ≥ 3.11. CUDA extras pin PyTorch to the CUDA 12.x index and add cucim for GPU-accelerated connected-component labelling.

Reproducibility

All benchmarks are fully reproducible. A Zenodo-archived reproducibility package containing the exact environment specification, benchmark scripts, and data retrieval instructions is deposited alongside the paper.

Continuous integration runs the full validation test suite on CPU on every pull request via GitHub Actions.

Citation

If you use fastrad in your research, please cite:

@misc{sanchez-femat2025fastrad,
  title     = {fastrad: Complete, IBSI-Validated GPU Acceleration of the Full PyRadiomics Feature Set},
  author    = {S{\'a}nchez-Femat, Erika and Celaya-Padilla, Jos{\'e}-Mar{\'i}a and Galvan-Tejada, Carlos Eric},
  year      = {2025},
  howpublished = {SSRN},
  note      = {Available at SSRN: \url{https://ssrn.com/abstract=6436486}},
  doi       = {10.2139/ssrn.6436486},
  url       = {https://dx.doi.org/10.2139/ssrn.6436486}
}

Contributions welcome — especially for NIfTI support, lazy-evaluation mode, and IBSI Phase 2 filter features. Open an issue or PR on GitHub.

DEV Community