<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Erika Sánchez-Femat</title>
    <description>The latest articles on DEV Community by Erika Sánchez-Femat (@helloerika__).</description>
    <link>https://dev.to/helloerika__</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3536078%2F625eaa80-80c3-470f-b57e-74356e5f529a.jpeg</url>
      <title>DEV Community: Erika Sánchez-Femat</title>
      <link>https://dev.to/helloerika__</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/helloerika__"/>
    <language>en</language>
    <item>
      <title>fastrad: GPU-Native Radiomics at 25x the Speed of PyRadiomics</title>
      <dc:creator>Erika Sánchez-Femat</dc:creator>
      <pubDate>Mon, 30 Mar 2026 23:43:36 +0000</pubDate>
      <link>https://dev.to/helloerika__/fastrad-gpu-native-radiomics-at-25x-the-speed-of-pyradiomics-3ha4</link>
      <guid>https://dev.to/helloerika__/fastrad-gpu-native-radiomics-at-25x-the-speed-of-pyradiomics-3ha4</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; We built &lt;code&gt;fastrad&lt;/code&gt;, a PyTorch-native Python library that extracts all 8 IBSI-standardised radiomic features from medical images 25× faster than PyRadiomics on GPU — with numerically identical results. It's open-source, pip-installable, and a drop-in replacement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastrad
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Problem: Radiomics is Slow
&lt;/h2&gt;

&lt;p&gt;Radiomics — the extraction of quantitative features from CT and MRI scans — is increasingly central to oncology research. Radiomic signatures have been used to predict treatment response, prognosis, and tumour phenotype across lung, head-and-neck, and many other cancer types.&lt;/p&gt;

&lt;p&gt;The standard tool for this is &lt;a href="https://github.com/AIM-Harvard/pyradiomics" rel="noopener noreferrer"&gt;PyRadiomics&lt;/a&gt;, developed at Dana-Farber / Brigham and Women's Hospital. It's robust, well-validated, and widely adopted. But it has one significant limitation: it runs entirely on CPU, and it's slow.&lt;/p&gt;

&lt;p&gt;On a modern 32-thread workstation, PyRadiomics takes &lt;strong&gt;~3 seconds per scan&lt;/strong&gt;. That might sound fine — until you're processing thousands of scans for a multi-cohort clinical study, or iterating rapidly over radiomic feature spaces in an ML pipeline. At that scale, extraction time becomes the bottleneck.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing fastrad
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;fastrad&lt;/code&gt; is a GPU-native Python library that reimplements the full PyRadiomics feature set as native PyTorch tensor operations. Everything — from DICOM ingestion to feature output — runs on &lt;code&gt;torch.Tensor&lt;/code&gt; objects, with transparent &lt;code&gt;auto&lt;/code&gt;, &lt;code&gt;cuda&lt;/code&gt;, and &lt;code&gt;cpu&lt;/code&gt; device routing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastrad&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RadiomicsFeatureExtractor&lt;/span&gt;

&lt;span class="n"&gt;extractor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RadiomicsFeatureExtractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# uses GPU if available
&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;extractor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mask_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API is intentionally familiar. If you've used PyRadiomics, there's nothing new to learn.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Covered
&lt;/h2&gt;

&lt;p&gt;fastrad implements all &lt;strong&gt;8 IBSI-standardised feature classes&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature Class&lt;/th&gt;
&lt;th&gt;Features&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;First-order statistics&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Intensity distribution: mean, entropy, kurtosis, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shape (3D)&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Volume, surface area, sphericity, compactness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shape (2D)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Per-slice axial shape descriptors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLCM&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;Grey-Level Co-occurrence Matrix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLRLM&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Grey-Level Run-Length Matrix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLSZM&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Grey-Level Size-Zone Matrix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLDM&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Grey-Level Dependence Matrix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NGTDM&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Neighbourhood Grey-Tone Difference Matrix&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is the &lt;strong&gt;complete&lt;/strong&gt; PyRadiomics feature set — not just the easy classes. Prior GPU-accelerated alternatives covered at most 2 of these 8 classes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;Benchmarked on an NVIDIA RTX 4070 Ti against PyRadiomics on a real NSCLC CT from the TCIA dataset:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Time (s)&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PyRadiomics (1 thread)&lt;/td&gt;
&lt;td&gt;2.90&lt;/td&gt;
&lt;td&gt;1×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyRadiomics (32 threads)&lt;/td&gt;
&lt;td&gt;2.90&lt;/td&gt;
&lt;td&gt;1×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fastrad CPU (1 thread)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.10&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.6×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fastrad GPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.116&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;25×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;PyRadiomics does not benefit from multi-threading at the feature computation level — 32 threads gives essentially no speedup over 1. fastrad single-thread CPU already outperforms it by 2.6×, and GPU extraction is 25× faster end-to-end.&lt;/p&gt;

&lt;p&gt;Per-class GPU speedups range from &lt;strong&gt;12.9× (GLRLM)&lt;/strong&gt; to &lt;strong&gt;49.3× (first-order)&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Class&lt;/th&gt;
&lt;th&gt;PyRadiomics (s)&lt;/th&gt;
&lt;th&gt;fastrad GPU (s)&lt;/th&gt;
&lt;th&gt;GPU speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;First-order&lt;/td&gt;
&lt;td&gt;0.408&lt;/td&gt;
&lt;td&gt;0.008&lt;/td&gt;
&lt;td&gt;49.3×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shape&lt;/td&gt;
&lt;td&gt;0.411&lt;/td&gt;
&lt;td&gt;0.012&lt;/td&gt;
&lt;td&gt;35.0×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLCM&lt;/td&gt;
&lt;td&gt;0.418&lt;/td&gt;
&lt;td&gt;0.021&lt;/td&gt;
&lt;td&gt;19.9×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLRLM&lt;/td&gt;
&lt;td&gt;0.414&lt;/td&gt;
&lt;td&gt;0.032&lt;/td&gt;
&lt;td&gt;12.9×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLSZM&lt;/td&gt;
&lt;td&gt;0.413&lt;/td&gt;
&lt;td&gt;0.018&lt;/td&gt;
&lt;td&gt;22.5×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLDM&lt;/td&gt;
&lt;td&gt;0.421&lt;/td&gt;
&lt;td&gt;0.011&lt;/td&gt;
&lt;td&gt;37.2×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NGTDM&lt;/td&gt;
&lt;td&gt;0.412&lt;/td&gt;
&lt;td&gt;0.013&lt;/td&gt;
&lt;td&gt;31.7×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At 0.116s per scan, a single RTX 4070 Ti can process approximately &lt;strong&gt;860 scans per minute&lt;/strong&gt; — enough to run a multi-site trial cohort in minutes rather than hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  Apple Silicon
&lt;/h3&gt;

&lt;p&gt;On an M3 MacBook Air (CPU-only), fastrad is &lt;strong&gt;3.56× faster&lt;/strong&gt; than PyRadiomics 8-thread, thanks to PyTorch's ARM NEON vectorisation.&lt;/p&gt;

&lt;h3&gt;
  
  
  ROI size scaling
&lt;/h3&gt;

&lt;p&gt;Speedup is maintained across all clinically relevant nodule sizes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Radius&lt;/th&gt;
&lt;th&gt;Voxels&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5 mm&lt;/td&gt;
&lt;td&gt;199&lt;/td&gt;
&lt;td&gt;25.9×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15 mm&lt;/td&gt;
&lt;td&gt;8,263&lt;/td&gt;
&lt;td&gt;18.9×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;30 mm&lt;/td&gt;
&lt;td&gt;67,461&lt;/td&gt;
&lt;td&gt;9.7×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Even at 30 mm — representative of large solid pulmonary nodules — fastrad GPU retains a &lt;strong&gt;9.7× advantage&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Numerical Validation
&lt;/h2&gt;

&lt;p&gt;Speed means nothing if the numbers are wrong. Radiomic features go into clinical research and ML models, so numerical correctness is non-negotiable.&lt;/p&gt;

&lt;h3&gt;
  
  
  IBSI Phase 1 compliance
&lt;/h3&gt;

&lt;p&gt;fastrad was validated against the Image Biomarker Standardisation Initiative (IBSI) Phase 1 digital phantom — the gold-standard compliance benchmark for radiomics tools. Across all 105 reference features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Maximum absolute relative deviation: 3.20 × 10⁻¹⁴%&lt;/strong&gt; (machine epsilon)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;0 features outside the 1% compliance threshold&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  PyRadiomics parity
&lt;/h3&gt;

&lt;p&gt;On a real NSCLC CT from the TCIA dataset, fastrad was compared feature-by-feature against PyRadiomics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;All 105 features agree to within 10⁻¹¹&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The tolerance threshold is 10⁻⁴ — fastrad is 7 orders of magnitude better&lt;/li&gt;
&lt;li&gt;0 features outside tolerance across all 7 feature classes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means models trained on PyRadiomics features can be applied directly to fastrad outputs &lt;strong&gt;without recalibration or retraining&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scan-rescan reproducibility
&lt;/h3&gt;

&lt;p&gt;Reproducibility was assessed on the RIDER Lung CT scan-rescan dataset (n=32 subjects, same-day repeat scans). ICC distributions were compared between fastrad and PyRadiomics via paired Wilcoxon signed-rank test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;W = 647, p = 0.411&lt;/strong&gt; — no statistically significant difference&lt;/li&gt;
&lt;li&gt;fastrad does not introduce additional scan-rescan variability&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architecture Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Everything is a tensor
&lt;/h3&gt;

&lt;p&gt;All computation in fastrad operates on &lt;code&gt;torch.Tensor&lt;/code&gt; objects. There is no NumPy roundtrip before your model — features stay on the GPU and can be passed directly into downstream PyTorch pipelines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Device routing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Automatic: uses GPU if available, silently falls back to CPU
&lt;/span&gt;&lt;span class="n"&gt;extractor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RadiomicsFeatureExtractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Explicit GPU: raises RuntimeError if CUDA unavailable
&lt;/span&gt;&lt;span class="n"&gt;extractor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RadiomicsFeatureExtractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# CPU-only
&lt;/span&gt;&lt;span class="n"&gt;extractor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RadiomicsFeatureExtractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Device resolution happens once at initialisation. Individual feature modules are entirely device-agnostic.&lt;/p&gt;

&lt;h3&gt;
  
  
  GLSZM: an algorithmic win
&lt;/h3&gt;

&lt;p&gt;The GLSZM class achieves its speedup through an algorithmic improvement rather than parallelisation alone. PyRadiomics passes the full image volume to &lt;code&gt;scipy.ndimage.label&lt;/code&gt; before discarding background labels. fastrad performs connected-component labelling on the &lt;strong&gt;bounding-box-cropped ROI only&lt;/strong&gt; — reducing the labelled volume by ~3 orders of magnitude for typical clinical nodule sizes. The result: 23.3× CPU speedup on GLSZM, exceeding several GPU-exclusive classes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory
&lt;/h3&gt;

&lt;p&gt;Peak VRAM for the full pipeline is &lt;strong&gt;654.78 MB&lt;/strong&gt; — within the capacity of any consumer GPU with ≥1 GB VRAM.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on CPU RAM:&lt;/strong&gt; fastrad materialises full intermediate tensor representations throughout the pipeline, resulting in higher CPU RAM usage than PyRadiomics for large ROIs (up to 11.4× at 30 mm). For typical clinical nodule sizes this is not a practical concern; a lazy-evaluation mode to address memory-constrained CPU deployments is planned.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Current Limitations
&lt;/h2&gt;

&lt;p&gt;We believe in being upfront about what fastrad doesn't yet do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DICOM only&lt;/strong&gt;: NIfTI and MetaImage formats are not currently supported. &lt;code&gt;nibabel&lt;/code&gt; integration is planned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU RAM&lt;/strong&gt;: Higher peak RAM than PyRadiomics for large ROIs under CPU-only execution (see above).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IBSI Phase 2&lt;/strong&gt;: Convolutional filter features (wavelets, LoG) are not yet implemented.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;CPU + GPU:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastrad[cuda]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CPU only:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastrad
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Requires Python ≥ 3.11. CUDA extras pin PyTorch to the CUDA 12.x index and add &lt;code&gt;cucim&lt;/code&gt; for GPU-accelerated connected-component labelling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reproducibility
&lt;/h2&gt;

&lt;p&gt;All benchmarks are fully reproducible. A Zenodo-archived reproducibility package containing the exact environment specification, benchmark scripts, and data retrieval instructions is deposited alongside the paper.&lt;/p&gt;

&lt;p&gt;Continuous integration runs the full validation test suite on CPU on every pull request via GitHub Actions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;📦 &lt;strong&gt;PyPI&lt;/strong&gt;: &lt;code&gt;pip install fastrad&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/helloerikaaa/fastrad" rel="noopener noreferrer"&gt;helloerikaaa/fastrad&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📄 &lt;strong&gt;Paper&lt;/strong&gt;: [link to preprint]&lt;/li&gt;
&lt;li&gt;🗄️ &lt;strong&gt;Reproducibility archive&lt;/strong&gt;: Zenodo [DOI to be assigned]&lt;/li&gt;
&lt;li&gt;📜 &lt;strong&gt;License&lt;/strong&gt;: Apache 2.0&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Citation
&lt;/h2&gt;

&lt;p&gt;If you use fastrad in your research, please cite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight bibtex"&gt;&lt;code&gt;&lt;span class="nc"&gt;@misc&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;sanchez-femat2025fastrad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{fastrad: Complete, IBSI-Validated GPU Acceleration of the Full PyRadiomics Feature Set}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;author&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{S{\'a}nchez-Femat, Erika and Celaya-Padilla, Jos{\'e}-Mar{\'i}a and Galvan-Tejada, Carlos Eric}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;year&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{2025}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;howpublished&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{SSRN}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;note&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{Available at SSRN: \url{https://ssrn.com/abstract=6436486}}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;doi&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{10.2139/ssrn.6436486}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;url&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;{https://dx.doi.org/10.2139/ssrn.6436486}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;em&gt;Contributions welcome — especially for NIfTI support, lazy-evaluation mode, and IBSI Phase 2 filter features. Open an issue or PR on GitHub.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>showdev</category>
      <category>api</category>
    </item>
    <item>
      <title>Radiomics in Breast Cancer – Part 1: Exploring the CBIS-DDSM Dataset</title>
      <dc:creator>Erika Sánchez-Femat</dc:creator>
      <pubDate>Tue, 30 Sep 2025 00:33:22 +0000</pubDate>
      <link>https://dev.to/helloerika__/radiomics-in-breast-cancer-part-1-exploring-the-cbis-ddsm-dataset-1ece</link>
      <guid>https://dev.to/helloerika__/radiomics-in-breast-cancer-part-1-exploring-the-cbis-ddsm-dataset-1ece</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;1. Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This article marks the &lt;strong&gt;first entry in a blog series&lt;/strong&gt; presenting the main projects from my PhD research on &lt;strong&gt;radiomics and breast cancer imaging&lt;/strong&gt;. The purpose is to disseminate my work in an accessible format while promoting &lt;strong&gt;open science&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Over the coming posts, I will outline the progression of my research: &lt;/p&gt;

&lt;p&gt;&lt;em&gt;dataset exploration → preprocessing → radiomics feature extraction → feature selection → ML benchmarking → interpretability.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This first post focuses on &lt;strong&gt;dataset exploration&lt;/strong&gt;, specifically the &lt;strong&gt;Curated Breast Imaging Subset of DDSM (CBIS-DDSM)&lt;/strong&gt;, widely regarded as a benchmark dataset for breast cancer imaging research.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Background on CBIS-DDSM&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Digital Database for Screening Mammography (DDSM)&lt;/strong&gt;, developed in the 1990s, was among the first large, publicly available collections of digitized mammograms. Its original structure posed limitations for contemporary Machine Learning applications.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;CBIS-DDSM&lt;/strong&gt;, released by the &lt;strong&gt;Cancer Imaging Archive (TCIA)&lt;/strong&gt;, is a curated and standardized subset with the following data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,566 patients&lt;/li&gt;
&lt;li&gt;2,620 mammography images&lt;/li&gt;
&lt;li&gt;Lesion annotations: &lt;em&gt;masses&lt;/em&gt; and &lt;em&gt;calcifications&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Two standard views per breast: &lt;em&gt;CC&lt;/em&gt; and &lt;em&gt;MLO&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Pathology labels: &lt;em&gt;Malignant, Benign, and Benign with Callback&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Figure 1 – Dataset Overview Diagram&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Insert a flowchart here showing patients → images (CC/MLO) → lesions (benign/malignant). This gives readers a quick visual understanding of the dataset structure.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Objectives of the Dataset Exploration&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The goal was to systematically assess:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Metadata&lt;/strong&gt; – patient age, lesion type, pathology, image view.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image characteristics&lt;/strong&gt; – resolution, contrast, file size.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Class distribution&lt;/strong&gt; – balance between benign and malignant cases and between lesion types.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This step was essential to design a robust preprocessing and analysis pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Findings&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Through systematic exploration of CBIS-DDSM, several critical insights emerged, each with direct implications for radiomics analysis and machine learning model development.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Lesion Types and Distribution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The dataset includes two primary lesion types: &lt;strong&gt;masses&lt;/strong&gt; and &lt;strong&gt;calcifications&lt;/strong&gt;. Masses are larger, localized abnormalities, while calcifications are tiny deposits of calcium that may indicate malignancy. Understanding this distribution is essential because each lesion type may require different preprocessing and feature extraction approaches.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkxa8pui47qmuh0cghus.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkxa8pui47qmuh0cghus.png" alt="Lesion Type Distribution Bar Chart" width="800" height="616"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1: Lesion Type Distribution Bar Chart&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The chart highlights that calcifications represent the majority of annotated lesions, indicating that models may naturally perform better on mass detection unless strategies are implemented to balance the contribution of calcifications.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pathology Labels: Benign vs Malignant&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Malignant lesions are substantially underrepresented compared to benign ones. This imbalance is critical because it can bias machine learning models toward overpredicting benign outcomes if not properly addressed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgix5qlzje4x51btewut9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgix5qlzje4x51btewut9.png" alt="Class Distribution Bar Chart" width="800" height="791"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: Class Distribution Bar Chart&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The chart clearly demonstrates the imbalance, where &lt;strong&gt;benign cases (including both “benign” and “benign with callback”) significantly outnumber malignant ones&lt;/strong&gt;. This imbalance arises after merging the two benign categories into a single class, which is necessary to simplify the analysis. However, it also means that models may be biased toward predicting benign outcomes. To address this, evaluation metrics such as &lt;strong&gt;ROC-AUC and sensitivity&lt;/strong&gt; are more appropriate than accuracy, since they better capture model performance on the underrepresented malignant cases.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sample Mammograms&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Examining actual images is critical to understand variability in imaging quality, lesion size, and annotation precision. Sample images also help communicate the nature of the dataset to readers who are less familiar with medical imaging.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hp2hd7ak16ftssd5x99.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hp2hd7ak16ftssd5x99.png" alt="Random sample of the mammograms" width="800" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3: Random sample of the mammograms&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Distribution Across Metadata Variables&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Beyond lesion type and pathology labels, it is critical to examine how cases are distributed across &lt;strong&gt;key imaging metadata variables&lt;/strong&gt;, including &lt;strong&gt;mammography view (CC vs MLO)&lt;/strong&gt;, &lt;strong&gt;laterality (left vs right breast)&lt;/strong&gt;, and &lt;strong&gt;lesion type (mass vs calcification)&lt;/strong&gt;. These variables reflect both the technical aspects of image acquisition and the biological characteristics of the breast. An unbalanced representation across them can introduce hidden biases into machine learning models, which may reduce their clinical applicability.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxp59xrwuqriersr2i76.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxp59xrwuqriersr2i76.png" alt="Class Distribution in different key metadata variable" width="800" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 4: Class Distribution in different key metadata variables&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To address this, &lt;em&gt;Figure 4&lt;/em&gt; presents three complementary charts that summarize the distribution of pathology labels across these variables.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;(Left) Pathology Distribution by Mammography View (CC vs MLO)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first chart compares benign and malignant cases across the two standard mammography projections: &lt;strong&gt;craniocaudal (CC)&lt;/strong&gt; and &lt;strong&gt;mediolateral oblique (MLO)&lt;/strong&gt;. While both views are routinely acquired in screening, the dataset shows a mild but notable imbalance: malignant cases are not equally represented in CC and MLO views.&lt;br&gt;
This finding is significant for two reasons. First, models may become inadvertently sensitive to projection-dependent features rather than lesion-specific characteristics, leading to overfitting on technical differences. Second, when evaluating algorithm performance, results may vary depending on whether the test set contains a higher proportion of CC or MLO images. Explicitly reporting view distribution is therefore essential for transparency and reproducibility in radiomics-based studies.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;(Center) Lesion Type Distribution by View&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The third chart investigates how &lt;strong&gt;lesion type (mass vs calcification)&lt;/strong&gt; is distributed across CC and MLO projections. This combined perspective is particularly relevant because it highlights subgroups that may be underrepresented in the dataset. For example, while benign masses are well represented in both CC and MLO views, certain subcategories—such as &lt;strong&gt;malignant calcifications in CC view&lt;/strong&gt;—are comparatively rare.&lt;br&gt;
This observation has critical implications. Models trained on such data may underperform in detecting rare but clinically important subgroups, not because the pathology is intrinsically more difficult to classify, but because of limited training samples. Furthermore, reporting global performance metrics without subgroup analysis could mask these deficiencies. Explicitly documenting subgroup imbalance encourages a more responsible interpretation of model results and highlights the need for either &lt;strong&gt;data augmentation&lt;/strong&gt; or &lt;strong&gt;specialized evaluation strategies&lt;/strong&gt; for minority subgroups.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;(Right) Pathology Distribution by Breast Side (Left vs Right)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second chart examines how benign and malignant cases are distributed across &lt;strong&gt;left and right breasts&lt;/strong&gt;. As expected, the dataset appears relatively balanced with respect to laterality, given that mammography protocols acquire both breasts in each exam. However, the malignant class remains underrepresented on both sides.&lt;br&gt;
Although laterality is not inherently expected to influence the biological likelihood of disease, it is worth noting that subtle technical differences (e.g., positioning, compression, or radiographer practice) could vary between sides. A balanced distribution minimizes the risk that models inadvertently learn from such laterality-related artifacts. Nevertheless, the overarching problem of &lt;strong&gt;class imbalance&lt;/strong&gt; persists across both sides, reinforcing the importance of prioritizing evaluation metrics such as ROC-AUC, sensitivity, and specificity over raw accuracy.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Challenges Identified&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The findings above naturally lead to several &lt;strong&gt;critical challenges&lt;/strong&gt;, which must be considered when designing ML pipelines or radiomics feature extraction protocols:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Class Imbalance&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Evidence:&lt;/strong&gt; Figure 2 illustrates a predominance of benign lesions over malignant ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implication:&lt;/strong&gt; Standard accuracy metrics are insufficient. Models must be evaluated with metrics sensitive to class imbalance (e.g., ROC-AUC, F1-score, sensitivity). Techniques such as resampling or class weighting may be necessary.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lesion Type Variation&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Evidence:&lt;/strong&gt; Figure 1 shows uneven distribution of masses versus calcifications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implication:&lt;/strong&gt; Feature extraction and ML models may require tailored approaches for each lesion type. For example, texture-based radiomics features may perform differently on masses compared to calcifications.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6. Relevance for Radiomics and Machine Learning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The exploration of CBIS-DDSM is not merely a preliminary step; it establishes the foundation for the entire radiomics and machine learning workflow. Each insight gained informs subsequent decisions and ensures that the models and features extracted are both &lt;strong&gt;robust and clinically meaningful&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Class Imbalance Awareness&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;The observed predominance of benign lesions (Figure 2) directly impacts model training. Without addressing this imbalance, ML models are likely to bias toward the majority class, producing inflated accuracy but poor detection of malignant lesions.&lt;/li&gt;
&lt;li&gt;This insight informed the decision to incorporate &lt;strong&gt;class weighting&lt;/strong&gt;, &lt;strong&gt;resampling techniques&lt;/strong&gt;, and &lt;strong&gt;sensitive evaluation metrics&lt;/strong&gt; (ROC-AUC, F1-score, sensitivity), ensuring that the model’s predictive performance reflects clinical relevance rather than statistical bias.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lesion Type Considerations&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Figure 1 demonstrates the uneven distribution of masses versus calcifications. Each lesion type presents distinct textural and morphological characteristics.&lt;/li&gt;
&lt;li&gt;Consequently, the feature extraction process (radiomics) must account for these differences. Certain features, such as texture or shape descriptors, may be more informative for one lesion type than another. This consideration guides both &lt;strong&gt;feature selection&lt;/strong&gt; and &lt;strong&gt;model interpretability&lt;/strong&gt;, ensuring that extracted radiomics features correspond to meaningful clinical phenomena.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implications for Feature Extraction and ML Model Design&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;A thorough understanding of these dataset characteristics allows for &lt;strong&gt;tailored preprocessing pipelines&lt;/strong&gt;, informed &lt;strong&gt;feature selection&lt;/strong&gt;, and appropriate &lt;strong&gt;model evaluation strategies&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Without this exploration, radiomics features could be biased, misrepresentative, or noisy, leading to suboptimal ML performance and reduced clinical interpretability.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In summary, the exploration stage bridges the gap between &lt;strong&gt;raw clinical data&lt;/strong&gt; and &lt;strong&gt;quantitative, analyzable features&lt;/strong&gt;. It ensures that all subsequent steps — from radiomics extraction to model training and interpretation — are grounded in a &lt;strong&gt;well-characterized, reliable dataset&lt;/strong&gt;, enhancing both &lt;strong&gt;scientific rigor&lt;/strong&gt; and &lt;strong&gt;clinical applicability&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;7. Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The exploration of CBIS-DDSM underscores the critical importance of &lt;strong&gt;systematic dataset characterization&lt;/strong&gt; in radiomics and machine learning research. Key lessons include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dataset richness and limitations&lt;/strong&gt;: CBIS-DDSM offers a valuable resource with thousands of annotated mammograms, yet presents challenges such as class imbalance, lesion variability, and image heterogeneity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact on downstream analysis&lt;/strong&gt;: Each observed feature of the dataset informs preprocessing, feature extraction, model design, and evaluation. Ignoring these factors can compromise both predictive performance and clinical relevance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foundation for reproducible research&lt;/strong&gt;: By carefully documenting dataset characteristics and exploration steps, other researchers can reproduce the pipeline and validate findings, in alignment with &lt;strong&gt;open science principles&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Next Steps in the Series&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This first post establishes a comprehensive understanding of the data that underpins all subsequent research. In &lt;strong&gt;Part 2&lt;/strong&gt;, I will detail &lt;strong&gt;preprocessing mammograms for radiomics analysis&lt;/strong&gt;, including steps for cleaning, normalizing, and preparing images for feature extraction.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>computervision</category>
      <category>ai</category>
      <category>science</category>
    </item>
  </channel>
</rss>
