DEV Community

Cover image for Albumentations in Life Sciences: Who Actually Uses It
Vladimir Iglovikov
Vladimir Iglovikov

Posted on

Albumentations in Life Sciences: Who Actually Uses It

Albumentations is shared image-augmentation infrastructure for life-sciences AI. It shows up in radiology, histopathology, microscopy, endoscopy, ophthalmology, infectious-disease imaging, neuroscience imaging, and cell-analysis workflows.

This post is the receipts: how many life-sciences papers cite it, which OSS library declares it as a direct dependency, which named organizations import it in public repositories, and where it appears in public Hugging Face model and dataset cards.

All numbers below come from an internal evidence pipeline over public sources: citation metadata, GitHub Code Search, the Hugging Face Hub, and root-level packaging files (requirements.txt, pyproject.toml, etc.) in each OSS repo. The derived CSVs used for this audit are not published with the blog post, so treat the tables as an evidence brief rather than a fully self-contained replication package. The org-scoped GitHub query is org:<name> "import albumentations".

Headline

  • 563 life-sciences papers cite Albumentations
  • 1 OSS life-sciences library declares it as a direct dependency
  • 12 public repositories across 3 named life-sciences organizations import it
  • 33 Hugging Face artifacts in the life-sciences / biomedical-imaging tag space reference it

"Albumentations" here means the project stewarded by Albumentations LLC: the legacy MIT albumentations package (archived June 2025) plus the maintained successor albumentationsx (AGPL-3.0 + Commercial), which preserves API compatibility. See the dual-licensing post for context.

This is broader than the earlier medical-imaging audit. "Life sciences" includes clinical imaging, but also bioimage analysis, microscopy, cell biology, infectious-disease imaging, neuroscience imaging, and high-content screening.

Why Life Sciences Pulls in an Augmentation Library at All

Life-sciences image data is messy in a very specific way. It is not just "photos, but harder." A training sample might be a pathology tile, a fluorescence microscopy stack, a phase-contrast video frame, an OCT slice, a retinal image, a CT patch, a bacterial colony image, a cell mask, a polyp box, a landmarked organ view, or a multichannel assay plate.

Three details make augmentation infrastructure matter:

  1. Labels and images have to move together. Masks for nuclei, organs, lesions, cells, plaques, cysts, vessels, and tissue regions have to stay pixel-aligned with the image. The same is true for bounding boxes and keypoints. Albumentations is built around Compose over (image, mask, bboxes, keypoints), which is why it appears in segmentation, detection, and measurement pipelines rather than only in image-classification scripts.
  2. The valid invariances are biological and clinical, not generic. A square symmetry can be reasonable for histology tiles or microscopy crops. A horizontal flip can be wrong for laterality-sensitive radiology, ophthalmology, or surgical-orientation tasks. Brightness and contrast jitter may model staining, illumination, or scanner variation, but it is not a substitute for physics-aware acquisition modeling. The library gives you the mechanism; the domain decides what variation preserves the label.
  3. Multichannel throughput matters. Life-sciences data often goes beyond RGB: fluorescence channels, CT window stacks, multispectral microscopy, derived masks, and auxiliary channels. Augmentation usually runs CPU-side inside a data loader and has to feed the GPU. In the current 9-channel CPU benchmark, AlbumentationsX is fastest on 30 of 42 transforms, with pairwise wins on 33 of 41 transforms vs Kornia and 15 of 23 transforms vs Torchvision. That benchmark is not a biomedical benchmark by itself, but the arbitrary-channel constraint is directly relevant to life-sciences workflows.

Concretely, a conservative microscopy or pathology segmentation pipeline can look like this:

import albumentations as A
import numpy as np

image = np.load("microscopy_tile.npy")
mask = np.load("cell_mask.npy")

transform = A.Compose([
    A.RandomCrop(height=512, width=512),
    A.SquareSymmetry(p=1.0),
    A.Affine(
        scale=(0.9, 1.1),
        translate_percent=(-0.03, 0.03),
        rotate=(-10, 10),
        shear=(-3, 3),
        p=0.5,
    ),
    A.RandomBrightnessContrast(
        brightness_range=(-0.08, 0.08),
        contrast_range=(-0.08, 0.08),
        p=0.4,
    ),
    A.GaussNoise(std_range=(0.01, 0.04), p=0.2),
])

out = transform(image=image, mask=mask)
tile, label = out["image"], out["mask"]
Enter fullscreen mode Exit fullscreen mode

In order, that pipeline is RandomCrop -> SquareSymmetry -> Affine -> RandomBrightnessContrast -> GaussNoise. For tissue patches, microscopy tiles, or cell-imaging crops, square symmetries are often defensible because there is no canonical camera-up direction. For chest X-ray, retinal laterality, surgical views, or acquisition-protocol-sensitive tasks, the same transform can be a bug.

The same Compose pipeline would also accept bboxes=... and keypoints=... and keep them aligned.

OSS Life-Sciences Libraries That Depend on Albumentations

These are repository-rooted facts. The dependency is declared in packaging files, not inferred from a citation graph or README mention.

Of 18 verified life-sciences OSS projects, 1 project declares albumentations as a direct dependency:

Library Org Evidence file(s) Repo
TIAToolbox Tissue Image Analytics Centre requirements/requirements.txt TissueImageAnalytics/tiatoolbox

TIAToolbox matters because it is a reusable pathology toolkit, not a one-off experiment repository. Direct dependency counts are conservative by design. They miss internal pharmaceutical, hospital, biotechnology, and research pipelines, plus public repositories that import Albumentations in training scripts without packaging it as a reusable library.

Named Life-Sciences Organizations Using It

Org-scoped GitHub Code Search (org:<name> "import albumentations") found import albumentations in 12 repositories across 3 organizations from a hand-curated tier-1 life-sciences list: medical AI toolkits, bioimage-analysis projects, microscopy and pathology tooling, clinical-imaging OSS, and life-science research labs.

Organization Repos Type
MIC-DKFZ 9 Organization
bowang-lab 2 Organization
TissueImageAnalytics 1 Organization

MIC-DKFZ is the largest public-code cluster in this audit. TissueImageAnalytics is the clearest reusable-library signal because TIAToolbox declares Albumentations as a dependency and imports it in stain-augmentation tooling. bowang-lab contributes public medical and biological imaging training code where Albumentations appears in the data pipeline.

A representative path list from the search:

Repo File
MIC-DKFZ/AGGC2022 data/test_augs.py
MIC-DKFZ/BodyPartRegression bpreg/preprocessing/nrrd2npy.py
MIC-DKFZ/diabetes-xai feature_extraction/extract_features_fp_timm.py
MIC-DKFZ/generalized_yolov5 utils/augmentations.py
MIC-DKFZ/help_a_hematologist_out_challenge augmentation/policies/cifar.py
MIC-DKFZ/image_classification src/glovita/augmentation/policies/dataset_specific/aid.py
MIC-DKFZ/perovskite-xai data/augmentations/perov_2d.py
MIC-DKFZ/radioactive src/radioa/model/SAMMed2D.py
MIC-DKFZ/semantic_segmentation src/semantic_segmentation/datasets/base_dataset.py
TissueImageAnalytics/tiatoolbox tiatoolbox/tools/stainaugment.py
bowang-lab/EchoJEPA data/batch_depth_attenuation.py
bowang-lab/MedSAMSlicer MedSAMLite/Resources/server_essentials/medsam_interface/engines/src/data/medsam_datamodule.py

Academic Citations

Albumentations is cited by 563 unique life-sciences / biomedical-imaging papers. The count is filtered from an internal citation export containing 2,470 unique citing papers and 12,371 author-paper-affiliation rows.

The citation data is deduplicated by paper URL, with paper title as fallback. That detail matters because the raw citation export contains one row per (paper x author x affiliation), so counting rows would overstate adoption.

Year-over-Year Growth

Year Life-sciences papers citing Albumentations
2020 18
2021 40
2022 78
2023 97
2024 113
2025 148
2026 69

The visible pattern is steady growth through 2025, with 2026 already substantial as of May 12. The conservative interpretation is simple: life-sciences ML papers increasingly publish code, increasingly use standard augmentation libraries instead of local one-off transforms, and increasingly cite the tooling that sits in the training pipeline.

Top-Cited Life-Sciences Papers (Sample)

The truncated titles are exactly what the citation export returned in this audit. The point of the table is not bibliographic polish; it is a concrete sample of life-sciences papers where Albumentations appears in the citation trail.

Top Affiliations

Affiliations with at least three life-sciences papers in the filtered citation set:

Affiliation Papers
Radboud University Medical Center 6
University of Electronic Science and Technology of China 6
University College London 5
University of Pennsylvania 5
Memorial Sloan Kettering Cancer Center 4
Technical University of Munich 4
University of Oxford 4
University of Ulsan College of Medicine, Seoul 4
Affiliated Hospital of Hubei University of Arts and Science 3
Beihang University 3
Case Western Reserve University 3
Chinese Academy of Sciences, Shenzhen 3
Chulalongkorn University 3
Concordia University 3
First Affiliated Hospital of Jinan University 3

Hugging Face Ecosystem

Across Hugging Face Hub artifacts tagged medical / medical-imaging / radiology / histopathology / microscopy / healthcare / biology / bioimage / cell-segmentation / drug-discovery, 33 artifacts reference Albumentations in their model or dataset card: 32 models and 1 dataset.

The absolute download counts are small for most of these cards, which is normal for specialized biomedical artifacts on Hugging Face. The useful signal is not popularity ranking. The useful signal is that Albumentations appears in public training recipes across radiology, histopathology, endoscopy, pressure-sore classification, polyp segmentation, cell segmentation, and related biomedical tasks.

Kind ID Downloads Likes Tags
model Snarcy/RedDino-large 423 1 medical-imaging
dataset LosHuesitos9-9/Huesitos 66 1 medical
model Lab-Rasool/PRIMER 9 1 radiology
model ibrahim313/ducknet-polyp-segmentation 4 1 medical-imaging
model RuthvikBandari/DiaFootAI 4 0 medical-imaging
model Thiyaga158/Custom_CNN_For_Pneumonia_Detection_Using_Check_X-Ray 0 0 healthcare; medical-imaging
model dheeren-tejani/DiabeticRetinpathyClassifier 0 0 medical-imaging
model adelelsayed1991/chexpert-mae-densenet-fpn 0 0 healthcare; medical-imaging
model ayanahmedkhan/VIT-gi-endoscopy-classifier 0 0 medical-imaging
model RuthvikBandari/DiaFoot.AI-v2 0 0 medical-imaging
model tanishq74/retinasense-vit 0 1 medical-imaging
model MrCzaro/Pressure_sore_cascade_classifier_Torch 0 0 medical-imaging
model csmp-hub/cellpose-histo-hgsc-nuc-v1 0 0 histopathology
model csmp-hub/hovernet-histo-hgsc-nuc-v1 0 0 histopathology
model csmp-hub/stardist-histo-hgsc-nuc-v1 0 0 histopathology
model csmp-hub/cellvit-histo-hgsc-nuc-v1 0 0 histopathology
model csmp-hub/cppnet-histo-hgsc-nuc-v1 0 0 histopathology
model histolytics-hub/hovernet-histo-hgsc-pan-v1 0 0 histopathology
model histolytics-hub/cellpose-histo-hgsc-pan-v1 0 0 histopathology
model histolytics-hub/stardist-histo-hgsc-pan-v1 0 0 histopathology

Life-Sciences Subcategory Rollup

A single paper or repository can match more than one subcategory, so these are evidence rollups rather than mutually exclusive totals.

Academic Papers

Subcategory Count
Radiology and clinical imaging 259
Biomedical imaging 75
Microscopy and bioimage analysis 56
Histopathology and digital pathology 44
Infectious disease and immunology imaging 43
Neuroscience imaging 29
Cell and developmental biology imaging 6
Therapeutics discovery and high-content screening 2

Public Repositories

Subcategory Count
Histopathology and digital pathology 1

Hugging Face Artifacts

Subcategory Count
Histopathology and digital pathology 20
Microscopy and bioimage analysis 9
Radiology and clinical imaging 2

What This Means

Life-sciences image workflows depend on label-preserving transforms: microscopy channels, histopathology tiles, radiology slices, endoscopy frames, cell masks, organ masks, boxes, landmarks, and metadata all have to stay aligned. The public evidence above shows the Albumentations ecosystem acting as shared infrastructure across those workflows, not as a single-purpose medical-imaging script.

The most important caveat is that biological and clinical augmentation is less forgiving than generic computer vision. A transform can be technically correct and scientifically wrong. HorizontalFlip can be harmless for many tissue patches and harmful for laterality-sensitive tasks. RandomBrightnessContrast can model nuisance variation in illumination or staining, but it does not replace scanner or assay physics. ElasticTransform can help in some microscopy and histology segmentation settings and can destroy morphology in others.

Every named org in the table above is a current, public-code user. TIAToolbox ships Albumentations transitively to its users. The 563-paper citation count is a lower bound because it only counts papers whose metadata explicitly contains life-sciences or biomedical-imaging keywords. It does not attempt to count private clinical, pharmaceutical, biotechnology, or research usage.

If you maintain a life-sciences OSS project, foundation model, or training pipeline and want to be added to or removed from this evidence set, ping me. The audit is scripted internally and can be rerun on request.


This brief is generated from an internal evidence pipeline over public APIs and public repository files. The derived artifacts are not published with this post. Last regenerated 2026-05-12.

Hero image: cropped and resized from An Image of Microorganisms by turek on Pexels.

Top comments (0)