Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Computer Vision Toolkit

#machinelearning #python #mlops #datascience

Computer Vision Toolkit

A production-ready collection of image classification, object detection, and segmentation pipelines built on PyTorch. This toolkit provides battle-tested data augmentation strategies, evaluation frameworks with per-class metrics, and modular architectures that let you swap backbones, heads, and loss functions without rewriting training loops. Whether you're prototyping a classifier on a small dataset or scaling object detection across multiple GPUs, this toolkit gives you the scaffolding to move fast.

Key Features

Modular Model Factory — Instantiate ResNet, EfficientNet, or Vision Transformer backbones with a single config change. Supports pretrained weights from torchvision and timm.
Augmentation Pipeline — Albumentations-based transforms with preset profiles for medical imaging, satellite imagery, and general photography.
Multi-Task Heads — Classification, detection (anchor-based and anchor-free), and semantic segmentation heads that attach to any backbone.
Evaluation Dashboard — Computes accuracy, mAP, IoU, precision-recall curves, and confusion matrices. Exports results as JSON and PNG.
Dataset Adapters — Plug-and-play loaders for COCO, VOC, ImageNet-style folders, and custom CSV-based annotation formats.
Mixed Precision & DDP — Built-in support for torch.cuda.amp and DistributedDataParallel with zero code changes.
Experiment Logging — Native integration points for MLflow, Weights & Biases, and TensorBoard.
Export Pipeline — Convert trained models to ONNX and TorchScript for deployment.

Quick Start

# Extract and install dependencies
unzip computer-vision-toolkit.zip && cd computer-vision-toolkit
pip install -r requirements.txt

# Run image classification training
python src/computer_vision_toolkit/core.py --config config.example.yaml

# config.example.yaml
task: classification
backbone:
  name: resnet50
  pretrained: true
  freeze_layers: 4

data:
  train_dir: ./data/train/
  val_dir: ./data/val/
  image_size: 224
  batch_size: 32
  num_workers: 4
  augmentation_profile: standard  # standard | medical | satellite

training:
  epochs: 50
  optimizer: adamw
  learning_rate: 0.001
  weight_decay: 0.01
  scheduler: cosine_annealing
  warmup_epochs: 5
  mixed_precision: true

evaluation:
  metrics: [accuracy, f1, precision, recall]
  save_confusion_matrix: true
  save_pr_curves: true

Architecture

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  DataLoader  │────>│   Backbone   │────>│   Task Head   │
│  + Augment   │     │  (ResNet/ViT)│     │  (Cls/Det/Seg)│
└─────────────┘     └──────────────┘     └───────┬───────┘
                                                  │
                    ┌──────────────┐     ┌────────▼───────┐
                    │   Exporter   │<────│   Evaluator    │
                    │ (ONNX/Script)│     │  (mAP/IoU/F1)  │
                    └──────────────┘     └────────────────┘

The toolkit follows a pipeline pattern: each stage (data loading, augmentation, model forward pass, loss computation, evaluation) is a swappable component registered in a factory. You configure which components to use in YAML; the runner assembles them at startup.

Usage Examples

Image Classification with Custom Dataset

from computer_vision_toolkit.core import ModelFactory, Trainer
from computer_vision_toolkit.utils import build_dataloaders

# Build model from config
model = ModelFactory.create(
    backbone="efficientnet_b3",
    task="classification",
    num_classes=10,
    pretrained=True,
)

# Create dataloaders with augmentation
train_loader, val_loader = build_dataloaders(
    train_dir="./data/train",
    val_dir="./data/val",
    image_size=300,
    batch_size=16,
    augmentation_profile="standard",
)

# Train with mixed precision
trainer = Trainer(model, train_loader, val_loader, mixed_precision=True)
trainer.fit(epochs=30, lr=1e-3)
trainer.export_onnx("model.onnx")

Object Detection Evaluation

from computer_vision_toolkit.core import Evaluator

evaluator = Evaluator(task="detection")
results = evaluator.run(
    model=model,
    dataloader=test_loader,
    iou_thresholds=[0.5, 0.75],
)
print(f"mAP@0.5: {results['mAP_50']:.4f}")
print(f"mAP@0.75: {results['mAP_75']:.4f}")
evaluator.save_report("./results/detection_report.json")

Configuration Reference

Parameter	Type	Default	Description
`task`	str	`classification`	Task type: classification, detection, segmentation
`backbone.name`	str	`resnet50`	Backbone architecture name
`backbone.pretrained`	bool	`true`	Use pretrained ImageNet weights
`data.image_size`	int	`224`	Input image resolution
`training.mixed_precision`	bool	`true`	Enable FP16 automatic mixed precision
`training.scheduler`	str	`cosine_annealing`	LR scheduler: step, cosine_annealing, one_cycle

Best Practices

Start with a pretrained backbone — Transfer learning beats training from scratch on datasets under 100K images. Freeze early layers and fine-tune later layers first.
Use progressive resizing — Train at 128px first, then 224px, then 384px. Each stage converges faster and regularizes the model.
Profile your data pipeline — Use torch.utils.data.DataLoader with pin_memory=True and tune num_workers to saturate GPU utilization.
Log everything — Track augmentation parameters, learning rate schedules, and validation metrics per epoch. Reproducibility saves debugging time.
Validate augmentation visually — Always plot augmented samples before training. Aggressive augmentation can destroy label-relevant features.

Troubleshooting

Issue	Cause	Fix
`CUDA out of memory`	Batch size too large for GPU VRAM	Reduce `batch_size`, enable `mixed_precision`, or use gradient accumulation
Val accuracy plateaus early	Learning rate too high or no warmup	Add `warmup_epochs: 5` and reduce initial LR by 10x
Data loader bottleneck	CPU can't prepare batches fast enough	Increase `num_workers`, enable `pin_memory`, use SSD storage
ONNX export fails	Unsupported ops in model	Replace dynamic control flow with `torch.jit.script` compatible patterns

This is 1 of 11 resources in the ML Engineer Toolkit toolkit. Get the complete [Computer Vision Toolkit] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire ML Engineer Toolkit bundle (11 products) for $149 — save 30%.

Get the Complete Bundle →

DEV Community

Computer Vision Toolkit

Computer Vision Toolkit

Key Features

Quick Start

Architecture

Usage Examples

Image Classification with Custom Dataset

Object Detection Evaluation

Configuration Reference

Best Practices

Troubleshooting

Related Articles

Top comments (0)