DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Computer Vision Toolkit

Computer Vision Toolkit

A production-ready collection of image classification, object detection, and segmentation pipelines built on PyTorch. This toolkit provides battle-tested data augmentation strategies, evaluation frameworks with per-class metrics, and modular architectures that let you swap backbones, heads, and loss functions without rewriting training loops. Whether you're prototyping a classifier on a small dataset or scaling object detection across multiple GPUs, this toolkit gives you the scaffolding to move fast.

Key Features

  • Modular Model Factory — Instantiate ResNet, EfficientNet, or Vision Transformer backbones with a single config change. Supports pretrained weights from torchvision and timm.
  • Augmentation Pipeline — Albumentations-based transforms with preset profiles for medical imaging, satellite imagery, and general photography.
  • Multi-Task Heads — Classification, detection (anchor-based and anchor-free), and semantic segmentation heads that attach to any backbone.
  • Evaluation Dashboard — Computes accuracy, mAP, IoU, precision-recall curves, and confusion matrices. Exports results as JSON and PNG.
  • Dataset Adapters — Plug-and-play loaders for COCO, VOC, ImageNet-style folders, and custom CSV-based annotation formats.
  • Mixed Precision & DDP — Built-in support for torch.cuda.amp and DistributedDataParallel with zero code changes.
  • Experiment Logging — Native integration points for MLflow, Weights & Biases, and TensorBoard.
  • Export Pipeline — Convert trained models to ONNX and TorchScript for deployment.

Quick Start

# Extract and install dependencies
unzip computer-vision-toolkit.zip && cd computer-vision-toolkit
pip install -r requirements.txt

# Run image classification training
python src/computer_vision_toolkit/core.py --config config.example.yaml
Enter fullscreen mode Exit fullscreen mode
# config.example.yaml
task: classification
backbone:
  name: resnet50
  pretrained: true
  freeze_layers: 4

data:
  train_dir: ./data/train/
  val_dir: ./data/val/
  image_size: 224
  batch_size: 32
  num_workers: 4
  augmentation_profile: standard  # standard | medical | satellite

training:
  epochs: 50
  optimizer: adamw
  learning_rate: 0.001
  weight_decay: 0.01
  scheduler: cosine_annealing
  warmup_epochs: 5
  mixed_precision: true

evaluation:
  metrics: [accuracy, f1, precision, recall]
  save_confusion_matrix: true
  save_pr_curves: true
Enter fullscreen mode Exit fullscreen mode

Architecture

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  DataLoader  │────>│   Backbone   │────>│   Task Head   │
│  + Augment   │     │  (ResNet/ViT)│     │  (Cls/Det/Seg)│
└─────────────┘     └──────────────┘     └───────┬───────┘
                                                  │
                    ┌──────────────┐     ┌────────▼───────┐
                    │   Exporter   │<────│   Evaluator    │
                    │ (ONNX/Script)│     │  (mAP/IoU/F1)  │
                    └──────────────┘     └────────────────┘
Enter fullscreen mode Exit fullscreen mode

The toolkit follows a pipeline pattern: each stage (data loading, augmentation, model forward pass, loss computation, evaluation) is a swappable component registered in a factory. You configure which components to use in YAML; the runner assembles them at startup.

Usage Examples

Image Classification with Custom Dataset

from computer_vision_toolkit.core import ModelFactory, Trainer
from computer_vision_toolkit.utils import build_dataloaders

# Build model from config
model = ModelFactory.create(
    backbone="efficientnet_b3",
    task="classification",
    num_classes=10,
    pretrained=True,
)

# Create dataloaders with augmentation
train_loader, val_loader = build_dataloaders(
    train_dir="./data/train",
    val_dir="./data/val",
    image_size=300,
    batch_size=16,
    augmentation_profile="standard",
)

# Train with mixed precision
trainer = Trainer(model, train_loader, val_loader, mixed_precision=True)
trainer.fit(epochs=30, lr=1e-3)
trainer.export_onnx("model.onnx")
Enter fullscreen mode Exit fullscreen mode

Object Detection Evaluation

from computer_vision_toolkit.core import Evaluator

evaluator = Evaluator(task="detection")
results = evaluator.run(
    model=model,
    dataloader=test_loader,
    iou_thresholds=[0.5, 0.75],
)
print(f"mAP@0.5: {results['mAP_50']:.4f}")
print(f"mAP@0.75: {results['mAP_75']:.4f}")
evaluator.save_report("./results/detection_report.json")
Enter fullscreen mode Exit fullscreen mode

Configuration Reference

Parameter Type Default Description
task str classification Task type: classification, detection, segmentation
backbone.name str resnet50 Backbone architecture name
backbone.pretrained bool true Use pretrained ImageNet weights
data.image_size int 224 Input image resolution
training.mixed_precision bool true Enable FP16 automatic mixed precision
training.scheduler str cosine_annealing LR scheduler: step, cosine_annealing, one_cycle

Best Practices

  1. Start with a pretrained backbone — Transfer learning beats training from scratch on datasets under 100K images. Freeze early layers and fine-tune later layers first.
  2. Use progressive resizing — Train at 128px first, then 224px, then 384px. Each stage converges faster and regularizes the model.
  3. Profile your data pipeline — Use torch.utils.data.DataLoader with pin_memory=True and tune num_workers to saturate GPU utilization.
  4. Log everything — Track augmentation parameters, learning rate schedules, and validation metrics per epoch. Reproducibility saves debugging time.
  5. Validate augmentation visually — Always plot augmented samples before training. Aggressive augmentation can destroy label-relevant features.

Troubleshooting

Issue Cause Fix
CUDA out of memory Batch size too large for GPU VRAM Reduce batch_size, enable mixed_precision, or use gradient accumulation
Val accuracy plateaus early Learning rate too high or no warmup Add warmup_epochs: 5 and reduce initial LR by 10x
Data loader bottleneck CPU can't prepare batches fast enough Increase num_workers, enable pin_memory, use SSD storage
ONNX export fails Unsupported ops in model Replace dynamic control flow with torch.jit.script compatible patterns

This is 1 of 11 resources in the ML Engineer Toolkit toolkit. Get the complete [Computer Vision Toolkit] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire ML Engineer Toolkit bundle (11 products) for $149 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)