Computer Vision Toolkit
A production-ready collection of image classification, object detection, and segmentation pipelines built on PyTorch. This toolkit provides battle-tested data augmentation strategies, evaluation frameworks with per-class metrics, and modular architectures that let you swap backbones, heads, and loss functions without rewriting training loops. Whether you're prototyping a classifier on a small dataset or scaling object detection across multiple GPUs, this toolkit gives you the scaffolding to move fast.
Key Features
- Modular Model Factory — Instantiate ResNet, EfficientNet, or Vision Transformer backbones with a single config change. Supports pretrained weights from torchvision and timm.
- Augmentation Pipeline — Albumentations-based transforms with preset profiles for medical imaging, satellite imagery, and general photography.
- Multi-Task Heads — Classification, detection (anchor-based and anchor-free), and semantic segmentation heads that attach to any backbone.
- Evaluation Dashboard — Computes accuracy, mAP, IoU, precision-recall curves, and confusion matrices. Exports results as JSON and PNG.
- Dataset Adapters — Plug-and-play loaders for COCO, VOC, ImageNet-style folders, and custom CSV-based annotation formats.
-
Mixed Precision & DDP — Built-in support for
torch.cuda.ampandDistributedDataParallelwith zero code changes. - Experiment Logging — Native integration points for MLflow, Weights & Biases, and TensorBoard.
- Export Pipeline — Convert trained models to ONNX and TorchScript for deployment.
Quick Start
# Extract and install dependencies
unzip computer-vision-toolkit.zip && cd computer-vision-toolkit
pip install -r requirements.txt
# Run image classification training
python src/computer_vision_toolkit/core.py --config config.example.yaml
# config.example.yaml
task: classification
backbone:
name: resnet50
pretrained: true
freeze_layers: 4
data:
train_dir: ./data/train/
val_dir: ./data/val/
image_size: 224
batch_size: 32
num_workers: 4
augmentation_profile: standard # standard | medical | satellite
training:
epochs: 50
optimizer: adamw
learning_rate: 0.001
weight_decay: 0.01
scheduler: cosine_annealing
warmup_epochs: 5
mixed_precision: true
evaluation:
metrics: [accuracy, f1, precision, recall]
save_confusion_matrix: true
save_pr_curves: true
Architecture
┌─────────────┐ ┌──────────────┐ ┌───────────────┐
│ DataLoader │────>│ Backbone │────>│ Task Head │
│ + Augment │ │ (ResNet/ViT)│ │ (Cls/Det/Seg)│
└─────────────┘ └──────────────┘ └───────┬───────┘
│
┌──────────────┐ ┌────────▼───────┐
│ Exporter │<────│ Evaluator │
│ (ONNX/Script)│ │ (mAP/IoU/F1) │
└──────────────┘ └────────────────┘
The toolkit follows a pipeline pattern: each stage (data loading, augmentation, model forward pass, loss computation, evaluation) is a swappable component registered in a factory. You configure which components to use in YAML; the runner assembles them at startup.
Usage Examples
Image Classification with Custom Dataset
from computer_vision_toolkit.core import ModelFactory, Trainer
from computer_vision_toolkit.utils import build_dataloaders
# Build model from config
model = ModelFactory.create(
backbone="efficientnet_b3",
task="classification",
num_classes=10,
pretrained=True,
)
# Create dataloaders with augmentation
train_loader, val_loader = build_dataloaders(
train_dir="./data/train",
val_dir="./data/val",
image_size=300,
batch_size=16,
augmentation_profile="standard",
)
# Train with mixed precision
trainer = Trainer(model, train_loader, val_loader, mixed_precision=True)
trainer.fit(epochs=30, lr=1e-3)
trainer.export_onnx("model.onnx")
Object Detection Evaluation
from computer_vision_toolkit.core import Evaluator
evaluator = Evaluator(task="detection")
results = evaluator.run(
model=model,
dataloader=test_loader,
iou_thresholds=[0.5, 0.75],
)
print(f"mAP@0.5: {results['mAP_50']:.4f}")
print(f"mAP@0.75: {results['mAP_75']:.4f}")
evaluator.save_report("./results/detection_report.json")
Configuration Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
task |
str | classification |
Task type: classification, detection, segmentation |
backbone.name |
str | resnet50 |
Backbone architecture name |
backbone.pretrained |
bool | true |
Use pretrained ImageNet weights |
data.image_size |
int | 224 |
Input image resolution |
training.mixed_precision |
bool | true |
Enable FP16 automatic mixed precision |
training.scheduler |
str | cosine_annealing |
LR scheduler: step, cosine_annealing, one_cycle |
Best Practices
- Start with a pretrained backbone — Transfer learning beats training from scratch on datasets under 100K images. Freeze early layers and fine-tune later layers first.
- Use progressive resizing — Train at 128px first, then 224px, then 384px. Each stage converges faster and regularizes the model.
-
Profile your data pipeline — Use
torch.utils.data.DataLoaderwithpin_memory=Trueand tunenum_workersto saturate GPU utilization. - Log everything — Track augmentation parameters, learning rate schedules, and validation metrics per epoch. Reproducibility saves debugging time.
- Validate augmentation visually — Always plot augmented samples before training. Aggressive augmentation can destroy label-relevant features.
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
CUDA out of memory |
Batch size too large for GPU VRAM | Reduce batch_size, enable mixed_precision, or use gradient accumulation |
| Val accuracy plateaus early | Learning rate too high or no warmup | Add warmup_epochs: 5 and reduce initial LR by 10x |
| Data loader bottleneck | CPU can't prepare batches fast enough | Increase num_workers, enable pin_memory, use SSD storage |
| ONNX export fails | Unsupported ops in model | Replace dynamic control flow with torch.jit.script compatible patterns |
This is 1 of 11 resources in the ML Engineer Toolkit toolkit. Get the complete [Computer Vision Toolkit] with all files, templates, and documentation for $39.
Or grab the entire ML Engineer Toolkit bundle (11 products) for $149 — save 30%.
Top comments (0)