What is Weights and Biases?
Weights and Biases (W&B) is the most popular ML experiment tracking platform. It automatically logs your model training metrics, hyperparameters, code, and artifacts — giving you a complete, reproducible history of every experiment.
Why W&B?
- Free forever — unlimited experiments for individuals and academics
- Automatic logging — 2 lines of code to track everything
- Beautiful dashboards — interactive charts with zero config
- Artifact versioning — track datasets, models, and pipelines
- Sweeps — automated hyperparameter optimization
- LLM monitoring — track prompts, completions, costs for AI apps
Quick Start
pip install wandb
wandb login # Free at wandb.ai
import wandb
# Initialize experiment
run = wandb.init(
project="my-ml-project",
config={
"learning_rate": 0.001,
"epochs": 50,
"batch_size": 32,
"architecture": "ResNet50",
"dataset": "CIFAR-10"
}
)
# Training loop — log metrics
for epoch in range(50):
train_loss = train(model)
val_loss, val_acc = evaluate(model)
wandb.log({
"epoch": epoch,
"train/loss": train_loss,
"val/loss": val_loss,
"val/accuracy": val_acc,
"learning_rate": scheduler.get_lr()
})
run.finish()
PyTorch Integration
import wandb
import torch
wandb.init(project="pytorch-demo")
# Automatic gradient logging
wandb.watch(model, log="all", log_freq=100)
# Log model checkpoints
torch.save(model.state_dict(), "model.pth")
wandb.save("model.pth")
Hyperparameter Sweeps
# sweep.yaml
program: train.py
method: bayes
metric:
name: val/accuracy
goal: maximize
parameters:
learning_rate:
min: 0.0001
max: 0.1
distribution: log_uniform_values
batch_size:
values: [16, 32, 64, 128]
optimizer:
values: [adam, sgd, adamw]
dropout:
min: 0.1
max: 0.5
wandb sweep sweep.yaml
wandb agent your-entity/your-project/sweep-id
LLM Tracking with Traces
import wandb
from wandb.integration.openai import autolog
autolog({"project": "my-llm-app"})
# Now ALL OpenAI calls are automatically tracked:
# - Prompts and completions
# - Token usage and costs
# - Latency
# - Model version
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain Kubernetes"}]
)
# Check wandb.ai dashboard for full trace
Artifact Versioning
# Version datasets
artifact = wandb.Artifact("training-data", type="dataset")
artifact.add_dir("data/processed/")
run.log_artifact(artifact) # v0, v1, v2...
# Version models
model_artifact = wandb.Artifact("best-model", type="model")
model_artifact.add_file("model.pth")
model_artifact.metadata = {"accuracy": 0.95, "f1": 0.93}
run.log_artifact(model_artifact)
W&B vs Alternatives
| Feature | W&B | MLflow | TensorBoard | Neptune |
|---|---|---|---|---|
| Free tier | Unlimited | Self-host | Local only | 200h/mo |
| Cloud hosted | Yes | Databricks | No | Yes |
| LLM tracking | Yes | Limited | No | Limited |
| Sweeps | Built-in | None | None | Built-in |
| Artifacts | Built-in | Built-in | None | Built-in |
| Team features | Free | Enterprise | None | Paid |
Real-World Impact
An ML team ran 500 experiments over 3 months in Jupyter notebooks. When they found a good model, they could not reproduce it — the notebook had been modified 200 times. After adopting W&B: every experiment tracked automatically, the best model from 6 months ago could be reproduced in 1 command. They shipped to production in 2 days instead of 2 weeks.
Building ML systems that need proper experiment tracking? I help teams set up reproducible ML pipelines. Contact spinov001@gmail.com or explore my data tools on Apify.
Top comments (0)