Weights and Biases Has a Free API: Track ML Experiments Like a Pro

#machinelearning #python #datascience #ai

What is Weights and Biases?

Weights and Biases (W&B) is the most popular ML experiment tracking platform. It automatically logs your model training metrics, hyperparameters, code, and artifacts — giving you a complete, reproducible history of every experiment.

Why W&B?

Free forever — unlimited experiments for individuals and academics
Automatic logging — 2 lines of code to track everything
Beautiful dashboards — interactive charts with zero config
Artifact versioning — track datasets, models, and pipelines
Sweeps — automated hyperparameter optimization
LLM monitoring — track prompts, completions, costs for AI apps

Quick Start

pip install wandb
wandb login  # Free at wandb.ai

import wandb

# Initialize experiment
run = wandb.init(
    project="my-ml-project",
    config={
        "learning_rate": 0.001,
        "epochs": 50,
        "batch_size": 32,
        "architecture": "ResNet50",
        "dataset": "CIFAR-10"
    }
)

# Training loop — log metrics
for epoch in range(50):
    train_loss = train(model)
    val_loss, val_acc = evaluate(model)

    wandb.log({
        "epoch": epoch,
        "train/loss": train_loss,
        "val/loss": val_loss,
        "val/accuracy": val_acc,
        "learning_rate": scheduler.get_lr()
    })

run.finish()

PyTorch Integration

import wandb
import torch

wandb.init(project="pytorch-demo")

# Automatic gradient logging
wandb.watch(model, log="all", log_freq=100)

# Log model checkpoints
torch.save(model.state_dict(), "model.pth")
wandb.save("model.pth")

Hyperparameter Sweeps

# sweep.yaml
program: train.py
method: bayes
metric:
  name: val/accuracy
  goal: maximize
parameters:
  learning_rate:
    min: 0.0001
    max: 0.1
    distribution: log_uniform_values
  batch_size:
    values: [16, 32, 64, 128]
  optimizer:
    values: [adam, sgd, adamw]
  dropout:
    min: 0.1
    max: 0.5

wandb sweep sweep.yaml
wandb agent your-entity/your-project/sweep-id

LLM Tracking with Traces

import wandb
from wandb.integration.openai import autolog

autolog({"project": "my-llm-app"})

# Now ALL OpenAI calls are automatically tracked:
# - Prompts and completions
# - Token usage and costs
# - Latency
# - Model version
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain Kubernetes"}]
)
# Check wandb.ai dashboard for full trace

Artifact Versioning

# Version datasets
artifact = wandb.Artifact("training-data", type="dataset")
artifact.add_dir("data/processed/")
run.log_artifact(artifact)  # v0, v1, v2...

# Version models
model_artifact = wandb.Artifact("best-model", type="model")
model_artifact.add_file("model.pth")
model_artifact.metadata = {"accuracy": 0.95, "f1": 0.93}
run.log_artifact(model_artifact)

W&B vs Alternatives

Feature	W&B	MLflow	TensorBoard	Neptune
Free tier	Unlimited	Self-host	Local only	200h/mo
Cloud hosted	Yes	Databricks	No	Yes
LLM tracking	Yes	Limited	No	Limited
Sweeps	Built-in	None	None	Built-in
Artifacts	Built-in	Built-in	None	Built-in
Team features	Free	Enterprise	None	Paid

Real-World Impact

An ML team ran 500 experiments over 3 months in Jupyter notebooks. When they found a good model, they could not reproduce it — the notebook had been modified 200 times. After adopting W&B: every experiment tracked automatically, the best model from 6 months ago could be reproduced in 1 command. They shipped to production in 2 days instead of 2 weeks.

Building ML systems that need proper experiment tracking? I help teams set up reproducible ML pipelines. Contact spinov001@gmail.com or explore my data tools on Apify.

DEV Community