ANKUSH CHOUDHARY JOHAL

Posted on May 3 • Originally published at johal.in

How to Use Weights & Biases 0.17 and Hugging Face Transformers 4.40 to Track LLM Training

#weights #biases #hugging #face

How to Use Weights & Biases 0.17 and Hugging Face Transformers 4.40 to Track LLM Training

Training large language models (LLMs) requires careful monitoring of metrics like loss, perplexity, learning rate, and GPU utilization to debug issues, compare experiments, and optimize performance. This guide walks through integrating Weights & Biases (W&B) 0.17 with Hugging Face Transformers 4.40 to streamline LLM training tracking.

Prerequisites

Python 3.8+ installed
W&B account (free tier available at wandb.ai)
Basic familiarity with Hugging Face Transformers training pipelines
GPU access (optional but recommended for LLM training)

Step 1: Install Required Dependencies

First, install the exact versions of W&B and Hugging Face Transformers specified to avoid compatibility issues:

pip install wandb==0.17.0 transformers==4.40.0 datasets tokenizers accelerate

wandb login

Follow the prompt to paste your W&B API key from your account settings.

Step 2: Initialize a W&B Run

Before starting training, initialize a W&B run to create a unique experiment container. You can set custom project names, run names, and tags to organize experiments:

import wandb

wandb.init(
    project="llm-training-tracking",
    name="gpt2-finetune-2024",
    tags=["gpt2", "finetune", "huggingface"],
    config={
        "model_name": "gpt2",
        "learning_rate": 5e-5,
        "batch_size": 8,
        "epochs": 3
    }
)

The config parameter automatically logs all hyperparameters to W&B for easy comparison across runs.

Step 3: Configure Hugging Face TrainingArguments with W&B Integration

Hugging Face Transformers 4.40 includes native support for W&B via the report_to parameter in TrainingArguments. Set report_to="wandb" to automatically log default training metrics (loss, learning rate, epoch progress) to your W&B dashboard:

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./gpt2-finetuned",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    learning_rate=5e-5,
    evaluation_strategy="steps",
    eval_steps=500,
    save_steps=500,
    report_to="wandb",  # Enable W&B logging
    logging_steps=100,
    fp16=True,  # Enable mixed precision if using GPU
)

Note: Transformers 4.40 automatically syncs W&B logging with the active wandb.init run, so no additional setup is needed.

Step 4: Log Custom Metrics and Artifacts

Beyond default metrics, you can log custom values (e.g., perplexity, generation samples) and model artifacts to W&B. Use the wandb.log method inside your training loop, or extend the Hugging Face Trainer with a custom callback:

from transformers import Trainer, TrainerCallback
import wandb

class WandbMetricsCallback(TrainerCallback):
    def on_evaluate(self, args, state, control, metrics, **kwargs):
        # Log custom evaluation metrics
        wandb.log({
            "eval/perplexity": metrics.get("eval_loss", 0) ** 0.5,
            "eval/samples": wandb.Table(columns=["Prompt", "Generated Text"], data=sample_generations)
        })

# Initialize Trainer with custom callback
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[WandbMetricsCallback()]
)

To log model artifacts (e.g., final model weights) to W&B, use the wandb.save method or enable push_to_hub with W&B sync:

trainer.push_to_hub(repo_id="your-username/gpt2-finetuned", use_wandb=True)

Step 5: Visualize and Compare Results

Once training starts, W&B automatically streams metrics to your dashboard. Key features to use:

Charts: Compare loss, learning rate, and custom metrics across runs
Parallel Coordinates: Identify hyperparameter combinations that yield the best performance
Artifacts: Version and download model checkpoints, datasets, and logs
Reports: Create shareable summaries of experiment results

Best Practices for W&B 0.17 + Transformers 4.40

Always pin dependency versions (wandb==0.17.0, transformers==4.40.0) to avoid breaking changes
Log all hyperparameters to the W&B config to enable experiment reproducibility
Use W&B tags to filter runs by model type, dataset, or experiment goal
Enable logging_steps in TrainingArguments to control metric logging frequency and reduce overhead

Conclusion

Integrating W&B 0.17 with Hugging Face Transformers 4.40 simplifies LLM training tracking with minimal code changes. By automating metric logging, centralizing experiment data, and enabling rich visualizations, you can iterate faster and build better-performing LLMs. Get started today by signing up for a free W&B account and testing the integration with your next LLM training run.

DEV Community

How to Use Weights & Biases 0.17 and Hugging Face Transformers 4.40 to Track LLM Training

How to Use Weights & Biases 0.17 and Hugging Face Transformers 4.40 to Track LLM Training

Prerequisites

Step 1: Install Required Dependencies

Step 2: Initialize a W&B Run

Step 3: Configure Hugging Face TrainingArguments with W&B Integration

Step 4: Log Custom Metrics and Artifacts

Step 5: Visualize and Compare Results

Best Practices for W&B 0.17 + Transformers 4.40

Conclusion

Top comments (0)