How to Use Weights & Biases 0.17 and Hugging Face Transformers 4.40 to Track LLM Training
Training large language models (LLMs) requires careful monitoring of metrics like loss, perplexity, learning rate, and GPU utilization to debug issues, compare experiments, and optimize performance. This guide walks through integrating Weights & Biases (W&B) 0.17 with Hugging Face Transformers 4.40 to streamline LLM training tracking.
Prerequisites
- Python 3.8+ installed
- W&B account (free tier available at wandb.ai)
- Basic familiarity with Hugging Face Transformers training pipelines
- GPU access (optional but recommended for LLM training)
Step 1: Install Required Dependencies
First, install the exact versions of W&B and Hugging Face Transformers specified to avoid compatibility issues:
pip install wandb==0.17.0 transformers==4.40.0 datasets tokenizers accelerate
Log in to your W&B account to enable experiment logging:
wandb login
Follow the prompt to paste your W&B API key from your account settings.
Step 2: Initialize a W&B Run
Before starting training, initialize a W&B run to create a unique experiment container. You can set custom project names, run names, and tags to organize experiments:
import wandb
wandb.init(
project="llm-training-tracking",
name="gpt2-finetune-2024",
tags=["gpt2", "finetune", "huggingface"],
config={
"model_name": "gpt2",
"learning_rate": 5e-5,
"batch_size": 8,
"epochs": 3
}
)
The config parameter automatically logs all hyperparameters to W&B for easy comparison across runs.
Step 3: Configure Hugging Face TrainingArguments with W&B Integration
Hugging Face Transformers 4.40 includes native support for W&B via the report_to parameter in TrainingArguments. Set report_to="wandb" to automatically log default training metrics (loss, learning rate, epoch progress) to your W&B dashboard:
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./gpt2-finetuned",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=3,
learning_rate=5e-5,
evaluation_strategy="steps",
eval_steps=500,
save_steps=500,
report_to="wandb", # Enable W&B logging
logging_steps=100,
fp16=True, # Enable mixed precision if using GPU
)
Note: Transformers 4.40 automatically syncs W&B logging with the active wandb.init run, so no additional setup is needed.
Step 4: Log Custom Metrics and Artifacts
Beyond default metrics, you can log custom values (e.g., perplexity, generation samples) and model artifacts to W&B. Use the wandb.log method inside your training loop, or extend the Hugging Face Trainer with a custom callback:
from transformers import Trainer, TrainerCallback
import wandb
class WandbMetricsCallback(TrainerCallback):
def on_evaluate(self, args, state, control, metrics, **kwargs):
# Log custom evaluation metrics
wandb.log({
"eval/perplexity": metrics.get("eval_loss", 0) ** 0.5,
"eval/samples": wandb.Table(columns=["Prompt", "Generated Text"], data=sample_generations)
})
# Initialize Trainer with custom callback
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
callbacks=[WandbMetricsCallback()]
)
To log model artifacts (e.g., final model weights) to W&B, use the wandb.save method or enable push_to_hub with W&B sync:
trainer.push_to_hub(repo_id="your-username/gpt2-finetuned", use_wandb=True)
Step 5: Visualize and Compare Results
Once training starts, W&B automatically streams metrics to your dashboard. Key features to use:
- Charts: Compare loss, learning rate, and custom metrics across runs
- Parallel Coordinates: Identify hyperparameter combinations that yield the best performance
- Artifacts: Version and download model checkpoints, datasets, and logs
- Reports: Create shareable summaries of experiment results
Best Practices for W&B 0.17 + Transformers 4.40
- Always pin dependency versions (
wandb==0.17.0,transformers==4.40.0) to avoid breaking changes - Log all hyperparameters to the W&B config to enable experiment reproducibility
- Use W&B tags to filter runs by model type, dataset, or experiment goal
- Enable
logging_stepsin TrainingArguments to control metric logging frequency and reduce overhead
Conclusion
Integrating W&B 0.17 with Hugging Face Transformers 4.40 simplifies LLM training tracking with minimal code changes. By automating metric logging, centralizing experiment data, and enabling rich visualizations, you can iterate faster and build better-performing LLMs. Get started today by signing up for a free W&B account and testing the integration with your next LLM training run.
Top comments (0)