DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

How to Use Weights & Biases 0.17 and Hugging Face Transformers 4.40 to Track LLM Training

How to Use Weights & Biases 0.17 and Hugging Face Transformers 4.40 to Track LLM Training

Training large language models (LLMs) requires careful monitoring of metrics like loss, perplexity, learning rate, and GPU utilization to debug issues, compare experiments, and optimize performance. This guide walks through integrating Weights & Biases (W&B) 0.17 with Hugging Face Transformers 4.40 to streamline LLM training tracking.

Prerequisites

  • Python 3.8+ installed
  • W&B account (free tier available at wandb.ai)
  • Basic familiarity with Hugging Face Transformers training pipelines
  • GPU access (optional but recommended for LLM training)

Step 1: Install Required Dependencies

First, install the exact versions of W&B and Hugging Face Transformers specified to avoid compatibility issues:

pip install wandb==0.17.0 transformers==4.40.0 datasets tokenizers accelerate
Enter fullscreen mode Exit fullscreen mode

Log in to your W&B account to enable experiment logging:

wandb login
Enter fullscreen mode Exit fullscreen mode

Follow the prompt to paste your W&B API key from your account settings.

Step 2: Initialize a W&B Run

Before starting training, initialize a W&B run to create a unique experiment container. You can set custom project names, run names, and tags to organize experiments:

import wandb

wandb.init(
    project="llm-training-tracking",
    name="gpt2-finetune-2024",
    tags=["gpt2", "finetune", "huggingface"],
    config={
        "model_name": "gpt2",
        "learning_rate": 5e-5,
        "batch_size": 8,
        "epochs": 3
    }
)
Enter fullscreen mode Exit fullscreen mode

The config parameter automatically logs all hyperparameters to W&B for easy comparison across runs.

Step 3: Configure Hugging Face TrainingArguments with W&B Integration

Hugging Face Transformers 4.40 includes native support for W&B via the report_to parameter in TrainingArguments. Set report_to="wandb" to automatically log default training metrics (loss, learning rate, epoch progress) to your W&B dashboard:

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./gpt2-finetuned",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    learning_rate=5e-5,
    evaluation_strategy="steps",
    eval_steps=500,
    save_steps=500,
    report_to="wandb",  # Enable W&B logging
    logging_steps=100,
    fp16=True,  # Enable mixed precision if using GPU
)
Enter fullscreen mode Exit fullscreen mode

Note: Transformers 4.40 automatically syncs W&B logging with the active wandb.init run, so no additional setup is needed.

Step 4: Log Custom Metrics and Artifacts

Beyond default metrics, you can log custom values (e.g., perplexity, generation samples) and model artifacts to W&B. Use the wandb.log method inside your training loop, or extend the Hugging Face Trainer with a custom callback:

from transformers import Trainer, TrainerCallback
import wandb

class WandbMetricsCallback(TrainerCallback):
    def on_evaluate(self, args, state, control, metrics, **kwargs):
        # Log custom evaluation metrics
        wandb.log({
            "eval/perplexity": metrics.get("eval_loss", 0) ** 0.5,
            "eval/samples": wandb.Table(columns=["Prompt", "Generated Text"], data=sample_generations)
        })

# Initialize Trainer with custom callback
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[WandbMetricsCallback()]
)
Enter fullscreen mode Exit fullscreen mode

To log model artifacts (e.g., final model weights) to W&B, use the wandb.save method or enable push_to_hub with W&B sync:

trainer.push_to_hub(repo_id="your-username/gpt2-finetuned", use_wandb=True)
Enter fullscreen mode Exit fullscreen mode

Step 5: Visualize and Compare Results

Once training starts, W&B automatically streams metrics to your dashboard. Key features to use:

  • Charts: Compare loss, learning rate, and custom metrics across runs
  • Parallel Coordinates: Identify hyperparameter combinations that yield the best performance
  • Artifacts: Version and download model checkpoints, datasets, and logs
  • Reports: Create shareable summaries of experiment results

Best Practices for W&B 0.17 + Transformers 4.40

  • Always pin dependency versions (wandb==0.17.0, transformers==4.40.0) to avoid breaking changes
  • Log all hyperparameters to the W&B config to enable experiment reproducibility
  • Use W&B tags to filter runs by model type, dataset, or experiment goal
  • Enable logging_steps in TrainingArguments to control metric logging frequency and reduce overhead

Conclusion

Integrating W&B 0.17 with Hugging Face Transformers 4.40 simplifies LLM training tracking with minimal code changes. By automating metric logging, centralizing experiment data, and enabling rich visualizations, you can iterate faster and build better-performing LLMs. Get started today by signing up for a free W&B account and testing the integration with your next LLM training run.

Top comments (0)