DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Fine-Tuning Pipeline

Fine-Tuning Pipeline

Everything you need to fine-tune open-source LLMs on your own data — from dataset preparation through training to deployment. This pipeline handles LoRA/QLoRA configuration, training data formatting, hyperparameter management, experiment tracking, model merging, and quantized deployment. Designed for teams running fine-tuning on single GPUs or small clusters without deep ML infrastructure expertise.

Key Features

  • LoRA & QLoRA Training — Parameter-efficient fine-tuning scripts with automatic rank selection, target module detection, and 4-bit quantization support
  • Dataset Preparation — Convert raw data (CSV, JSON, conversations) into training-ready formats with deduplication, filtering, and train/val/test splits
  • Hyperparameter Management — Predefined configs for common base models (Llama, Mistral, Phi) with recommended learning rates, batch sizes, and schedules
  • Training Monitoring — Real-time loss curves, gradient norms, learning rate schedules, and GPU utilization tracking with automatic early stopping
  • Model Merging — Merge LoRA adapters back into base models with TIES, DARE, and linear merge strategies
  • Evaluation Suite — Run benchmarks against your test set automatically after training to validate improvement
  • Deployment Export — Export merged models in GGUF, ONNX, or SafeTensors format for inference serving

Quick Start

from fine_tuning import Pipeline, DatasetConfig, TrainingConfig

# 1. Prepare dataset
dataset = DatasetConfig(
    source="data/customer_support_conversations.jsonl",
    format="sharegpt",           # sharegpt | alpaca | completion
    train_split=0.9,
    val_split=0.1,
    max_length=2048,
    filter_empty=True,
    dedup_threshold=0.95,        # Remove near-duplicate examples
)

# 2. Configure training
training = TrainingConfig(
    base_model="meta-llama/Llama-3.1-8B",
    method="qlora",              # lora | qlora | full
    lora_rank=64,
    lora_alpha=128,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    learning_rate=2e-4,
    batch_size=4,
    gradient_accumulation=4,     # Effective batch size = 16
    epochs=3,
    warmup_ratio=0.05,
    output_dir="outputs/support-bot-v1",
)

# 3. Run pipeline
pipeline = Pipeline(dataset=dataset, training=training)
result = pipeline.run()
print(f"Final val loss: {result.best_val_loss:.4f}")
print(f"Checkpoint: {result.best_checkpoint}")
Enter fullscreen mode Exit fullscreen mode

Architecture

Raw Data (CSV/JSON/JSONL)
         │
         ▼
┌─────────────────┐
│ Dataset Prep    │──── Format, clean, deduplicate, split
└────────┬────────┘
         ▼
┌─────────────────┐
│  Tokenization   │──── Tokenize + pad/truncate to max_length
└────────┬────────┘
         ▼
┌─────────────────┐
│ Training Loop   │──── LoRA/QLoRA with monitoring + checkpoints
│                 │     ↳ Early stopping on val loss plateau
└────────┬────────┘
         ▼
┌─────────────────┐
│  Evaluation     │──── Run test set benchmarks
└────────┬────────┘
         ▼
┌─────────────────┐
│  Model Merge    │──── Merge adapter into base model
└────────┬────────┘
         ▼
┌─────────────────┐
│  Export         │──── GGUF / ONNX / SafeTensors
└─────────────────┘
Enter fullscreen mode Exit fullscreen mode

Usage Examples

Dataset Format Conversion

from fine_tuning.data import DatasetConverter

# Convert from CSV with question/answer columns to ShareGPT format
converter = DatasetConverter()
converter.from_csv(
    path="data/faq.csv",
    user_column="question",
    assistant_column="answer",
    system_prompt="You are a helpful customer support agent.",
    output_path="data/faq_sharegpt.jsonl",
)

# Convert from raw text completions to instruction format
converter.from_completions(
    path="data/code_samples.txt",
    instruction_template="Complete the following code:",
    output_path="data/code_alpaca.jsonl",
)
Enter fullscreen mode Exit fullscreen mode

Hyperparameter Presets for Common Models

from fine_tuning.presets import get_preset

# Load optimized defaults for Llama 3.1 8B on a single A100
config = get_preset(
    model="llama-3.1-8b",
    gpu="a100-40gb",
    task="chat",           # chat | instruction | completion
)
print(config.learning_rate)   # 2e-4
print(config.batch_size)      # 4
print(config.lora_rank)       # 64
Enter fullscreen mode Exit fullscreen mode

Model Merging Strategies

from fine_tuning.merge import ModelMerger

merger = ModelMerger(
    base_model="meta-llama/Llama-3.1-8B",
    adapter_path="outputs/support-bot-v1/best-checkpoint",
    merge_strategy="ties",     # linear | ties | dare
    output_path="models/support-bot-v1-merged",
    output_format="safetensors",
)
merger.merge()
Enter fullscreen mode Exit fullscreen mode

Configuration

# fine_tuning_config.yaml
dataset:
  source: "data/training_data.jsonl"
  format: "sharegpt"
  max_length: 2048
  train_split: 0.9
  val_split: 0.08
  test_split: 0.02
  preprocessing:
    remove_duplicates: true
    min_length: 10              # Skip very short examples
    max_length: 4096            # Skip very long examples
    filter_language: "en"

training:
  base_model: "meta-llama/Llama-3.1-8B"
  method: "qlora"
  quantization_bits: 4
  lora:
    rank: 64
    alpha: 128
    dropout: 0.05
    target_modules: ["q_proj", "v_proj", "k_proj", "o_proj"]
  optimizer: "adamw_8bit"
  learning_rate: 2e-4
  lr_scheduler: "cosine"
  batch_size: 4
  gradient_accumulation_steps: 4
  epochs: 3
  warmup_ratio: 0.05
  max_grad_norm: 1.0
  save_steps: 100
  eval_steps: 50
  early_stopping_patience: 5

merge:
  strategy: "linear"            # linear | ties | dare
  output_format: "safetensors"  # safetensors | gguf | onnx

monitoring:
  log_to: "tensorboard"         # tensorboard | wandb | csv
  log_dir: "logs/"
  track_gpu_utilization: true
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Start with QLoRA on small rank — Begin with rank=16, evaluate, then increase to 32/64 if quality is insufficient. Higher rank = more parameters = slower training.
  2. Clean your data ruthlessly — Fine-tuning amplifies data quality issues. Remove duplicates, fix formatting, and validate every example.
  3. Use a validation set — Always hold out 10% for validation. Watch val loss — if it diverges from train loss, you're overfitting.
  4. Match the chat template — Use the exact chat template (system/user/assistant tags) that the base model was trained with.
  5. Don't over-train — 1-3 epochs is usually sufficient for LoRA. More epochs often leads to catastrophic forgetting of base model capabilities.
  6. Evaluate on real tasks — Loss numbers alone don't tell the full story. Test the fine-tuned model on actual use cases before deploying.
  7. Version your datasets — Hash your training data and log it with each experiment. Reproducibility requires knowing exactly what data was used.

Troubleshooting

Problem Cause Fix
Training loss doesn't decrease Learning rate too low or data format mismatch Try 5e-4 learning rate; verify chat template matches base model
CUDA out of memory Batch size too large for GPU VRAM Reduce batch_size to 1 and increase gradient_accumulation_steps
Fine-tuned model gives worse results than base Overfitting or bad training data Reduce epochs to 1, increase dataset size, check data quality
Merged model produces garbage output Wrong merge strategy or base model version mismatch Verify exact base model version matches training; try linear merge

This is 1 of 11 resources in the AI Builder Pro toolkit. Get the complete [Fine-Tuning Pipeline] with all files, templates, and documentation for $59.

Get the Full Kit →

Or grab the entire AI Builder Pro bundle (11 products) for $169 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)