Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Fine-Tuning Pipeline

#ai #llm #machinelearning #python

Fine-Tuning Pipeline

Everything you need to fine-tune open-source LLMs on your own data — from dataset preparation through training to deployment. This pipeline handles LoRA/QLoRA configuration, training data formatting, hyperparameter management, experiment tracking, model merging, and quantized deployment. Designed for teams running fine-tuning on single GPUs or small clusters without deep ML infrastructure expertise.

Key Features

LoRA & QLoRA Training — Parameter-efficient fine-tuning scripts with automatic rank selection, target module detection, and 4-bit quantization support
Dataset Preparation — Convert raw data (CSV, JSON, conversations) into training-ready formats with deduplication, filtering, and train/val/test splits
Hyperparameter Management — Predefined configs for common base models (Llama, Mistral, Phi) with recommended learning rates, batch sizes, and schedules
Training Monitoring — Real-time loss curves, gradient norms, learning rate schedules, and GPU utilization tracking with automatic early stopping
Model Merging — Merge LoRA adapters back into base models with TIES, DARE, and linear merge strategies
Evaluation Suite — Run benchmarks against your test set automatically after training to validate improvement
Deployment Export — Export merged models in GGUF, ONNX, or SafeTensors format for inference serving

Quick Start

from fine_tuning import Pipeline, DatasetConfig, TrainingConfig

# 1. Prepare dataset
dataset = DatasetConfig(
    source="data/customer_support_conversations.jsonl",
    format="sharegpt",           # sharegpt | alpaca | completion
    train_split=0.9,
    val_split=0.1,
    max_length=2048,
    filter_empty=True,
    dedup_threshold=0.95,        # Remove near-duplicate examples
)

# 2. Configure training
training = TrainingConfig(
    base_model="meta-llama/Llama-3.1-8B",
    method="qlora",              # lora | qlora | full
    lora_rank=64,
    lora_alpha=128,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    learning_rate=2e-4,
    batch_size=4,
    gradient_accumulation=4,     # Effective batch size = 16
    epochs=3,
    warmup_ratio=0.05,
    output_dir="outputs/support-bot-v1",
)

# 3. Run pipeline
pipeline = Pipeline(dataset=dataset, training=training)
result = pipeline.run()
print(f"Final val loss: {result.best_val_loss:.4f}")
print(f"Checkpoint: {result.best_checkpoint}")

Architecture

Raw Data (CSV/JSON/JSONL)
         │
         ▼
┌─────────────────┐
│ Dataset Prep    │──── Format, clean, deduplicate, split
└────────┬────────┘
         ▼
┌─────────────────┐
│  Tokenization   │──── Tokenize + pad/truncate to max_length
└────────┬────────┘
         ▼
┌─────────────────┐
│ Training Loop   │──── LoRA/QLoRA with monitoring + checkpoints
│                 │     ↳ Early stopping on val loss plateau
└────────┬────────┘
         ▼
┌─────────────────┐
│  Evaluation     │──── Run test set benchmarks
└────────┬────────┘
         ▼
┌─────────────────┐
│  Model Merge    │──── Merge adapter into base model
└────────┬────────┘
         ▼
┌─────────────────┐
│  Export         │──── GGUF / ONNX / SafeTensors
└─────────────────┘

Usage Examples

Dataset Format Conversion

from fine_tuning.data import DatasetConverter

# Convert from CSV with question/answer columns to ShareGPT format
converter = DatasetConverter()
converter.from_csv(
    path="data/faq.csv",
    user_column="question",
    assistant_column="answer",
    system_prompt="You are a helpful customer support agent.",
    output_path="data/faq_sharegpt.jsonl",
)

# Convert from raw text completions to instruction format
converter.from_completions(
    path="data/code_samples.txt",
    instruction_template="Complete the following code:",
    output_path="data/code_alpaca.jsonl",
)

Hyperparameter Presets for Common Models

from fine_tuning.presets import get_preset

# Load optimized defaults for Llama 3.1 8B on a single A100
config = get_preset(
    model="llama-3.1-8b",
    gpu="a100-40gb",
    task="chat",           # chat | instruction | completion
)
print(config.learning_rate)   # 2e-4
print(config.batch_size)      # 4
print(config.lora_rank)       # 64

Model Merging Strategies

from fine_tuning.merge import ModelMerger

merger = ModelMerger(
    base_model="meta-llama/Llama-3.1-8B",
    adapter_path="outputs/support-bot-v1/best-checkpoint",
    merge_strategy="ties",     # linear | ties | dare
    output_path="models/support-bot-v1-merged",
    output_format="safetensors",
)
merger.merge()

Configuration

# fine_tuning_config.yaml
dataset:
  source: "data/training_data.jsonl"
  format: "sharegpt"
  max_length: 2048
  train_split: 0.9
  val_split: 0.08
  test_split: 0.02
  preprocessing:
    remove_duplicates: true
    min_length: 10              # Skip very short examples
    max_length: 4096            # Skip very long examples
    filter_language: "en"

training:
  base_model: "meta-llama/Llama-3.1-8B"
  method: "qlora"
  quantization_bits: 4
  lora:
    rank: 64
    alpha: 128
    dropout: 0.05
    target_modules: ["q_proj", "v_proj", "k_proj", "o_proj"]
  optimizer: "adamw_8bit"
  learning_rate: 2e-4
  lr_scheduler: "cosine"
  batch_size: 4
  gradient_accumulation_steps: 4
  epochs: 3
  warmup_ratio: 0.05
  max_grad_norm: 1.0
  save_steps: 100
  eval_steps: 50
  early_stopping_patience: 5

merge:
  strategy: "linear"            # linear | ties | dare
  output_format: "safetensors"  # safetensors | gguf | onnx

monitoring:
  log_to: "tensorboard"         # tensorboard | wandb | csv
  log_dir: "logs/"
  track_gpu_utilization: true

Best Practices

Start with QLoRA on small rank — Begin with rank=16, evaluate, then increase to 32/64 if quality is insufficient. Higher rank = more parameters = slower training.
Clean your data ruthlessly — Fine-tuning amplifies data quality issues. Remove duplicates, fix formatting, and validate every example.
Use a validation set — Always hold out 10% for validation. Watch val loss — if it diverges from train loss, you're overfitting.
Match the chat template — Use the exact chat template (system/user/assistant tags) that the base model was trained with.
Don't over-train — 1-3 epochs is usually sufficient for LoRA. More epochs often leads to catastrophic forgetting of base model capabilities.
Evaluate on real tasks — Loss numbers alone don't tell the full story. Test the fine-tuned model on actual use cases before deploying.
Version your datasets — Hash your training data and log it with each experiment. Reproducibility requires knowing exactly what data was used.

Troubleshooting

Problem	Cause	Fix
Training loss doesn't decrease	Learning rate too low or data format mismatch	Try `5e-4` learning rate; verify chat template matches base model
CUDA out of memory	Batch size too large for GPU VRAM	Reduce `batch_size` to 1 and increase `gradient_accumulation_steps`
Fine-tuned model gives worse results than base	Overfitting or bad training data	Reduce epochs to 1, increase dataset size, check data quality
Merged model produces garbage output	Wrong merge strategy or base model version mismatch	Verify exact base model version matches training; try `linear` merge

This is 1 of 11 resources in the AI Builder Pro toolkit. Get the complete [Fine-Tuning Pipeline] with all files, templates, and documentation for $59.

Get the Full Kit →

Or grab the entire AI Builder Pro bundle (11 products) for $169 — save 30%.

Get the Complete Bundle →

DEV Community

Fine-Tuning Pipeline

Fine-Tuning Pipeline

Key Features

Quick Start

Architecture

Usage Examples

Dataset Format Conversion

Hyperparameter Presets for Common Models

Model Merging Strategies

Configuration

Best Practices

Troubleshooting

Related Articles

Top comments (0)