Fine-Tuning Pipeline
Everything you need to fine-tune open-source LLMs on your own data — from dataset preparation through training to deployment. This pipeline handles LoRA/QLoRA configuration, training data formatting, hyperparameter management, experiment tracking, model merging, and quantized deployment. Designed for teams running fine-tuning on single GPUs or small clusters without deep ML infrastructure expertise.
Key Features
- LoRA & QLoRA Training — Parameter-efficient fine-tuning scripts with automatic rank selection, target module detection, and 4-bit quantization support
- Dataset Preparation — Convert raw data (CSV, JSON, conversations) into training-ready formats with deduplication, filtering, and train/val/test splits
- Hyperparameter Management — Predefined configs for common base models (Llama, Mistral, Phi) with recommended learning rates, batch sizes, and schedules
- Training Monitoring — Real-time loss curves, gradient norms, learning rate schedules, and GPU utilization tracking with automatic early stopping
- Model Merging — Merge LoRA adapters back into base models with TIES, DARE, and linear merge strategies
- Evaluation Suite — Run benchmarks against your test set automatically after training to validate improvement
- Deployment Export — Export merged models in GGUF, ONNX, or SafeTensors format for inference serving
Quick Start
from fine_tuning import Pipeline, DatasetConfig, TrainingConfig
# 1. Prepare dataset
dataset = DatasetConfig(
source="data/customer_support_conversations.jsonl",
format="sharegpt", # sharegpt | alpaca | completion
train_split=0.9,
val_split=0.1,
max_length=2048,
filter_empty=True,
dedup_threshold=0.95, # Remove near-duplicate examples
)
# 2. Configure training
training = TrainingConfig(
base_model="meta-llama/Llama-3.1-8B",
method="qlora", # lora | qlora | full
lora_rank=64,
lora_alpha=128,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
learning_rate=2e-4,
batch_size=4,
gradient_accumulation=4, # Effective batch size = 16
epochs=3,
warmup_ratio=0.05,
output_dir="outputs/support-bot-v1",
)
# 3. Run pipeline
pipeline = Pipeline(dataset=dataset, training=training)
result = pipeline.run()
print(f"Final val loss: {result.best_val_loss:.4f}")
print(f"Checkpoint: {result.best_checkpoint}")
Architecture
Raw Data (CSV/JSON/JSONL)
│
▼
┌─────────────────┐
│ Dataset Prep │──── Format, clean, deduplicate, split
└────────┬────────┘
▼
┌─────────────────┐
│ Tokenization │──── Tokenize + pad/truncate to max_length
└────────┬────────┘
▼
┌─────────────────┐
│ Training Loop │──── LoRA/QLoRA with monitoring + checkpoints
│ │ ↳ Early stopping on val loss plateau
└────────┬────────┘
▼
┌─────────────────┐
│ Evaluation │──── Run test set benchmarks
└────────┬────────┘
▼
┌─────────────────┐
│ Model Merge │──── Merge adapter into base model
└────────┬────────┘
▼
┌─────────────────┐
│ Export │──── GGUF / ONNX / SafeTensors
└─────────────────┘
Usage Examples
Dataset Format Conversion
from fine_tuning.data import DatasetConverter
# Convert from CSV with question/answer columns to ShareGPT format
converter = DatasetConverter()
converter.from_csv(
path="data/faq.csv",
user_column="question",
assistant_column="answer",
system_prompt="You are a helpful customer support agent.",
output_path="data/faq_sharegpt.jsonl",
)
# Convert from raw text completions to instruction format
converter.from_completions(
path="data/code_samples.txt",
instruction_template="Complete the following code:",
output_path="data/code_alpaca.jsonl",
)
Hyperparameter Presets for Common Models
from fine_tuning.presets import get_preset
# Load optimized defaults for Llama 3.1 8B on a single A100
config = get_preset(
model="llama-3.1-8b",
gpu="a100-40gb",
task="chat", # chat | instruction | completion
)
print(config.learning_rate) # 2e-4
print(config.batch_size) # 4
print(config.lora_rank) # 64
Model Merging Strategies
from fine_tuning.merge import ModelMerger
merger = ModelMerger(
base_model="meta-llama/Llama-3.1-8B",
adapter_path="outputs/support-bot-v1/best-checkpoint",
merge_strategy="ties", # linear | ties | dare
output_path="models/support-bot-v1-merged",
output_format="safetensors",
)
merger.merge()
Configuration
# fine_tuning_config.yaml
dataset:
source: "data/training_data.jsonl"
format: "sharegpt"
max_length: 2048
train_split: 0.9
val_split: 0.08
test_split: 0.02
preprocessing:
remove_duplicates: true
min_length: 10 # Skip very short examples
max_length: 4096 # Skip very long examples
filter_language: "en"
training:
base_model: "meta-llama/Llama-3.1-8B"
method: "qlora"
quantization_bits: 4
lora:
rank: 64
alpha: 128
dropout: 0.05
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj"]
optimizer: "adamw_8bit"
learning_rate: 2e-4
lr_scheduler: "cosine"
batch_size: 4
gradient_accumulation_steps: 4
epochs: 3
warmup_ratio: 0.05
max_grad_norm: 1.0
save_steps: 100
eval_steps: 50
early_stopping_patience: 5
merge:
strategy: "linear" # linear | ties | dare
output_format: "safetensors" # safetensors | gguf | onnx
monitoring:
log_to: "tensorboard" # tensorboard | wandb | csv
log_dir: "logs/"
track_gpu_utilization: true
Best Practices
-
Start with QLoRA on small rank — Begin with
rank=16, evaluate, then increase to 32/64 if quality is insufficient. Higher rank = more parameters = slower training. - Clean your data ruthlessly — Fine-tuning amplifies data quality issues. Remove duplicates, fix formatting, and validate every example.
- Use a validation set — Always hold out 10% for validation. Watch val loss — if it diverges from train loss, you're overfitting.
- Match the chat template — Use the exact chat template (system/user/assistant tags) that the base model was trained with.
- Don't over-train — 1-3 epochs is usually sufficient for LoRA. More epochs often leads to catastrophic forgetting of base model capabilities.
- Evaluate on real tasks — Loss numbers alone don't tell the full story. Test the fine-tuned model on actual use cases before deploying.
- Version your datasets — Hash your training data and log it with each experiment. Reproducibility requires knowing exactly what data was used.
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| Training loss doesn't decrease | Learning rate too low or data format mismatch | Try 5e-4 learning rate; verify chat template matches base model |
| CUDA out of memory | Batch size too large for GPU VRAM | Reduce batch_size to 1 and increase gradient_accumulation_steps
|
| Fine-tuned model gives worse results than base | Overfitting or bad training data | Reduce epochs to 1, increase dataset size, check data quality |
| Merged model produces garbage output | Wrong merge strategy or base model version mismatch | Verify exact base model version matches training; try linear merge |
This is 1 of 11 resources in the AI Builder Pro toolkit. Get the complete [Fine-Tuning Pipeline] with all files, templates, and documentation for $59.
Or grab the entire AI Builder Pro bundle (11 products) for $169 — save 30%.
Top comments (0)