DEV Community

Cover image for Fine-Tuning AI for Free — Kaggle + QLoRA Hands-On Guide
as1as
as1as

Posted on

Fine-Tuning AI for Free — Kaggle + QLoRA Hands-On Guide

I wanted to fine-tune an AI model to sound more human.

Not the usual stiff AI tone — something closer to how people actually write
on Reddit. Natural, direct, sometimes blunt. So I decided to fine-tune
Qwen3-8B on Reddit-style data.

The problem was my local PC. Not enough VRAM. So I went looking for a free
GPU solution and found Kaggle.

Fair warning: I made quite a few mistakes along the way.
That's what this post is really about.


Why Kaggle

Kaggle is known as a data science competition platform, but the key thing is:

It gives you free GPU.

  • NVIDIA Tesla T4 (15.6GB VRAM)
  • 30 hours of GPU per week
  • Completely free

One thing to know — Kaggle defaults to internet OFF. You can switch it
on under Settings → Internet, and turning it on costs nothing extra.

I worked with internet OFF, which led to my first mistake.


The Full Flow

1. Prepare a public dataset from Hugging Face
2. Connect the model + dataset in Kaggle
3. Run QLoRA fine-tuning
4. Save the adapter and evaluate
Enter fullscreen mode Exit fullscreen mode

Step 1 — Data Preparation

I wanted Reddit-style data to get that natural, human-sounding tone.

To be clear: I didn't scrape Reddit. Hugging Face already has publicly
available Reddit-based datasets with proper licensing. That's the right
approach.

Some options:

  • webis/tldr-17 — Reddit posts + summaries
  • reddit — based on the public Reddit archive
  • sentence-transformers/reddit-title-body — title/body pairs

The data format I used was ChatML-style JSONL:

{
  "messages": [
    {"role": "system", "content": "You are a helpful member of r/SideProject."},
    {"role": "user", "content": "Just shipped my side project. Nobody's using it."},
    {"role": "assistant", "content": "Congrats on shipping. Seriously..."}
  ]
}
Enter fullscreen mode Exit fullscreen mode

Upload this to Hugging Face as a Dataset and you can connect it directly
in Kaggle.


Step 2 — Kaggle Setup

Connecting the model and dataset

After creating a Kaggle Notebook:

  • Settings → Accelerator → GPU T4 x2
  • Right panel → Input → Models → search Qwen3-8B → Add
  • Right panel → Input → Datasets → add your dataset

Mistake #1: The model path

I connected the model, then wrote this:

MODEL = "Qwen/Qwen3-8B"
tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
Enter fullscreen mode Exit fullscreen mode

Result:

OSError: Can't load the configuration of 'Qwen/Qwen3-8B'.
'[Errno -3] Temporary failure in name resolution'
Enter fullscreen mode Exit fullscreen mode

With internet OFF, it tried to reach HuggingFace and failed. Even though
I had connected the model in Kaggle, passing "Qwen/Qwen3-8B" still
sends a request to the HuggingFace server instead of using the local copy.

Fix: find the real path with glob

Kaggle doesn't clearly tell you where your connected model actually lives.
You have to find it yourself.

import glob

# Find the dataset
matches = glob.glob("/kaggle/input/**/*.jsonl", recursive=True)
assert matches, "No jsonl file found. Check that your dataset is added."
DATASET_PATH = matches[0]
print(f"Dataset: {DATASET_PATH}")
# Dataset: /kaggle/input/datasets/jisungyeom/datafinetune-dataset/finetune_dataset.jsonl
Enter fullscreen mode Exit fullscreen mode

Do the same for the model path, then use that actual path in MODEL.
Skipping this step means hitting FileNotFoundError or the OSError above.


Step 3 — QLoRA Fine-Tuning

Check the environment

import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

import torch
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
print(f"PyTorch: {torch.__version__}")
# GPU: Tesla T4
# VRAM: 15.6 GB
# PyTorch: 2.9.0+cu126
Enter fullscreen mode Exit fullscreen mode

Mistake #2: pip install with internet OFF

ERROR: Could not find a version that satisfies the requirement bitsandbytes>=0.46.1
Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
Enter fullscreen mode Exit fullscreen mode

With internet OFF, pip can't reach PyPI. Either turn internet ON first,
or check if the package is already installed in the Kaggle environment.
In my case, the default version worked fine.

Load the model with 4-bit quantization

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

MODEL = "/kaggle/input/your-actual-path-found-with-glob"
MAX_SEQ_LENGTH = 1024  # Reduced from 2048 to save memory

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
model = prepare_model_for_kbit_training(model)
Enter fullscreen mode Exit fullscreen mode

LoRA config

lora_config = LoraConfig(
    r=8,           # Reduced from 16 to save memory
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
Enter fullscreen mode Exit fullscreen mode

LoRA only updates roughly 1% of the total parameters. That's how it fits
in a T4's 15GB VRAM.

Dataset class

Only the assistant turn contributes to the loss. The prefix is masked with -100:

class ChatMLDataset(TorchDataset):
    def __init__(self, samples, tokenizer, max_length):
        self.data = []
        for sample in samples:
            messages = sample["messages"]

            prefix = tokenizer.apply_chat_template(
                messages[:-1], tokenize=False, add_generation_prompt=True
            )
            full = tokenizer.apply_chat_template(
                messages, tokenize=False, add_generation_prompt=False
            )

            prefix_ids = tokenizer(prefix, add_special_tokens=False)["input_ids"]
            full_enc = tokenizer(
                full, add_special_tokens=False,
                max_length=max_length, truncation=True,
            )
            full_ids = full_enc["input_ids"]

            labels = [-100] * len(prefix_ids) + full_ids[len(prefix_ids):]
            labels = labels[:max_length]

            self.data.append({
                "input_ids": full_ids,
                "attention_mask": full_enc["attention_mask"],
                "labels": labels,
            })
Enter fullscreen mode Exit fullscreen mode

Training

training_args = TrainingArguments(
    output_dir="/kaggle/working/qwen3-reddit-ft",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    warmup_steps=10,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_strategy="steps",
    save_steps=100,
    eval_strategy="steps",
    eval_steps=100,
    optim="adamw_8bit",
    gradient_checkpointing=True,
    dataloader_pin_memory=False,
    report_to="none",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    data_collator=data_collator,
)

trainer.train()
Enter fullscreen mode Exit fullscreen mode

Step 4 — Save and Evaluate

# Save LoRA adapter
OUTPUT_DIR = "/kaggle/working/qwen3-reddit-ft"
trainer.save_model(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

# Fuse LoRA into the base model
FINAL_DIR = "/kaggle/working/qwen3-reddit-final"
merged_model = model.merge_and_unload()
merged_model.save_pretrained(FINAL_DIR)
tokenizer.save_pretrained(FINAL_DIR)
Enter fullscreen mode Exit fullscreen mode

Evaluation prompts

PROMPTS = [
    ("r/SideProject", "Just shipped my side project after 6 months. Nobody's using it."),
    ("r/artificial", "GPT-5 was just released and it's apparently 10x better than Claude"),
    ("r/webdev", "Should I learn React or just stick with vanilla JS in 2025?"),
    ("r/LocalLLaMA", "Running a 70B model locally on a 4090, is it worth it?"),
]

for sub, prompt in PROMPTS:
    messages = [
        {"role": "system", "content": f"You are a helpful member of {sub}."},
        {"role": "user", "content": prompt},
    ]
    # generate and print response
Enter fullscreen mode Exit fullscreen mode

What I Actually Learned

518 samples was enough to shift the tone. Train: 466, Valid: 52.
Even with a small dataset, the response style changed noticeably.

Mistakes summary:

  • Model path — don't use "Qwen/Qwen3-8B" directly. Use glob to find the real path first
  • Internet is OFF by default — pip won't work. Turn it on (free) or use pre-installed packages
  • VRAM limits — set MAX_SEQ_LENGTH=1024, batch_size=1, r=8 to fit on T4

Final Thoughts

Fine-tuning isn't that hard. The setup is where the friction is.

Free Kaggle GPU + open model + public dataset = zero cost to get started.
If you've been curious about fine-tuning but assumed you needed expensive
hardware, this stack removes that excuse.

Top comments (0)