I wanted to fine-tune an AI model to sound more human.
Not the usual stiff AI tone — something closer to how people actually write
on Reddit. Natural, direct, sometimes blunt. So I decided to fine-tune
Qwen3-8B on Reddit-style data.
The problem was my local PC. Not enough VRAM. So I went looking for a free
GPU solution and found Kaggle.
Fair warning: I made quite a few mistakes along the way.
That's what this post is really about.
Why Kaggle
Kaggle is known as a data science competition platform, but the key thing is:
It gives you free GPU.
- NVIDIA Tesla T4 (15.6GB VRAM)
- 30 hours of GPU per week
- Completely free
One thing to know — Kaggle defaults to internet OFF. You can switch it
on under Settings → Internet, and turning it on costs nothing extra.
I worked with internet OFF, which led to my first mistake.
The Full Flow
1. Prepare a public dataset from Hugging Face
2. Connect the model + dataset in Kaggle
3. Run QLoRA fine-tuning
4. Save the adapter and evaluate
Step 1 — Data Preparation
I wanted Reddit-style data to get that natural, human-sounding tone.
To be clear: I didn't scrape Reddit. Hugging Face already has publicly
available Reddit-based datasets with proper licensing. That's the right
approach.
Some options:
-
webis/tldr-17— Reddit posts + summaries -
reddit— based on the public Reddit archive -
sentence-transformers/reddit-title-body— title/body pairs
The data format I used was ChatML-style JSONL:
{
"messages": [
{"role": "system", "content": "You are a helpful member of r/SideProject."},
{"role": "user", "content": "Just shipped my side project. Nobody's using it."},
{"role": "assistant", "content": "Congrats on shipping. Seriously..."}
]
}
Upload this to Hugging Face as a Dataset and you can connect it directly
in Kaggle.
Step 2 — Kaggle Setup
Connecting the model and dataset
After creating a Kaggle Notebook:
- Settings → Accelerator → GPU T4 x2
- Right panel → Input → Models → search Qwen3-8B → Add
- Right panel → Input → Datasets → add your dataset
Mistake #1: The model path
I connected the model, then wrote this:
MODEL = "Qwen/Qwen3-8B"
tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
Result:
OSError: Can't load the configuration of 'Qwen/Qwen3-8B'.
'[Errno -3] Temporary failure in name resolution'
With internet OFF, it tried to reach HuggingFace and failed. Even though
I had connected the model in Kaggle, passing "Qwen/Qwen3-8B" still
sends a request to the HuggingFace server instead of using the local copy.
Fix: find the real path with glob
Kaggle doesn't clearly tell you where your connected model actually lives.
You have to find it yourself.
import glob
# Find the dataset
matches = glob.glob("/kaggle/input/**/*.jsonl", recursive=True)
assert matches, "No jsonl file found. Check that your dataset is added."
DATASET_PATH = matches[0]
print(f"Dataset: {DATASET_PATH}")
# Dataset: /kaggle/input/datasets/jisungyeom/datafinetune-dataset/finetune_dataset.jsonl
Do the same for the model path, then use that actual path in MODEL.
Skipping this step means hitting FileNotFoundError or the OSError above.
Step 3 — QLoRA Fine-Tuning
Check the environment
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
import torch
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
print(f"PyTorch: {torch.__version__}")
# GPU: Tesla T4
# VRAM: 15.6 GB
# PyTorch: 2.9.0+cu126
Mistake #2: pip install with internet OFF
ERROR: Could not find a version that satisfies the requirement bitsandbytes>=0.46.1
Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
With internet OFF, pip can't reach PyPI. Either turn internet ON first,
or check if the package is already installed in the Kaggle environment.
In my case, the default version worked fine.
Load the model with 4-bit quantization
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
MODEL = "/kaggle/input/your-actual-path-found-with-glob"
MAX_SEQ_LENGTH = 1024 # Reduced from 2048 to save memory
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
MODEL,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
model = prepare_model_for_kbit_training(model)
LoRA config
lora_config = LoraConfig(
r=8, # Reduced from 16 to save memory
lora_alpha=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
LoRA only updates roughly 1% of the total parameters. That's how it fits
in a T4's 15GB VRAM.
Dataset class
Only the assistant turn contributes to the loss. The prefix is masked with -100:
class ChatMLDataset(TorchDataset):
def __init__(self, samples, tokenizer, max_length):
self.data = []
for sample in samples:
messages = sample["messages"]
prefix = tokenizer.apply_chat_template(
messages[:-1], tokenize=False, add_generation_prompt=True
)
full = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=False
)
prefix_ids = tokenizer(prefix, add_special_tokens=False)["input_ids"]
full_enc = tokenizer(
full, add_special_tokens=False,
max_length=max_length, truncation=True,
)
full_ids = full_enc["input_ids"]
labels = [-100] * len(prefix_ids) + full_ids[len(prefix_ids):]
labels = labels[:max_length]
self.data.append({
"input_ids": full_ids,
"attention_mask": full_enc["attention_mask"],
"labels": labels,
})
Training
training_args = TrainingArguments(
output_dir="/kaggle/working/qwen3-reddit-ft",
per_device_train_batch_size=1,
gradient_accumulation_steps=16,
warmup_steps=10,
num_train_epochs=3,
learning_rate=2e-4,
fp16=True,
logging_steps=10,
save_strategy="steps",
save_steps=100,
eval_strategy="steps",
eval_steps=100,
optim="adamw_8bit",
gradient_checkpointing=True,
dataloader_pin_memory=False,
report_to="none",
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=valid_dataset,
data_collator=data_collator,
)
trainer.train()
Step 4 — Save and Evaluate
# Save LoRA adapter
OUTPUT_DIR = "/kaggle/working/qwen3-reddit-ft"
trainer.save_model(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
# Fuse LoRA into the base model
FINAL_DIR = "/kaggle/working/qwen3-reddit-final"
merged_model = model.merge_and_unload()
merged_model.save_pretrained(FINAL_DIR)
tokenizer.save_pretrained(FINAL_DIR)
Evaluation prompts
PROMPTS = [
("r/SideProject", "Just shipped my side project after 6 months. Nobody's using it."),
("r/artificial", "GPT-5 was just released and it's apparently 10x better than Claude"),
("r/webdev", "Should I learn React or just stick with vanilla JS in 2025?"),
("r/LocalLLaMA", "Running a 70B model locally on a 4090, is it worth it?"),
]
for sub, prompt in PROMPTS:
messages = [
{"role": "system", "content": f"You are a helpful member of {sub}."},
{"role": "user", "content": prompt},
]
# generate and print response
What I Actually Learned
518 samples was enough to shift the tone. Train: 466, Valid: 52.
Even with a small dataset, the response style changed noticeably.
Mistakes summary:
- Model path — don't use
"Qwen/Qwen3-8B"directly. Use glob to find the real path first - Internet is OFF by default — pip won't work. Turn it on (free) or use pre-installed packages
- VRAM limits — set
MAX_SEQ_LENGTH=1024,batch_size=1,r=8to fit on T4
Final Thoughts
Fine-tuning isn't that hard. The setup is where the friction is.
Free Kaggle GPU + open model + public dataset = zero cost to get started.
If you've been curious about fine-tuning but assumed you needed expensive
hardware, this stack removes that excuse.
Top comments (0)