I trained my own LLM and published it on HuggingFace

#sideprojects #ai #machinelearning #finetuning

This is the post where things got real. Training an actual language model, watching the loss go down, pushing it to HuggingFace with my name on it.

The plan

I couldn't afford to train from scratch — that takes thousands of GPU hours and costs thousands of dollars. Instead I used fine-tuning: take an existing pre-trained model and train it further on my medical data.

The model I chose: facebook/opt-1.3b — 1.3 billion parameters, open source, no access restrictions.

The technique: LoRA (Low-Rank Adaptation) — instead of updating all 1.3 billion parameters, LoRA adds small trainable layers on top and only trains those. You go from training 1.3 billion parameters to training about 4 million. Same result, 100x cheaper.

Why Google Colab

My laptop has no GPU. Training even a small LLM on CPU takes days. Google Colab gives you a free Tesla T4 GPU with 15GB of memory. You get 30 hours per week for free. This is what I used.

The training code

The key parts:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer, SFTConfig

# Load base model
model = AutoModelForCausalLM.from_pretrained("facebook/opt-1.3b")

# Add LoRA adapters
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)

# Train
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    args=SFTConfig(num_train_epochs=3, learning_rate=2e-4)
)
trainer.train()

The results

Training took 1.5 hours on the free T4 GPU. Here's what the loss looked like:

Step 100:  Loss 1.163
Step 500:  Loss 0.994
Step 1000: Loss 0.967
Step 1700: Loss 0.944  ← training complete

Loss going down means the model is learning. Both training and validation loss decreased together, which means the model generalized rather than just memorizing.

Publishing to HuggingFace

model.push_to_hub("Yakhilesh/medmind-opt-medical")
tokenizer.push_to_hub("Yakhilesh/medmind-opt-medical")

That's it. My model is now publicly available at:
huggingface.co/Yakhilesh/medmind-opt-medical

Anyone can download and use it. The adapter weights are only 12.6MB — small because LoRA only saves the adapter, not the entire base model.

What I actually learned

Fine-tuning is more about data quality than model architecture. My 1.3B model trained for 1.5 hours learned genuine medical patterns. The loss numbers prove it.

DEV Community