DEV Community

Mudasir Habib
Mudasir Habib

Posted on

A Minimal ~9M Parameter Transformer LLM Trained from Scratch

SelfLM — Building a Tiny LLM from Scratch (End-to-End)

LLMs are complex, but not magical — once you break them into components, everything becomes understandable.

It started with a simple question:
“How do models like GPT actually work?”

So I decided to build a smaller version myself — step by step — from dataset generation to tokenization, training, and deployment. Everything is fully open-source.


What This Project Covers

Instead of treating models as black boxes, this project focuses on the entire pipeline:

  • Synthetic dataset generation (~60K samples)
  • Tokenization & preprocessing
  • Transformer architecture (from scratch)
  • Training pipeline
  • Inference & deployment

Highlights

  • Trained in ~5 minutes (Colab T4 GPU)
  • Fully custom LLM (~9M parameters)
  • Hugging Face model + dataset + live Space
  • Serverless deployment using ONNX on Vercel (free tier)
  • Lightweight, browser-friendly inference

Live Demo

https://selflm.vercel.app/docs


Hugging Face Space

https://huggingface.co/spaces/Mudasir-Habib/selflm-demo


Colab Notebook

https://colab.research.google.com/drive/1EyR5mFuHupJWdnJWazvdjU1Bre2rF2RD?usp=sharing


GitHub Repository

https://github.com/Mudasirhabib123/selflm


Customization Feature

One of the most interesting parts:

You can customize the model with your own data by simply:

  • Editing the first cell in the Colab notebook
  • OR modifying src/dataset/data.py

Add your own context, retrain, and instantly get a personalized LLM.


Goal

This project is built for:

  • Learning how LLMs actually work
  • Experimentation with small-scale models
  • Understanding the full pipeline end-to-end

Open Source

Fully open-source and designed to make LLMs accessible, transparent, and understandable.


If you find it useful, consider giving it a star on GitHub.

Top comments (0)