<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mudasir Habib</title>
    <description>The latest articles on DEV Community by Mudasir Habib (@mudasirhabib123).</description>
    <link>https://dev.to/mudasirhabib123</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1446387%2F5c2834aa-8c5c-4be1-a134-c3c9ada65ded.jpeg</url>
      <title>DEV Community: Mudasir Habib</title>
      <link>https://dev.to/mudasirhabib123</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mudasirhabib123"/>
    <language>en</language>
    <item>
      <title>A Minimal ~9M Parameter Transformer LLM Trained from Scratch</title>
      <dc:creator>Mudasir Habib</dc:creator>
      <pubDate>Thu, 23 Apr 2026 14:30:13 +0000</pubDate>
      <link>https://dev.to/mudasirhabib123/a-minimal-9m-parameter-transformer-llm-trained-from-scratch-1p21</link>
      <guid>https://dev.to/mudasirhabib123/a-minimal-9m-parameter-transformer-llm-trained-from-scratch-1p21</guid>
      <description>&lt;h2&gt;
  
  
  SelfLM — Building a Tiny LLM from Scratch (End-to-End)
&lt;/h2&gt;

&lt;p&gt;LLMs are complex, but not magical — once you break them into components, everything becomes understandable.&lt;/p&gt;

&lt;p&gt;It started with a simple question:&lt;br&gt;
&lt;strong&gt;“How do models like GPT actually work?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I decided to build a smaller version myself — step by step — from dataset generation to tokenization, training, and deployment. Everything is fully open-source.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Project Covers
&lt;/h2&gt;

&lt;p&gt;Instead of treating models as black boxes, this project focuses on the &lt;strong&gt;entire pipeline&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Synthetic dataset generation (~60K samples)&lt;/li&gt;
&lt;li&gt;Tokenization &amp;amp; preprocessing&lt;/li&gt;
&lt;li&gt;Transformer architecture (from scratch)&lt;/li&gt;
&lt;li&gt;Training pipeline&lt;/li&gt;
&lt;li&gt;Inference &amp;amp; deployment&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Highlights
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Trained in ~5 minutes (Colab T4 GPU)&lt;/li&gt;
&lt;li&gt;Fully custom LLM (~9M parameters)&lt;/li&gt;
&lt;li&gt;Hugging Face model + dataset + live Space&lt;/li&gt;
&lt;li&gt;Serverless deployment using ONNX on Vercel (free tier)&lt;/li&gt;
&lt;li&gt;Lightweight, browser-friendly inference&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Live Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://selflm.vercel.app/docs" rel="noopener noreferrer"&gt;https://selflm.vercel.app/docs&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Hugging Face Space
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/spaces/Mudasir-Habib/selflm-demo" rel="noopener noreferrer"&gt;https://huggingface.co/spaces/Mudasir-Habib/selflm-demo&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Colab Notebook
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://colab.research.google.com/drive/1EyR5mFuHupJWdnJWazvdjU1Bre2rF2RD?usp=sharing" rel="noopener noreferrer"&gt;https://colab.research.google.com/drive/1EyR5mFuHupJWdnJWazvdjU1Bre2rF2RD?usp=sharing&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  GitHub Repository
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Mudasirhabib123/selflm" rel="noopener noreferrer"&gt;https://github.com/Mudasirhabib123/selflm&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Customization Feature
&lt;/h2&gt;

&lt;p&gt;One of the most interesting parts:&lt;/p&gt;

&lt;p&gt;You can &lt;strong&gt;customize the model with your own data&lt;/strong&gt; by simply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Editing the first cell in the Colab notebook&lt;/li&gt;
&lt;li&gt;OR modifying &lt;code&gt;src/dataset/data.py&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Add your own context, retrain, and instantly get a personalized LLM.&lt;/p&gt;




&lt;h2&gt;
  
  
  Goal
&lt;/h2&gt;

&lt;p&gt;This project is built for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Learning how LLMs actually work&lt;/li&gt;
&lt;li&gt;Experimentation with small-scale models&lt;/li&gt;
&lt;li&gt;Understanding the full pipeline end-to-end&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Open Source
&lt;/h2&gt;

&lt;p&gt;Fully open-source and designed to make LLMs &lt;strong&gt;accessible, transparent, and understandable&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;If you find it useful, consider giving it a star on GitHub.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>llm</category>
      <category>challenge</category>
    </item>
  </channel>
</rss>
