After working for several years on state-of-the-art models and deploying them in real-world applications, I wanted to revisit the fundamentals.
So I built a GPT-like model completely from scratch β including the tokenizer and transformer architecture β using pure PyTorch.
βοΈ This post walks through my approach, architecture, training, and inference pipeline using a custom Shakespeare dataset.
The goal: Understand how GPTs really work under the hood.
π§ Highlights
π Trained on a cleaned corpus of Shakespeare plays
π€ Built a Byte-Pair Encoding (BPE) tokenizer from scratch
π§ Implemented a transformer model using PyTorch (no HF/Transformers)
π Achieved a strong loss curve through tuning and debugging
π Built an end-to-end training + inference pipeline
βοΈ Hosted the model + tokenizer on Hugging Face for public use
π§ Why Build from Scratch?
While Hugging Face and pretrained models are excellent for real-world use, understanding the nuts and bolts of how LLMs work is essential for:
Customizing architectures
Optimizing memory/performance
Working on low-resource or domain-specific tasks
Research and experimentation
π Training Loss Graph
The model was trained for ~15 epochs. You can clearly see how the loss drops β especially after tuning hyperparameters.
π Code & Resources
π Full Article on Medium β includes deep dives on each part
π» GitHub Repo β notebooks, training script, model loading, etc.
π How to Use
πΉ Option 1: Direct Python Script (model download + inference)
python saved_models/load_model.py
Downloads model + tokenizer from Hugging Face Loads them into memory. Ready for predictions
πΉ Option 2: Notebook Execution
Use the end_to_end folder:
1_train_custom_gpt.ipynb β training pipeline
2_predict_with_trained_gpt.ipynb β inference and generation
π Example Output
Input: ROMEO:
Generated: What hast thou done? My love is gone too soon...
The output retains Shakespearean style thanks to custom training.
π Letβs Connect!
If you're working on LLMs, transformers, or AI engineering β Iβd love to connect and collaborate.
π¬ Drop your thoughts or questions in the comments!
Top comments (0)