[Book Review] Build a Large Language Model (From Scratch) by Sebastian Raschka

#llm #machinelearning

TL; DR

A very practical book to learn the general concepts of LLMs
- Saving tons of time from having to search for materials for many topics separately
Very informative but somewhat intensive; must take some time to follow the book while getting your hands dirty with Python code

Overall Review

In the era of LLMs, there seem to be a strong demand for understanding them. However, like other "hot" fields, there are so many materials about LLMs and it is not easy to find a solid resource. Either they focus on how to use LLMs (yes, you need OPEN_AI_KEY=your_own_openai_key!) or fill entire pages with pure texts and diagrams explaining what a transformer architecture is.

I know, the field is literally overwhelming. Developing and serving LLMs is one of the most fast-moving fields both in academia and industry, and there are so many parts to cover, from bare-metal hardware to user experiences on edge devices. Considering tons of papers and codebases out there, I doubted anyone would dare to touch the topic so thoroughly, such as building an LLM from scratch.

But here we are - Sebastian Raschka, PhD
has accomplished such tremendous work in his book Build a Large Language Model (From Scratch) .

I know there are already thousands, if not millions, of great reviews of the book, so I would like to briefly share my takeaways from reading it over the last few days.

With only a basic understanding of ML and Python coding, you can grasp the general concepts of how an LLM works from this book. The code examples are not overly technical and clear to understand.
The explanations are very friendly and thorough. As you work through the book and type out the Python code yourself, you can tell whether you're going in the right direction. However, because the explanations are so detailed, reading through the book make take more time than you expect, given the book's length.
The book covers core concepts of modern LLMs - token embedding, attention and the transformer architecture, deep learning and neural networks, pre- and post-training(or fine tuning). After finishing the book, I feel more ready to understand articles and posts about topics like attention or KV-cache.
Personally, it was a very good review of deep learning and neural networks. For people who have no idea what PyTorch is, one of the appendices is dedicated to introducing PyTorch. I highly recommend reading it before diving into chapter 1.

My 2 Cents for Those Who Haven't Read the Book Yet

My opinion on the prerequisites:
- Python: I don't think you need to worry too much about your proficiency with the language. Reading a general introductory book on the language is sufficient. The book doesn't use very sophisticated Python techniques, such as writing your own context managers or decorators.
- Math, Deep Learning, and PyTorch: In addition to basic knowledge of linear algebra, I feel hands-on experience with matrix computation is essential. Especially when it comes to constructing your own attention algorithm, complex tensor multiplications could be somewhat confusing, and you should keep track of the dimension of each of nn.Linear layers.
Still, running the code on local Apple machines is not desirable. I used my Mac M3 Pro with the MPS backend, but a few computation results were different from what those shown in the book. I would try using Google Colab or other similar services on the second read.
Appendices are not negligible. I think the one about LoRA is a must-read. Also, I find the ones from the book's Github Repo are extremely useful to keep up with the recent LLM research and developments.

That's it! This is my short review of the book, Build a Large Language Model (From Scratch) . I hope you enjoy reading it as much as I did!