I guess no one among us is unaware of how much hype Large Language Models (LLMs) have generated nowadays. In a field where thousands of people are simultaneously discussing, developing, and conducting trainings, staying oblivious to the topic isn’t an option for individuals like myself who prefer not to be left behind. Especially if, like me, you’re a bit anxious, missing out on a development in the field of artificial intelligence can lead to dozens of questions swirling in your mind: “Have I fallen too far behind in the world of AI, or have my competitors surged ahead?” So, what’s the solution? Due to individual differences in learning styles and progress, I find it beneficial to at least keep up with the latest developments and learn using the most logical and reasonable method available to me: current books.
For me, reading a book feels like the opportunity to absorb years of accumulated knowledge from an author’s mind in just a few days. That, to me, translates back as decades of accumulated knowledge from various sources. At least, that’s my take. We mentioned books; why not videos? As I mentioned, this is entirely due to individual differences. While watching an educational video, I feel like I’m undergoing Chinese water torture, constantly fast-forwarding to get it over with, hoping it’ll cover whatever it’s supposed to. This only leads to me not understanding anything. For someone else, this process could be entirely different; as I said, at least for me, it’s like this.
Moreover, I unfortunately can’t take seriously the creators who go to great lengths to make the cover image of educational videos more attention-grabbing, all in the name of getting more likes, while simultaneously seeing videos that completely contradict their content, such as “RAG is dying,” “Transformer is dying?,” “PHP is dying.”
I’ve dragged on the introduction for too long. I’m a software developer, working as a full-stack developer in the ERP industry. Alongside my backend and frontend work, I research artificial intelligence as much as possible to add smart scenarios and solutions to the ERP industry. Additionally, I’m in the thesis phase of my master’s degree and aim to strengthen my academic side further by pursuing a Phd. Perhaps this academic curiosity is directing me towards books and articles rather than videos, but I’m not sure.
Without further ado, as I mentioned at the beginning of the topic, for those curious about how Large Language Models (LLMs) work and what goes on behind the scenes, there are hundreds of blog posts, educational documents, and videos available online. In this blog post of mine, I want to talk about the Super Study Guide: Transformers & Large Language Models book written by Afshine Amidi and Shervine Amidi, which I finished reading yesterday. Since I’ve just finished the book, I find it useful to relay its contents while they’re fresh in my mind, without delving into technical details.
First and foremost, when quickly flipping through the book’s pages (I’m talking about scrolling down the schrool:)), it was initially odd not to see any code examples. Instead of code, the book offers almost a visual feast, filled with clear and explanatory sentences supporting visuals, rather than lengthy texts or unnecessary details. Looking at the book’s sections, we see that it consists of 5 different parts. In the first part, it covers the fundamentals of deep learning such as performance evaluation metrics, bias, variance, optimizers. The second part extensively covers the concept of Embedding, discussing approaches like RNN, LSTM used before the transformer structure, and their respective disadvantages. The third part delves into the transformer architecture that forms the basic logic of LLMs in great detail, explaining not only the attention mechanism but also the encoder-decoder structure with wonderful visuals. Especially for those trying to understand the attention mechanism, it’s definitely beneficial to read this part a few times if the concepts of Query, Key, Value, and the mathematical operations creating a more efficient method than RNN and LSTM do not clearly form concrete things in their heads.
When examining the structure and narration of the book, I found it difficult to distinguish between the 4th and 5th sections in practice, as both discuss the structure and applications of LLMs. I must admit my ignorance here — I wasn’t aware that after the pretraining and fine-tuning phases, there was also a preference tuning phase. This section provides insights into how a model is further refined after fine-tuning through feedback labeled as correct or incorrect.
Additionally, this part covers PEFT (Parameter-Efficient Fine-Tuning) techniques such as LoRA, QLoRA, and Adapters, which allow us to train and fine-tune LLMs even on personal computers. Moreover, it discusses Retrieval-Augmented Generation (RAG), which enables models to gather information from external data sources rather than relying solely on their training data. The book also touches on practical topics frequently needed in real-world projects, such as model compression and reducing model size.
To summarize, this book thoroughly explains methods and technologies from the ground up, ensuring that readers don’t feel lost when grasping new concepts. As I mentioned, since it doesn’t delve too deeply into mathematical and theoretical details, I didn’t find it overwhelming. However, I do think it’s beneficial to take notes while reading and research certain concepts in more depth.
Additionally, it would have been great if the book had included a section on agent structures in LLMs and practical usage examples (I know, I’m asking for a lot!).
I highly recommend this book to anyone curious about artificial intelligence, especially those working with natural language processing. Of course, having some fundamental knowledge of deep learning wouldn’t hurt either.
In my next post, I will be reviewing Building LLMs for Production by Louis-François Bouchard and Louie Peters, which I have just started reading.
Thank you for taking the time to read!
Top comments (0)