How ChatGPT and other LLMs work - A 1000 ft view

#machinelearning #ai #llm #beginners

Hi, I am Karan back with a new article on How ChatGPT and other LLMs work, this article will give you a very high level understanding of how this all works not going too deep in concepts today, because i am also learning all this and this is a way to document my learnings through out my journey, there will be series of articles in future on this topic in which I will go deep on many concepts related to AI and ML.

Let's start with basics, what actually AI is and where do ChatGPT like LLMs fall into. So, AI is short for Artificial Intelligence which is a field of computer science which focuses on creating systems to perform tasks that usually requires human intelligence. AI is not new it is there for decades with Machine Learning and Computer Vision, used in recommendations, search engines, classifications etc.

Nowadays, the hot topic in AI is GenAI or Generative AI which generates new content rather than finding or classifying existing content. GenAI contains text gen, image gen, audio gen, video gen etc. Yes, the G in GPT refers to Generative.
The AI which mainly deals with text gen is called as LLM short for Large Language Model. ChatGPT is one of the LLMs in the internet you may had used, there are also others like Google Bard, Anthropic Claude, etc. Today we will have a high level look into LLMs.
So, LLMs consist of three things, we can show it as

LLM = data + architecture + training

Let's assume we are creating a ChatGPT grade LLM named Chiti, we will go step by step

1. Data - Chiti Knowledge Source💡

first we need a massive amount of Text Data for eg. ChatGPT was Pre-trained (P in GPT) on 570GB (1gb ≈ 178m words) of text data, a good chunk of internet. Pre-trained means to initially train a model with large corpus of general data before being fine-tuned or adapted for specific tasks. Pre-training helps the model to learn language patterns, grammar, facts, and some level of reasoning.

Let's say we collected sufficient data by extracting content from the websites, articles, books, etc. and our dataset is ready. Now, comes the most important part creating a Architecture.

2. Architecture - Chiti Brain 🧠.

Here comes the concept of Neural Networks, its like neurons in a human brain. It helps LLMs to understand, learn and respond. When we talk about training a model we talk about training the NN. NN do not understand text, thus we have to convert the data, first in tokens(words or subwords) and then in Embeddings(numbers) this process is called as Encoding (It includes more things), then the encoded data is fed to NN and the response which we get is decoded and showed as response.

There are various types of NNs which have their own purpose. In Modern LLMs we use Transformers(T in GPT), It is a very core and complex concept in LLM, my next article will be on explaining transformers.

3. Training - Chiti Education 🏫.

Model training is the most crucial and expensive part of LLM development, it marks the quality in a LLMs response. In this part we splits our dataset which we have created in first step in two parts (90%-10%), one part for training and second for testing, the training data will contain input and answer both for eg. Input: Tajmahal is in and Answer: Agra. So that the Transformer NN can learn and understand patterns and assign weights and biases to each params in NN (will explain in next article).
Note: the training data is encoded before sending to Transformer NN.

After training is completed, our LLM Chiti will be tested with the test data.

This was the very sky level view of how LLMs work and created, trust me there are many concepts here which i do not mention like Tensors, back propagation, position encodings, layers, etc. I will learn and explain all this in future. If you have any questions do let me know.

Note: Neural Networks require a good maths understanding specially Matrices, so if you are in college, please don't bunk maths lecture.

Thank You,
Karan R Singh