Parth Sarthi Bissa

Posted on Jun 6

Large language models, explained simply 🚀

#ai #mcp #llm #machinelearning

You’ve used ChatGPT. You’ve heard people call it ‘just autocomplete.’ But if that’s true — why does it write better than most people you know? The secret is LLM or Large Language Models. It might sound as a fancy term used for some data centre but its the core of Generative AI systems. Each of the Generative AI service are using LLMs. This post will introduce to LLM in a very simplest way possible.

First of all, You don’t need to be an engineer to understand this. In fact, most engineers can’t explain it simply either. But here’s why you should care: the people who understand how this works are already making better decisions with it than everyone else.

LLMs appear in your daily life whenever you use Gmail’s Smart Compose, Grammarly’s suggestions, or Google Search’s AI‑generated answers. You also meet them in chatbots on websites, voice assistants, and tools like Microsoft Copilot that help write emails, documents, or code inside familiar apps.

Large Language Model (LLM) is a type of AI Model that has ability to process natural‑language inputs and generate human‑like responses.
In simpler words: an LLM can read, understand, and generate human language.

It “reads” text (like a sentence or a question) and converts it into a form the computer can work with.
It “understands” the meaning and context enough to answer questions, explain ideas, or continue a conversation.
Then it “writes” back in natural language, choosing the most likely next words to form a coherent reply. What actually happens when you hit send?

The first thing the model does is tear your sentence apart….

The first thing the model does is break your sentence down into smaller pieces called tokens — a step known as tokenization. This process takes the input text and, depending on the underlying algorithm, splits it into words, subwords, or punctuation units that the model can process numerically.

Let’s say you ask any Generative AI : Hi, How are you?

This input will be torn apart (Tokenized) as:

“Hi” — “,” — “ How” — “ are” — “ you” — “?”

You see how a simple message turned into 6 tokens, and this is just an example how tokens are created out of your inputs.

So now the model has your sentence split into pieces. But here’s the problem — computers don’t understand pieces of text. They only understand numbers. So how does your ‘Hi’ or ‘How’ become something a machine can actually work with?
That’s where embeddings play their part . . .

Now the tokens are converted into a list of numbers (vectors) that captures its meaning and relationships to other words. These are called Embeddings. Instead of dealing with raw text such as “king” or “apple” , the model converts each piece into a list of numbers (a vector) that captures its meaning and relationships to other words.

Now if we just revisit the Tokens we had, their Embeddings will be as follows:

“Hi” — “,” — “ How” — “ are” — “ you” — “?”
12194, 11, 3253, 553, 481, 30

The numbers mapped to tokens (the embeddings) are not completely random, but they are initially chosen in a structured, learnable way and then adjusted during training. These values depend entirely on the specific model and its training process, so different models will assign different numerical vectors to the same token.

So now the model has a long list of numbers. But numbers alone mean nothing without context. ‘Hi’ and ‘Hello’ are two separate numbers — but the model needs to know they belong together, and that together they mean something specific. How does it figure that out?

Here comes the role of Transformers

Think of it like a room full of people where everyone is talking to everyone else simultaneously. Each word in your sentence is "asking" every other word — how relevant are you to me? The word "bank" asks "river" and "money" and "account" — and based on your full sentence, decides which one matters. This mechanism is called self-attention, and it's what makes modern LLMs dramatically better than everything that came before them.

By the time your sentence has passed through the Transformer, every single token is no longer just a number. It's a number that understands its relationship to every other number in the sentence. Now all the embeddings are very well aware of their relation with others.

This context-aware output is what gets handed to the next stage — where the model finally generates a response.

So now the model has your sentence — fully broken down, converted to numbers, and loaded with context. You’d expect it to now “think” of an answer and write it out. But that’s not what happens.

It predicts one word at a time.

The model looks at everything you’ve written, asks “what word most likely comes next?” — picks one, adds it to the sequence, then repeats the exact same process. Over and over, word by word, until the response is complete. There’s no plan. No outline. No destination in mind. Just one prediction at a time, each one informed by everything that came before it.

There’s actually a setting behind this called temperature. Low temperature means the model plays it safe — picking the most statistically likely word every time. High temperature means it takes risks — picking less obvious words, which makes responses feel more creative, but also less reliable. When ChatGPT sounds poetic, someone turned that dial up.

But here’s the part that changes how you should think about these tools forever:

This is exactly why LLMs make things up. They’re not lying. They’re not guessing randomly. They’re doing precisely what they were built to do — predicting the most statistically likely next word. Sometimes that prediction is a fact. Sometimes it’s a very confident-sounding fiction. The model cannot tell the difference between the two.

Here’s what’s wild — the model that writes your cover letter and the one that passed the bar exam are doing the exact same thing. Predicting the next token. Nothing more, nothing less.

So Now, We can bust some Myths about LLM

It is not searching the internet. Unless explicitly connected to a search tool, an LLM has no idea what happened yesterday.
It is not thinking. There’s no reasoning happening in the way your brain reasons
It is not remembering you. Every new conversation starts completely blank. Last but not the least

My Opinion

I find it more exciting — not less — knowing that there’s no magic here. Just math, scale, and a lot of clever engineering. That’s somehow more impressive than magic

And A Question for you guys,

Now that you know it’s predicting words, not thinking — will you use it differently?

DEV Community

Large language models, explained simply 🚀

Top comments (0)