How Large Language Models (LLMs) actually work

#llm #genai #ai

In the first article of this AI series, Why AI struggles with “no” and what that teaches us about ourselves, we explored how language models often misinterpret negation and how those mistakes mirror some very human learning patterns.

In this new piece, we’re going a step further, trying to draw a clear line between human and machine intelligence while answering the following:

Are today’s language models really intelligent in the way we are? Do they reason, reflect, and create from scratch?

The short answer? No.

Language models can sound equally inspired, but their creativity is different. They don’t create from meaning; they predict patterns. When you see an impressive AI-generated paragraph, what you’re reading is the result of billions of statistical predictions that mimic the structure of human expression. The illusion feels real because prediction and creativity share a similar surface: both produce something new from something known.

That difference between meaning and mimicry is what we’ll explore next. The final goal is to help you feel more confident using those incredible tools.

Human intelligence vs. machine prediction

Human intelligence is semantic (we interpret, connect, and assign meaning), while machine intelligence is syntactic; it manipulates form rather than meaning.

When humans create, we draw from more than stored words or patterns. We rely on a lifetime of experiences, emotions, and associations that shape how we express ideas. Our creativity often comes from combining unrelated thoughts or memories into something entirely new.

Psychologists describe this reflective process as System 2 thinking: slow, deliberate reasoning guided by awareness and meaning. It’s what we use when solving problems, making ethical judgments, or forming original connections between ideas.

At least for now, machines don’t do that.

Large Language Models (LLMs) like ChatGPT (by OpenAI), Claude (by Anthropic), Gemini (by Google DeepMind), and Llama (by Meta AI) operate through System 1-like behavior: fast, automatic, and predictive. In a nutshell, they don’t think; they predict what comes next based on patterns found in massive amounts of training data.

Every response you read from an LLM is the result of statistical pattern-matching. It looks at billions of text examples and estimates which token (a fragment of a word or symbol) is most likely to follow your prompt. That’s why these models can sound fluent, even brilliant, without truly understanding what they’re saying.

A study from MIT CSAIL (2024) showed that reasoning skills in large language models are often overestimated. They perform well on familiar tasks but fail when the scenario changes.

Similarly, From System 1 to System 2: A Survey of Reasoning Large Language Models (2025) explains that while models can imitate human logic through learned patterns, they rarely achieve the deliberate, abstract reasoning that defines human thought.

Another paper, Evidence of interrelated cognitive-like capabilities in large language models (2023), reinforces this: the models exhibit behaviors that look intelligent but are still driven by correlation, not comprehension.

Now that we’ve differentiated how humans and LLMs reason, let’s look at how these remarkable systems actually process information and how understanding that can help us get the best out of them.

Short-term memory convos

Every interaction with an AI model is built on tokens, tiny pieces of language that represent words, parts of words, or punctuation. Tokens are how the model processes and predicts what comes next in a given sentence.

For example, the sentence “This is how LLMs break up language” contains nine tokens on the OpenAI GPT-4 Tokenizer. Most English words average about ¾ of a token.

Now, each model has a limit on the number of tokens it can “see” at once. This limit is called the context window, and it works as the model’s short-term memory.

GPT-3.5 handled around 4,000 tokens.
GPT-4 Turbo and Claude 3 Opus expanded that to roughly 128,000.
GPT-4.1 and GPT-5 can now manage up to one million token s.

This expansion allows for long conversations, uploaded documents, and even book-length inputs, but it’s not true memory. Once the window fills up, earlier information is compressed or replaced. The model can’t recall those details unless they’re reintroduced, which explains why AI sometimes contradicts itself or loses track of context: it has simply run out of space to “see” what came before.

Complex instructions, such as those involving negation (“don’t do X unless Y”), also depend on this limited window. If earlier details fall out of view, the model might miss conditions or reverse logic.

The tip here is: when writing prompts, keep instructions focused and reintroduce context when needed. It’s like talking to someone who is taking notes in real time. If you want accuracy, repeat the key points.

Chunking and Retrieval-Augmented Generation (RAG)

Behind the scenes, when an AI model reads a long piece of text, it doesn’t process it all at once. Instead, it breaks the text into smaller pieces called chunks, each one fitting neatly within its context window. This internal process helps the model manage information efficiently without losing track of meaning.

However, when you provide external data, such as uploading a PDF, website links, or connecting to a database, the model needs a way to access it. That’s where Retrieval-Augmented Generation (RAG) comes in.

RAG is an external method that allows a model to retrieve relevant information rather than rely solely on what fits within its window. When you ask a question, the system searches your documents for related sections (chunks) and gives that content back to the model to generate a grounded, more accurate answer.

In short, chunking is how the model processes what’s already inside, while RAG helps it reach outside its memory limits to access knowledge it doesn’t currently hold. Together, they help balance speed, scale, and accuracy.

You can make both processes more effective by preparing your data clearly. Use short sections, descriptive titles, and simple summaries when sharing documents. The better organized your material, the easier it is for the model to retrieve what matters.

Wrapping up

LLMs have changed how we write, learn, and create, but understanding how they work changes how we use them.

They don’t think like humans; they predict patterns based on probability, and when we understand how those patterns are built (through tokens, context windows, chunking, and retrieval), we can guide them more intentionally.

We could say, then, that:

Clarity helps them stay logical: the model only works with what it can “see.” Clear prompts reduce confusion and keep its short-term memory focused.
Structure helps them stay consistent: chunking and retrieval rely on well-organized information to produce coherent results.
And our intent gives their output meaning: while models predict patterns, only humans can add context, emotion, and purpose.

In the end, AI doesn’t replace human creativity or intelligence; it extends it.