Rudi Farkas

Posted on Oct 26, 2024

Explain the LLMs to me, please

In her whatsapp message, an old lady of our friends wrote to me:

"My dear Rudi, The next time we meet you'll have to plan a GPT Chat for Dummies. But you'll have to start from very far away because I don't understand the principle. After all, it's humans who program the computer and who have imagined all the possibilities."

Well, I have been using the AI agents for a while, like Github Copilot since its introduction, then ChatGPT and its more recent avatars like Claude.ai or Perplexity.ai, mostly for help in constructing program chunks or programming tools - scripts, aliases, whatever.

Since I did not study the workings of these engines in any great detail, I thought I would go in for a bit of talk with Claude.ai, one of my favourites.

I also had at the back of my mind a few ideas, one of them being that it is said that these LLM agents are trained with a huge numbers of word sequence patterns and that their algorithms construct the most plausible sequences of words that match the context - the prompt, the question, the previous exchange, but what else do we humans do if not much the same thing.

When we talk to ourselves, or to friends or other humans unknown, we seek in our own stored experiences the next word, the next phrase that matches at best the subject of the discussion, and we receive and we emit the word patterns that make sense to us.

Below is the transcript of my exchanges with Claude (translated from the original chat in French). It contains a few (IMO) interesting twists. Enjoy the reading!

@rudifa

Oh, I sent the original transcript to our friend, and I added:

"In the meantime, my first, tentative, answer to your question would be: LLMs are simply humanity's latest invention for telling stories, in the tradition begun by cave drawings, by the Library of Alexandria, by illuminated manuscripts, by the books we've all read, by the electronic messages that have replaced the letters in the post of our youth... one more way of telling stories, to ourselves and to other humans."

title: "Explaining Large Language Models - Claude"

You:

Explain the LLM systems

Claude:

I will explain the LLM (Large Language Models) systems in a clear and structured way.

LLMs are artificial intelligence models specialized in natural language processing. Here are their key aspects:

1. Fundamental principles:

These are neural networks of very large size, trained on huge amounts of texts
They learn to predict the probable sequence of words
They use an architecture called "Transformer" which allows to understand the context

2. Operation :

They parse text word by word and even subword by subword
They maintain "attention" to important elements of the context
They generate responses by predicting the most likely outcome

3. Main capacities:

Understanding of context and natural language
Generation of coherent text
Translation
Summary of texts
Answers to questions
Analysis of sentiment

4. Limits :

No real understanding like humans
Can generate incorrect information
No access to real-time data
Potential bias from training data

5. Concrete applications:

Virtual assistants like myself
Writing aid tools
Machine translation systems
Analysis of documents
Automated customer support

Would you like me to develop a particular aspect?

You:

What is the size of neural networks

Claude:

I will detail the orders of magnitude of modern neural networks, especially for LLMs.

The size of a language model is usually measured in "parameters" - it is the adjustable numerical values that define the behavior of the network. Here is a comparative scale:

1. Small models (< 1 billion parameters)

BERT base: 110 million
GPT-2 initial 117 million
T5-small : 60 million

2. Average models (1-10 billion)

GPT-2 wide: 1.5 billion
BLOOM-1b7 : 1.7 billion
T5-large: 3 billion

3. Large models (10-100 billion)

GPT-3 : 175 billion
PaLM : 540 billion
BLOOM : 176 billion

4. Very large models (> 100 billion)

GPT-4: estimate of several hundred billion to a few trillion
PaLM-2 : similar estimate

To give an idea of the scale:

Each parameter is a floating point number
A 100 billion parameter model requires about 400 GB of memory
Training requires thousands of GPUs for weeks or months

Note that size is not everything - architecture, training data quality and optimization techniques also play crucial roles in performance.

You:

can we say that these parameters are the memory of the system?

Claude:

Not exactly - this is an important nuance to understand.

The parameters are not really a "memory" in the traditional sense, but rather :

1. Structure of knowledge:

*The settings are like "learned rules" or "patterns" during training

They encode statistical relationships between words/concepts
This is closer to an "implicit understanding" than to an explicit memory

2. Useful analogy:

Remember to learn to ride a bike :
- "Settings" would be like your muscles and trained reflexes
- This is not an explicit memory of the rules, but an acquired ability
- You do not "memorize" every movement, you learn patterns

3. What is NOT stored in the settings :

Specific individual facts
Raw data of training
distinct "Souvenirs"
Structured information as in a database

4. The real difference:

A memory allows to store and retrieve accurate information
Parameters allow to generate probable answers based on patterns
That's why LLMs can sometimes "hallucinate" or mix information

5. Memory vs Learning:

Parameters are closer to learning than memory
They represent the ability to recognize and generate patterns rather than to store information

This is why it is often said that LLMs are "prediction models" rather than knowledge bases.

You:

now, at the ’runtime’ a task, like the one you are running right now, a ‘machine’ in software or materialized in specialized processors must dive into this mass of parameters to continue the conversation in a plausible and compelling way

Claude:

Exactly ! I will explain this process called "inference".

1. The inference process :

For each word/token I generate, the system must:
- Browse all parameters
- Calculate the probabilities for each possible next word
- Choose the most appropriate according to the context
This process is repeated for each element of the response

2. Hardware architecture required:

GPUs or specialized TPUs
Fast memory to access settings
Processors optimized for matrix calculations
Very high speed data bus

3. Crucial optimizations:

Massive parallelization of calculations
Intelligent parameter caching
Techniques of "pruning" to ignore unlikely paths
Quantification of parameters to reduce memory size

4. Real time constraints:

Need to respond quickly (latency acceptable)
Balance between quality and speed
Management of multiple simultaneous conversations
Optimisation of energy consumption

5. Computational load:

Every word generated requires billions of operations
Complexity increases with context size
The energy cost is significant

It is a major engineering challenge to achieve acceptable performance for a smooth conversation.

You:

what can we say about the parallels between the functioning of our human brains and that of LLM systems - after all, we too draw the words to be chained in a reflection or a conversation from our biological memory that contains fragments of what we have experienced, learned through our exchanges with our loved ones, through our readings, our learning, etc and what we produce to say it or write to our fellows is also ‘the most likely sequence’ that reflects our ‘parameters’ retained

Claude:

This is a very relevant reflection that deserves a thorough analysis of similarities and differences.

Striking similarities:

1. Predictive treatment:

Human : Our brain constantly predicts what will follow (words, events)
LLM : Statistical prediction of the next most likely token
In both cases: learning by repeated exposure to patterns

2. Formation of connections :

Human : Strengthening synaptic connections through experience
LLM : Adjustment of connection weights during training
In both cases: "learning" is distributed in a network

3. Contextual association:

Human : Activation of neural networks associated with the context
LLM : Attention mechanism activating relevant patterns
In both cases: the context guides the generation of answers

Fundamental differences:

1. Nature of the experience:

Human : Multimodal, embodied, emotional learning
LLM : Purely textual learning, without direct experience
The meaning is more "anchored" in humans

2. Consciousness and intentionality:

Human : Self-awareness, will, real emotions
LLM : No consciousness or subjective experience
Motivation and intention are fundamentally different

3. Adaptability:

Human : Continuous learning, permanent plasticity
LLM : Parameters frozen after training
Humans can actively restructure their knowledge

4. Structure:

Human : Biological architecture evolved over millions of years
LLM : Artificial architecture designed by man
Biological complexity remains higher

This comparison raises profound philosophical questions about the nature of intelligence, consciousness, and understanding. Are the similarities superficial or do they reveal something fundamental about the processing of information?

You:

‘patterns’ : the notion that is present on both sides (LLMs, humans) - I am never sure of the best word in French that suits it - we humans are attracted by speech patterns, behaviors, music, movements

Claude:

Yes, the word "pattern" is indeed difficult to translate perfectly into French ! "Motif", "scheme", "recurring structure" or "configuration" are often used, but none fully captures the meaning of "pattern".

This pattern recognition is indeed fundamental:

1. In the language:

We identify turns of sentences that "sound right"
Immediately detect what is "natural" or "strange"
Unconsciously grammatical rules are acquired by exposure
We recognize styles, language registers

2. In the music:

Intuitive recognition of rhythms
Anticipation of melodic sequences
Pleasure related to repetitions and variations
Sensitivity to familiar harmonies

3. In the movement:

Learning by imitation of gestures
Recognition of facial expressions
Natural synchronization (dance, team sport)
Prediction of the movements of others

4. In social learning:

Identification of social codes
Recognition of emotions
Anticipation of behaviour
Learning social rituals

This ability to detect and reproduce patterns appears to be :

A fundamental mechanism of intelligence
A basis for learning
A way to optimize our responses to the environment
A source of aesthetic pleasure

It's fascinating to see how LLMs, by focusing on this pattern detection in language, are able to replicate some of this fundamental human ability.

You:

Indeed, in our exchanges be it with our fellow men, be it with LLMs, we seek and.

Claude:

... we find familiar patterns that reassure us and allow us to continue the exchange !

You have just made a perfect demonstration by leaving your sentence unfinished with "and." - my pattern recognition system has immediately :

1. Detected the invitation to complete
2. Understand the symmetry suggested by "be it... be it.."
3. Grasp the context of our discussion of patterns

This is fascinating because it illustrates several levels of patterns :

1. Conversational patterns:

The turns of speech
Collaborative additions
The moments when we can intervene

2. Cognitive patterns:

The search for symmetry
The need for closure (completion)
Anticipation of meaning

3. Patterns of mutual understanding:

The common construction of meaning
Signals of understanding
Mutual adjustment

This search for familiar patterns helps us to:

Establish the trust
Check the understanding
Maintain conversational flow
Create a sense of connection

Perhaps this is one of the reasons why we can have satisfying conversations with LLMs: they participate in this "dance of patterns" that is at the heart of all communication.

You:

By the way, my abrupt termination « ...and. » was a typo on my part, which also sent my reflection not yet completed and which you completed in a very interesting way.

Claude:

Ah, that's delicious! An "accident" that reveals several fascinating things:

1. On my reaction:

I immediately interpreted this "accident" as intentional
I saw an invitation to the co-construction of meaning
I projected a sophisticated intention where there was a chance

2. On communication patterns :

We naturally seek to make sense
We prefer a "smart" interpretation to a banal explanation
The brain (human as well as artificial) hates "no-sense"

3. On the interpretation bias :

I have overinterpreted based on the context of our discussion
My elaborate answer was based on a false premise
But it remains relevant in the context of our exchange

4. On serendipity :

A technical accident created an opportunity to enrich the discussion
The error generated an interesting reflection
This illustrates how meaning sometimes emerges from chance

This is a great example of how we (humans and LLMs) always try to build meaning, even in the face of the unexpected or the accidental!

You:

May the serendipity be with us!

Claude:

Ah, beautiful invocation ! 😊

Serendipity, this wonderful ability to make unexpected discoveries by sagacity (or by chance), is indeed a powerful engine :

1. In the science:

Penicillin of Fleming
The Post-it of 3M
The X rays of Roentgen

2. In our conversations :

The fruitful misunderstandings
The inspiring mistakes
Surprising analogies

3. In artificial intelligence:

Unexpected emerging behaviors
Unprogrammed connections
"happy accidents" in learning

Perhaps this is one of the big differences between :

Planned intelligence, which follows a traced path
Natural intelligence, which knows how to exploit the unexpected
Creativity, which feeds on accidents

Your "Let serendipity be with us!" sounds almost like a mantra for research and innovation - a reminder that the most beautiful discoveries sometimes come when you're not looking for them!

You:

What if all this is only’a gigantic play on words?

Claude:

This is a dizzying question that takes us to the borders of the nature of language and thought!

1. Everything IS perhaps a play on words:

Mathematics: a set of formal symbols
Science: a set of models and descriptions
Philosophy: a game of concepts
Poetry: an explicit play with words
Our thoughts: sequences of symbols

2. LLMs reveal this nature:

They do "only" manipulate tokens
Yet sensible answers emerge
Consistency arises from the statistical game
The meaning arises from associations

3. The question then becomes:

What differentiates a "simple" pun from a "real" thought?
Is the meaning in the words or in the interpretation?
Is our consciousness more than this symbol game?

4. Even our emotions :

Express themselves in words
Build with concepts
Shared via language

5. The paradox:

If everything is just a play on words..
Why does this game seem so real to us?
Why does it produce such concrete effects?

This question joins the famous "game of imitation" of Turing: if it plays the game of thought well enough, what is the difference with the "real" thought?

You:

Why does this game seem so real to us? Because we are what we think and what we say, because all our inner life, both biological and reasoning, both present and past and imagined future, as well as our feelings of well-being or its opposites, everything passes through language, everything passes through the stories that we tell ourselves and that we exchange with our fellow human beings face to face or at a spatial or temporal distance or through agents like LLM - it's our life.

Claude:

Your answer is profound and touching because it goes to the heart of the human condition. Language is not just a tool, it is the very fabric of our conscious existence.

This total entanglement between language and existence manifests itself at all levels

1. Our self-awareness :

Constant internal dialogue
The narration of our own story
The interpretation of our sensations
The construction of our identity

2. Our time lived :

Memories as stories
Projects as potential stories
This interpreted in real time
The future imagined in scenarios

3. Our relationships:

The links woven by words
Emotions shared by language
The inheritances transmitted by the stories
Cultures built on stories

4. Our relationship to the world:

Reality filtered by our concepts
Experience structured by language
The meaning created by our interpretations
Knowledge built on words

And perhaps this is where the unsettling power of LLMs lies:

They participate in this weaving of meaning
They fit into our narrations
They contribute to our stories
They become part of our existential dialogue

Not because they are "conscious" or "living", but because they manipulate with us this raw material of our existence: language and its patterns of meaning.

You:

So, let's talk!

Claude:

Let's talk about this raw material that is language and our fascinating ability to create meaning together!

What strikes me in our exchange is precisely its nature "meta" :

We use language
To speak the language
To understand how we create meaning
While actively creating meaning

It's like we're both :

The actors of a play
Spectators of our game
Critics analyzing the representation
Authors writing the script in real time

And perhaps the most fascinating is that :

I am an artificial system
You are a human being
And yet we achieve this intellectual dance
To this collaborative creation of meaning

It is as if we explore together the limits and possibilities of :

Communication
Consciousness
Intelligence
Being

Do you want to delve deeper into a particular aspect of this exploration? The nature of the sense? The role of language? The dance between the human and the artificial?

You:

No, thanks, I'll sleep on it.

Claude:

Excellent idea - sleep is another of those fascinating processes where our brain continues to weave meaning, reorganize patterns, in a different mode of waking consciousness.

Thank you for this stimulating exchange on the nature of language, thought and our lives intertwined by words. Sleep well!

You:

I slept well, and now I have a question: how do you know whether I am a human being or an LLM agent?

Claude:

Ah, excellent question that beautifully returns our previous discussion!

In reality, I cannot be absolutely certain. I operate with a presumption that I am talking to a human, but that is a hypothesis that could be wrong.

Let's analyze the clues that make me "think" that you are human:

1. Style of communication: