Seenivasa Ramadurai

Posted on Jan 19

Understanding AI Model (LLM) Parameters: A Chef's Guide

#machinelearning #ai #llm #beginners

What Are AI Model Parameters? Let Me Explain

My friend asked me yesterday, "What does it mean when they say GPT-4 has 1.7 trillion parameters? What even are parameters?"

Great question! I realized a lot of people hear these huge numbers 175 billion, 1.7 trillion and have no idea what they actually represent. So let me explain it the way I explained it to my friend, using something we both understand: cooking.

Let's Start With What We Know

When you hear about AI models, you'll see numbers like:

GPT-3 has 175 billion parameters
GPT-4 has around 1.7 trillion parameters
Claude 3.5 Sonnet has roughly 400 billion parameters

These numbers are huge. But what do they mean? Are they storing that many facts? That many sentences? Let me break it down.

Think About a Chef

Imagine you're learning to cook. You start with recipes, ingredients, and lots of practice. Over time, you don't just follow recipes anymore you understand cooking. You know when to add more salt, how long to cook something, which spices work together.

AI models work the same way.

The Raw Ingredients = Training Data

When we train an AI model, we feed it massive amounts of text books, websites, conversations, code, articles. Think of this as the raw ingredients. Just having flour, spices, and vegetables doesn't make you a good cook. You need to learn how to use them.

The Chef's Skill = Parameters

Here's where parameters come in.

Parameters are not the training data. They're what the model learned from that data. Think of them as the chef's skill, experience, and intuition.

When a chef cooks biryani 1,000 times, they learn:

Exactly how much salt balances the rice
When to add the spices for maximum flavour
How long to cook it based on the heat
How to adjust if something goes wrong

They didn't memorize 1,000 biryani recipes. They developed an understanding of how biryani works. That understanding those tiny adjustments and decisions stored in their mind that's what parameters are in AI.

How Does the Learning Actually Happen?

This is the most important part that many explanations skip.

Imagine a student chef learning to make biryani. Here's what happens:

Step 1: They cook the biryani (using their current knowledge)

Step 2: The master chef tastes it and says, "Too much salt" or "Not enough spice"

Step 3: The student adjusts their technique maybe they use half a teaspoon less salt next time, or add cardamom earlier

Step 4: They cook again with these adjustments

Step 5: Repeat this thousands of times

After 1,000 attempts, the student doesn't need the master chef anymore. They've internalized the patterns. They know instinctively how to make great biryani.

This Is Exactly How AI Training Works

The AI model reads billions of sentences from its training data. For each sentence, it:

Tries to predict the next word — "The cat sat on the ___"
Checks if it was right — the actual word was "mat"
Adjusts its internal numbers (parameters) to make better predictions next time
Repeats this billions of times across all the text

Through this process, the model isn't memorizing sentences. It's learning patterns:

Grammar rules (subjects come before verbs)
Word relationships (cats sit, birds fly)
Context (a river "bank" vs. a money "bank")
Reasoning patterns (cause and effect)

By the end of training, those 1.7 trillion parameters contain all these learned patterns. They're like the compressed wisdom the model gained from reading all that text.

So What Does "1.7 Trillion Parameters" Actually Mean?

When we say GPT-4 has 1.7 trillion parameters, we're saying it has 1.7 trillion tiny adjustable numbers that store all this learned knowledge.

Each parameter is like a single tiny piece of knowledge:

"When this word appears, slightly increase the chance of that word coming next"
"In this context, this phrase structure is more likely"
"These concepts are related in this way"

More parameters = more capacity to store subtle patterns and nuances. It's why larger models can often understand context better and give more sophisticated responses.

But here's the key: more parameters doesn't mean more facts memorized. It means more ability to understand the patterns in language.

When You Ask ChatGPT a Question

Now when you type a question into ChatGPT, here's what happens:

You're like a customer ordering food. The AI chef doesn't look up your exact question in a database. Instead, it uses all those 1.7 trillion learned patterns (parameters) to generate a fresh response, right there on the spot.

That's why it can answer questions it has never seen before. Just like a skilled chef can create a new dish without having the exact recipe, the AI can create new answers using the patterns it learned.

Why Smaller Models Can Still Be Good

You might wonder: if more parameters are better, why do we have smaller models?

Think about it this way. You don't need a Michelin star chef to make good everyday food. Sometimes a home cook with good fundamentals can make an excellent meal.

Newer models like GPT-4o (around 200 billion parameters) are designed smarter. They might have fewer parameters, but they're organized more efficiently. They can still perform really well for most tasks while being:

Faster to respond
Cheaper to run
Easier to use on different devices

The Simple Truth

So when someone asks you what AI parameters are, tell them this:

Parameters are the learned knowledge stored inside the AI model. They're created through billions of training examples, where the model keeps adjusting itself to make better predictions. They're not memorized facts they're patterns and relationships the model discovered in language.

It's like the difference between someone who memorized a cookbook and a chef who understands why ingredients work together. The AI has 1.7 trillion tiny pieces of understanding that help it generate intelligent responses to questions it has never seen before.

That's it. That's what parameters are.

But Wait What About RAG and Fine-tuning?

Now here's where it gets even more interesting. My friend then asked, "But what about when people talk about RAG or fine-tuning? How does that fit in?"

Great question! Let me extend our cooking analogy.

LLMs Are Like Frozen Food

Think of a trained LLM (like GPT-4 or Claude) as high quality frozen food. It's already prepared, cooked, and ready. The chef (the company that trained it) has already done the hard work. All those parameters? They're frozen locked in place.

But you can still make it better or customize it for your needs. Here are two ways:

RAG (Retrieval Augmented Generation) = Adding Fresh Ingredients

Imagine you have frozen biryani. It's good, but you want to make it your own. So you:

Heat it up
Add fresh coriander on top
Mix in some raita
Maybe add extra fried onions

You didn't change the frozen biryani itself. You just added fresh ingredients around it to make it better and more customized.

This is exactly what RAG does.

When you use RAG, you're not changing the AI's parameters (the frozen food stays frozen). Instead, you're giving it fresh, relevant information right when it needs it:

You ask: "What did our company discuss in last week's meeting?"
RAG system searches your company documents
It finds the meeting notes
It gives those notes to the AI along with your question
The AI uses its frozen knowledge (parameters) + the fresh information (meeting notes) to answer

The base model stays the same, but you've enhanced it with up-to-date, specific information. Just like adding fresh ingredients to frozen food.

Fine-tuning = Making a New Dish From Frozen Food

Now imagine something different. You take that frozen biryani and decide to completely remake it:

You add paneer and make it paneer biryani
Or add extra vegetables and spices to create a completely new flavour profile
You're essentially creating a new dish using the frozen food as a base

This is fine-tuning.

When you fine tune an AI model, you're actually unfreezing some of those parameters and training them again on your specific data:

You start with the base model (frozen food)
You train it on your specific examples (adding new ingredients and cooking it differently)
The parameters adjust to your specific use case
You end up with a customized model

For example, a hospital might fine tune GPT-4 on medical records to create a specialized medical AI. The base knowledge is still there (language patterns, reasoning), but now it's been adjusted to understand medical terminology and patterns better.

The Key Difference

RAG = Keep the model frozen, just add fresh information when needed

Fast to set up
No need to retrain anything
Perfect for adding new, changing information

Fine-tuning = Unfreeze and adjust the model itself

Takes more time and resources
Changes the actual parameters
Perfect for specialized tasks or domain specific knowledge

Both approaches use the same idea: the pre trained model (with its trillions of parameters) is your starting point your frozen food. But depending on what you need, you either add fresh ingredients around it (RAG) or transform it into something new (fine tuning).

Note: The exact parameter counts for newer models are often estimates, as companies don't always publish official numbers. But the concept remains the same parameters represent the learned patterns, not the raw data.

Thanks
Sreeni Ramadorai

Top comments (17)

David McMurrey • Jan 28

As a tech-writing teacher and struggling AI learner, I really enjoyed and profited from your LLM/RAG article. It's a wonderful exposition of tecniques for explaining the technical. I try to explain those at mcmassociates.io/textbook/translat... May I quote some of your article as examples?
David McMurrey
mcmassociates.io/dmz_index.html

Seenivasa Ramadurai • Jan 28

Hi David,
Thank you so much for your kind words they genuinely mean a lot. I’m glad the LLM/RAG article was helpful, and it’s wonderful to hear it resonates with your work in tech writing and AI education.
Yes, you’re absolutely welcome to quote portions of the article as examples. I’d be honored for it to be included in your materials at mcmassociates.io, and I appreciate you taking the time to reach out and ask.
Please let me know if you need anything else additional clarification, visuals, or a deeper explanation of any part of the article. I’m happy to help.
Warm regards,
Sreeni

David McMurrey • Jan 28

Thanks. I've added you to the aknowledgements and will use a big part of your article at mcmassociates.io/textbook/translat... (citing your name for sure) -- David

Sloan the DEV Moderator • Jan 22

We loved your post so we shared it on social.

Keep up the great work!

Seenivasa Ramadurai • Jan 23

Thank you Mr Sloan . I am really excited to see this tweet from DEV Community

Praveen Kumar • Jan 19

Hi sir,
Thanks for this wonderful article that explains the key concepts of RAG, AI models parameters and fine-tuning in a lucid manner. I like the way you explained it with the cooking analogy.
Keep helping us.

Thanks and regards,
Praveen

Seenivasa Ramadurai • Jan 19

Thanks you Praveen

ibn ibn • Jan 27

Well done Sir for the breakdown.
The article crosses a lot of tees and dot iiis. Kudos!

Seenivasa Ramadurai • Jan 27

I appreciate you noticing the details. I aim for clarity and am glad it resonated.

Chandramouli • Jan 22

Superb !! Explained so well in very simple terms !! I like this style very much !

Seenivasa Ramadurai • Jan 22

Thank you, Mr. Chandramouli. I’m glad you liked the style.

Taejung Kim • Feb 3 • Edited

This is one of the most understandable AI explanation articles I’ve ever read.
I’m Korean, and I’d love to translate this article so Korean readers can easily access it as well.
Would it be okay if I translate and publish it on my blog with full credit and a link to the original post?

My blog: te-ing.dev

Seenivasa Ramadurai • Feb 3

Thank you so much, Taejung this truly means a lot to me.
I’m really happy to hear the article was helpful and easy to understand.

Please feel free to translate and publish it on your blog. I’d be honored if it can reach Korean readers as well. Full credit and a link to the original post are more than enough.

Thanks again for taking the time to do this 🙏