We hear a lot about parameters in modern AI models. Every time a highly performant model is released, the next thing everyone wants to know is how many parameters it has. 600 billion? 700 billion? 😋
It's almost as if the more parameters there are, the better the A.I. model performs (and to some extent, it does).
Parametric and Nonparametric learning 🥶
To understand what parameters are and their use within modern LLMs like ChatGPT and Claude, we first need to know how machines learn, because A.I parameters are closely tied to two of the major ways we train A.I model today, which includes parametric and nonparametric techniques. Understanding these two techniques will mark our first step into our understanding of A.I model's parameters.
Parametric learning 🧪
When an A.I model resort to parametric learning during training what it does is that it tries to predict an outcome base on a dataset (that constructs its base knowledge), and if the dataset is very large, that first prediction will likely be incorrect, that were the parameters come in; parameters are settings that influence the predictions of AI models, if a prediction is incorrect the parameters are adjusted and the next prediction will be done based on new parameter settings up until we get to a correct prediction, note that this happens during training not in production.
In this example, during training, the LLM takes an input that asks if the next movie starring a female actress will win an Oscar, the LLM then refers to its base knowledge, and makes a prediction based on its parameter settings.
If the LLM responds with “Yes, the next movie starring a female actress will win an Oscar”, and the movie ended up not making it to the Oscars, next time the prediction will be done with different parameter settings until we start to get more accurate predictions.
It's like telling the model when it’s wrong and then giving it another try with different parameter settings over and over.
Keep in mind that the number of parameters is fixed and does not depend on the amount of training data. By the way, ChatGPT and Claude use a parametric approach.
This is a high-level overview that aims to help you understand the inner workings of LLMs.
Nonparametric learning 😮
The funny thing about nonparametric learning is that it doesn’t mean that there are no parameters involved! We still resort to parameters, but they aren’t a fixed number of parameters that do not depend on the amount of training data.
Nonparametric learning refers to a class of machine learning methods where the model structure is not fixed in advance. That is, the model does not assume a predetermined number of parameters. Instead, it learns patterns directly from the data, and its complexity can grow as more data becomes available.
So what does that mean? How do we make sure our predictions are correct?
This time, as it relies more on the data it’s provided with, the model will have a more analytical approach, it will be more observing, and it adapts to the structure and size of the training data.
Example: Having all successful movies (Oscar winning included) starring female actresses as it dataset, the LLM will rely on the shape of the data, the structure on which the data has been form could present properties like the popularity of the actress, how much revenue she generates on average in the movies she was casted in and so on.
This way, the LLM can assume that in every Oscar-winning movie starring a female actress, the actress had above 70% in popularity and, on average, generated more than $100 million of revenue for the movies she had previously starred in.
It will then evaluate and filter movies based on those features and try to make more accurate predictions.
2 What are parameters?
They are settings that influence an AI model's predictions during training; The better those settings are, the better the predictions.
More parameters don't mean a better AI model; What if those settings do not lead to better predictions? So the quality of the settings determines the quality of the training of the model and the quality of the model itself, and even in that case, we still need a great number of parameters! So finding the right balance is key here!
That it! You made it this far! Follow me if you want more content like this in your feed! 😋
Here’s my Twitter/X 😉
Top comments (0)