When people talk about AI models, the focus is usually on training, such as how much data was used, how big the model is, or what architecture it uses.
Thatโs where most of the attention goes.
But Chapter 2 of AI Engineering made me focus on something that has a direct impact on how models behave in practice: ๐ฆ๐ฎ๐บ๐ฝ๐น๐ถ๐ป๐ด.
Sampling is how a model selects one output from many possible options. It might seem like a small detail, but it explains a lot of what we see in real-world usage, especially inconsistency and hallucinations.
๐ง๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐๐๐ปโ๐ ๐๐๐๐ ๐๐ฏ๐ผ๐๐ ๐ ๐ผ๐ฟ๐ฒ ๐๐ฎ๐๐ฎ
Itโs easy to assume that more data leads to better performance, but that assumption breaks down quickly.
A model trained on a smaller amount of high-quality data can outperform a larger model trained on low-quality data.
What matters is finding the right balance between quantity, quality, and diversity. The model needs enough exposure to learn patterns, the data needs to be reliable, and it needs enough variety to generalize well.
This aligns closely with how we think about data in backend systems. Clean and well-structured inputs tend to produce more reliable outputs.
๐๐ผ๐บ๐ฝ๐๐๐ฒ ๐๐ผ๐ฟ๐ฐ๐ฒ๐ ๐ง๐ฟ๐ฎ๐ฑ๐ฒ๐ผ๐ณ๐ณ๐
AI systems donโt scale in isolation. They scale within constraints.
Larger models and datasets require more compute, and compute directly translates to cost.
In practice, teams donโt start with the biggest possible model. They start with a budget and design within that limit.
This is where the ๐๐ต๐ถ๐ป๐ฐ๐ต๐ถ๐น๐น๐ฎ ๐๐ฐ๐ฎ๐น๐ถ๐ป๐ด ๐น๐ฎ๐ becomes useful.
Before jumping into large numbers, it helps to understand one term:
A ๐ฝ๐ฎ๐ฟ๐ฎ๐บ๐ฒ๐๐ฒ๐ฟ is a learned value inside the model that helps it make decisions. More parameters mean the model can learn more patterns, but it also needs more data to train properly.
Now, think of the scaling rule like this:
1 parameter โ ~20 tokens
Then scale it up:
1B parameters โ ~20B tokens
3B parameters โ ~60B tokens
The pattern stays consistent. As the model grows, the data needs to grow with it. Otherwise, you end up with a larger model that isnโt fully trained and doesnโt use compute efficiently.
This matters because it directly affects how we make decisions when working with AI systems.
In practice, we are constantly choosing between models, deciding whether to fine-tune, and balancing cost with performance. This concept gives a way to reason about those choices.
For example, a larger model isnโt automatically better if it wasnโt trained with enough data. That explains why smaller, well-trained models can sometimes outperform larger ones.
It also applies when fine-tuning. Adding complexity or expecting better results wonโt help unless there is enough high-quality data to support it.
Even when using APIs, this changes the mindset. Instead of defaulting to the biggest model, the focus shifts to whether the model was trained efficiently and whether it fits the use case.
So this is not just a scaling rule. It becomes a way to guide model selection, fine-tuning decisions, and cost vs performance tradeoffs.
๐๐ฟ๐ผ๐บ ๐ฃ๐ฟ๐ฒ-๐ง๐ฟ๐ฎ๐ถ๐ป๐ฒ๐ฑ ๐ ๐ผ๐ฑ๐ฒ๐น๐ ๐๐ผ ๐ฅ๐ฒ๐ฎ๐น ๐ฆ๐๐๐๐ฒ๐บ๐
A pre-trained model is not ready for production use.
It is optimized for predicting the next token, not for producing useful, safe, or aligned responses.
Thatโs where post-training comes in.
๐ฆ๐๐ฝ๐ฒ๐ฟ๐๐ถ๐๐ฒ๐ฑ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ถ๐ป๐ด teaches the model how to respond using structured examples. However, that alone is not enough.
๐ฅ๐๐๐ (๐ฅ๐ฒ๐ถ๐ป๐ณ๐ผ๐ฟ๐ฐ๐ฒ๐บ๐ฒ๐ป๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐ณ๐ฟ๐ผ๐บ ๐๐๐บ๐ฎ๐ป ๐๐ฒ๐ฒ๐ฑ๐ฏ๐ฎ๐ฐ๐ธ) introduces a feedback loop that improves alignment.
A ๐ฟ๐ฒ๐๐ฎ๐ฟ๐ฑ ๐บ๐ผ๐ฑ๐ฒ๐น is trained to evaluate how good a response is. Instead of relying on absolute scoring, models often learn from comparing multiple responses, which helps reduce inconsistency.
Then RLHF uses that feedback process. The model generates responses, the reward model scores them, and the model is updated to favor better responses over time.
This process helps align models with human expectations, not just in correctness but also in tone, safety, and usefulness.
๐ ๐๐ถ๐ฑ๐ฑ๐ฒ๐ป ๐ฅ๐ถ๐๐ธ: ๐ง๐ต๐ฒ ๐ค๐๐ฎ๐น๐ถ๐๐ ๐ผ๐ณ ๐๐ต๐ฒ ๐๐ป๐๐ฒ๐ฟ๐ป๐ฒ๐ ๐๐๐๐ฒ๐น๐ณ
Models are trained on internet-scale data. That means whatever exists online, whether accurate or misleading, can influence how models behave.
As more AI-generated content is published, there is a growing risk that future models will be trained on synthetic or incorrect information.
It is also possible for bad actors to intentionally introduce misleading content into the internet so that future models learn from it.
This turns into a data integrity problem, not just a modeling problem.
As engineers, this means we need to be more mindful. Not all data sources are equally reliable, and blindly trusting model outputs becomes riskier over time.
๐ช๐ต๐ ๐๐ ๐๐ฒ๐ฒ๐น๐ ๐๐ป๐ฐ๐ผ๐ป๐๐ถ๐๐๐ฒ๐ป๐
One of the most important ideas in this chapter is that AI models are ๐ฝ๐ฟ๐ผ๐ฏ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐๐ถ๐ฐ ๐๐๐๐๐ฒ๐บ๐.
That means the same input can produce different outputs, and even a small change in input can lead to a noticeably different response.
This behavior is driven by ๐ฆ๐ฎ๐บ๐ฝ๐น๐ถ๐ป๐ด.
It also explains ๐ต๐ฎ๐น๐น๐๐ฐ๐ถ๐ป๐ฎ๐๐ถ๐ผ๐ป๐, where the model generates responses that sound correct but are not grounded in fact.
๐๐ฒ๐๐ถ๐ด๐ป๐ถ๐ป๐ด ๐๐ฟ๐ผ๐๐ป๐ฑ ๐ฃ๐ฟ๐ผ๐ฏ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐๐ถ๐ฐ ๐ฆ๐๐๐๐ฒ๐บ๐
This chapter didnโt introduce completely new ideas to me, but it helped connect things more clearly.
In backend systems, Iโm used to building deterministic workflows where the same input leads to the same output. This chapter reinforced that AI systems donโt behave that way.
Instead, AI systems need to be designed with their probabilistic nature in mind.
That shows up in practice. Outputs need validation instead of blind trust. Prompting and constraints act as control mechanisms. Fine-tuning becomes a tool for consistency, not just improvement.
AI systems are shaped by the data they are trained on, the compute used during training, the post-training process, and the sampling strategy that generates outputs.
Sampling is what makes models flexible and useful, but it is also what introduces variability.
Understanding that tradeoff is what makes AI engineering more practical.
Top comments (0)