What is an LLM actually doing when it's "thinking"?

#gemini #llm #ai #machinelearning

Ever wondered what an LLM is doing when it's "thinking"?

In this episode of Release Notes Explained, we cover the fundamentals of how thinking and reasoning models work including concepts like:

Scaling laws
Test-time compute
Reinforcement learning from verifiable rewards

Hope you enjoy! 🩵

Questions? Leave them down below.

Top comments (8)

Archit Mittal • Apr 15

The shift from scaling laws (bigger model = better) to test-time compute (let the model think longer on hard problems) is one of the most underappreciated changes in recent AI. It fundamentally changes the cost calculus - instead of throwing more parameters at everything, you can allocate compute dynamically based on problem difficulty. The reinforcement learning from verifiable rewards angle is especially interesting because it creates a natural feedback loop for domains where you can programmatically check correctness, like math and code. The open question is how well this transfers to fuzzier domains where verification itself is subjective.

Michael • May 2

Great explanation! Practical tip: give models "permission to think" by structuring prompts like this:
"Think step-by-step:

What's being asked?
Reasoning process
Self-check
Final answer" This leverages the video's core insight: more reasoning tokens = better results. The structure keeps those tokens focused rather than wasteful.