What is an LLM actually doing when it's "thinking"?

#gemini #llm #ai #machinelearning

Ever wondered what an LLM is doing when it's "thinking"?

In this episode of Release Notes Explained, we cover the fundamentals of how thinking and reasoning models work including concepts like:

Scaling laws
Test-time compute
Reinforcement learning from verifiable rewards

Hope you enjoy! 🩵

Questions? Leave them down below.

Top comments (4)

Archit Mittal • Apr 15

The shift from scaling laws (bigger model = better) to test-time compute (let the model think longer on hard problems) is one of the most underappreciated changes in recent AI. It fundamentally changes the cost calculus - instead of throwing more parameters at everything, you can allocate compute dynamically based on problem difficulty. The reinforcement learning from verifiable rewards angle is especially interesting because it creates a natural feedback loop for domains where you can programmatically check correctness, like math and code. The open question is how well this transfers to fuzzier domains where verification itself is subjective.

Benjamin Nguyen • Apr 10

nice!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.