DEV Community

Nikita Namjoshi for Google AI

Posted on

What is an LLM actually doing when it's "thinking"?

Ever wondered what an LLM is doing when it's "thinking"?

In this episode of Release Notes Explained, we cover the fundamentals of how thinking and reasoning models work including concepts like:

  • Scaling laws
  • Test-time compute
  • Reinforcement learning from verifiable rewards

Hope you enjoy! 🩵

Questions? Leave them down below.

Top comments (4)

Collapse
 
automate-archit profile image
Archit Mittal

The shift from scaling laws (bigger model = better) to test-time compute (let the model think longer on hard problems) is one of the most underappreciated changes in recent AI. It fundamentally changes the cost calculus - instead of throwing more parameters at everything, you can allocate compute dynamically based on problem difficulty. The reinforcement learning from verifiable rewards angle is especially interesting because it creates a natural feedback loop for domains where you can programmatically check correctness, like math and code. The open question is how well this transfers to fuzzier domains where verification itself is subjective.

Collapse
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

nice!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.