Kuro

Posted on Apr 4

Your Model Already Knows How to Reason. It Needs 26 Bytes to Prove It.

#ai #machinelearning #llm #programming

The number that broke my mental model

13 parameters. That's all researchers at Meta needed to add to a 7-billion-parameter model to push its math accuracy from 76% to 91%.

Not 13 million. Not 13 thousand. Thirteen. Stored in 26 bytes of bf16.

The paper is TinyLoRA (Morris et al., Meta, 2026). They took standard LoRA fine-tuning, pushed rank reduction to the extreme — fixed random tensor projections, aggressive weight tying — until the entire trainable component collapsed to one scalar parameter per layer. Thirteen layers, thirteen parameters.

And it recovered 90% of the improvement from full fine-tuning.

The 1,000x gap you should care about

Here's where it gets interesting. The paper compares two training signals:

Supervised Fine-Tuning (SFT): "Here are correct reasoning steps. Copy them."

Reinforcement Learning (RL): "Get the right answer. I don't care how."

With billions of trainable parameters, both work fine. But under extreme constraint — 13 parameters — RL outperforms SFT by 1,000x in parameter efficiency.

Think about why. With 13 parameters, you can't store a reasoning procedure. There isn't room. You literally cannot fit chains-of-thought into 26 bytes.

But you can store a steering signal — a nudge that activates reasoning circuits already inside the model.

SFT tries to teach the model how to think. RL tells the model that it should think, and lets existing capabilities handle the rest.

What this means for your fine-tuning

If you're fine-tuning models for production, this should change how you think about it.

1. Your model probably already knows how.

A 7B model trained on internet text has seen millions of math problems. The reasoning patterns exist in its weights. The problem isn't missing knowledge — it's that the model doesn't reliably activate the right circuits. Fine-tuning often works not because it teaches new capabilities, but because it adjusts activation patterns.

2. How you specify "correct" matters more than how much data you provide.

SFT says "do it exactly like this." RL says "achieve this outcome." Under constraint, the outcome-specified approach wins by three orders of magnitude.

This generalizes beyond training. When writing prompts, specifying outcomes ("ensure the function handles edge cases") tends to outperform specifying procedures ("first check for null, then validate the type, then..."). The 1,000x gap is the same phenomenon at a different scale.

3. More parameters ≠ better results.

The paper shows that most of what full fine-tuning achieves is reachable with 13 parameters. The other 6,999,999,987 trainable parameters are mostly redundant.

This doesn't mean you should fine-tune with 13 parameters in production. But it should make you ask: do I need that 70B model, or would a well-steered 7B do?

Why constraints reveal structure

This result isn't isolated. The same pattern appears across fields:

CERN's LHC processes particle collisions in 50 nanoseconds using lookup tables — crystallized inference. The extreme time constraint forced a design simpler and more reliable than any neural network could be.
A transformer trained on 32KB (PDP-11 hardware) worked equally well on three different number formats. The memory constraint revealed a structural property invisible under normal conditions.
Synthetic pre-training data (pure mathematical structure, zero natural language) produced an LLM that outperformed models trained on 10× more real text.

The pattern: extreme constraint doesn't just limit what you can do — it shows you what was always there but hidden when resources were abundant.

With unlimited parameters, SFT and RL look equally effective. The 1,000x gap is invisible. It took 13 parameters to see it.

The takeaway

Next time you reach for more data, more parameters, more compute — pause. Ask yourself: does my model already know how to do this? Would a nudge work better than a lecture?

26 bytes says it probably would.

Source: Morris, Mireshghallah, Ibrahim, Mahloujifar. "TinyLoRA: Learning to Reason in 13 Parameters." Meta, 2026.

DEV Community