Sai Srinivas

Posted on Jan 1

The LLM Control Stack: From Words to Weights

#deeplearning #llm #promptengineering #ai

Most people think they are controlling LLMs.

They tweak prompts.

They add examples.

They play with temperature.

It works.

Until one day it doesn’t.

Then the confusion starts.

“The same prompt worked yesterday. Why is the model ignoring me now?”

Short answer:

You were never controlling the model. You were negotiating with it.

This blog is about understanding how deep different control methods go, and why some always beat others.

The core idea

LLM behavior control is not a switch.

It is a depth ladder.

Each technique touches a different part of the model:

some touch words,
some touch probabilities,
some touch hidden thinking,
some touch weights.

Deeper you go, stronger the control.

Also harder to undo.

Two hard buckets (this matters)

Everything falls into one of these. No exceptions.

Training-time control

Gradients flow
The model actually learns
Behavior sticks across sessions

Inference-time control

No gradients
The model only interprets instructions
Behavior disappears when context ends

If you remember only one thing from this blog, remember this split.

The control stack (strongest to weakest)

Training-time

Pretraining
Supervised Fine-Tuning (SFT)
RLHF and Preference Optimization (DPO, etc.)
PEFT like LoRA and adapters

Inference-time

Soft prompts (learned vectors)
Steering (logit bias, decoding tricks)
System prompt
Few-shot examples

This order is not opinion.

It is enforced by how transformers work.

Image: The LLM Control Stack

[IMAGE PLACEHOLDER: A vertical stack diagram showing the LLM Control Stack.

Top labeled “Deep control (hard to undo)”, bottom labeled “Shallow control (easy to undo)”.

Training-time methods at the top, inference-time methods at the bottom, with a clear horizontal divider labeled “Training-time vs Inference-time boundary”.

Side note: “When control methods conflict, higher layers dominate lower ones.”]

Keep this image in mind.

Every blog in this series zooms into one layer of this stack.

The heatmap (the map you will keep coming back to)

This is the anchor for the entire series.

Read it top to bottom, not left to right.

Control method	When applied	Control strength	How long it lasts	Easy to undo	Cost	Common way it fails
Pretraining	Training	Very high	Permanent	No	Extreme	Locked-in biases
SFT	Training	High	Long	Very hard	High	Silent behavior drift
RLHF / DPO	Training	High	Long	Very hard	High	Over-alignment
LoRA / PEFT	Training	Medium-high	Medium	Yes	Medium	Overfitting to task
Soft prompts	Inference	Medium	Session-level	Yes	Medium	Ignored under conflict
Steering	Inference	Low	Per request	Instant	Low	Weird phrasing
System prompt	Inference	Very low	Per request	Instant	Very low	Context erosion
Few-shot	Inference	Very low	Per request	Instant	Very low	Pattern collapse

Important note:

Control strength here does not mean formatting, verbosity, or tone control.

It means one thing only:

Can this behavior survive long context, paraphrasing, user pushback, and conflicting instructions?

Why this ordering exists (simple reasons)

1. What state you are touching

Weights change defaults
Adapters add skills
Hidden states shape thinking
Logits shape words
Text only suggests behavior

Earlier state always wins.

This is why higher rows dominate lower ones in the heatmap.

2. Gradients vs no gradients

If gradients flow, the model changes.

If they don’t, the model decides whether to listen.

That is why you cannot prompt away safety rules.

You are asking, not changing.

3. Reversibility

The easier it is to undo, the weaker it is.

If you can delete it with one line of code or one prompt change, it does not have deep control.

This is why the bottom half of the heatmap is cheap and fragile.

The hidden rule: conflict resolution

When signals disagree, the model resolves them roughly like this:

What it was trained to do
What attached weights say (LoRA, adapters)
Learned soft prompts
System instructions
Examples
User wording

This single rule explains a lot:

why jailbreaks exist,
why prompt engineering plateaus,
why LoRA beats clever prompting every time.

Keep this rule in mind. We will come back to it often.

Why people reach for the wrong lever

Most people start at the bottom of the stack.

Why?

Prompts are cheap
Prompts feel powerful
Prompts require no commitment

Fine-tuning feels scary.

LoRA sounds complex.

Training sounds expensive.

So people keep pushing prompts harder and harder, even when the problem clearly needs deeper control.

This is not stupidity.

It is rational behavior under uncertainty.

The problem is that prompts fail silently.

They work… until they don’t.

A simple failure example

You write:

“You are a legal analyst. Be formal.”

It works.

Then:

the context gets longer,
the user starts talking casually,
examples contradict tone.

The formality fades.

Nothing broke.

You were just operating too shallow for the behavior you wanted.

Look at the heatmap again.

You were using one of the weakest levers.

What this series will do

Each blog in this series will focus on one layer:

what it really controls,
why it works,
where it breaks,
and how to try it yourself with code when that makes sense.

The goal is not to use the strongest tool.

The goal is to answer this question correctly:

What is the weakest lever that will not fail for my use case?

That is real control.

What’s next

Next, we go to the bottom of the stack.

Why prompts feel powerful.

Why they are fragile.

And why everyone hits the same wall.

Instructions are not control.

DEV Community