DEV Community

Cover image for The LLM Control Stack: From Words to Weights
Sai Srinivas
Sai Srinivas

Posted on

The LLM Control Stack: From Words to Weights

Most people think they are controlling LLMs.

They tweak prompts.

They add examples.

They play with temperature.

It works.

Until one day it doesn’t.

Then the confusion starts.

“The same prompt worked yesterday. Why is the model ignoring me now?”

Short answer:

You were never controlling the model. You were negotiating with it.

This blog is about understanding how deep different control methods go, and why some always beat others.


The core idea

LLM behavior control is not a switch.

It is a depth ladder.

Each technique touches a different part of the model:

  • some touch words,
  • some touch probabilities,
  • some touch hidden thinking,
  • some touch weights.

Deeper you go, stronger the control.

Also harder to undo.


Two hard buckets (this matters)

Everything falls into one of these. No exceptions.

Training-time control

  • Gradients flow
  • The model actually learns
  • Behavior sticks across sessions

Inference-time control

  • No gradients
  • The model only interprets instructions
  • Behavior disappears when context ends

If you remember only one thing from this blog, remember this split.


The control stack (strongest to weakest)

Training-time

  1. Pretraining
  2. Supervised Fine-Tuning (SFT)
  3. RLHF and Preference Optimization (DPO, etc.)
  4. PEFT like LoRA and adapters

Inference-time

  1. Soft prompts (learned vectors)
  2. Steering (logit bias, decoding tricks)
  3. System prompt
  4. Few-shot examples

This order is not opinion.

It is enforced by how transformers work.


Image: The LLM Control Stack

[IMAGE PLACEHOLDER: A vertical stack diagram showing the LLM Control Stack.

Top labeled “Deep control (hard to undo)”, bottom labeled “Shallow control (easy to undo)”.

Training-time methods at the top, inference-time methods at the bottom, with a clear horizontal divider labeled “Training-time vs Inference-time boundary”.

Side note: “When control methods conflict, higher layers dominate lower ones.”]

Keep this image in mind.

Every blog in this series zooms into one layer of this stack.


The heatmap (the map you will keep coming back to)

This is the anchor for the entire series.

Read it top to bottom, not left to right.

Control method When applied Control strength How long it lasts Easy to undo Cost Common way it fails
Pretraining Training Very high Permanent No Extreme Locked-in biases
SFT Training High Long Very hard High Silent behavior drift
RLHF / DPO Training High Long Very hard High Over-alignment
LoRA / PEFT Training Medium-high Medium Yes Medium Overfitting to task
Soft prompts Inference Medium Session-level Yes Medium Ignored under conflict
Steering Inference Low Per request Instant Low Weird phrasing
System prompt Inference Very low Per request Instant Very low Context erosion
Few-shot Inference Very low Per request Instant Very low Pattern collapse

Important note:

Control strength here does not mean formatting, verbosity, or tone control.

It means one thing only:

Can this behavior survive long context, paraphrasing, user pushback, and conflicting instructions?


Why this ordering exists (simple reasons)

1. What state you are touching

  • Weights change defaults
  • Adapters add skills
  • Hidden states shape thinking
  • Logits shape words
  • Text only suggests behavior

Earlier state always wins.

This is why higher rows dominate lower ones in the heatmap.


2. Gradients vs no gradients

If gradients flow, the model changes.

If they don’t, the model decides whether to listen.

That is why you cannot prompt away safety rules.

You are asking, not changing.


3. Reversibility

The easier it is to undo, the weaker it is.

If you can delete it with one line of code or one prompt change, it does not have deep control.

This is why the bottom half of the heatmap is cheap and fragile.


The hidden rule: conflict resolution

When signals disagree, the model resolves them roughly like this:

  1. What it was trained to do
  2. What attached weights say (LoRA, adapters)
  3. Learned soft prompts
  4. System instructions
  5. Examples
  6. User wording

This single rule explains a lot:

  • why jailbreaks exist,
  • why prompt engineering plateaus,
  • why LoRA beats clever prompting every time.

Keep this rule in mind. We will come back to it often.


Why people reach for the wrong lever

Most people start at the bottom of the stack.

Why?

  • Prompts are cheap
  • Prompts feel powerful
  • Prompts require no commitment

Fine-tuning feels scary.

LoRA sounds complex.

Training sounds expensive.

So people keep pushing prompts harder and harder, even when the problem clearly needs deeper control.

This is not stupidity.

It is rational behavior under uncertainty.

The problem is that prompts fail silently.

They work… until they don’t.


A simple failure example

You write:

“You are a legal analyst. Be formal.”

It works.

Then:

  • the context gets longer,
  • the user starts talking casually,
  • examples contradict tone.

The formality fades.

Nothing broke.

You were just operating too shallow for the behavior you wanted.

Look at the heatmap again.

You were using one of the weakest levers.


What this series will do

Each blog in this series will focus on one layer:

  • what it really controls,
  • why it works,
  • where it breaks,
  • and how to try it yourself with code when that makes sense.

The goal is not to use the strongest tool.

The goal is to answer this question correctly:

What is the weakest lever that will not fail for my use case?

That is real control.


What’s next

Next, we go to the bottom of the stack.

Why prompts feel powerful.

Why they are fragile.

And why everyone hits the same wall.

Instructions are not control.

Top comments (0)