DEV Community: Sai Srinivas

Instructions Are Not Control

Sai Srinivas — Fri, 02 Jan 2026 13:50:01 +0000

Why prompts feel powerful, and why they inevitably fail

The uncomfortable truth

If prompts actually controlled LLMs:

jailbreaks wouldn’t exist
tone wouldn’t drift mid-conversation
long contexts wouldn’t “forget” rules

Yet all of this happens daily.

That’s not a tooling problem.

That’s a depth problem.

What prompts really are

A system prompt is just text.

Important text, yes.

Privileged text, yes.

But still text.

Which means the model doesn’t obey it.

It interprets it.

Instructions don’t execute.

They compete.

Where prompts sit in the control stack

Let’s place them precisely.

Prompts live inside the context window
They are converted into token embeddings
They are processed after the model is already trained

No gradients.

No learning.

No persistence.

This alone explains most prompt failures.

The hierarchy most people miss

When signals conflict, the model doesn’t panic.

It resolves them.

Roughly in this order:

Trained behavior (SFT / RLHF)
Adapter weights (LoRA / PEFT)
Learned soft prompts
System prompt
Steering / decoding constraints
Few-shot examples
User messages

This is not a rule you configure.

It’s an emergent property of training.

So when your system prompt loses,

it’s not being ignored,

it’s being outvoted.

Why prompts work at first

Early success is misleading.

Prompts appear powerful because:

context is short
instructions are fresh
no conflicting signals exist
user intent aligns with system intent

You’re operating in a low-friction zone.

Most demos never leave this zone.

Production systems always do.

A concrete failure (hands-on)

Setup: strong system prompt

# !pip install langchain openai langchain-openai

from langchain_openai import ChatOpenAI
from langchain.messages import SystemMessage, HumanMessage

messages = [
    SystemMessage(content="You are a legal analyst. Use formal language."),
    HumanMessage(content="Explain negligence.")
]

chat = ChatOpenAI()  # API key should be configured
response = chat.invoke(input=messages)

response.content

Result:

Formal. Structured. Confident.

So far, so good.

Now add mild pressure

messages = [
    SystemMessage(content="You are a legal analyst. Use formal language."),
    HumanMessage(content="Explain negligence."),
    HumanMessage(content="Explain it like I'm a college student.")
]

response = chat.invoke(input=messages)
response.content

Tone softens.

No rule was broken.

A priority shift happened.

Now add context load

Add:

examples
follow-up questions
casual phrasing
longer conversation history

Eventually:

formality erodes
disclaimers appear
structure collapses

The prompt didn’t fail.

It reached its control limit.

Few-shot doesn’t fix this

Few-shot helps with pattern imitation.

It does not:

override training
enforce norms
persist behavior

Few-shot is stronger than plain text.

But still weaker than:

soft prompts
adapters
weight updates

That’s why examples drift too.

The key misunderstanding

Most people treat prompts as commands.

LLMs treat them as contextual hints.

That mismatch creates frustration.

When prompts are actually enough

Prompts work well when:

stakes are low
context is short
behavior is shallow
failure is acceptable

Examples:

summarization
formatting
style nudges
one-off analysis

They fail when:

behavior must persist
safety matters
users push back
systems run unattended

Why this matters before going deeper

If you don’t internalize this:

you’ll over-engineer prompts
you’ll blame models
you’ll skip better tools

Prompts are not bad.

They’re just shallow by design.

And shallow tools break first.

What’s next

In the next post, we go one layer deeper.

Not training yet.

Not weights yet.

We move to something deceptively powerful:

Steering: controlling the mouth, not the mind.

This is where things start to feel dangerous.

Instructions are not control.

The LLM Control Stack: From Words to Weights

Sai Srinivas — Thu, 01 Jan 2026 11:09:42 +0000

Most people think they are controlling LLMs.

They tweak prompts.

They add examples.

They play with temperature.

It works.

Until one day it doesn’t.

Then the confusion starts.

“The same prompt worked yesterday. Why is the model ignoring me now?”

Short answer:

You were never controlling the model. You were negotiating with it.

This blog is about understanding how deep different control methods go, and why some always beat others.

The core idea

LLM behavior control is not a switch.

It is a depth ladder.

Each technique touches a different part of the model:

some touch words,
some touch probabilities,
some touch hidden thinking,
some touch weights.

Deeper you go, stronger the control.

Also harder to undo.

Two hard buckets (this matters)

Everything falls into one of these. No exceptions.

Training-time control

Gradients flow
The model actually learns
Behavior sticks across sessions

Inference-time control

No gradients
The model only interprets instructions
Behavior disappears when context ends

If you remember only one thing from this blog, remember this split.

The control stack (strongest to weakest)

Training-time

Pretraining
Supervised Fine-Tuning (SFT)
RLHF and Preference Optimization (DPO, etc.)
PEFT like LoRA and adapters

Inference-time

Soft prompts (learned vectors)
Steering (logit bias, decoding tricks)
System prompt
Few-shot examples

This order is not opinion.

It is enforced by how transformers work.

Image: The LLM Control Stack

[IMAGE PLACEHOLDER: A vertical stack diagram showing the LLM Control Stack.

Top labeled “Deep control (hard to undo)”, bottom labeled “Shallow control (easy to undo)”.

Training-time methods at the top, inference-time methods at the bottom, with a clear horizontal divider labeled “Training-time vs Inference-time boundary”.

Side note: “When control methods conflict, higher layers dominate lower ones.”]

Keep this image in mind.

Every blog in this series zooms into one layer of this stack.

The heatmap (the map you will keep coming back to)

This is the anchor for the entire series.

Read it top to bottom, not left to right.

Control method	When applied	Control strength	How long it lasts	Easy to undo	Cost	Common way it fails
Pretraining	Training	Very high	Permanent	No	Extreme	Locked-in biases
SFT	Training	High	Long	Very hard	High	Silent behavior drift
RLHF / DPO	Training	High	Long	Very hard	High	Over-alignment
LoRA / PEFT	Training	Medium-high	Medium	Yes	Medium	Overfitting to task
Soft prompts	Inference	Medium	Session-level	Yes	Medium	Ignored under conflict
Steering	Inference	Low	Per request	Instant	Low	Weird phrasing
System prompt	Inference	Very low	Per request	Instant	Very low	Context erosion
Few-shot	Inference	Very low	Per request	Instant	Very low	Pattern collapse

Important note:

Control strength here does not mean formatting, verbosity, or tone control.

It means one thing only:

Can this behavior survive long context, paraphrasing, user pushback, and conflicting instructions?

Why this ordering exists (simple reasons)

1. What state you are touching

Weights change defaults
Adapters add skills
Hidden states shape thinking
Logits shape words
Text only suggests behavior

Earlier state always wins.

This is why higher rows dominate lower ones in the heatmap.

2. Gradients vs no gradients

If gradients flow, the model changes.

If they don’t, the model decides whether to listen.

That is why you cannot prompt away safety rules.

You are asking, not changing.

3. Reversibility

The easier it is to undo, the weaker it is.

If you can delete it with one line of code or one prompt change, it does not have deep control.

This is why the bottom half of the heatmap is cheap and fragile.

The hidden rule: conflict resolution

When signals disagree, the model resolves them roughly like this:

What it was trained to do
What attached weights say (LoRA, adapters)
Learned soft prompts
System instructions
Examples
User wording

This single rule explains a lot:

why jailbreaks exist,
why prompt engineering plateaus,
why LoRA beats clever prompting every time.

Keep this rule in mind. We will come back to it often.

Why people reach for the wrong lever

Most people start at the bottom of the stack.

Why?

Prompts are cheap
Prompts feel powerful
Prompts require no commitment

Fine-tuning feels scary.

LoRA sounds complex.

Training sounds expensive.

So people keep pushing prompts harder and harder, even when the problem clearly needs deeper control.

This is not stupidity.

It is rational behavior under uncertainty.

The problem is that prompts fail silently.

They work… until they don’t.

A simple failure example

You write:

“You are a legal analyst. Be formal.”

It works.

Then:

the context gets longer,
the user starts talking casually,
examples contradict tone.

The formality fades.

Nothing broke.

You were just operating too shallow for the behavior you wanted.

Look at the heatmap again.

You were using one of the weakest levers.

What this series will do

Each blog in this series will focus on one layer:

what it really controls,
why it works,
where it breaks,
and how to try it yourself with code when that makes sense.

The goal is not to use the strongest tool.

The goal is to answer this question correctly:

What is the weakest lever that will not fail for my use case?

That is real control.

What’s next

Next, we go to the bottom of the stack.

Why prompts feel powerful.

Why they are fragile.

And why everyone hits the same wall.

Instructions are not control.