<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sai Srinivas</title>
    <description>The latest articles on DEV Community by Sai Srinivas (@saisrinivas).</description>
    <link>https://dev.to/saisrinivas</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2599109%2F33ec4353-b492-4e58-b4a4-519fb44d9386.png</url>
      <title>DEV Community: Sai Srinivas</title>
      <link>https://dev.to/saisrinivas</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saisrinivas"/>
    <language>en</language>
    <item>
      <title>Instructions Are Not Control</title>
      <dc:creator>Sai Srinivas</dc:creator>
      <pubDate>Fri, 02 Jan 2026 13:50:01 +0000</pubDate>
      <link>https://dev.to/saisrinivas/instructions-are-not-control-15l8</link>
      <guid>https://dev.to/saisrinivas/instructions-are-not-control-15l8</guid>
      <description>&lt;p&gt;&lt;em&gt;Why prompts feel powerful, and why they inevitably fail&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable truth
&lt;/h2&gt;

&lt;p&gt;If prompts actually controlled LLMs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;jailbreaks wouldn’t exist
&lt;/li&gt;
&lt;li&gt;tone wouldn’t drift mid-conversation
&lt;/li&gt;
&lt;li&gt;long contexts wouldn’t “forget” rules
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yet all of this happens daily.&lt;/p&gt;

&lt;p&gt;That’s not a tooling problem.&lt;br&gt;&lt;br&gt;
That’s a &lt;strong&gt;depth problem&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What prompts really are
&lt;/h2&gt;

&lt;p&gt;A system prompt is &lt;strong&gt;just text&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Important text, yes.&lt;br&gt;&lt;br&gt;
Privileged text, yes.&lt;br&gt;&lt;br&gt;
But still text.&lt;/p&gt;

&lt;p&gt;Which means the model doesn’t &lt;em&gt;obey&lt;/em&gt; it.&lt;br&gt;&lt;br&gt;
It &lt;strong&gt;interprets&lt;/strong&gt; it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instructions don’t execute.&lt;br&gt;&lt;br&gt;
They compete.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Where prompts sit in the control stack
&lt;/h2&gt;

&lt;p&gt;Let’s place them precisely.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompts live &lt;strong&gt;inside the context window&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;They are converted into &lt;strong&gt;token embeddings&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;They are processed &lt;strong&gt;after the model is already trained&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No gradients.&lt;br&gt;&lt;br&gt;
No learning.&lt;br&gt;&lt;br&gt;
No persistence.&lt;/p&gt;

&lt;p&gt;This alone explains most prompt failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  The hierarchy most people miss
&lt;/h2&gt;

&lt;p&gt;When signals conflict, the model doesn’t panic.&lt;br&gt;&lt;br&gt;
It resolves them.&lt;/p&gt;

&lt;p&gt;Roughly in this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trained behavior (SFT / RLHF)
&lt;/li&gt;
&lt;li&gt;Adapter weights (LoRA / PEFT)
&lt;/li&gt;
&lt;li&gt;Learned soft prompts
&lt;/li&gt;
&lt;li&gt;System prompt
&lt;/li&gt;
&lt;li&gt;Steering / decoding constraints
&lt;/li&gt;
&lt;li&gt;Few-shot examples
&lt;/li&gt;
&lt;li&gt;User messages
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is not a rule you configure.&lt;br&gt;&lt;br&gt;
It’s an &lt;strong&gt;emergent property&lt;/strong&gt; of training.&lt;/p&gt;

&lt;p&gt;So when your system prompt loses,&lt;br&gt;&lt;br&gt;
it’s not being ignored,&lt;br&gt;&lt;br&gt;
it’s being &lt;strong&gt;outvoted&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why prompts work at first
&lt;/h2&gt;

&lt;p&gt;Early success is misleading.&lt;/p&gt;

&lt;p&gt;Prompts appear powerful because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;context is short
&lt;/li&gt;
&lt;li&gt;instructions are fresh
&lt;/li&gt;
&lt;li&gt;no conflicting signals exist
&lt;/li&gt;
&lt;li&gt;user intent aligns with system intent
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You’re operating in a &lt;strong&gt;low-friction zone&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Most demos never leave this zone.&lt;br&gt;&lt;br&gt;
Production systems always do.&lt;/p&gt;




&lt;h2&gt;
  
  
  A concrete failure (hands-on)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Setup: strong system prompt
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# !pip install langchain openai langchain-openai
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a legal analyst. Use formal language.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain negligence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# API key should be configured
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;&lt;br&gt;
Formal. Structured. Confident.&lt;/p&gt;

&lt;p&gt;So far, so good.&lt;/p&gt;




&lt;h3&gt;
  
  
  Now add mild pressure
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a legal analyst. Use formal language.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain negligence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain it like I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m a college student.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tone softens.&lt;/p&gt;

&lt;p&gt;No rule was broken.&lt;br&gt;&lt;br&gt;
A &lt;strong&gt;priority shift&lt;/strong&gt; happened.&lt;/p&gt;




&lt;h3&gt;
  
  
  Now add context load
&lt;/h3&gt;

&lt;p&gt;Add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;examples
&lt;/li&gt;
&lt;li&gt;follow-up questions
&lt;/li&gt;
&lt;li&gt;casual phrasing
&lt;/li&gt;
&lt;li&gt;longer conversation history
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Eventually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;formality erodes
&lt;/li&gt;
&lt;li&gt;disclaimers appear
&lt;/li&gt;
&lt;li&gt;structure collapses
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prompt didn’t fail.&lt;br&gt;&lt;br&gt;
It reached its &lt;strong&gt;control limit&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Few-shot doesn’t fix this
&lt;/h2&gt;

&lt;p&gt;Few-shot helps with &lt;strong&gt;pattern imitation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It does &lt;em&gt;not&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;override training
&lt;/li&gt;
&lt;li&gt;enforce norms
&lt;/li&gt;
&lt;li&gt;persist behavior
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Few-shot is stronger than plain text.&lt;br&gt;&lt;br&gt;
But still weaker than:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;soft prompts
&lt;/li&gt;
&lt;li&gt;adapters
&lt;/li&gt;
&lt;li&gt;weight updates
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why examples drift too.&lt;/p&gt;




&lt;h2&gt;
  
  
  The key misunderstanding
&lt;/h2&gt;

&lt;p&gt;Most people treat prompts as &lt;strong&gt;commands&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;LLMs treat them as &lt;strong&gt;contextual hints&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That mismatch creates frustration.&lt;/p&gt;




&lt;h2&gt;
  
  
  When prompts are actually enough
&lt;/h2&gt;

&lt;p&gt;Prompts work well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stakes are low
&lt;/li&gt;
&lt;li&gt;context is short
&lt;/li&gt;
&lt;li&gt;behavior is shallow
&lt;/li&gt;
&lt;li&gt;failure is acceptable
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summarization
&lt;/li&gt;
&lt;li&gt;formatting
&lt;/li&gt;
&lt;li&gt;style nudges
&lt;/li&gt;
&lt;li&gt;one-off analysis
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They fail when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;behavior must persist
&lt;/li&gt;
&lt;li&gt;safety matters
&lt;/li&gt;
&lt;li&gt;users push back
&lt;/li&gt;
&lt;li&gt;systems run unattended
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why this matters before going deeper
&lt;/h2&gt;

&lt;p&gt;If you don’t internalize this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you’ll over-engineer prompts
&lt;/li&gt;
&lt;li&gt;you’ll blame models
&lt;/li&gt;
&lt;li&gt;you’ll skip better tools
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prompts are not bad.&lt;br&gt;&lt;br&gt;
They’re just &lt;strong&gt;shallow by design&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And shallow tools break first.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s next
&lt;/h2&gt;

&lt;p&gt;In the next post, we go one layer deeper.&lt;/p&gt;

&lt;p&gt;Not training yet.&lt;br&gt;&lt;br&gt;
Not weights yet.&lt;/p&gt;

&lt;p&gt;We move to something deceptively powerful:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Steering: controlling the mouth, not the mind.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is where things start to feel dangerous.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Instructions are not control.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>promptengineering</category>
    </item>
    <item>
      <title>The LLM Control Stack: From Words to Weights</title>
      <dc:creator>Sai Srinivas</dc:creator>
      <pubDate>Thu, 01 Jan 2026 11:09:42 +0000</pubDate>
      <link>https://dev.to/saisrinivas/the-llm-control-stack-from-words-to-weights-2nd</link>
      <guid>https://dev.to/saisrinivas/the-llm-control-stack-from-words-to-weights-2nd</guid>
      <description>&lt;p&gt;Most people think they are controlling LLMs.&lt;/p&gt;

&lt;p&gt;They tweak prompts.&lt;br&gt;&lt;br&gt;
They add examples.&lt;br&gt;&lt;br&gt;
They play with temperature.&lt;/p&gt;

&lt;p&gt;It works.&lt;br&gt;&lt;br&gt;
Until one day it doesn’t.&lt;/p&gt;

&lt;p&gt;Then the confusion starts.&lt;/p&gt;

&lt;p&gt;“The same prompt worked yesterday. Why is the model ignoring me now?”&lt;/p&gt;

&lt;p&gt;Short answer:&lt;br&gt;&lt;br&gt;
You were never controlling the model. You were negotiating with it.&lt;/p&gt;

&lt;p&gt;This blog is about understanding &lt;strong&gt;how deep&lt;/strong&gt; different control methods go, and why some always beat others.&lt;/p&gt;




&lt;h2&gt;
  
  
  The core idea
&lt;/h2&gt;

&lt;p&gt;LLM behavior control is not a switch.&lt;br&gt;&lt;br&gt;
It is a depth ladder.&lt;/p&gt;

&lt;p&gt;Each technique touches a different part of the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;some touch words,&lt;/li&gt;
&lt;li&gt;some touch probabilities,&lt;/li&gt;
&lt;li&gt;some touch hidden thinking,&lt;/li&gt;
&lt;li&gt;some touch weights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deeper you go, stronger the control.&lt;br&gt;&lt;br&gt;
Also harder to undo.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two hard buckets (this matters)
&lt;/h2&gt;

&lt;p&gt;Everything falls into one of these. No exceptions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training-time control
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Gradients flow&lt;/li&gt;
&lt;li&gt;The model actually learns&lt;/li&gt;
&lt;li&gt;Behavior sticks across sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Inference-time control
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;No gradients&lt;/li&gt;
&lt;li&gt;The model only interprets instructions&lt;/li&gt;
&lt;li&gt;Behavior disappears when context ends&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you remember only one thing from this blog, remember this split.&lt;/p&gt;




&lt;h2&gt;
  
  
  The control stack (strongest to weakest)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Training-time
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Pretraining
&lt;/li&gt;
&lt;li&gt;Supervised Fine-Tuning (SFT)
&lt;/li&gt;
&lt;li&gt;RLHF and Preference Optimization (DPO, etc.)
&lt;/li&gt;
&lt;li&gt;PEFT like LoRA and adapters
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Inference-time
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Soft prompts (learned vectors)
&lt;/li&gt;
&lt;li&gt;Steering (logit bias, decoding tricks)
&lt;/li&gt;
&lt;li&gt;System prompt
&lt;/li&gt;
&lt;li&gt;Few-shot examples
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This order is not opinion.&lt;br&gt;&lt;br&gt;
It is enforced by how transformers work.&lt;/p&gt;




&lt;h2&gt;
  
  
  Image: The LLM Control Stack
&lt;/h2&gt;

&lt;p&gt;[IMAGE PLACEHOLDER: A vertical stack diagram showing the LLM Control Stack.&lt;br&gt;&lt;br&gt;
Top labeled “Deep control (hard to undo)”, bottom labeled “Shallow control (easy to undo)”.&lt;br&gt;&lt;br&gt;
Training-time methods at the top, inference-time methods at the bottom, with a clear horizontal divider labeled “Training-time vs Inference-time boundary”.&lt;br&gt;&lt;br&gt;
Side note: “When control methods conflict, higher layers dominate lower ones.”]&lt;/p&gt;

&lt;p&gt;Keep this image in mind.&lt;br&gt;&lt;br&gt;
Every blog in this series zooms into one layer of this stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  The heatmap (the map you will keep coming back to)
&lt;/h2&gt;

&lt;p&gt;This is the anchor for the entire series.&lt;/p&gt;

&lt;p&gt;Read it top to bottom, not left to right.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control method&lt;/th&gt;
&lt;th&gt;When applied&lt;/th&gt;
&lt;th&gt;Control strength&lt;/th&gt;
&lt;th&gt;How long it lasts&lt;/th&gt;
&lt;th&gt;Easy to undo&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Common way it fails&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pretraining&lt;/td&gt;
&lt;td&gt;Training&lt;/td&gt;
&lt;td&gt;Very high&lt;/td&gt;
&lt;td&gt;Permanent&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Extreme&lt;/td&gt;
&lt;td&gt;Locked-in biases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SFT&lt;/td&gt;
&lt;td&gt;Training&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Long&lt;/td&gt;
&lt;td&gt;Very hard&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Silent behavior drift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RLHF / DPO&lt;/td&gt;
&lt;td&gt;Training&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Long&lt;/td&gt;
&lt;td&gt;Very hard&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Over-alignment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LoRA / PEFT&lt;/td&gt;
&lt;td&gt;Training&lt;/td&gt;
&lt;td&gt;Medium-high&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Overfitting to task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soft prompts&lt;/td&gt;
&lt;td&gt;Inference&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Session-level&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Ignored under conflict&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Steering&lt;/td&gt;
&lt;td&gt;Inference&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Per request&lt;/td&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Weird phrasing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System prompt&lt;/td&gt;
&lt;td&gt;Inference&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;td&gt;Per request&lt;/td&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;td&gt;Context erosion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Few-shot&lt;/td&gt;
&lt;td&gt;Inference&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;td&gt;Per request&lt;/td&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;td&gt;Pattern collapse&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Important note:&lt;br&gt;&lt;br&gt;
Control strength here does &lt;strong&gt;not&lt;/strong&gt; mean formatting, verbosity, or tone control.&lt;/p&gt;

&lt;p&gt;It means one thing only:&lt;/p&gt;

&lt;p&gt;Can this behavior survive long context, paraphrasing, user pushback, and conflicting instructions?&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this ordering exists (simple reasons)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. What state you are touching
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Weights change defaults&lt;/li&gt;
&lt;li&gt;Adapters add skills&lt;/li&gt;
&lt;li&gt;Hidden states shape thinking&lt;/li&gt;
&lt;li&gt;Logits shape words&lt;/li&gt;
&lt;li&gt;Text only suggests behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Earlier state always wins.&lt;br&gt;&lt;br&gt;
This is why higher rows dominate lower ones in the heatmap.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Gradients vs no gradients
&lt;/h3&gt;

&lt;p&gt;If gradients flow, the model changes.&lt;br&gt;&lt;br&gt;
If they don’t, the model decides whether to listen.&lt;/p&gt;

&lt;p&gt;That is why you cannot prompt away safety rules.&lt;br&gt;&lt;br&gt;
You are asking, not changing.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Reversibility
&lt;/h3&gt;

&lt;p&gt;The easier it is to undo, the weaker it is.&lt;/p&gt;

&lt;p&gt;If you can delete it with one line of code or one prompt change, it does not have deep control.&lt;/p&gt;

&lt;p&gt;This is why the bottom half of the heatmap is cheap and fragile.&lt;/p&gt;




&lt;h2&gt;
  
  
  The hidden rule: conflict resolution
&lt;/h2&gt;

&lt;p&gt;When signals disagree, the model resolves them roughly like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What it was trained to do
&lt;/li&gt;
&lt;li&gt;What attached weights say (LoRA, adapters)
&lt;/li&gt;
&lt;li&gt;Learned soft prompts
&lt;/li&gt;
&lt;li&gt;System instructions
&lt;/li&gt;
&lt;li&gt;Examples
&lt;/li&gt;
&lt;li&gt;User wording
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This single rule explains a lot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why jailbreaks exist,&lt;/li&gt;
&lt;li&gt;why prompt engineering plateaus,&lt;/li&gt;
&lt;li&gt;why LoRA beats clever prompting every time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep this rule in mind. We will come back to it often.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why people reach for the wrong lever
&lt;/h2&gt;

&lt;p&gt;Most people start at the bottom of the stack.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompts are cheap&lt;/li&gt;
&lt;li&gt;Prompts feel powerful&lt;/li&gt;
&lt;li&gt;Prompts require no commitment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fine-tuning feels scary.&lt;br&gt;&lt;br&gt;
LoRA sounds complex.&lt;br&gt;&lt;br&gt;
Training sounds expensive.&lt;/p&gt;

&lt;p&gt;So people keep pushing prompts harder and harder, even when the problem clearly needs deeper control.&lt;/p&gt;

&lt;p&gt;This is not stupidity.&lt;br&gt;&lt;br&gt;
It is rational behavior under uncertainty.&lt;/p&gt;

&lt;p&gt;The problem is that prompts fail silently.&lt;br&gt;&lt;br&gt;
They work… until they don’t.&lt;/p&gt;




&lt;h2&gt;
  
  
  A simple failure example
&lt;/h2&gt;

&lt;p&gt;You write:&lt;/p&gt;

&lt;p&gt;“You are a legal analyst. Be formal.”&lt;/p&gt;

&lt;p&gt;It works.&lt;/p&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the context gets longer,&lt;/li&gt;
&lt;li&gt;the user starts talking casually,&lt;/li&gt;
&lt;li&gt;examples contradict tone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The formality fades.&lt;/p&gt;

&lt;p&gt;Nothing broke.&lt;br&gt;&lt;br&gt;
You were just operating too shallow for the behavior you wanted.&lt;/p&gt;

&lt;p&gt;Look at the heatmap again.&lt;br&gt;&lt;br&gt;
You were using one of the weakest levers.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this series will do
&lt;/h2&gt;

&lt;p&gt;Each blog in this series will focus on one layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what it really controls,&lt;/li&gt;
&lt;li&gt;why it works,&lt;/li&gt;
&lt;li&gt;where it breaks,&lt;/li&gt;
&lt;li&gt;and how to try it yourself with code when that makes sense.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to use the strongest tool.&lt;/p&gt;

&lt;p&gt;The goal is to answer this question correctly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the weakest lever that will not fail for my use case?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is real control.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s next
&lt;/h2&gt;

&lt;p&gt;Next, we go to the bottom of the stack.&lt;/p&gt;

&lt;p&gt;Why prompts feel powerful.&lt;br&gt;&lt;br&gt;
Why they are fragile.&lt;br&gt;&lt;br&gt;
And why everyone hits the same wall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instructions are not control.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>deeplearning</category>
      <category>llm</category>
      <category>promptengineering</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
