Brendan

Posted on May 20

Prompt Physics: Building a Cognitive Steering Layer for Gemma 4

#devchallenge #gemmachallenge #gemma #gemma4

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

explaind is a local-first cognitive steering layer for Gemma 4. It is not a chatbot wrapper, not an agent system, and not a RAG tool. It is a structured prompt-assembly harness that shapes how Gemma 4 reasons — not just what it says.

The core thesis: for instruction-tuned models like Gemma 4, prompt structure is part of the system design. The harness, the template, and the injection positions are not presentation — they are the engineering. explaind makes that engineering explicit, inspectable, and testable.

The system is built around Gemma 4's documented failure modes rather than pretending they do not exist. Weak system prompt adherence, overconfidence, preference for parametric knowledge over injected context, and stochastic output variance are not problems to work around — they are the design brief.

What it does:

Feature	What it does
8 reasoning abilities	Structured bias vectors grounded in cognitive science: skeptical, causal, compressive, exploratory, calibrator, devil, updater, balanced
`--compare`	Same question through multiple abilities side by side
`--honest`	Two-pass self-critique: balanced first, then skeptical audit
`--chain`	Sequential ability pipeline — each pass transforms the previous output
`--consensus N`	Self-consistency aggregation based on Wang et al. (2022)
`--scaffold`	Persistent JSON reasoning state across chain passes
Three-position BIAS FIELD	Primacy, periodic, and recency steering signals designed around transformer attention position effects
359 passing tests	Prompt geometry is locked down and verified
Terminal UX	Rich Markdown rendering, color-coded ability headers, timed spinner, `--dry-run` and `--trace` for iterative inspection — designed for local use

Every design decision maps to a documented Gemma 4 failure mode:

Gemma 4 Failure Mode	explaind Response
Weak system prompt adherence	Three-position BIAS FIELD (primacy + periodic + recency)
Overconfidence / shallow elaboration	`calibrator` ability + `--honest` two-pass critique
Prefers parametric knowledge over input	`--scratchpad` and `--context` injection with semantic headers
Stochastic output variance	`--consensus N` self-consistency aggregation (Wang et al. 2022)
Reasoning collapse without harness	Cognitive scaffold with persistent JSON state across `--chain`
"Sounds smarter than it is"	`devil` adversarial pressure + structured ability specifications

Demo

Run explaind --full-demo to see the full walkthrough live.

Demo 1 — Same question, three reasoning trajectories

"Was the 2008 financial crisis preventable?"

Skeptical interrogates the question's framing before engaging its content:

Surfaced Assumptions Embedded in the Question's Framing:

1. Assumption of Actionability: The term "preventable" implies 
   that there exists a clear, identifiable intervention point...
2. Assumption of Linear Causality: The framing suggests a simple 
   cause-and-effect relationship...
3. Assumption of Moral Culpability: The question implicitly seeks 
   a judgment on whether actors should have acted differently...

Null Hypothesis Test: The null hypothesis which the skeptical 
analysis must test is that the crisis was inevitable...

Causal traces the mechanism backward from outcome to root condition:

Chain Trace (Working Backward):
  Proximate Cause <- Failure of Liquidity/Solvency
  Failure of Liquidity <- Excessive Leverage and Under-Capitalization
  Excessive Leverage <- Lax Risk Management and Regulatory Arbitrage
  Lax Risk Management <- Structural Flaws (Root Conditions)

Trigger vs. Root Separation:
  Root Conditions: deregulation, complex financial instruments,
  failure of regulatory bodies to enforce adequate capital requirements
  Triggering Condition: collapse in the U.S. subprime mortgage market

Devil constructs the strongest opposing case:

The strongest genuine counterargument is that the 2008 crisis was 
not preventable — an inevitable systemic consequence arising from 
inherent structural flaws, complexity, and interconnectedness of 
the global financial architecture.

The strongest version of the opposing case: The 2008 crisis was 
not preventable because it was an emergent property of a highly 
complex, interconnected, and inadequately regulated system.

Three clearly distinct reasoning trajectories. Same model. Same question. Different prompt physics.

Demo 2 — Self-critique (honest mode)

"AI will eliminate most jobs within 10 years."

Initial response (balanced ability) acknowledges the claim and preserves uncertainty. Self-critique (skeptical audit) then interrogates it:

Surfaced Assumptions Embedded in the Claim:

1. Linearity of Technological Trajectory: The claim assumes the 
   current pace of AI will continue without inflection points...
2. Negligible Adaptation Rate: It assumes workforce capacity for 
   reskilling will be insufficient...
3. Stable Definition of "Job": The claim implicitly assumes the 
   concept of a "job" remains relatively stable...

Evidence Gap Analysis:
- No evidence detailing mechanisms by which AI leads to elimination 
  rather than transformation
- No longitudinal data on workforce adaptation programs
- "Most jobs" is undefined — the claim cannot be empirically tested

The self-critique is substantively different from the initial response — not a restatement, a genuine adversarial audit.

Demo 3 — Calibrated epistemic reasoning

"Is the scientific consensus on climate change settled?"

[HIGH confidence] A broad scientific consensus exists regarding 
the fundamental physics of the greenhouse effect and the role of 
anthropogenic emissions in driving current global warming trends.

[MEDIUM confidence] The consensus is robust regarding the existence 
of human-caused warming, but not entirely "settled" regarding all 
future projections or the precise magnitude of future impacts.

Falsification Conditions:
1. The consensus would be overturned if high-quality independent 
   research definitively demonstrated that primary drivers of 
   warming are not anthropogenic...

Unknown Inventory (What is NOT known):
1. The precise, non-linear tipping points for climate feedback loops
2. The exact socio-economic consequences of various climate scenarios
3. The precise weighting of uncertainty across scientific disciplines

Explicit confidence markers. Named assumptions. Falsification conditions. Unknown inventory. This is calibrated reasoning, not performative hedging.

Code

Full source code and README: brendanddev/explaind

python3.11 -m venv .venv && source .venv/bin/activate
pip install -e .
ollama pull gemma4-e2b_q4_k_m:latest
explaind --full-demo   # full narrative walkthrough
explaind --demo        # three curated live demos

Research Backing

Wang et al. (2022) — Self-consistency sampling → --consensus N
Liu et al. (2023) — Lost in the Middle, position effects in long contexts → three-position BIAS FIELD
Sclar et al. (2024) — Prompt format sensitivity → structured ability file format

How I Used Gemma 4

Model choice: gemma4-e2b_q4_k_m (quantized E2B)

I chose the E2B variant specifically for the edge deployment story — it runs on 8GB unified memory, which means the entire system works on a MacBook Air with no cloud dependency. The E2B is small enough to iterate with but capable enough to show real reasoning differentiation across abilities.

More importantly, E2B's sensitivity to structured prompts is what makes the prompt physics approach work. A less instruction-sensitive model would ignore the BIAS FIELD. A model with perfect instruction following wouldn't need it.

The architecture:

The assembled prompt follows a strict layer order:

SYSTEM PROMPT        <- primacy anchor injected here
GEMMA.md             <- universal invariant layer
                     <- periodic refresh #1
ABILITY              <- structured bias vector
                     <- periodic refresh #2
CONTEXT WINDOW       <- scratchpad + context injection
COGNITIVE SCAFFOLD   <- optional, --chain --scaffold only
BIAS FIELD           <- recency position, strongest signal
<user_input>

Every layer has a job. The BIAS FIELD appears in three positions because transformer attention research shows position matters for instruction persistence across long contexts. The primacy anchor sets the initial interpretive frame. The periodic refreshes fight drift in long prompts. The recency field is the final forceful instruction before user input.

What Gemma 4 specifically enabled:

The <|think|> token is Gemma 4-specific. When injected at the correct position — as the first token of <start_of_turn>model — it activates Gemma 4's native thinking mode. This was a key discovery during development: placing it anywhere else in the prompt has no effect.

What I discovered empirically:

Building against a real running model produced findings that no static analysis would have caught:

1. The implicit thinking channel finding. At temperature=0.0, Gemma 4's analytical abilities route reasoning entirely through implicit <|channel>thought blocks even without <|think|>. The model reasons correctly but the output layer suppresses it — only the input question echoes back. Explicit <|think|> activation makes that reasoning visible.

2. The Ollama double-wrapping issue. Without raw: true in the Ollama API call, hand-assembled chat template markers get wrapped inside Ollama's own template application — producing nested broken turn structure. Every claim about the chat template's effectiveness was contingent on this flag being set correctly.

3. The scaffold drift pattern. The cognitive scaffold works when the model complies with the JSON update instruction. Passes 2-3 of a chain showed turn token contamination (</start_of_turn>) bleeding into the JSON output and corrupting the parse. Graceful degradation handles this — drift_detected is set and output is preserved — but it is a real failure mode.

The failure documentation:

The README has an explicit "Where explaind Fails" section. Ability steering changes reasoning style more reliably than factual accuracy. Consensus at N>3 is slow on 8GB hardware. The three-position BIAS FIELD improves consistency but cannot guarantee instruction following. These are not marketing-friendly statements. They are part of the project.

DEV Community