This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
explaind is a local-first cognitive steering layer for Gemma 4. It is not a chatbot wrapper, not an agent system, and not a RAG tool. It is a structured prompt-assembly harness that shapes how Gemma 4 reasons — not just what it says.
The core thesis: for instruction-tuned models like Gemma 4, prompt structure is part of the system design. The harness, the template, and the injection positions are not presentation — they are the engineering. explaind makes that engineering explicit, inspectable, and testable.
The system is built around Gemma 4's documented failure modes rather than pretending they do not exist. Weak system prompt adherence, overconfidence, preference for parametric knowledge over injected context, and stochastic output variance are not problems to work around — they are the design brief.
What it does:
| Feature | What it does |
|---|---|
| 8 reasoning abilities | Structured bias vectors grounded in cognitive science: skeptical, causal, compressive, exploratory, calibrator, devil, updater, balanced |
--compare |
Same question through multiple abilities side by side |
--honest |
Two-pass self-critique: balanced first, then skeptical audit |
--chain |
Sequential ability pipeline — each pass transforms the previous output |
--consensus N |
Self-consistency aggregation based on Wang et al. (2022) |
--scaffold |
Persistent JSON reasoning state across chain passes |
| Three-position BIAS FIELD | Primacy, periodic, and recency steering signals designed around transformer attention position effects |
| 359 passing tests | Prompt geometry is locked down and verified |
| Terminal UX | Rich Markdown rendering, color-coded ability headers, timed spinner, --dry-run and --trace for iterative inspection — designed for local use |
Every design decision maps to a documented Gemma 4 failure mode:
| Gemma 4 Failure Mode | explaind Response |
|---|---|
| Weak system prompt adherence | Three-position BIAS FIELD (primacy + periodic + recency) |
| Overconfidence / shallow elaboration |
calibrator ability + --honest two-pass critique |
| Prefers parametric knowledge over input |
--scratchpad and --context injection with semantic headers |
| Stochastic output variance |
--consensus N self-consistency aggregation (Wang et al. 2022) |
| Reasoning collapse without harness | Cognitive scaffold with persistent JSON state across --chain
|
| "Sounds smarter than it is" |
devil adversarial pressure + structured ability specifications |
Demo
Run explaind --full-demo to see the full walkthrough live.
Demo 1 — Same question, three reasoning trajectories
"Was the 2008 financial crisis preventable?"
Skeptical interrogates the question's framing before engaging its content:
Surfaced Assumptions Embedded in the Question's Framing:
1. Assumption of Actionability: The term "preventable" implies
that there exists a clear, identifiable intervention point...
2. Assumption of Linear Causality: The framing suggests a simple
cause-and-effect relationship...
3. Assumption of Moral Culpability: The question implicitly seeks
a judgment on whether actors should have acted differently...
Null Hypothesis Test: The null hypothesis which the skeptical
analysis must test is that the crisis was inevitable...
Causal traces the mechanism backward from outcome to root condition:
Chain Trace (Working Backward):
Proximate Cause <- Failure of Liquidity/Solvency
Failure of Liquidity <- Excessive Leverage and Under-Capitalization
Excessive Leverage <- Lax Risk Management and Regulatory Arbitrage
Lax Risk Management <- Structural Flaws (Root Conditions)
Trigger vs. Root Separation:
Root Conditions: deregulation, complex financial instruments,
failure of regulatory bodies to enforce adequate capital requirements
Triggering Condition: collapse in the U.S. subprime mortgage market
Devil constructs the strongest opposing case:
The strongest genuine counterargument is that the 2008 crisis was
not preventable — an inevitable systemic consequence arising from
inherent structural flaws, complexity, and interconnectedness of
the global financial architecture.
The strongest version of the opposing case: The 2008 crisis was
not preventable because it was an emergent property of a highly
complex, interconnected, and inadequately regulated system.
Three clearly distinct reasoning trajectories. Same model. Same question. Different prompt physics.
Demo 2 — Self-critique (honest mode)
"AI will eliminate most jobs within 10 years."
Initial response (balanced ability) acknowledges the claim and preserves uncertainty. Self-critique (skeptical audit) then interrogates it:
Surfaced Assumptions Embedded in the Claim:
1. Linearity of Technological Trajectory: The claim assumes the
current pace of AI will continue without inflection points...
2. Negligible Adaptation Rate: It assumes workforce capacity for
reskilling will be insufficient...
3. Stable Definition of "Job": The claim implicitly assumes the
concept of a "job" remains relatively stable...
Evidence Gap Analysis:
- No evidence detailing mechanisms by which AI leads to elimination
rather than transformation
- No longitudinal data on workforce adaptation programs
- "Most jobs" is undefined — the claim cannot be empirically tested
The self-critique is substantively different from the initial response — not a restatement, a genuine adversarial audit.
Demo 3 — Calibrated epistemic reasoning
"Is the scientific consensus on climate change settled?"
[HIGH confidence] A broad scientific consensus exists regarding
the fundamental physics of the greenhouse effect and the role of
anthropogenic emissions in driving current global warming trends.
[MEDIUM confidence] The consensus is robust regarding the existence
of human-caused warming, but not entirely "settled" regarding all
future projections or the precise magnitude of future impacts.
Falsification Conditions:
1. The consensus would be overturned if high-quality independent
research definitively demonstrated that primary drivers of
warming are not anthropogenic...
Unknown Inventory (What is NOT known):
1. The precise, non-linear tipping points for climate feedback loops
2. The exact socio-economic consequences of various climate scenarios
3. The precise weighting of uncertainty across scientific disciplines
Explicit confidence markers. Named assumptions. Falsification conditions. Unknown inventory. This is calibrated reasoning, not performative hedging.
Code
Full source code and README: brendanddev/explaind
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e .
ollama pull gemma4-e2b_q4_k_m:latest
explaind --full-demo # full narrative walkthrough
explaind --demo # three curated live demos
Research Backing
-
Wang et al. (2022) — Self-consistency sampling →
--consensus N - Liu et al. (2023) — Lost in the Middle, position effects in long contexts → three-position BIAS FIELD
- Sclar et al. (2024) — Prompt format sensitivity → structured ability file format
How I Used Gemma 4
Model choice: gemma4-e2b_q4_k_m (quantized E2B)
I chose the E2B variant specifically for the edge deployment story — it runs on 8GB unified memory, which means the entire system works on a MacBook Air with no cloud dependency. The E2B is small enough to iterate with but capable enough to show real reasoning differentiation across abilities.
More importantly, E2B's sensitivity to structured prompts is what makes the prompt physics approach work. A less instruction-sensitive model would ignore the BIAS FIELD. A model with perfect instruction following wouldn't need it.
The architecture:
The assembled prompt follows a strict layer order:
SYSTEM PROMPT <- primacy anchor injected here
GEMMA.md <- universal invariant layer
<- periodic refresh #1
ABILITY <- structured bias vector
<- periodic refresh #2
CONTEXT WINDOW <- scratchpad + context injection
COGNITIVE SCAFFOLD <- optional, --chain --scaffold only
BIAS FIELD <- recency position, strongest signal
<user_input>
Every layer has a job. The BIAS FIELD appears in three positions because transformer attention research shows position matters for instruction persistence across long contexts. The primacy anchor sets the initial interpretive frame. The periodic refreshes fight drift in long prompts. The recency field is the final forceful instruction before user input.
What Gemma 4 specifically enabled:
The <|think|> token is Gemma 4-specific. When injected at the correct position — as the first token of <start_of_turn>model — it activates Gemma 4's native thinking mode. This was a key discovery during development: placing it anywhere else in the prompt has no effect.
What I discovered empirically:
Building against a real running model produced findings that no static analysis would have caught:
1. The implicit thinking channel finding. At temperature=0.0, Gemma 4's analytical abilities route reasoning entirely through implicit <|channel>thought blocks even without <|think|>. The model reasons correctly but the output layer suppresses it — only the input question echoes back. Explicit <|think|> activation makes that reasoning visible.
2. The Ollama double-wrapping issue. Without raw: true in the Ollama API call, hand-assembled chat template markers get wrapped inside Ollama's own template application — producing nested broken turn structure. Every claim about the chat template's effectiveness was contingent on this flag being set correctly.
3. The scaffold drift pattern. The cognitive scaffold works when the model complies with the JSON update instruction. Passes 2-3 of a chain showed turn token contamination (</start_of_turn>) bleeding into the JSON output and corrupting the parse. Graceful degradation handles this — drift_detected is set and output is preserved — but it is a real failure mode.
The failure documentation:
The README has an explicit "Where explaind Fails" section. Ability steering changes reasoning style more reliably than factual accuracy. Consensus at N>3 is slow on 8GB hardware. The three-position BIAS FIELD improves consistency but cannot guarantee instruction following. These are not marketing-friendly statements. They are part of the project.





Top comments (0)