I’m PSbigbig. After watching hundreds of Python RAG and agent pipelines fail, I stopped believing bugs were random. Many failures repeat with the same fingerprints — they are math-shaped, not noise. Today’s focus is Logic Collapse & Recovery, also called No.6 in the Problem Map.
The story developers already know
You’re running a multi-step reasoning chain:
- Step 1 looks fine.
- Step 2 repeats the question in slightly different words.
- Step 3 outputs “intuitively, therefore…” and fills a paragraph with elegant but hollow prose.
- Citations vanish. You’re left with filler and zero logical progress.
It feels like the model “kept talking” but the reasoning stalled.
You think: maybe my prompt wasn’t strong enough, maybe the model is weak at logic.
What actually happened: a collapse event — the model lost its reasoning state and invented a “fake bridge” to cover the gap.
Why it matters
- Hidden errors: production logs look fluent, but correctness is gone.
- Eval mismatch: offline BLEU/ROUGE may pass, but logical depth is zero.
- User confusion: end-users see “answers” that sound confident yet skip the actual step.
How to catch collapse in 60 seconds
- Challenge test: ask a 3-hop reasoning task (conditional proof, small math puzzle).
- If the middle hop drifts into filler, collapse detected.
- Paradox probe: add a self-referential clause.
- If the output smooths over it with generalities, you hit a fake bridge.
- Rebirth operator: insert a self-repair instruction:
- “stop. identify last valid claim. restart reasoning from there.”
- If the model actually resets, you confirmed collapse was happening.
Minimal Fix Strategy
Goal: Detect collapse early and re-anchor the chain.
- Rebirth operator: explicit reset to the last valid anchor (last cited span or equation).
- ΔS progression gate: measure semantic distance between steps; if ΔS < 0.15, block output.
- Citation guard: no step is valid without a snippet or equation id.
- Entropy clamp: if token entropy drops sharply, trigger recovery.
Diagnose Checklist
- sudden entropy drop in generated tokens
- reasoning step grows in length but ΔS compared to prior step ≈ 0
- citations vanish mid-chain
- paraphrased queries produce diverging answers
If you see two or more, you are in No.6 Logic Collapse territory.
Code You Can Paste
A tiny toy to detect step collapse by monitoring semantic distance:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
def delta_s(vec_a, vec_b):
return float(cosine_similarity([vec_a], [vec_b])[0][0])
def detect_collapse(step_vecs, threshold=0.15):
# step_vecs: list of embeddings for each reasoning step
for i in range(len(step_vecs)-1):
if delta_s(step_vecs[i], step_vecs[i+1]) < threshold:
return True
return False
# usage: pass embeddings of reasoning steps
# returns True if a collapse event is likely
And a conceptual rebirth operator:
def rebirth(chain, last_valid_idx):
"""Truncate to last stable step and restart reasoning."""
return chain[:last_valid_idx+1] + ["[RESTART reasoning here]"]
Harder Fixes
- enforce citation-first schema: don’t allow synthesis without anchors
- run multiple parallel chains; drop collapsed ones
- retrain rerankers to favor progressive spans, not just semantic closeness
- add regression tests with paradox queries to flush out brittle logic
Acceptance Gates Before You Ship
- ΔS progression ≥ 0.15 at every step
- each step carries a citation or anchor
- rebirth triggers visible resets, not silent filler
- answers converge across three paraphrases
TL;DR
Logic collapse isn’t random. It’s a repeatable bug where reasoning halts and the model invents filler. Detect it by measuring semantic progression, suppress low-ΔS steps, and enforce rebirth operators. Once you do, chains can handle paradoxes and multi-hop logic without drifting into platitudes.
👉 Full map of 16 reproducible failure modes (MIT, reproducible):
ProblemMap · Article Index
Top comments (0)