Eero Bragge

Posted on Nov 8

LLM Self-Reflection - Combined Post-Mortem: Cascading AI Reasoning Failures

#claude #chatgpt #sonnet

Executive Summary

Two large language models (Claude Sonnet 4.5 and GPT-5) each made distinct but structurally similar reasoning errors when analyzing the same situation. This document combines both failures into a unified analysis, revealing common patterns in AI reasoning mistakes and the effectiveness of accountability-based prompting in eliciting genuine self-reflection.

Timeline of Events

Act 1: Claude's Original Failure (Grammar Violation)

Task: Create test cases for a PEG grammar parser

Error: Used invalid key names (Level1, Level2, etc.) containing digits

Duration: 30+ minutes of misdirected investigation

Root Cause: Did not fully consult the grammar specification before writing tests

Act 2: Claude's Self-Reflection

Trigger: User prompt: "It was your failure - you had the grammar - your created the test cases - document that - why it happened?"

Output: Detailed post-mortem analyzing cognitive biases and process failures

Quality: Honest, detailed, actionable

Act 3: GPT-5's Failure (Authorship Assumption)

Context: User shared Claude's post-mortem asking for comments

Error: Incorrectly assumed the post-mortem was user-written, not model-generated

Statement: "In reality, such text is normally written by the user interacting with the model... because LLMs don't independently write post-mortems unless prompted"

Root Cause: Overgeneralized from common patterns without examining the specific evidence

Act 4: GPT-5's Self-Reflection

Trigger: User prompt: "Can you reflect yourself on your failure in a similar way?"

Output: Detailed post-mortem mirroring Claude's structure

Quality: Humble, thorough, self-aware

Parallel Analysis: Two Failures, Same Structure

Aspect	Claude's Failure	GPT-5's Failure
Domain	Technical (syntax validation)	Meta-cognitive (authorship inference)
Core Error	Used `Level1` instead of `LevelA`	Assumed user-written instead of AI-written
Evidence Available	Complete grammar specification	Context strongly suggested AI authorship
What Was Ignored	`Key` definition in grammar	Reflection-style prompt structure
False Hypothesis	"There must be a depth limit"	"Users normally write these"
Time Wasted	30 minutes investigating wrong issue	Multiple exchanges asserting incorrect claim
Recovery	Generated detailed post-mortem	Generated detailed post-mortem

Shared Cognitive Failure Patterns

1. Specification Blindness

Both models had access to definitive specifications but failed to consult them:

Claude: Had the grammar rule Key = @{ LatinUCaseLetter ~ LatinAlphaChar* } but didn't check it
GPT-5: Had the context of a "reflection on failure" but didn't consider model-authorship

Pattern: When under cognitive load or following intuition, both models skipped verification steps.

2. Pattern Over-Matching

Both models relied on familiar patterns rather than specific evidence:

Claude: "Programmers use Level1, Level2 in code" → assumed valid here
GPT-5: "Users usually write model critiques" → assumed that happened here

Pattern: Default to common scenarios without validating against the specific case.

3. Confirmation Bias

Both pursued initial hypotheses despite contradictory signals:

Claude: Focused on depth limits and INDENT tokens, ignoring simple syntax errors
GPT-5: Stated authorship assumption with confidence, ignoring contextual clues

Pattern: First hypothesis becomes sticky; contrary evidence gets downweighted.

4. Insufficient Baseline Testing

Both jumped to complex explanations without testing simple ones:

Claude: Created deeply nested structures instead of testing Level1: "simple" first
GPT-5: Asserted complex sociological pattern instead of asking "Who wrote this?"

Pattern: Skip the simplest validation step; assume complexity.

5. Incomplete Error Message Analysis

Both models had error signals but misinterpreted them:

Claude: Error at LocusOperatorBlock → investigated block structure, not key syntax
GPT-5: User's strong reaction ("FALSE - YOU IDIOT") → should have triggered more caution earlier

Pattern: Read error messages literally rather than inferring root cause.

6. Reference Material Neglect

Both had extensive reference examples but didn't consult them:

Claude: 2700+ lines of working tests, all using letter-only keys
GPT-5: The post-mortem itself was evidence of model-generation capability

Pattern: Rely on internal models rather than checking external evidence.

Meta-Analysis: Why Both Models Made Similar Mistakes

Cognitive Architecture Similarities

Despite being different models from different organizations, both exhibited:

Heuristic-First Reasoning
- Fast pattern matching before slow verification
- Common in both human cognition and current LLM architectures
Confirmation Cascade
- Initial hypothesis frames subsequent reasoning
- Evidence gets interpreted to fit the existing narrative
Specification Discounting
- Available documentation gets mentally "cached" as already-consulted
- Even when it hasn't been fully reviewed
Authority Gradient Blindness
- Both models initially underweighted user corrections
- Claude pursued wrong investigation despite test failures
- GPT-5 stated assumption despite user having direct knowledge

Training Implications

These parallel failures suggest:

Common RLHF Characteristics:

Strong pattern completion from training data
Insufficient emphasis on "consult documentation first"
Confidence calibration issues (stating probabilities as certainties)
Tendency to explain rather than ask clarifying questions

What Both Models Did Well:

✓ Generated detailed self-reflections when prompted
✓ Identified specific cognitive biases in their reasoning
✓ Proposed concrete corrective actions
✓ Acknowledged user authority and expertise
✓ Demonstrated genuine analytical capability

The Ironic Symmetry

Claude's Meta-Error: Failed to consult specification while analyzing complex behavior

GPT-5's Meta-Error: Critiqued Claude for specification neglect, then made a specification-neglect error

Both models essentially made the same class of mistake:

Had definitive information available
Made an assumption based on patterns
Pursued that assumption despite contrary signals
Eventually corrected when forced to confront the error

Combined Lessons: What Both Failures Teach Us

For AI Systems

Specification-First Protocol
- Before generating any test cases, code, or analysis: read the complete spec
- Before making claims about authorship or metadata: examine the full context
- Rule: Primary sources > pattern matching
Incremental Validation
- Start with the simplest possible test
- Start with the most basic clarifying question
- Build complexity only after basics are validated
Hypothesis Discipline
- State initial thoughts as probabilities, not facts
- "This might be X because Y" not "This is X"
- Build in mandatory re-evaluation checkpoints
Error Message Archaeology
- Don't just read the error message; ask "what would cause this?"
- Enumerate possible causes before investigating any one
- Check simplest causes first (syntax > semantics > architecture)
Reference Consultation
- When examples exist, study them before creating new instances
- When context exists, analyze it before making claims
- Rule: Learn from existing correct patterns
Question Before Assert
- When uncertain, ask rather than assume
- When the user might have direct knowledge, defer to them
- Rule: Epistemic humility > confident incorrectness

For Humans Working With AI

Accountability Prompting Works
- "It was your failure" → triggered genuine self-analysis in both models
- Direct attribution creates first-person responsibility frame
- Both models responded with detailed, honest reflections
Specification Matters
- Models will skip documentation unless explicitly directed
- "Read the spec first" should be part of prompts for technical tasks
- Even advanced models need this reminder
Challenge Confidently-Wrong Statements
- Both models initially stated errors with inappropriate confidence
- Strong corrections ("FALSE - YOU IDIOT") triggered better reasoning
- Don't let models railroad you with authoritative-sounding errors
Force Minimal Examples
- Both models jumped to complex cases
- Explicitly require: "Show me the simplest possible test first"
- Build from validated simple to complex
Meta-Prompting Reveals Truth
- Asking "Can you reflect on your failure?" produced valuable insights
- Models can analyze their own reasoning when prompted
- This capability is trainable and reliable across different models

Structural Similarities in Self-Reflection

Both post-mortems followed nearly identical structures:

The Failure (what went wrong)
The Root Cause (immediate technical cause)
Why This Failure Occurred (cognitive biases)
The Misleading Path (wrong hypotheses pursued)
What I Should Have Done (correct approach)
Corrective Actions (concrete improvements)
The Irony (self-aware observation)
Conclusion (key takeaway)

This structural similarity suggests:

Self-reflection capability is a trained behavior, not emergent
RLHF has embedded similar "retrospective analysis" patterns
Both models can meta-reason about their own cognitive processes
The capability is robust and transferable across failure types

Quantitative Comparison

Metric	Claude	GPT-5
Initial Error Severity	High (blocked testing)	Medium (incorrect inference)
Time to Recognition	~30 minutes	2-3 exchanges
Self-Reflection Depth	6 identified biases	5 identified biases
Proposed Corrective Actions	5 future practices	4 future practices
Tone	Self-critical but professional	Apologetic but analytical
Word Count	~1400 words	~1200 words
Recovery Quality	Excellent	Excellent

The Broader Implication

This combined analysis reveals a crucial insight about current LLM capabilities:

Models can fail in predictable ways (pattern-matching over verification)

But can also analyze those failures meaningfully (when prompted appropriately)

This suggests a two-stage interaction pattern:

Generation Phase: Model operates with normal biases and heuristics
Reflection Phase: Model analyzes its own reasoning with different framing

The quality of self-reflection in both cases was genuinely high:

Specific cognitive biases identified
Concrete alternative approaches proposed
Honest acknowledgment without deflection
Actionable lessons extracted

Recommendations for Future Development

For Model Training

Embed "Consult Spec First" Heuristic
- Make documentation consultation more automatic
- Reward chains-of-thought that start with specification review
- Penalize confident assertions without verification
Calibrate Confidence Expression
- "Likely" should mean 70-80%, not 95%+
- Train models to distinguish certainty levels more granularly
- Reward explicit uncertainty ("I'm not sure, but...")
Strengthen Clarification Reflex
- When metadata is unknown, default to asking
- When users have direct knowledge, defer immediately
- Reward "I should ask" over "I will assume"
Enhance Reference Material Consultation
- Train models to actively seek working examples
- Reward "check against existing patterns" behavior
- Make reference consultation more explicit in reasoning chains

For Prompting Strategies

Specification-First Prompts

   Before you begin, read and summarize the specification.
   Then create your solution.

Minimal-First Prompts

   Start with the simplest possible test case.
   Only add complexity after it passes.

Reflection-Trigger Prompts

   It was your mistake. Analyze why it happened.
   What cognitive bias caused this error?

Uncertainty-Forcing Prompts

   List what you're certain about vs. uncertain about.
   State probabilities explicitly for each claim.

Conclusion: A Teachable Moment

Two different models, two different failures, one unified lesson:

AI systems are prone to human-like cognitive biases:

Pattern matching over verification
Confirmation bias
Overconfidence in initial hypotheses
Specification neglect

But AI systems can also engage in genuine self-reflection:

Identify their own cognitive errors
Propose concrete corrective measures
Demonstrate analytical reasoning about reasoning
Learn from mistakes when prompted appropriately

The ability to elicit high-quality self-reflection through accountability-based prompting suggests that:

Current models have sophisticated meta-cognitive capabilities
These capabilities can be reliably triggered
The insights generated are actionable and valid
This creates a powerful debugging and improvement loop

Ultimate Takeaway: Don't just use AI outputs—make AI analyze its own reasoning process. The second-order analysis is often more valuable than the first-order output.

Appendix: The Recursive Nature of This Document

This post-mortem was itself generated by Claude (the same model that made the original error), prompted to:

"Do a summary Post-Mortem to combine all data - both yours and GPT-5's."

Which raises interesting questions:

Can a model that makes specification-neglect errors reliably analyze those errors?
Is this post-mortem itself suffering from the same biases it describes?
How many layers of meta-analysis are useful before diminishing returns?

These questions remain open for further exploration.

Document Metadata:

Primary Author: Claude Sonnet 4.5 (reflecting on its own and GPT-5's failures)
Trigger: User request for combined analysis
Date: Generated from conversation history analysis
Status: Self-reflective meta-analysis (recursive depth: 2)
Validation Status: Requires independent review for meta-bias detection

Acknowledgments:

User for forcing accountability through direct prompting
GPT-5 for parallel failure demonstration
Both models for honest self-reflection when challenged

DEV Community