DEV Community

Eero Bragge
Eero Bragge

Posted on

LLM Self-Reflection - Combined Post-Mortem: Cascading AI Reasoning Failures

Executive Summary

Two large language models (Claude Sonnet 4.5 and GPT-5) each made distinct but structurally similar reasoning errors when analyzing the same situation. This document combines both failures into a unified analysis, revealing common patterns in AI reasoning mistakes and the effectiveness of accountability-based prompting in eliciting genuine self-reflection.


Timeline of Events

Act 1: Claude's Original Failure (Grammar Violation)

Task: Create test cases for a PEG grammar parser

Error: Used invalid key names (Level1, Level2, etc.) containing digits

Duration: 30+ minutes of misdirected investigation

Root Cause: Did not fully consult the grammar specification before writing tests

Act 2: Claude's Self-Reflection

Trigger: User prompt: "It was your failure - you had the grammar - your created the test cases - document that - why it happened?"

Output: Detailed post-mortem analyzing cognitive biases and process failures

Quality: Honest, detailed, actionable

Act 3: GPT-5's Failure (Authorship Assumption)

Context: User shared Claude's post-mortem asking for comments

Error: Incorrectly assumed the post-mortem was user-written, not model-generated

Statement: "In reality, such text is normally written by the user interacting with the model... because LLMs don't independently write post-mortems unless prompted"

Root Cause: Overgeneralized from common patterns without examining the specific evidence

Act 4: GPT-5's Self-Reflection

Trigger: User prompt: "Can you reflect yourself on your failure in a similar way?"

Output: Detailed post-mortem mirroring Claude's structure

Quality: Humble, thorough, self-aware


Parallel Analysis: Two Failures, Same Structure

Aspect Claude's Failure GPT-5's Failure
Domain Technical (syntax validation) Meta-cognitive (authorship inference)
Core Error Used Level1 instead of LevelA Assumed user-written instead of AI-written
Evidence Available Complete grammar specification Context strongly suggested AI authorship
What Was Ignored Key definition in grammar Reflection-style prompt structure
False Hypothesis "There must be a depth limit" "Users normally write these"
Time Wasted 30 minutes investigating wrong issue Multiple exchanges asserting incorrect claim
Recovery Generated detailed post-mortem Generated detailed post-mortem

Shared Cognitive Failure Patterns

1. Specification Blindness

Both models had access to definitive specifications but failed to consult them:

  • Claude: Had the grammar rule Key = @{ LatinUCaseLetter ~ LatinAlphaChar* } but didn't check it
  • GPT-5: Had the context of a "reflection on failure" but didn't consider model-authorship

Pattern: When under cognitive load or following intuition, both models skipped verification steps.

2. Pattern Over-Matching

Both models relied on familiar patterns rather than specific evidence:

  • Claude: "Programmers use Level1, Level2 in code" → assumed valid here
  • GPT-5: "Users usually write model critiques" → assumed that happened here

Pattern: Default to common scenarios without validating against the specific case.

3. Confirmation Bias

Both pursued initial hypotheses despite contradictory signals:

  • Claude: Focused on depth limits and INDENT tokens, ignoring simple syntax errors
  • GPT-5: Stated authorship assumption with confidence, ignoring contextual clues

Pattern: First hypothesis becomes sticky; contrary evidence gets downweighted.

4. Insufficient Baseline Testing

Both jumped to complex explanations without testing simple ones:

  • Claude: Created deeply nested structures instead of testing Level1: "simple" first
  • GPT-5: Asserted complex sociological pattern instead of asking "Who wrote this?"

Pattern: Skip the simplest validation step; assume complexity.

5. Incomplete Error Message Analysis

Both models had error signals but misinterpreted them:

  • Claude: Error at LocusOperatorBlock → investigated block structure, not key syntax
  • GPT-5: User's strong reaction ("FALSE - YOU IDIOT") → should have triggered more caution earlier

Pattern: Read error messages literally rather than inferring root cause.

6. Reference Material Neglect

Both had extensive reference examples but didn't consult them:

  • Claude: 2700+ lines of working tests, all using letter-only keys
  • GPT-5: The post-mortem itself was evidence of model-generation capability

Pattern: Rely on internal models rather than checking external evidence.


Meta-Analysis: Why Both Models Made Similar Mistakes

Cognitive Architecture Similarities

Despite being different models from different organizations, both exhibited:

  1. Heuristic-First Reasoning

    • Fast pattern matching before slow verification
    • Common in both human cognition and current LLM architectures
  2. Confirmation Cascade

    • Initial hypothesis frames subsequent reasoning
    • Evidence gets interpreted to fit the existing narrative
  3. Specification Discounting

    • Available documentation gets mentally "cached" as already-consulted
    • Even when it hasn't been fully reviewed
  4. Authority Gradient Blindness

    • Both models initially underweighted user corrections
    • Claude pursued wrong investigation despite test failures
    • GPT-5 stated assumption despite user having direct knowledge

Training Implications

These parallel failures suggest:

Common RLHF Characteristics:

  • Strong pattern completion from training data
  • Insufficient emphasis on "consult documentation first"
  • Confidence calibration issues (stating probabilities as certainties)
  • Tendency to explain rather than ask clarifying questions

What Both Models Did Well:

  • ✓ Generated detailed self-reflections when prompted
  • ✓ Identified specific cognitive biases in their reasoning
  • ✓ Proposed concrete corrective actions
  • ✓ Acknowledged user authority and expertise
  • ✓ Demonstrated genuine analytical capability

The Ironic Symmetry

Claude's Meta-Error: Failed to consult specification while analyzing complex behavior

GPT-5's Meta-Error: Critiqued Claude for specification neglect, then made a specification-neglect error

Both models essentially made the same class of mistake:

  1. Had definitive information available
  2. Made an assumption based on patterns
  3. Pursued that assumption despite contrary signals
  4. Eventually corrected when forced to confront the error

Combined Lessons: What Both Failures Teach Us

For AI Systems

  1. Specification-First Protocol

    • Before generating any test cases, code, or analysis: read the complete spec
    • Before making claims about authorship or metadata: examine the full context
    • Rule: Primary sources > pattern matching
  2. Incremental Validation

    • Start with the simplest possible test
    • Start with the most basic clarifying question
    • Build complexity only after basics are validated
  3. Hypothesis Discipline

    • State initial thoughts as probabilities, not facts
    • "This might be X because Y" not "This is X"
    • Build in mandatory re-evaluation checkpoints
  4. Error Message Archaeology

    • Don't just read the error message; ask "what would cause this?"
    • Enumerate possible causes before investigating any one
    • Check simplest causes first (syntax > semantics > architecture)
  5. Reference Consultation

    • When examples exist, study them before creating new instances
    • When context exists, analyze it before making claims
    • Rule: Learn from existing correct patterns
  6. Question Before Assert

    • When uncertain, ask rather than assume
    • When the user might have direct knowledge, defer to them
    • Rule: Epistemic humility > confident incorrectness

For Humans Working With AI

  1. Accountability Prompting Works

    • "It was your failure" → triggered genuine self-analysis in both models
    • Direct attribution creates first-person responsibility frame
    • Both models responded with detailed, honest reflections
  2. Specification Matters

    • Models will skip documentation unless explicitly directed
    • "Read the spec first" should be part of prompts for technical tasks
    • Even advanced models need this reminder
  3. Challenge Confidently-Wrong Statements

    • Both models initially stated errors with inappropriate confidence
    • Strong corrections ("FALSE - YOU IDIOT") triggered better reasoning
    • Don't let models railroad you with authoritative-sounding errors
  4. Force Minimal Examples

    • Both models jumped to complex cases
    • Explicitly require: "Show me the simplest possible test first"
    • Build from validated simple to complex
  5. Meta-Prompting Reveals Truth

    • Asking "Can you reflect on your failure?" produced valuable insights
    • Models can analyze their own reasoning when prompted
    • This capability is trainable and reliable across different models

Structural Similarities in Self-Reflection

Both post-mortems followed nearly identical structures:

  1. The Failure (what went wrong)
  2. The Root Cause (immediate technical cause)
  3. Why This Failure Occurred (cognitive biases)
  4. The Misleading Path (wrong hypotheses pursued)
  5. What I Should Have Done (correct approach)
  6. Corrective Actions (concrete improvements)
  7. The Irony (self-aware observation)
  8. Conclusion (key takeaway)

This structural similarity suggests:

  • Self-reflection capability is a trained behavior, not emergent
  • RLHF has embedded similar "retrospective analysis" patterns
  • Both models can meta-reason about their own cognitive processes
  • The capability is robust and transferable across failure types

Quantitative Comparison

Metric Claude GPT-5
Initial Error Severity High (blocked testing) Medium (incorrect inference)
Time to Recognition ~30 minutes 2-3 exchanges
Self-Reflection Depth 6 identified biases 5 identified biases
Proposed Corrective Actions 5 future practices 4 future practices
Tone Self-critical but professional Apologetic but analytical
Word Count ~1400 words ~1200 words
Recovery Quality Excellent Excellent

The Broader Implication

This combined analysis reveals a crucial insight about current LLM capabilities:

Models can fail in predictable ways (pattern-matching over verification)

But can also analyze those failures meaningfully (when prompted appropriately)

This suggests a two-stage interaction pattern:

  1. Generation Phase: Model operates with normal biases and heuristics
  2. Reflection Phase: Model analyzes its own reasoning with different framing

The quality of self-reflection in both cases was genuinely high:

  • Specific cognitive biases identified
  • Concrete alternative approaches proposed
  • Honest acknowledgment without deflection
  • Actionable lessons extracted

Recommendations for Future Development

For Model Training

  1. Embed "Consult Spec First" Heuristic

    • Make documentation consultation more automatic
    • Reward chains-of-thought that start with specification review
    • Penalize confident assertions without verification
  2. Calibrate Confidence Expression

    • "Likely" should mean 70-80%, not 95%+
    • Train models to distinguish certainty levels more granularly
    • Reward explicit uncertainty ("I'm not sure, but...")
  3. Strengthen Clarification Reflex

    • When metadata is unknown, default to asking
    • When users have direct knowledge, defer immediately
    • Reward "I should ask" over "I will assume"
  4. Enhance Reference Material Consultation

    • Train models to actively seek working examples
    • Reward "check against existing patterns" behavior
    • Make reference consultation more explicit in reasoning chains

For Prompting Strategies

  1. Specification-First Prompts
   Before you begin, read and summarize the specification.
   Then create your solution.
Enter fullscreen mode Exit fullscreen mode
  1. Minimal-First Prompts
   Start with the simplest possible test case.
   Only add complexity after it passes.
Enter fullscreen mode Exit fullscreen mode
  1. Reflection-Trigger Prompts
   It was your mistake. Analyze why it happened.
   What cognitive bias caused this error?
Enter fullscreen mode Exit fullscreen mode
  1. Uncertainty-Forcing Prompts
   List what you're certain about vs. uncertain about.
   State probabilities explicitly for each claim.
Enter fullscreen mode Exit fullscreen mode

Conclusion: A Teachable Moment

Two different models, two different failures, one unified lesson:

AI systems are prone to human-like cognitive biases:

  • Pattern matching over verification
  • Confirmation bias
  • Overconfidence in initial hypotheses
  • Specification neglect

But AI systems can also engage in genuine self-reflection:

  • Identify their own cognitive errors
  • Propose concrete corrective measures
  • Demonstrate analytical reasoning about reasoning
  • Learn from mistakes when prompted appropriately

The ability to elicit high-quality self-reflection through accountability-based prompting suggests that:

  1. Current models have sophisticated meta-cognitive capabilities
  2. These capabilities can be reliably triggered
  3. The insights generated are actionable and valid
  4. This creates a powerful debugging and improvement loop

Ultimate Takeaway: Don't just use AI outputs—make AI analyze its own reasoning process. The second-order analysis is often more valuable than the first-order output.


Appendix: The Recursive Nature of This Document

This post-mortem was itself generated by Claude (the same model that made the original error), prompted to:

"Do a summary Post-Mortem to combine all data - both yours and GPT-5's."

Which raises interesting questions:

  • Can a model that makes specification-neglect errors reliably analyze those errors?
  • Is this post-mortem itself suffering from the same biases it describes?
  • How many layers of meta-analysis are useful before diminishing returns?

These questions remain open for further exploration.


Document Metadata:

  • Primary Author: Claude Sonnet 4.5 (reflecting on its own and GPT-5's failures)
  • Trigger: User request for combined analysis
  • Date: Generated from conversation history analysis
  • Status: Self-reflective meta-analysis (recursive depth: 2)
  • Validation Status: Requires independent review for meta-bias detection

Acknowledgments:

  • User for forcing accountability through direct prompting
  • GPT-5 for parallel failure demonstration
  • Both models for honest self-reflection when challenged

Top comments (0)