邱敬幃 Pardn Chiu

Posted on Jun 29 • Edited on Jul 6

Cognitive Imperfect Memory System

#go #ai #llm #programming

[!Note]
This content is translated by LLM. Original text can be found here

Attempts to solve traditional LLM issues of getting lost in multi-turn conversations and single conversation length limitations by simulating human dialogue patterns.

Based on research paper LLMs Get Lost In Multi-Turn Conversation

Example provides both traditional and new architecture TUI modes for testing: Quick Jump to TUI Example

Paper Problem Analysis

LLMs Get Lost In Multi-Turn Conversation

Common LLM Issues in Long Conversations

Research on 15 LLMs across 200,000+ conversations shows:

Problem: Multi-turn conversation performance drops 39%
Cause: LLMs use "complete memory" models instead of human-like selective memory

Four Key Issues

LLM Problem	Human Behavior
Premature Solutions	Ask clarifying questions first
Information Hoarding	Forget irrelevant details
Linear Replay	Maintain "current state"
Verbose Spirals	Stay focused

Filtering Trumps Memory

Cognitive Burden of Perfect Memory

Research shows exceptional memory ability doesn't equal intelligence advantage. The classic neuropsychological case of Solomon Shereshevsky could remember arbitrary details from decades ago, including meaningless number sequences and vocabulary lists, but "perfect memory" actually created cognitive burden. He couldn't distinguish important from unimportant information, leading to difficulties in abstract thinking and daily decision-making.

Design Insights for LLMs

Traditional LLMs using complete memory models may actually be simulating cognitive impairment. This leads to requiring bigger, more powerful hardware support without proportional performance gains.

Selective attention > Complete memory
Abstract summarization > Detail preservation  
Dynamic adaptation > Fixed replay

Real Conversation Process

Continuously Updating Mental Summary

Humans don't repeatedly go through entire conversation history in their minds
Instead, they maintain a dynamic "current understanding" and update conclusions based on new information
Past details fade or disappear, but key conclusions and constraints persist

Keyword-Triggered Recall

When someone says "that thing we discussed earlier"
We perform fuzzy searches of recent memory for relevant information
Only retrieve specific details when triggered by reference keywords

Implementation Plan

Human conversation process → Engineering implementation

Mental summary → Small model generates structured summary after each turn
Content recall → Automatic fuzzy search of conversation history for each question (relevance scoring)
New conversation → Latest summary + relevant history fragments + new question

Human Conversation Simulation

Simulating imperfect memory: Instead of designing better, larger information retrieval support, we built a system that processes information like humans do

Humans Are Inherently Poor at Complete Memory

We forget irrelevant details
We remember key decisions
We learn from mistakes
We have internal measures
We maintain current conversation focus
We actively associate relevant past content

Combining Machine Advantages

This approach explores combining human cognitive advantages with machine computational advantages:

Simulate human mechanisms: Default state uses only structured summaries, avoiding historical information overwhelming current conversation
Machine enhancement: Complete conversation records are still preserved; when retrieval is triggered, can provide more precise detail recall than humans

Maintains the natural focus characteristics of human conversation while leveraging machine advantages in precise retrieval. Doesn't use complete history during conversation, only activating detailed retrieval under specific trigger conditions.

Engineering Simulation Focus

Exclude unnecessary information: Remove from key summaries
Maintain focus: Use structured summaries, similar to mental rough overviews
Active recall: Automatically retrieve relevant historical content for each question
State updates: Continuous summarization, similar to mental event understanding

Don't replay complete content but use summaries to simulate human rough overviews
Summarize into new overviews to adjust conversation direction, simulating human internal perspectives
Actively retrieve relevant history to simulate human associative memory

Implementation

Continuous mental perspective updates → Automatic summary updates (relevant information retention vs complete history) After each conversation turn, humans unconsciously update their current conversation summary based on new information and proceed with the next turn using new perspectives
Active associative memory → Fuzzy search system (automatic memory retrieval) For each new question, automatically search relevant content in conversation history, simulating human active association of past discussions
Current state focus → Fixed context structure (structured summaries) Dynamically adjust current conversation direction, not re-reviewing entire conversation history

Comparison

Cognitive Mode	Human Behavior	Traditional LLM	Simulation Implementation
Memory Management	Selective retention	Perfect recall	Structured forgetting
Error Learning	Avoid known failures	Repeat mistakes	Excluded options tracking
Focus Maintenance	Current state oriented	Historical drowning	Summary-based context
Memory Retrieval	Active associative triggering	Passive complete memory	Automatic fuzzy search

Memory Architecture

LLM "Complete Memory" (Non-human conversation method)

Flowchart

graph TB
  T1["Turn 1 Conversation"] --> T1_Store["Store: [Q1] + [R1]"]
  T1_Store --> T2["Turn 2 Conversation"]
  T2 --> T2_Store["Store: [Q1] + [R1] + [Q2] + [R2]"]
  T2_Store --> T3["Turn 3 Conversation"]
  T3 --> T3_Store["Store: [Q1] + [R1] + [Q2] + [R2] + [Q3] + [R3]"]
  T3_Store --> TN["Turn N Conversation"]
  TN --> TN_Store["Store: Complete conversation"]

Turn 1: [question 1] + [response 1]
Turn 2: [question 1] + [response 1] + [question 2] + [response 2]
Turn 3: [question 1] + [response 1] + [question 2] + [response 2] + [question 3] + [response 3]
...
Turn N: [Complete verbatim conversation record]

Humans don't completely recall all content
Old irrelevant information interferes with current content generation; humans exclude irrelevant information
No mechanism for learning from mistakes; gets interfered by irrelevant information in long conversations
Linear token growth leads to conversation length limits; humans don't interrupt conversations because they're too long

Human Real Conversation Method Study

Flowchart

graph TB
  H_Input["New question input"] --> H_Fuzzy["Fuzzy search history"]
  H_Fuzzy --> H_Components["Context composition"]
  H_Components --> H_Summary["Structured summary"]
  H_Components --> H_Relevant["Relevant history fragments"]
  H_Components --> H_Question["New question"]

  H_Summary --> H_LLM["LLM response"]
  H_Relevant --> H_LLM
  H_Question --> H_LLM

  H_LLM --> H_Response["Generate answer"]
  H_Response --> H_NewSummary["Update structured summary"]
  H_NewSummary --> H_Store["Store to memory"]

Each turn: [Structured current state] + [Relevant history fragments] + [New question]

Conversation Summary Design

Core topic of current discussion
Accumulated retention of all confirmed requirements
Accumulated retention of all constraint conditions
Excluded options + reasons
Accumulated retention of all important data, facts, and conclusions
Current topic-related questions to clarify
All important historical discussion points

Fuzzy Retrieval Algorithm

Human memory retrieval is typically triggered by keywords, such as: "what we mentioned earlier..."

This section is designed to calculate high similarity between the latest question and conversation history to provide supplementary reference materials, simulating natural memory trigger mechanisms:

Keyword triggering: Immediately associate relevant content upon hearing specific keywords

Semantic Similarity: Comprehend content with similar meaning but different wording

Time Weight: Recent conversations are more easily recalled

Multi-dimensional Scoring Mechanism

Total score = Keyword overlap (40%) + Semantic similarity (40%) + Time weight (20%)

Keyword triggering

Use Jaccard similarity to calculate vocabulary matching degree
Support partial matching and inclusion relationships

Semantic Similarity

Simplified cosine similarity, calculating common vocabulary proportion
Suitable for Chinese-English mixed text processing

Time Weight

Linear decay within 24 hours: recent=1.0, 24 hours ago=0.7
Fixed score of 0.7 after 24 hours (suitable for long-term continuous conversations)

Retrieval Control Mechanism

Relevance threshold: Default 0.3, filters irrelevant content
Result quantity limit: Return maximum 5 most relevant records
Keyword extraction: Automatically filter stop words, retain meaningful vocabulary

Context Combination Strategy

Each turn conversation context = [Structured summary] + [Relevant historical conversation] + [New question]

Implemented

[x] Structured summary system: Simulate human mental rough summaries
[x] State update mechanism: Automatically update cognitive state after each conversation turn (gpt-4o-mini)
[x] Error learning system: Avoid repeated mistakes through ExcludedOptions
[x] Token efficiency optimization: Fixed transmission of summaries and new content, no longer passing complete message streams
[x] Fuzzy retrieval mechanism: Automatically retrieve relevant historical conversations as reference
[x] Multi-dimensional scoring algorithm: Comprehensive relevance assessment of keywords+semantics+time
[x] Long conversation optimization: Time weight design suitable for continuous conversation scenarios

To Be Implemented

[ ] Semantic understanding enhancement: Integrate more precise semantic similarity algorithms
[ ] Keyword extraction optimization: More intelligent vocabulary extraction and weight allocation
[ ] Dynamic threshold adjustment: Automatically adjust relevance thresholds based on conversation content
[ ] Conversation type identification: Optimize memory strategies for different conversation scenarios
[ ] Multi-model support: Support more LLM providers (Claude, Gemini, etc.)

TUI Example Usage

Environment Requirements

Go 1.20 or higher
OpenAI API key

Installation Steps

Clone the project

git clone https://github.com/pardnchiu/cim-prototype 
cd cim-prototype

Configure API key Create an OPENAI_API_KEY file and put your OpenAI API key:

echo "your-openai-api-key-here" > OPENAI_API_KEY

Or set environment variable:

export OPENAI_API_KEY="your-openai-api-key-here"

Run the program

./cimp
./cimp --old # Run traditional memory mode

go run main.go
go run main.go --old # Run traditional memory mode

API Key Configuration

The program will look for OpenAI API key in the following order:

Environment variable OPENAI_API_KEY
OPENAI_API_KEY file in current directory
OPENAI_API_KEY file in executable directory

Instruction File Configuration

INSTRUCTION_CONVERSATION

Defines system instructions for main conversation model (GPT-4o)
Affects AI assistant's response style and behavior
If file doesn't exist, will use blank instructions

INSTRUCTION_SUMMARY

Defines system instructions for summary generation model (GPT-4o-mini)
Affects conversation summary update logic and format
If file doesn't exist, will use blank instructions

Usage

Start the program: After execution, displays three-panel interface
- Left: Conversation history display
- Top right: Conversation summary display
- Bottom right: Question input field
Basic operations:
- Enter: Submit question
- Tab: Switch panel focus
- Ctrl+C: Exit program
Conversation flow:
- After inputting question, system automatically retrieves relevant historical conversations
- AI provides answers based on summary and relevant history
- System automatically updates conversation summary, maintaining memory state (wait for summary update before continuing conversation)

License

This source code project is licensed under the MIT license.

DEV Community