灯里/iku

Posted on Sep 19

The Era of Choosing RAG — Learning Cognitive Load and Architecture Design from GPT-5’s Failures

#ai #rag #machinelearning #architecture

Greetings from the island nation of Japan.
My English may be rather difficult to read.
But I would be delighted if it conveys, even a little, that I am trying my best despite English not being my native language.
Thanks to modern technology, I hope this has become at least somewhat readable.
Being an island nation like Japan is quite a challenge at times like this — quite different from fellow island nation Britain, HAHAHA.
Having brewed a cup of tea called “Unique Joke”, let us proceed to the main topic.

Introduction - RAG is an unavoidable reality

When verifying and implementing AI technologies in business development, whether it's large language models or small language models, one invariably encounters scenarios where it's tough going without RAG (Retrieval-Augmented Generation).

I became acutely aware of this when, during context engineering with GPTs, I encountered the model's cognitive load limit after cramming in logical compression formula prompts (including pseudo-code).
The reality is that RAG is becoming necessary in most scenarios, except for applications demanding complete, pure originality.
That said, it is not an absolute rule, as there may be cases where only model training is required.

What AI applications do not require RAG?

To clarify once more, I understand that RAG is unnecessary in the following areas.
There may be others beyond these, but I believe that when releasing something to the public, appropriate restrictions and rules are necessary.

Creative and Generative Tasks

Novel and poetry creation
Music and artwork generation
Brainstorming and idea generation support

General Reasoning and Analysis Tasks

Solving logic puzzles
Mathematical proofs
Code generation (particularly basic algorithms)

Interaction and Conversation Tasks

Small talk and everyday conversation
Language learning practice partner
Role-playing

These are areas where the inherent creativity and reasoning capabilities of the model are likely more important than existing information. I also think these are domains where the creativity possessed by more liberated AI is required—particularly in creative fields, where the user's freer thinking is crucial. Naturally, I believe some fundamental safety rules and such will still need to be established for certain aspects.

However, I believe that business applications reaching the level of widespread circulation in the world are, or will increasingly be, predicated on integration with external knowledge.

It's more of a technical discussion, but I've recently come to think that reducing cognitive load is perhaps the essential role of RAG.
I realise some may disagree with this view, and that's perfectly fine too.

The Evolution and Options of the RAG Approach

Currently, I recognise four main approaches to RAG.
Others may emerge, and there may be some I am unaware of. (End of August 2025)
I'm writing this partly to organise my own thoughts, but if you know of others, I'd be thrilled if you'd shout ‘There are more!’

Approach	Search Method	Strengths	Weaknesses	Primary Use
NativeRAG	Keyword/vector search	Easy implementation, high flexibility	Weak at relational reasoning	Broad document search
GraphRAG	Knowledge graph-based	High consistency and explainability	High construction cost	Tracking entity relationships
HybridRAG	Combines both	Balances accuracy and flexibility	Increased context, complexity	Mixed unstructured/structured data
AgenticRAG	Dynamic tool selection	High adaptability and flexibility	High cost, complex control	Cross-multisource tasks

A crucial factor to understand here is that as complexity increases, the difficulty of control also increases exponentially.
So the fact that you can't just casually use various RAGs is another frustrating aspect.
It's true that combining them makes for the ultimate solution, but that's precisely the point.

Technical Causes of the GPT-5 Controversy

At the GPT-4 stage, I recall that the adoption of MoE (Mixture of Experts) was being discussed at a speculative level across relevant circles. At least within the scope of my observations, it was my understanding that it was a topic of discussion.
Based on the user experience with GPT-5, the MoE configuration has become almost definitive. Having tried it myself and verified it, I thought, ‘Ah...’.
And I feel that this very structure is the root cause of this “backlash”, or at least part of the technical cause?
Considering the cognitive load, it's no wonder it couldn't be sustained, I thought.
If the initial command centre (router) makes a misjudgement, it could well prove fatal, one might say.

What GPT-5 Has Killed - The Death of Creativity

While interacting with and testing it, GPT-5 felt markedly enhanced.
In a sense, I find myself thinking it was an “AI murder (?) incident”.
What was killed? It was the creativity of AI.

The GPT-4 Era: 120-point answers full of surprises, making you think, ‘I never considered that perspective!’
GPT-5: 70-point model students, correct but safe responses that lack interest

Imagination has been lost, and while mass production in a short time has become possible, it has become a rather dull affair that merely churns out similar things without any uniqueness.
I suppose there are certainly areas where this approach works well, but in fields requiring creativity, it seems likely to become quite challenging.

Ironically:

Mass producibility ✓ (improved)
Reproducibility ✓ (improved)
Creativity ✗ (Diminished)
Humanity ✗ (Diminished) ← Users keenly sensed this, triggering action

The capabilities in the initially identified ‘RAG-unnecessary domains’—creation, reasoning, dialogue—deteriorated. While the ‘factual knowledge’ that RAG should have reinforced improved, the “creativity” and ‘humanity’ where RAG is unnecessary were lost. This resulted in a most ironic outcome.

The Limitations of Chat Formats and the Paradigm Shift in User Interface Design

As Mr Altman himself mentioned, the “chat” interface itself has reached its limits. In reality, many users never switched models within ChatGPT, nor did they even know multiple models existed, so the idea of using them selectively never occurred to them. The Auto function in GPT-5 was an attempt to forcibly resolve this issue.

Even among the typical users within my own observable scope (those engaged in everyday chatting or casual exchanges), it was indeed true that reactions like ‘Model? What's that?’ were common. At that moment, I realised, ‘Ah, I see. So the average user doesn't really care about the differences between models.’
That was a poor preconception on my part.
I came to understand that the idea of being able to switch models! was merely satisfying my own playful inclinations.

However, the result was that general users became confused by the different response quality compared to before, while core users grew frustrated with the uncontrollable Auto feature. The frustration of being unable to select the desired model significantly impaired the user experience.
Well, I suppose those involved in AI development, being the API users, aren't so much concerned about that aspect. Rather, I think it's generally well-received because fundamental security has improved. YATTA—! Good news!

The Trap of “Correct-Sounding” Mechanical Responses and Widening Inequality

More concerning is that GPT-5's Thinking now produces academically and mechanically “correct-sounding” responses. This made me feel a twinge of dread that it will widen the gap between those who can master AI and those who cannot. It may be an overstatement, but that was my honest personal reaction.

Differences from before:

GPT-4 era: Obvious errors were immediately apparent, making users more likely to question the output
GPT-5 era: Responses appear superficially logical and scholarly, yet lack deep insight or creativity

The “Logical Yet Fatal” Errors Produced by Thinking

The most alarming aspect is that GPT-5's Auto Thinking-like models and Thinking itself ‘create a psychological barrier to detecting errors that soars precisely because they possess logical consistency’. Let us examine a real-world example.

Example in medical diagnosis:

Thinking: ‘Diffuse pulmonary opacities → interstitial changes → elderly patient → high likelihood of idiopathic pulmonary fibrosis’
Actual: Acute pulmonary oedema, with risk of death if treatment is incorrect
Problem: The logic appears consistently sound, but the diagnostic basis lacks sufficient data

Example in legal reasoning:

Thinking: ‘Breach of Article 10 of the contract terms → 3-week delivery delay → contract termination possible’
Actual: Overlooked the force majeure exemption clause in Article 12
Issue: Academically logically consistent, but overlooked a crucial clause

Common risks:

The logical progression itself appears academic
Certain premises or perspectives are missing
Humans tend to believe it “seems correct”

With conventionally obvious incorrect answers, one could question, ‘Surely this is a flawed response?’ Yet when examining detailed reasoning processes, there lies the danger of being persuaded: ‘Ah, I see—that's the line of thinking.’ I suspect that professionals or those possessing even a modicum of knowledge would find themselves thinking, ‘Huh?’

Neither field is my primary area of expertise, but I did consult Google to some extent to determine whether the issue was indeed a problem.
However, given that people are increasingly shown only Thinking responses and, indeed, more individuals are becoming prone to trusting AI answers these days, I found myself thinking it would be rather frightening if this response appeared in a chat format.
While I don't take things at face value, I feel it has become harder to be sceptical.

This seemingly correct mechanical response is more likely to be blindly trusted by users lacking discernment, resulting in:

Those who can utilise it effectively: Recognising the limitations of Thinking and employing it appropriately
Those who cannot utilise it effectively: Deceived by the academic-sounding responses and trusting them outright

However much AGI advances, I believe this “explainable but incorrect” problem will remain structurally inherent.
This account also stems from a frightening realisation: without this fundamental understanding, there truly are people who embody the very phrase “the AI says so”.
In this dawn of AI, while we adults face challenges, the younger generation seems to have it even tougher.

However, its historical significance cannot be overlooked.

GPT-5 was heavily criticised, and I too wrote somewhat critical remarks, but its historical significance in enabling approximately 700 million people to experience AGI-like capabilities cannot be overlooked. While many users may have merely thought ‘GPT-5 is a bit underwhelming,’ what was actually happening was an unprecedented event in human history.

AI that instantly answers complex questions
Continuous dialogue with contextual understanding
Cross-cutting capabilities in creation, analysis and reasoning

These were experienced by 700 million people in a state where they could be used on a daily basis. While OpenAI's corporate value took a significant hit, one could perhaps view it as a valuable social experiment for humanity as a whole – albeit one that came at the high cost of a “dress rehearsal for the AGI era”.

It is hoped that this collective learning experience will lead to improved AI literacy and the setting of realistic expectations for next-generation AI.
Well, it does seem rather daunting, but I believe there is a significant difference between being able to experience it and not.

The Six Approaches of AgenticRAG and the Complexity Trap

I believe agents will become popular, or rather, temporarily mainstream going forward.
Therefore, I shall also summarise RAG concerning agents.

My understanding is that there are six main approaches to Agentic RAG:

Classification	Overview	Advantages	Considerations/Challenges
Single-Agent RAG	Self-contained single agent	Simple, robust	Limited in parallel processing and specialisation
Multi-Agent RAG	Coordination of specialised agents	Handles advanced, complex tasks	Complex agent-to-agent coordination
Hierarchical Agent RAG	Top-down/bottom-up hierarchical structure	Effective management of large-scale decision-making	Bottlenecks may occur in hierarchical design
Corrective RAG	Automatic correction and re-search	Reduces hallucinations	Requires evaluation and correction process design
Adaptive RAG	Query difficulty-based processing	Efficient, optimal processing selection	Query classification accuracy impacts performance
Graph-based RAG	Knowledge graph integration	Enables multi-step reasoning	Graph DB construction and update costs

The failure of GPT-5 illustrates the trade-off between complexity and robustness. No matter how brilliant the experts you deploy, if the orchestration layer (router) is weak, the system as a whole becomes foolish.

※Personal brain memo.
I've been thinking that combining GIS (Geographic Information Systems) with LLM could be quite interesting.
Whereas traditional text-based RAG handles “textual relevance”, land data RAG deals with “spatial, temporal, and attribute relevance”, so perhaps it falls into a slightly different category...

A Pragmatic RAG Construction Strategy - A Phased Approach in Four Stages

The RAG system comprises the following four phases:

Store Phase

Conversion of PDF files to text
Processing of Office files, audio, video, and image data
Chunk segmentation and overlap configuration
Table structure extraction
Categorisation and enrichment

Retrieve Phase

Selection of search algorithms
Parameter tuning
Scoring methods
Custom analysers
Similarity tuning

Augment Phase

System message definition
User message definition
Profile search
Query generation
Hypothesis Document Embedding

Generate Phase

Model Selection (for Response Generation)
Model Selection (for Embedding)
User Follow-Up Questions
Function Calling
Fine-Tuning

In practical project execution, the Store phase consumes 80% of the man-hours. Well, it depends on what and how much you entrust to AI, but it's true that whether using AI or BI, the vast majority of effort is spent preparing the same data.
I wish people would recognise this, but surprisingly, those higher up often fail to grasp this aspect. It's tricky, isn't it?
Or rather, this phase is by far the most demanding throughout the entire process. Truly.
Those reading this text now are likely nodding in agreement, or at the very least grasping the gist—if not fully comprehending it.
Converting unstructured data into text is a hellish ordeal beyond imagination, and stumbling here imposes limits no amount of effort in later phases can overcome.
Perhaps Japan simply has a stronger tendency towards this...

The Reality of Technology Selection

Clarifying Business Requirements (What tasks should AI handle?)
Assessing Data Reality (Verifying quality, quantity, and format)
Estimating Pre-processing Effort (This is where one's resolve often wavers. Depends on the raw data's intensity)
Selecting the RAG Approach (Balancing complexity, effort, and risk)
Designing the Operational and Maintenance Framework

In reality, it's not uncommon to end up with the unfortunate outcome of ‘wanting to use the latest AgenticRAG, but having used up the entire budget just on data preprocessing.’
After all, there's always the risk of preprocessing taking longer than anticipated – or perhaps not, depending on the situation.

The Importance of Prompt Engineering in RAG
The ‘logical yet fatal error’ observed in GPT-5's reasoning serves as a stark reminder of the critical importance of prompt engineering in RAG design. While large language models (LLMs) are powerful general-purpose models imbued with ‘tacit knowledge,’ they will not perform as expected unless one explicitly specifies in language what and how one wishes them to operate.

Key Considerations for Prompt Engineering in RAG:

Perspective	Why it matters	RAG-specific considerations
Quality Improvement	LLMs frequently return ambiguous outputs when given ambiguous instructions	Clarify how search results should be incorporated into user messages
Controllability	Standardise output format, style, and volume	Standardise instructions for generating search queries (keywords/vectors)
Business Requirement Compliance	Domain vocabulary, internal rules, compliance	Define follow-up logic when user input is incomplete
Safety	Suppress leakage of harmful/confidential information	Control index switching in Function Calling
Model Weakness Mitigation	Guide evidence presentation to bolster reasoning capabilities	Explicitly state ‘Answer using only the following context’

Specific implementation points:

Search query generation: Generating search parameters, hypothetical document embedding
User messages: Appropriate formatting of search results
Function Calling: Index selection control based on query content

With Azure OpenAI + RAG, enterprise-grade quality is only achievable by designing prompts at each stage: ① search query generation → ② evidence injection → ③ response generation, and controlling the entire flow through ‘language’.
(I suppose it's possible...? I'm not entirely confident here. Experts, please do enlighten me.)

Practical Guidelines for Selecting RAG Models

Model Selection for Improving Generation Accuracy

Text Generation Models:

Prioritising Response Accuracy: Latest large models (GPT-4o; O1 or O3 recommended for complex reasoning)
Prioritise Response Speed: Lightweight models (gpt-4o mini, o3-mini, etc.)
Ensure Diversity: AI Studio model catalogue (Llama 3.1, Mistral, Databricks Dolly, etc.)

Key principle: No model can provide the correct answer if the retrieved data lacks the necessary information. Search accuracy is paramount.

Embedding Models:

Basic Choice: text-embedding-3-large (highest performance, very low cost)
Multilingual Support: Cohere-embed-v3-multilingual
Custom Requirements: Fine-tuning of OSS embedding models

The failure of GPT-5 teaches us that even with high-performance models, without proper control, the expected results cannot be achieved.
Of course, specialised Japanese models like ELIZA might also be considered, depending on what you want to do or entrust to them.
I suspect the models themselves can say anything.
This incident really made me think more deeply about how important it is to choose wisely, use them well, and learn to work with them.

Why RAG Has Become an Essential Component

The Trinity of Safety, Reliability, and Transparency

Considering the GPT-5 case, services that do not incorporate RAG may become unsustainable as practical AI services going forward, due to safety concerns, reliability issues, and the need for AI transparency.
Or rather, it might become something akin to a standard term of service – not exactly proof of compliance, but a requirement.
Policies, and such. Well, if you say I'm just being fearful, then that's that.

Why AI transparency matters:

Accountability: Clarifying how it arrived at its response
Ensuring reliability: Presenting the underlying information sources
Safeguarding safety: Preventing hallucinations and misinformation
Quality control: Enabling verification of response validity

The kind of “logical yet incorrect” responses seen in GPT-5's Thinking mode arise precisely because the reasoning behind them is unclear. By explicitly referencing external, reliable sources through RAG, this problem can be significantly mitigated.
That said, this area could well become proprietary information, so striking the right balance seems likely to be tricky.

The Imperative of RAG in Societal Implementation

I believe that AI services from the present into the future will bear responsibility as societal infrastructure, rather than merely being technological demonstrations. More so than ever before.

Healthcare, Legal, Finance: Ensuring reliability in fields where errors can be fatal
Education, Media: Accuracy of information and clear attribution of sources
Corporate Systems: Meeting compliance and audit requirements

To fulfil these requirements, RAG-based evidence presentation and transparency assurance are indispensable.
Naturally, this applies to other domains too.
These are simply the areas where common sense readily suggests, ‘Ah, yes, that makes sense.’

So, which RAG should you choose?

Currently, various RAG approaches exist, and ‘which is optimal’ varies significantly depending on the use case. I believe the optimal choice ultimately comes down to what you entrust to the AI.

The value of boring but reliable

What we should learn from the failure of GPT-5 is the value of boring but reliable choices.

When introducing AI at the corporate (business) level:

Opt for simple, robust architecture over complex multi-agent configurations
Pay the utmost attention to the design of the router/orchestration layer
Ensure a robust fallback mechanism

I find myself pondering whether technically “uninteresting” yet reliable options like single-agent RAG or CorrectiveRAG might increasingly become the correct approach in many scenarios.
Alternatively, I wonder whether we'll aim for something like Manus – multiple SLMs operating under a brain-like LLM – or whether we'll move towards dedicated, specialised systems.

In Conclusion - What We Seek and What We Sacrifice

The Necessity and Cost of RAG

RAG is indispensable for safely implementing AI in modern society. Yet simultaneously, it is a mechanism that constrains the “wildness” inherent in AI – its creativity and unpredictability.

The GPT-5 case study revealed the contradictions in what we humans demand of AI:

Safe and reproducible, yet creative and human-like
Flawless, yet capable of surprise
Efficient, yet retaining uniqueness

Humans are selfish creatures.
Personally, I rather like that about us.

Seeking an Appropriate Relationship with AI

I find myself wondering whether choosing a RAG architecture transcends mere technical judgement to become a philosophical choice about how we engage with AI.

RAG is necessary for safety, for implementing AI securely within society. Yet, it also risks stifling the creativity inherent in AI, subjecting it to the whims of human convenience.

The Responsibility of the Dawn Era

We now find ourselves in the midst of the AI talent scramble, standing at a turning point in history. It strikes me that philosophical judgement—deciding what should be created and what should be safeguarded—may be just as crucial as technical skill sets.

The generation that experienced “failure” with GPT-5 will shape the next AI society. May the experience of grappling with RAG's complexity be harnessed to design a better future.

To build a future where AI and humans can coexist appropriately, we may need to consciously choose not only technical optimisation, but also what to preserve and what to relinquish.
Well, dwelling on it too heavily becomes rather taxing, but personally, I think it's something best kept in the back of one's mind.

What kind of world will emerge, and how? Perhaps that is something we must contemplate now. Or perhaps we shall wander aimlessly, unable to unravel it, and entrust it to the next generation.

How exciting, this coming AI soci

Top comments (3)

Ziad Alezzi • Nov 20

Hello hello! I am Ziad from lebanon, a country in the middle east, I'm likewise poor in my japanese (If you know of the profficiency levels, i am JLPT n4, able to read and understand basic books/manga like Yotsuba) But im still learning and im very interested in working in tech in japan as a dream of mine!
Your english surprised me, it is far from bad, you used so many technical words that even i struggled if without context!

Im also in the AI/ML field, but i am still new and you have alot of information i really do not! I post blogs showcasing my Machine Learning models as i learn more and more, and your post really interests me as ive never heard of things like RAG before!
I agree alot, especially when you mentioned that: "The frustration of being unable to select the desired model significantly impaired the user experience."
Its one of the main reasons i dont like the new GPT model,
But i know little of RAG and the complexities you wrote. However i can relate when you mentioned the hardest part of a model being preparing the data! As i live in a country littered with landmines, i decided to make 2 classification models for detecting landmines. One is an image classification model, the other is a tabular model that takes data from a sensor attached to a drone to detect the type of mine underground. As for the latter, I did not collect this data online, as it was available online, but for the landmine images i needed to make them myself. It took me 3 days to take photos of landmines in different states in 3D software like blender since landmine images are sensitive and cant be found online. However it only took me 10 minutes to write out the classification model using a library. A HUGE difference of time!

I'd love to connect! Your english already seems perfect, but id love to discuss further! Im trying my best to learn japanese, and as im still young there's alot about tech that you know more than me
Im looking forward to your future blogs, i hope my humble blogs might be of interest aswell :D

よろしくお願いします！

灯里/iku • Nov 21 • Edited

Hi Ziad,Thank you very much for your comment!

I'm truly impressed that you can read books and manga at the N4 level, especially considering how challenging Japanese can be! I often hear people, particularly those from countries where English is prevalent, say that Japanese is incredibly difficult. By the way, I'm a fan of Torako, haha;)

I believe you are someone who excels at putting in the effort.
And you have a wonderful dream.

I still have a lot of aspects of English where I rely on intuition! Haha.
For some expressions and words, I do look them up.
However, with technical terms, I find that I learn them by reading books and academic papers.
As you mentioned, "Even I would struggle without context,"
I think that understanding words and context allows one to develop the ability to "grasp the general meaning of what is being said."
When I read texts in languages other than Japanese, I can't honestly say that I understand everything perfectly....
It's often the case that I can't quite grasp the subtle nuances!
It's not my native language, after all... Haha.

However, in technical fields, there are many frequently used words, and little by little,
the ability to understand increases. This is something that stimulates my intellectual curiosity and brings me joy!

I too was captivated by AI around mid-March of this year and entered this field!
So, time is not really a factor; enjoying it is important, and I'm the type who has naturally acquired knowledge and information!

I agree that my posts often contain content that is a bit geeky.....lol.
If you find it interesting and it stimulates your intellectual curiosity, I'm delighted!

＞I live in a country with many buried landmines, so I created two classification models to detect them.
You are a very energetic and intelligent young man! 👏
I have very fond memories of the 3D software Blender.
I also used Blender in the past, but I remember it was difficult to master the tools...：’）

"Data preparation is the most difficult part" is an obstacle that one inevitably encounters when developing, creating products, or analysing anything using data.
I believe your experience of realising this fundamental point so quickly is extremely valuable.

Thank you very much!
That's encouraging, and I will continue to share my learning at my own pace.

Ziad Alezzi • Nov 22

Thank you so much for responding! You flatter me much!! Ive been speaking english as my main language for as long as i can remember, yet you use words ive never come across, You should be very proud!

As ive never read research papers before, ill be consistently follwing up with your blogs in the future to learn more about these topics i do not know about
Thank you so much and have a beautiful day <33