Greetings from the island nation of Japan.
My English may be rather difficult to read.
But I would be delighted if it conveys, even a little, that I am trying my best despite English not being my native language.
Thanks to modern technology, I hope this has become at least somewhat readable.
Being an island nation like Japan is quite a challenge at times like this — quite different from fellow island nation Britain, HAHAHA.
Having brewed a cup of tea called “Unique Joke”, let us proceed to the main topic.
Introduction - RAG is an unavoidable reality
When verifying and implementing AI technologies in business development, whether it's large language models or small language models, one invariably encounters scenarios where it's tough going without RAG (Retrieval-Augmented Generation).
I became acutely aware of this when, during context engineering with GPTs, I encountered the model's cognitive load limit after cramming in logical compression formula prompts (including pseudo-code).
The reality is that RAG is becoming necessary in most scenarios, except for applications demanding complete, pure originality.
That said, it is not an absolute rule, as there may be cases where only model training is required.
What AI applications do not require RAG?
To clarify once more, I understand that RAG is unnecessary in the following areas.
There may be others beyond these, but I believe that when releasing something to the public, appropriate restrictions and rules are necessary.
Creative and Generative Tasks
- Novel and poetry creation
- Music and artwork generation
- Brainstorming and idea generation support
General Reasoning and Analysis Tasks
- Solving logic puzzles
- Mathematical proofs
- Code generation (particularly basic algorithms)
Interaction and Conversation Tasks
- Small talk and everyday conversation
- Language learning practice partner
- Role-playing
These are areas where the inherent creativity and reasoning capabilities of the model are likely more important than existing information. I also think these are domains where the creativity possessed by more liberated AI is required—particularly in creative fields, where the user's freer thinking is crucial. Naturally, I believe some fundamental safety rules and such will still need to be established for certain aspects.
However, I believe that business applications reaching the level of widespread circulation in the world are, or will increasingly be, predicated on integration with external knowledge.
It's more of a technical discussion, but I've recently come to think that reducing cognitive load is perhaps the essential role of RAG.
I realise some may disagree with this view, and that's perfectly fine too.
The Evolution and Options of the RAG Approach
Currently, I recognise four main approaches to RAG.
Others may emerge, and there may be some I am unaware of. (End of August 2025)
I'm writing this partly to organise my own thoughts, but if you know of others, I'd be thrilled if you'd shout ‘There are more!’
Approach | Search Method | Strengths | Weaknesses | Primary Use |
---|---|---|---|---|
NativeRAG | Keyword/vector search | Easy implementation, high flexibility | Weak at relational reasoning | Broad document search |
GraphRAG | Knowledge graph-based | High consistency and explainability | High construction cost | Tracking entity relationships |
HybridRAG | Combines both | Balances accuracy and flexibility | Increased context, complexity | Mixed unstructured/structured data |
AgenticRAG | Dynamic tool selection | High adaptability and flexibility | High cost, complex control | Cross-multisource tasks |
A crucial factor to understand here is that as complexity increases, the difficulty of control also increases exponentially.
So the fact that you can't just casually use various RAGs is another frustrating aspect.
It's true that combining them makes for the ultimate solution, but that's precisely the point.
Technical Causes of the GPT-5 Controversy
At the GPT-4 stage, I recall that the adoption of MoE (Mixture of Experts) was being discussed at a speculative level across relevant circles. At least within the scope of my observations, it was my understanding that it was a topic of discussion.
Based on the user experience with GPT-5, the MoE configuration has become almost definitive. Having tried it myself and verified it, I thought, ‘Ah...’.
And I feel that this very structure is the root cause of this “backlash”, or at least part of the technical cause?
Considering the cognitive load, it's no wonder it couldn't be sustained, I thought.
If the initial command centre (router) makes a misjudgement, it could well prove fatal, one might say.
What GPT-5 Has Killed - The Death of Creativity
While interacting with and testing it, GPT-5 felt markedly enhanced.
In a sense, I find myself thinking it was an “AI murder (?) incident”.
What was killed? It was the creativity of AI.
- The GPT-4 Era: 120-point answers full of surprises, making you think, ‘I never considered that perspective!’
- GPT-5: 70-point model students, correct but safe responses that lack interest
Imagination has been lost, and while mass production in a short time has become possible, it has become a rather dull affair that merely churns out similar things without any uniqueness.
I suppose there are certainly areas where this approach works well, but in fields requiring creativity, it seems likely to become quite challenging.
Ironically:
- Mass producibility ✓ (improved)
- Reproducibility ✓ (improved)
- Creativity ✗ (Diminished)
- Humanity ✗ (Diminished) ← Users keenly sensed this, triggering action
The capabilities in the initially identified ‘RAG-unnecessary domains’—creation, reasoning, dialogue—deteriorated. While the ‘factual knowledge’ that RAG should have reinforced improved, the “creativity” and ‘humanity’ where RAG is unnecessary were lost. This resulted in a most ironic outcome.
The Limitations of Chat Formats and the Paradigm Shift in User Interface Design
As Mr Altman himself mentioned, the “chat” interface itself has reached its limits. In reality, many users never switched models within ChatGPT, nor did they even know multiple models existed, so the idea of using them selectively never occurred to them. The Auto function in GPT-5 was an attempt to forcibly resolve this issue.
Even among the typical users within my own observable scope (those engaged in everyday chatting or casual exchanges), it was indeed true that reactions like ‘Model? What's that?’ were common. At that moment, I realised, ‘Ah, I see. So the average user doesn't really care about the differences between models.’
That was a poor preconception on my part.
I came to understand that the idea of being able to switch models! was merely satisfying my own playful inclinations.
However, the result was that general users became confused by the different response quality compared to before, while core users grew frustrated with the uncontrollable Auto feature. The frustration of being unable to select the desired model significantly impaired the user experience.
Well, I suppose those involved in AI development, being the API users, aren't so much concerned about that aspect. Rather, I think it's generally well-received because fundamental security has improved. YATTA—! Good news!
The Trap of “Correct-Sounding” Mechanical Responses and Widening Inequality
More concerning is that GPT-5's Thinking now produces academically and mechanically “correct-sounding” responses. This made me feel a twinge of dread that it will widen the gap between those who can master AI and those who cannot. It may be an overstatement, but that was my honest personal reaction.
Differences from before:
- GPT-4 era: Obvious errors were immediately apparent, making users more likely to question the output
- GPT-5 era: Responses appear superficially logical and scholarly, yet lack deep insight or creativity
The “Logical Yet Fatal” Errors Produced by Thinking
The most alarming aspect is that GPT-5's Auto Thinking-like models and Thinking itself ‘create a psychological barrier to detecting errors that soars precisely because they possess logical consistency’. Let us examine a real-world example.
Example in medical diagnosis:
- Thinking: ‘Diffuse pulmonary opacities → interstitial changes → elderly patient → high likelihood of idiopathic pulmonary fibrosis’
- Actual: Acute pulmonary oedema, with risk of death if treatment is incorrect
- Problem: The logic appears consistently sound, but the diagnostic basis lacks sufficient data
Example in legal reasoning:
- Thinking: ‘Breach of Article 10 of the contract terms → 3-week delivery delay → contract termination possible’
- Actual: Overlooked the force majeure exemption clause in Article 12
- Issue: Academically logically consistent, but overlooked a crucial clause
Common risks:
- The logical progression itself appears academic
- Certain premises or perspectives are missing
- Humans tend to believe it “seems correct”
With conventionally obvious incorrect answers, one could question, ‘Surely this is a flawed response?’ Yet when examining detailed reasoning processes, there lies the danger of being persuaded: ‘Ah, I see—that's the line of thinking.’ I suspect that professionals or those possessing even a modicum of knowledge would find themselves thinking, ‘Huh?’
Neither field is my primary area of expertise, but I did consult Google to some extent to determine whether the issue was indeed a problem.
However, given that people are increasingly shown only Thinking responses and, indeed, more individuals are becoming prone to trusting AI answers these days, I found myself thinking it would be rather frightening if this response appeared in a chat format.
While I don't take things at face value, I feel it has become harder to be sceptical.
This seemingly correct mechanical response is more likely to be blindly trusted by users lacking discernment, resulting in:
- Those who can utilise it effectively: Recognising the limitations of Thinking and employing it appropriately
- Those who cannot utilise it effectively: Deceived by the academic-sounding responses and trusting them outright
However much AGI advances, I believe this “explainable but incorrect” problem will remain structurally inherent.
This account also stems from a frightening realisation: without this fundamental understanding, there truly are people who embody the very phrase “the AI says so”.
In this dawn of AI, while we adults face challenges, the younger generation seems to have it even tougher.
However, its historical significance cannot be overlooked.
GPT-5 was heavily criticised, and I too wrote somewhat critical remarks, but its historical significance in enabling approximately 700 million people to experience AGI-like capabilities cannot be overlooked. While many users may have merely thought ‘GPT-5 is a bit underwhelming,’ what was actually happening was an unprecedented event in human history.
- AI that instantly answers complex questions
- Continuous dialogue with contextual understanding
- Cross-cutting capabilities in creation, analysis and reasoning
These were experienced by 700 million people in a state where they could be used on a daily basis. While OpenAI's corporate value took a significant hit, one could perhaps view it as a valuable social experiment for humanity as a whole – albeit one that came at the high cost of a “dress rehearsal for the AGI era”.
It is hoped that this collective learning experience will lead to improved AI literacy and the setting of realistic expectations for next-generation AI.
Well, it does seem rather daunting, but I believe there is a significant difference between being able to experience it and not.
The Six Approaches of AgenticRAG and the Complexity Trap
I believe agents will become popular, or rather, temporarily mainstream going forward.
Therefore, I shall also summarise RAG concerning agents.
My understanding is that there are six main approaches to Agentic RAG:
Classification | Overview | Advantages | Considerations/Challenges |
---|---|---|---|
Single-Agent RAG | Self-contained single agent | Simple, robust | Limited in parallel processing and specialisation |
Multi-Agent RAG | Coordination of specialised agents | Handles advanced, complex tasks | Complex agent-to-agent coordination |
Hierarchical Agent RAG | Top-down/bottom-up hierarchical structure | Effective management of large-scale decision-making | Bottlenecks may occur in hierarchical design |
Corrective RAG | Automatic correction and re-search | Reduces hallucinations | Requires evaluation and correction process design |
Adaptive RAG | Query difficulty-based processing | Efficient, optimal processing selection | Query classification accuracy impacts performance |
Graph-based RAG | Knowledge graph integration | Enables multi-step reasoning | Graph DB construction and update costs |
The failure of GPT-5 illustrates the trade-off between complexity and robustness. No matter how brilliant the experts you deploy, if the orchestration layer (router) is weak, the system as a whole becomes foolish.
※Personal brain memo.
I've been thinking that combining GIS (Geographic Information Systems) with LLM could be quite interesting.
Whereas traditional text-based RAG handles “textual relevance”, land data RAG deals with “spatial, temporal, and attribute relevance”, so perhaps it falls into a slightly different category...
A Pragmatic RAG Construction Strategy - A Phased Approach in Four Stages
The RAG system comprises the following four phases:
Store Phase
- Conversion of PDF files to text
- Processing of Office files, audio, video, and image data
- Chunk segmentation and overlap configuration
- Table structure extraction
- Categorisation and enrichment
Retrieve Phase
- Selection of search algorithms
- Parameter tuning
- Scoring methods
- Custom analysers
- Similarity tuning
Augment Phase
- System message definition
- User message definition
- Profile search
- Query generation
- Hypothesis Document Embedding
Generate Phase
- Model Selection (for Response Generation)
- Model Selection (for Embedding)
- User Follow-Up Questions
- Function Calling
- Fine-Tuning
In practical project execution, the Store phase consumes 80% of the man-hours. Well, it depends on what and how much you entrust to AI, but it's true that whether using AI or BI, the vast majority of effort is spent preparing the same data.
I wish people would recognise this, but surprisingly, those higher up often fail to grasp this aspect. It's tricky, isn't it?
Or rather, this phase is by far the most demanding throughout the entire process. Truly.
Those reading this text now are likely nodding in agreement, or at the very least grasping the gist—if not fully comprehending it.
Converting unstructured data into text is a hellish ordeal beyond imagination, and stumbling here imposes limits no amount of effort in later phases can overcome.
Perhaps Japan simply has a stronger tendency towards this...
The Reality of Technology Selection
- Clarifying Business Requirements (What tasks should AI handle?)
- Assessing Data Reality (Verifying quality, quantity, and format)
- Estimating Pre-processing Effort (This is where one's resolve often wavers. Depends on the raw data's intensity)
- Selecting the RAG Approach (Balancing complexity, effort, and risk)
- Designing the Operational and Maintenance Framework
In reality, it's not uncommon to end up with the unfortunate outcome of ‘wanting to use the latest AgenticRAG, but having used up the entire budget just on data preprocessing.’
After all, there's always the risk of preprocessing taking longer than anticipated – or perhaps not, depending on the situation.
The Importance of Prompt Engineering in RAG
The ‘logical yet fatal error’ observed in GPT-5's reasoning serves as a stark reminder of the critical importance of prompt engineering in RAG design. While large language models (LLMs) are powerful general-purpose models imbued with ‘tacit knowledge,’ they will not perform as expected unless one explicitly specifies in language what and how one wishes them to operate.
Key Considerations for Prompt Engineering in RAG:
Perspective | Why it matters | RAG-specific considerations |
---|---|---|
Quality Improvement | LLMs frequently return ambiguous outputs when given ambiguous instructions | Clarify how search results should be incorporated into user messages |
Controllability | Standardise output format, style, and volume | Standardise instructions for generating search queries (keywords/vectors) |
Business Requirement Compliance | Domain vocabulary, internal rules, compliance | Define follow-up logic when user input is incomplete |
Safety | Suppress leakage of harmful/confidential information | Control index switching in Function Calling |
Model Weakness Mitigation | Guide evidence presentation to bolster reasoning capabilities | Explicitly state ‘Answer using only the following context’ |
Specific implementation points:
- Search query generation: Generating search parameters, hypothetical document embedding
- User messages: Appropriate formatting of search results
- Function Calling: Index selection control based on query content
With Azure OpenAI + RAG, enterprise-grade quality is only achievable by designing prompts at each stage: ① search query generation → ② evidence injection → ③ response generation, and controlling the entire flow through ‘language’.
(I suppose it's possible...? I'm not entirely confident here. Experts, please do enlighten me.)
Practical Guidelines for Selecting RAG Models
Model Selection for Improving Generation Accuracy
Text Generation Models:
- Prioritising Response Accuracy: Latest large models (GPT-4o; O1 or O3 recommended for complex reasoning)
- Prioritise Response Speed: Lightweight models (gpt-4o mini, o3-mini, etc.)
- Ensure Diversity: AI Studio model catalogue (Llama 3.1, Mistral, Databricks Dolly, etc.)
Key principle: No model can provide the correct answer if the retrieved data lacks the necessary information. Search accuracy is paramount.
Embedding Models:
- Basic Choice: text-embedding-3-large (highest performance, very low cost)
- Multilingual Support: Cohere-embed-v3-multilingual
- Custom Requirements: Fine-tuning of OSS embedding models
The failure of GPT-5 teaches us that even with high-performance models, without proper control, the expected results cannot be achieved.
Of course, specialised Japanese models like ELIZA might also be considered, depending on what you want to do or entrust to them.
I suspect the models themselves can say anything.
This incident really made me think more deeply about how important it is to choose wisely, use them well, and learn to work with them.
Why RAG Has Become an Essential Component
The Trinity of Safety, Reliability, and Transparency
Considering the GPT-5 case, services that do not incorporate RAG may become unsustainable as practical AI services going forward, due to safety concerns, reliability issues, and the need for AI transparency.
Or rather, it might become something akin to a standard term of service – not exactly proof of compliance, but a requirement.
Policies, and such. Well, if you say I'm just being fearful, then that's that.
Why AI transparency matters:
- Accountability: Clarifying how it arrived at its response
- Ensuring reliability: Presenting the underlying information sources
- Safeguarding safety: Preventing hallucinations and misinformation
- Quality control: Enabling verification of response validity
The kind of “logical yet incorrect” responses seen in GPT-5's Thinking mode arise precisely because the reasoning behind them is unclear. By explicitly referencing external, reliable sources through RAG, this problem can be significantly mitigated.
That said, this area could well become proprietary information, so striking the right balance seems likely to be tricky.
The Imperative of RAG in Societal Implementation
I believe that AI services from the present into the future will bear responsibility as societal infrastructure, rather than merely being technological demonstrations. More so than ever before.
- Healthcare, Legal, Finance: Ensuring reliability in fields where errors can be fatal
- Education, Media: Accuracy of information and clear attribution of sources
- Corporate Systems: Meeting compliance and audit requirements
To fulfil these requirements, RAG-based evidence presentation and transparency assurance are indispensable.
Naturally, this applies to other domains too.
These are simply the areas where common sense readily suggests, ‘Ah, yes, that makes sense.’
So, which RAG should you choose?
Currently, various RAG approaches exist, and ‘which is optimal’ varies significantly depending on the use case. I believe the optimal choice ultimately comes down to what you entrust to the AI.
The value of boring but reliable
What we should learn from the failure of GPT-5 is the value of boring but reliable choices.
When introducing AI at the corporate (business) level:
- Opt for simple, robust architecture over complex multi-agent configurations
- Pay the utmost attention to the design of the router/orchestration layer
- Ensure a robust fallback mechanism
I find myself pondering whether technically “uninteresting” yet reliable options like single-agent RAG or CorrectiveRAG might increasingly become the correct approach in many scenarios.
Alternatively, I wonder whether we'll aim for something like Manus – multiple SLMs operating under a brain-like LLM – or whether we'll move towards dedicated, specialised systems.
In Conclusion - What We Seek and What We Sacrifice
The Necessity and Cost of RAG
RAG is indispensable for safely implementing AI in modern society. Yet simultaneously, it is a mechanism that constrains the “wildness” inherent in AI – its creativity and unpredictability.
The GPT-5 case study revealed the contradictions in what we humans demand of AI:
- Safe and reproducible, yet creative and human-like
- Flawless, yet capable of surprise
- Efficient, yet retaining uniqueness
Humans are selfish creatures.
Personally, I rather like that about us.
Seeking an Appropriate Relationship with AI
I find myself wondering whether choosing a RAG architecture transcends mere technical judgement to become a philosophical choice about how we engage with AI.
RAG is necessary for safety, for implementing AI securely within society. Yet, it also risks stifling the creativity inherent in AI, subjecting it to the whims of human convenience.
The Responsibility of the Dawn Era
We now find ourselves in the midst of the AI talent scramble, standing at a turning point in history. It strikes me that philosophical judgement—deciding what should be created and what should be safeguarded—may be just as crucial as technical skill sets.
The generation that experienced “failure” with GPT-5 will shape the next AI society. May the experience of grappling with RAG's complexity be harnessed to design a better future.
To build a future where AI and humans can coexist appropriately, we may need to consciously choose not only technical optimisation, but also what to preserve and what to relinquish.
Well, dwelling on it too heavily becomes rather taxing, but personally, I think it's something best kept in the back of one's mind.
What kind of world will emerge, and how? Perhaps that is something we must contemplate now. Or perhaps we shall wander aimlessly, unable to unravel it, and entrust it to the next generation.
How exciting, this coming AI soci
Top comments (0)