The "Yes Man" Problem If you've built a RAG application, you've seen it: you ask a leading question with a false premise, and the LLM happily hallucinates evidence to support you. This is called Sycophancy, and it's a silent killer for trust in AI.
I wanted to build a system that wasn't just a "search engine wrapper" but a rigorous investigator—something that could verify claims even if I, the user, was wrong.
Enter FailSafe.
The Architecture: Defense in Depth FailSafe treats verification like a cybersecurity problem. It uses multiple layers of filters to ensure only high-quality facts survive.
The Statistical Firewall (Layer 0) Why waste tokens on garbage? We use Shannon Entropy and lexical analysis to reject low-quality inputs instantly. It’s a "zero-cost early exit" strategy.
Specialized Small Models (SLMs) We don't need GPT-5 for everything. FailSafe offloads tasks like Coreference Resolution ("He said...") to specialized models like FastCoref. It’s faster, cheaper, and often more accurate for specific grammatical tasks.
The Council: Managing Cognitive Conflict This is the core. Instead of a single agent, FailSafe employs a "Council" of three distinct agents:
The Logician: Detects formal fallacies in reasoning.
The Skeptic: Designed to suppress "H-Neurons" (hallucination patterns) by being intentionally contrarian.
The Researcher: The only agent allowed to fetch external data.
They operate in a forced-debate loop. If they reach a consensus too easily, the system assumes something is wrong (combatting the "Lazy Consensus" phenomenon found in recent Google DeepMind research).
Conclusion We call FailSafe an "Epistemic Engine" because it prioritizes the integrity of knowledge over conversational fluency. It’s open source, and we’re looking for contributors to help push the boundaries of autonomous verification.
Check out the code and the technical Whitepaper here: [https://github.com/Amin7410/FailSafe-AI-Powered-Fact-Checking-System]
Top comments (0)