A journalist writes: "Intermittent fasting reverses ageing at the cellular level."
Is that true? Partially true? Based on one mouse study or fifty human trials?
Finding out takes hours. You have to search PubMed, read abstracts, weigh study quality, check for retractions, and notice who funded the research. Most people don't have those hours. They either trust the headline blindly or dismiss it entirely. Neither is good.
I didn’t want another AI summarizer that hallucinates citations or treats a blog post as equal to a meta-analysis. I wanted a reasoning engine—a system that could think like a scientist.
So I built SciVerify: a multi-agent system that retrieves, evaluates, and synthesizes scientific evidence with full transparency, using the Elastic Stack to enforce rigorous logic.
The Blind Spot in Modern RAG
Standard RAG (Retrieval-Augmented Generation) fails at science because it treats all text chunks as epistemically equal.
To a vector database, a sentence from a high-quality systematic review looks identical to a sentence from a retracted pilot study. semantic similarity $\neq$ scientific truth.
If an AI says "Studies show X," I need to know:
- Which studies?
- Are they Randomized Controlled Trials (RCTs)?
- Was the sample size significant?
- Was the paper retracted?
To solve this, I couldn't rely on vector search alone. I needed structure.
The Architecture: Semantic Search + Deterministic Logic
SciVerify uses Elastic Agent Builder to orchestrate a 5-step verification workflow. It combines two powerful retrieval strategies that usually don't talk to each other:
- Semantic Search (semantic_text): Finds papers that are conceptually about the claim, even if they use different keywords (e.g., matching "intermittent fasting" with "time-restricted feeding").
- ES|QL Analytics: Uses Elasticsearch Query Language to run rigorous, deterministic filters.
Instead of letting the LLM guess complex SQL, I built custom tools like find_high_quality_evidence. When the agent runs this, it executes a precise ES|QL query:
sql
FROM sciverify-papers
| WHERE study_type IN ("meta-analysis", "systematic-review", "rct")
| WHERE citation_count > 50
| SORT year DESC
| LIMIT 10
This guarantees that when the agent says "I found high-quality evidence," it isn't hallucinating—it's mathematically true.
The "Wow" Factor: Adversarial Peer Review
The coolest part of SciVerify isn't just that it answers questions—it's that it checks its own work.
I implemented a Multi-Agent System architecture:
- The SciVerify Agent: Decomposes the claim, finds evidence using the ES|QL tools, and drafts a calibrated verdict.
- The BiasDetector Agent: This is a second, separate agent instructed to be a "hostile peer reviewer."
The BiasDetector reads the first agent's draft and critiques it: Did you cherry-pick that study? Did you notice the funding source? Did you mention the small sample size?
This setup forces Epistemic Humility. The system is designed to admit what it doesn't know, rather than confidently lying to you.
See It In Action (30 Seconds)
Claim: "Does intermittent fasting reduce inflammation?"
Result:
- Step 1: Decomposes claim into Subject (fasting), Outcome (inflammation), Population (adults).
- Step 2: Finds 5 papers. Rejects 2 for sample size < 20. Keeps 2 RCTs and 1 Systematic Review.
- Step 3: Flags one paper for conflicting interest (industry funding).
- Final Pulse: "Moderate Confidence. Evidence supports reduction in specific markers (CRP), but long-term data is limited."
Core Principles for Scientific AI
Building this taught me three key lessons for anyone designing agents for high-stakes domains:
- Context is Structure, Not Just Text: Vectors find the topic, but structured fields (Year, Citations, Study Type) find the validity. You need both.
- Tools Create Accountability: Giving the agent specific, deterministic tools (like ES|QL filters) prevents it from inventing data statistics.
- Adversarial Feedback Loops: Two agents with opposing goals (Builder vs. Reviewer) produce significantly higher quality output than one agent aimed at "pleasing" the user.
Limitations & Future Work
SciVerify is a reasoning aid, not a replacement for expert judgment.
- Methodology Extraction: Currently uses regex heuristics to identify study types. This needs to move to a specialized ML model.
- Data Coverage: We rely on Semantic Scholar. If a paper isn't there (or is paywalled), we can't see it.
- Retraction Lag: We depend on metadata updates. A paper retracted yesterday might still look valid today.
The Future of Trustworthy AI
As AI becomes integral to scientific workflows—from literature triage to experimental design—the community needs tooling that reasons about evidence structure, not just compresses content.
SciVerify is a step towards that infrastructure: an epistemic layer for trustworthy, AI-assisted science.
🔗 View the Code on GitHub - Link to GitHub Repository

Top comments (0)