Contradiction Mining in Scientific Literature: How RUMI Finds Conflicts Across Papers
One of the hardest problems in scientific research is identifying contradictions across papers. Two studies might claim opposite things about the same mechanism, and unless you read both carefully — and remember both — you'll never notice the conflict.
RUMI automates this. Here's the technical approach.
The Problem
Scientific literature is growing exponentially. PubMed adds ~4,000 papers per day. No human can read, remember, and cross-reference everything. This leads to:
- Unresolved contradictions: Paper A says mechanism X causes outcome Y. Paper B says mechanism X prevents outcome Y. Neither cites the other.
- Hidden consensus: 5 papers independently confirm the same finding, but nobody has connected them.
- Novel findings hiding in plain sight: A new mechanism described in one paper is actually the missing piece for a puzzle described in another.
RUMI's Contradiction Mining Pipeline
Stage 1: Entity Normalization
Before you can find contradictions, you need to know when two papers are talking about the same thing. RUMI normalizes entities using multiple strategies:
- Gene/protein names: Maps aliases to canonical names (e.g., "BRAF" = "B-Raf" = "v-Raf murine sarcoma viral oncogene homolog B")
- Drug names: Maps brand names to generic (e.g., "Lumakras" = "sotorasib" = "AMG 510")
- Pathway names: Uses KEGG and Reactome IDs to normalize pathway references
- Disease names: Maps to MeSH terms and OMIM IDs
Without normalization, "sotorasib" and "AMG 510" look like different entities. With it, RUMI can connect findings across papers that use different nomenclature.
Stage 2: Claim Extraction
RUMI extracts structured claims from each paper using LLM-assisted parsing:
@dataclass
class ScientificClaim:
subject: Entity # What is being discussed
predicate: str # What relationship is claimed
object: Entity # What it's related to
direction: str # positive / negative / neutral
confidence: float # Extraction confidence
evidence_type: str # experimental / observational / computational
paper_id: str # Source paper
sentence: str # Original text
Example extraction:
- Subject: KRAS G12C, Predicate: activates, Object: MAPK signaling, Direction: positive
- Subject: Sotorasib, Predicate: inhibits, Object: KRAS G12C, Direction: negative
Stage 3: Contradiction Detection
Two claims contradict when they have the same subject and object but opposite directions, or when one paper claims A causes B while another claims A prevents B.
RUMI uses three detection methods:
Direct contradiction: Same entities, opposite directions.
Paper 1: "AURKA promotes KRAS inhibitor resistance"
Paper 2: "AURKA inhibition does not sensitize KRAS-mutant cells"
→ Direct contradiction on AURKA's role
Contextual contradiction: Same relationship, different conditions.
Paper 1: "MET amplification drives resistance in early treatment"
Paper 2: "MET amplification is rare in acquired resistance"
→ Contextual: timing-dependent
Implicit contradiction: Different mechanisms proposed for the same phenomenon.
Paper 1: "Resistance is primarily driven by MAPK reactivation"
Paper 2: "Resistance is primarily driven by PI3K/AKT activation"
→ Implicit: competing models
Stage 4: Resolution Analysis
Not all contradictions are real. Some are:
- Methodological: Different cell lines, different doses, different timepoints
- Temporal: The field's understanding evolved between publication dates
- Definitional: Same term used with different meanings
RUMI classifies each contradiction and suggests resolution strategies:
class Contradiction:
claim_a: ScientificClaim
claim_b: ScientificClaim
type: ContradictionType # direct, contextual, implicit
resolution_strategy: str # methodological, temporal, definitional, genuine
suggested_experiment: str # What experiment would resolve it
Real Example: The AURKA Paradox
In the KRAS G12C analysis, RUMI found a genuine contradiction:
- Paper A (2026): AURKA is upregulated in sotorasib-resistant cells and stabilizes PHB2, activating PI3K/AKT
- Paper B (2026): AURKA inhibition alone does not restore sotorasib sensitivity in resistant lines
RUMI classified this as a contextual contradiction: AURKA upregulation is a real resistance mechanism, but it's part of a positive feedback loop (AURKA→PHB2→PI3K/AKT) that requires combined inhibition to break. Single-agent AURKA inhibition fails because the loop has redundancy.
This resolution led to the hypothesis that dual AURKA + PI3K inhibition might be more effective — a testable prediction that neither paper explicitly made.
The Knowledge Graph Approach
All of this is powered by RUMI's knowledge graph. Each node represents an entity (gene, protein, drug, disease, pathway). Each edge represents a relationship with:
- Direction: activation, inhibition, association
- Evidence strength: number of supporting papers
- Confidence: based on extraction quality and paper count
- Temporal context: when the finding was published
Contradictions appear as negative-weight edges between the same nodes. The graph makes it visually and computationally obvious where the scientific literature disagrees.
Limitations
This system is still early:
- Claim extraction depends on LLM quality — complex claims with multiple qualifications are often oversimplified
- Some "contradictions" are actually nuanced positions that require expert interpretation
- The system can't evaluate experimental quality — a poorly designed study gets equal weight
- Publication bias means the literature itself may be contradictory for structural reasons
Try It
git clone https://github.com/subhansh-dev/Rumi
cd rumi
pip install -e .
playwright install chromium
rumi
Run /discover on a topic with active debate and see what contradictions RUMI surfaces.
Links
- GitHub: https://github.com/subhansh-dev/Rumi
- Portfolio: https://subhanshh.vercel.app
If you work in systematic reviews, meta-analyses, or evidence synthesis, I'd love to know: what would make a tool like this actually useful in your workflow? What's the biggest gap?
— Subhansh
Top comments (0)