subhansh

Posted on May 28

Contradiction Mining in Scientific Literature: How RUMI Finds Conflicts Across Papers

#ai #research #python #datascience

Contradiction Mining in Scientific Literature: How RUMI Finds Conflicts Across Papers

One of the hardest problems in scientific research is identifying contradictions across papers. Two studies might claim opposite things about the same mechanism, and unless you read both carefully — and remember both — you'll never notice the conflict.

RUMI automates this. Here's the technical approach.

The Problem

Scientific literature is growing exponentially. PubMed adds ~4,000 papers per day. No human can read, remember, and cross-reference everything. This leads to:

Unresolved contradictions: Paper A says mechanism X causes outcome Y. Paper B says mechanism X prevents outcome Y. Neither cites the other.
Hidden consensus: 5 papers independently confirm the same finding, but nobody has connected them.
Novel findings hiding in plain sight: A new mechanism described in one paper is actually the missing piece for a puzzle described in another.

RUMI's Contradiction Mining Pipeline

Stage 1: Entity Normalization

Before you can find contradictions, you need to know when two papers are talking about the same thing. RUMI normalizes entities using multiple strategies:

Gene/protein names: Maps aliases to canonical names (e.g., "BRAF" = "B-Raf" = "v-Raf murine sarcoma viral oncogene homolog B")
Drug names: Maps brand names to generic (e.g., "Lumakras" = "sotorasib" = "AMG 510")
Pathway names: Uses KEGG and Reactome IDs to normalize pathway references
Disease names: Maps to MeSH terms and OMIM IDs

Without normalization, "sotorasib" and "AMG 510" look like different entities. With it, RUMI can connect findings across papers that use different nomenclature.

Stage 2: Claim Extraction

RUMI extracts structured claims from each paper using LLM-assisted parsing:

@dataclass
class ScientificClaim:
    subject: Entity        # What is being discussed
    predicate: str         # What relationship is claimed
    object: Entity         # What it's related to
    direction: str         # positive / negative / neutral
    confidence: float      # Extraction confidence
    evidence_type: str     # experimental / observational / computational
    paper_id: str          # Source paper
    sentence: str          # Original text

Example extraction:

Subject: KRAS G12C, Predicate: activates, Object: MAPK signaling, Direction: positive
Subject: Sotorasib, Predicate: inhibits, Object: KRAS G12C, Direction: negative

Stage 3: Contradiction Detection

Two claims contradict when they have the same subject and object but opposite directions, or when one paper claims A causes B while another claims A prevents B.

RUMI uses three detection methods:

Direct contradiction: Same entities, opposite directions.

Paper 1: "AURKA promotes KRAS inhibitor resistance"
Paper 2: "AURKA inhibition does not sensitize KRAS-mutant cells"
→ Direct contradiction on AURKA's role

Contextual contradiction: Same relationship, different conditions.

Paper 1: "MET amplification drives resistance in early treatment"
Paper 2: "MET amplification is rare in acquired resistance"
→ Contextual: timing-dependent

Implicit contradiction: Different mechanisms proposed for the same phenomenon.

Paper 1: "Resistance is primarily driven by MAPK reactivation"
Paper 2: "Resistance is primarily driven by PI3K/AKT activation"
→ Implicit: competing models

Stage 4: Resolution Analysis

Not all contradictions are real. Some are:

Methodological: Different cell lines, different doses, different timepoints
Temporal: The field's understanding evolved between publication dates
Definitional: Same term used with different meanings

RUMI classifies each contradiction and suggests resolution strategies:

class Contradiction:
    claim_a: ScientificClaim
    claim_b: ScientificClaim
    type: ContradictionType  # direct, contextual, implicit
    resolution_strategy: str  # methodological, temporal, definitional, genuine
    suggested_experiment: str # What experiment would resolve it

Real Example: The AURKA Paradox

In the KRAS G12C analysis, RUMI found a genuine contradiction:

Paper A (2026): AURKA is upregulated in sotorasib-resistant cells and stabilizes PHB2, activating PI3K/AKT
Paper B (2026): AURKA inhibition alone does not restore sotorasib sensitivity in resistant lines

RUMI classified this as a contextual contradiction: AURKA upregulation is a real resistance mechanism, but it's part of a positive feedback loop (AURKA→PHB2→PI3K/AKT) that requires combined inhibition to break. Single-agent AURKA inhibition fails because the loop has redundancy.

This resolution led to the hypothesis that dual AURKA + PI3K inhibition might be more effective — a testable prediction that neither paper explicitly made.

The Knowledge Graph Approach

All of this is powered by RUMI's knowledge graph. Each node represents an entity (gene, protein, drug, disease, pathway). Each edge represents a relationship with:

Direction: activation, inhibition, association
Evidence strength: number of supporting papers
Confidence: based on extraction quality and paper count
Temporal context: when the finding was published

Contradictions appear as negative-weight edges between the same nodes. The graph makes it visually and computationally obvious where the scientific literature disagrees.

Limitations

This system is still early:

Claim extraction depends on LLM quality — complex claims with multiple qualifications are often oversimplified
Some "contradictions" are actually nuanced positions that require expert interpretation
The system can't evaluate experimental quality — a poorly designed study gets equal weight
Publication bias means the literature itself may be contradictory for structural reasons

Try It

git clone https://github.com/subhansh-dev/Rumi
cd rumi
pip install -e .
playwright install chromium
rumi

Run /discover on a topic with active debate and see what contradictions RUMI surfaces.

DEV Community

Contradiction Mining in Scientific Literature: How RUMI Finds Conflicts Across Papers

Contradiction Mining in Scientific Literature: How RUMI Finds Conflicts Across Papers

The Problem

RUMI's Contradiction Mining Pipeline

Stage 1: Entity Normalization

Stage 2: Claim Extraction

Stage 3: Contradiction Detection

Stage 4: Resolution Analysis

Real Example: The AURKA Paradox

The Knowledge Graph Approach

Limitations

Try It

Links

Top comments (0)