Stop Building Stale RAG: Meet Sentinel, the "Self-Healing" Knowledge Graph

#ai #python #opensource #rag

We all know the dirty secret of RAG (Retrieval-Augmented Generation) applications: They are great on Day 1, and broken on Day 30.

Why? Data Staleness.

You scrape your documentation, embed it into a Vector DB, and build a chatbot. It works perfectly. But two weeks later, the documentation changes. A price updates. A policy is rewritten.

Your Vector DB doesn't know. It happily retrieves the old chunks, and your LLM confidently hallucinates an answer based on outdated facts.

Re-indexing everything is expensive and slow. Building custom "update scripts" is boring.

I got tired of this problem, so I built Sentinel.

🛡️ What is Sentinel?

Sentinel is an open-source, autonomous ETL pipeline that treats your RAG data as a Living Knowledge Graph.

Instead of a "snapshot" vector store, Sentinel:

Watches your source URLs for changes.
Detects differences (byte-level hashing).
Heals the graph by extracting only the new facts using LLMs.
Maintains History using "Time Travel" edges.

It is pip-installable, model-agnostic (works with Ollama, OpenAI, Anthropic), and runs locally or in the cloud.

⚡ Real-World Example: SaaS Pricing Update

Imagine you are tracking a competitor's pricing page.

Day 1: The page says "Pro Plan is $29/mo".

Sentinel Graph: (Pro Plan) --[COSTS {valid_from: Day 1}]--> ($29)

Day 15: They silently raise the price to $49/mo.

Sentinel Graph:
- Sentinel detects the hash change.
- It retires the old edge: (Pro Plan) --[COSTS {valid_to: Day 15}]--> ($29)
- It creates a new edge: (Pro Plan) --[COSTS {valid_from: Day 15}]--> ($49)

The Result:

Standard RAG: Returns both $29 and $49, confusing the user.
Sentinel: Knows exactly which price is current, AND knows the price history.

⏳ The Killer Feature: "Time Travel"

Most RAG systems overwrite old data. Sentinel uses Bitemporal Versioning in Neo4j. This unlocks a whole new class of questions your AI can answer:

"How has the pricing structure changed since last month?"
"What were the safety guidelines before the 2024 update?"
"Show me the evolution of this compliance policy."

🎯 Top Use Cases

We built Sentinel for developers who need high-accuracy retrieval over changing data.

1. Legal & Compliance Tech

Laws and company policies change constantly. Sentinel ensures your bot never cites a repealed law or an outdated HR policy.

2. Market Intelligence

Track competitor websites, earnings reports, or news feeds. Sentinel builds a timeline of events automatically, allowing you to query "What happened to Competitor X in Q3?"

3. Developer Documentation Bots

APIs change. If a library deprecates a function, Sentinel updates the graph so your bot stops recommending broken code.

⚙️ How It Works (The Loop)

Sentinel runs an autonomous "Healing Loop" in the background:

Monitor: It checks the content hash of watched URLs. If the hash matches the database, it sleeps. Cost: $0.
Diff: If the hash changes, it scrapes the new content.
Extract: It uses an LLM (via LiteLLM + Instructor) to extract nodes and relationships.
Upsert: It updates the Graph Database, handling the temporal logic automatically.

🚀 Quick Start

You can add this to your existing Python project in minutes.

pip install sentinel-core

from sentinel_core import Sentinel

# Initialize (uses standard env vars for Neo4j & LLM)
sentinel = Sentinel()

# Start watching a URL
# Sentinel will scrape, extract, and build the initial graph
await sentinel.process_url("[https://docs.example.com/pricing](https://docs.example.com/pricing)")

# Run the autonomous healing loop
# It will check for updates every 24 hours
await sentinel.run_healing_loop(interval_hours=24)

That's it. You now have a self-updating knowledge graph.

🤝 Open Source & Roadmap

I built the core engine, but there is so much potential here. I am looking for contributors to help with:

Entity Resolution: Smarter merging of duplicate nodes (e.g., "Tesla" vs "Tesla Inc").
UI Dashboard: We have a basic API, but a visualization of the graph "healing" in real-time would be epic.
More Scrapers: Adding support for Playwright or Selenium.

🌟 Support the Project

If you think "Self-Healing RAG" is a cool concept, please consider starring the repo! It helps us gain visibility and attracts more contributors to make the tool better for everyone.

👉 Star Sentinel on GitHub

I'm active in the comments—let me know what you think about the "Temporal Graph" approach vs standard Vector Stores!