We all know the dirty secret of RAG (Retrieval-Augmented Generation) applications: They are great on Day 1, and broken on Day 30.
Why? Data Staleness.
You scrape your documentation, embed it into a Vector DB, and build a chatbot. It works perfectly. But two weeks later, the documentation changes. A price updates. A policy is rewritten.
Your Vector DB doesn't know. It happily retrieves the old chunks, and your LLM confidently hallucinates an answer based on outdated facts.
Re-indexing everything is expensive and slow. Building custom "update scripts" is boring.
I got tired of this problem, so I built Sentinel.
🛡️ What is Sentinel?
Sentinel is an open-source, autonomous ETL pipeline that treats your RAG data as a Living Knowledge Graph.
Instead of a "snapshot" vector store, Sentinel:
- Watches your source URLs for changes.
- Detects differences (byte-level hashing).
- Heals the graph by extracting only the new facts using LLMs.
- Maintains History using "Time Travel" edges.
It is pip-installable, model-agnostic (works with Ollama, OpenAI, Anthropic), and runs locally or in the cloud.
⚡ Real-World Example: SaaS Pricing Update
Imagine you are tracking a competitor's pricing page.
Day 1: The page says "Pro Plan is $29/mo".
-
Sentinel Graph:
(Pro Plan) --[COSTS {valid_from: Day 1}]--> ($29)
Day 15: They silently raise the price to $49/mo.
-
Sentinel Graph:
- Sentinel detects the hash change.
- It retires the old edge:
(Pro Plan) --[COSTS {valid_to: Day 15}]--> ($29) - It creates a new edge:
(Pro Plan) --[COSTS {valid_from: Day 15}]--> ($49)
The Result:
- Standard RAG: Returns both $29 and $49, confusing the user.
- Sentinel: Knows exactly which price is current, AND knows the price history.
⏳ The Killer Feature: "Time Travel"
Most RAG systems overwrite old data. Sentinel uses Bitemporal Versioning in Neo4j. This unlocks a whole new class of questions your AI can answer:
"How has the pricing structure changed since last month?"
"What were the safety guidelines before the 2024 update?"
"Show me the evolution of this compliance policy."
🎯 Top Use Cases
We built Sentinel for developers who need high-accuracy retrieval over changing data.
1. Legal & Compliance Tech
Laws and company policies change constantly. Sentinel ensures your bot never cites a repealed law or an outdated HR policy.
2. Market Intelligence
Track competitor websites, earnings reports, or news feeds. Sentinel builds a timeline of events automatically, allowing you to query "What happened to Competitor X in Q3?"
3. Developer Documentation Bots
APIs change. If a library deprecates a function, Sentinel updates the graph so your bot stops recommending broken code.
⚙️ How It Works (The Loop)
Sentinel runs an autonomous "Healing Loop" in the background:
- Monitor: It checks the content hash of watched URLs. If the hash matches the database, it sleeps. Cost: $0.
- Diff: If the hash changes, it scrapes the new content.
- Extract: It uses an LLM (via
LiteLLM+Instructor) to extract nodes and relationships. - Upsert: It updates the Graph Database, handling the temporal logic automatically.
🚀 Quick Start
You can add this to your existing Python project in minutes.
pip install sentinel-core
from sentinel_core import Sentinel
# Initialize (uses standard env vars for Neo4j & LLM)
sentinel = Sentinel()
# Start watching a URL
# Sentinel will scrape, extract, and build the initial graph
await sentinel.process_url("[https://docs.example.com/pricing](https://docs.example.com/pricing)")
# Run the autonomous healing loop
# It will check for updates every 24 hours
await sentinel.run_healing_loop(interval_hours=24)
That's it. You now have a self-updating knowledge graph.
🤝 Open Source & Roadmap
I built the core engine, but there is so much potential here. I am looking for contributors to help with:
- Entity Resolution: Smarter merging of duplicate nodes (e.g., "Tesla" vs "Tesla Inc").
- UI Dashboard: We have a basic API, but a visualization of the graph "healing" in real-time would be epic.
- More Scrapers: Adding support for Playwright or Selenium.
🌟 Support the Project
If you think "Self-Healing RAG" is a cool concept, please consider starring the repo! It helps us gain visibility and attracts more contributors to make the tool better for everyone.
I'm active in the comments—let me know what you think about the "Temporal Graph" approach vs standard Vector Stores!
Top comments (0)