TechsphereX AI

Posted on Jun 9

Building an Enterprise RAG & Knowledge Graph Engine with Governed AI Workflows

#wiki #agents #architecture #rag

As large language models (LLMs) take over the enterprise landscape, organizations face a massive challenge: How do we make fragmented corporate knowledge searchable and actionable without sacrificing security, audibility, and data relationships?

Standard RAG (Retrieval-Augmented Generation) patterns often fall short. They treat documents as isolated text chunks, lose the rich semantic connections between entities, and frequently lack enterprise-grade data governance boundaries.

To solve this, I’ve been working on Nexus-KB — an open-source reference architecture designed for building secure, production-grade Enterprise RAG and Knowledge Graph platforms.

Here is a deep dive into how it bridges the gap between raw vector search and governed enterprise AI.

🏗️ The Architecture At A Glance

Nexus-KB addresses fragmented enterprise knowledge by making documents searchable, reviewable, auditable, and graph-aware. Instead of a single monolithic pipeline, it separates ingestion, human-in-the-loop validation, semantic indexing, and graph construction into decoupled layers.

flowchart TD
    Source[Local files / Obsidian / MCP source] --> Parser[Document Parser Worker]
    Parser --> Review{Review policy}
    Review -->|Approved / direct commit| Metadata[(PostgreSQL)]
    Review -->|Approved / direct commit| Vector[(Qdrant)]
    Review -->|Low confidence| Queue[Review Queue]
    Queue -->|Approve / modify| Metadata
    Queue -->|Approve / modify| Vector
    Metadata --> Graph[Graph Builder Worker]
    Graph --> GraphTables[(Relational graph tables)]
    API[FastAPI] --> Metadata
    API --> Vector
    API --> GraphTables
    API --> Audit[(Audit logs)]

🚀 Key Architectural Pillars

1. Hybrid Storage: PostgreSQL + Qdrant

Pure vector databases are great for semantic search but struggle with complex transactional metadata, role-based access control (RBAC) filtering, and structured auditing.

Qdrant (3-node HA Cluster) handles high-availability vector embeddings utilizing the advanced BAAI/bge-m3 model.
PostgreSQL (v16) acts as the relational source-of-truth, storing strict document metadata, ingestion runs, immutable audit logs, human review items, and structured graph records.

2. Obsidian & Markdown Intelligence

Corporate knowledge isn't just plain text; it has hierarchy and connections. The document parser natively handles Markdown and Obsidian vaults, automatically extracting:

YAML frontmatter and custom tags.
Wiki-style cross-links ([[WikiLink]]).
Section paths based on header hierarchy for smarter chunking.

3. Human-in-the-Loop Review & Hardened Auditing

Enterprise AI requires high precision. Nexus-KB includes a Review Queue workflow. If an ingestion run or an AI extraction outputs low-confidence scores, chunks are routed to a human review queue supporting approve/reject/modify flows via mock reviewer RBAC.

Every single operation — from ingestion and document reads to search queries and review actions — generates immutable audit events.

4. Model Context Protocol (MCP) Boundaries

To securely ingest third-party enterprise platforms, Nexus-KB leverages an MCP Source Connector Scaffold (specifically a confluence-bridge). It enforces:

Strict user-context authorization.
Disabled mutating tools by default.
Redacted error messages to prevent internal leakage.

5. Knowledge Graph Construction

The workers/graph-builder asynchronous worker extracts entities and relationships from approved data chunks, merges duplicates, stores confidence/provenance metrics, and injects graph context fields directly into the hybrid search results for a richer LLM context window.

🛠️ The Tech Stack

Nexus-KB is built with a modern, highly efficient Python ecosystem:

Backend Framework: FastAPI (0.115)
ORM & Migrations: SQLAlchemy (2.0) & Alembic (1.14)
Vector Engine: Qdrant Client (v1.13)
Testing: Pytest (8.3) with support for full Docker-backed live integration tests.

⚡ Getting Started (Local Quickstart)

Want to explore the codebase or test it locally? Here’s the fast track using a local Docker-backed stack:

1. Spin Up Local Infrastructure

# Clone the repository and boot PostgreSQL + Qdrant
docker compose up -d

2. Run Database Migrations

alembic -c infrastructure/alembic.ini upgrade head

3. Launch the FastAPI Gateway

export PYTHONPATH="packages/shared-contracts:packages/vector-client:workers/document-parser:workers/graph-builder:services/nexus-api"
uvicorn nexus_api.main:app --reload

Head over to http://127.0.0.1:8000/docs to explore the interactive Swagger documentation.

4. Execute a Governed Hybrid Search

curl -X POST "[http://127.0.0.1:8000/api/v1/search](http://127.0.0.1:8000/api/v1/search)" \
  -H "Content-Type: application/json" \
  -d '{"query":"governed retrieval","limit":5,"tags":["rag"]}'

🗺️ What’s Next on the Roadmap?

The core architecture is solid, but there's always more to build. Future work includes:

[ ] Production-ready authentication/OIDC adapters.
[ ] A full Web Admin console for managing review queues and graph entities visually.
[ ] Enterprise connector adapters for live Confluence and Sharepoint environments.
[ ] Production API Gateway hardening and OpenTelemetry export setups.

🤝 Open Source & Contributions

Nexus-KB is licensed under the MIT License and is fully open-source.

If you are passionate about AI Engineering, Knowledge Graphs, and RAG architectures, I'd love to hear your thoughts on this design! Check out the project layout, drop a comment below, or let's connect to discuss how you approach data governance in your LLM workflows.

View all: https://khaitrang1995.github.io/nexus-kb/

Happy coding! If you find this architecture interesting, don't forget to ❤️ and bookmark this post!

DEV Community