rerere L.

Posted on Dec 30 • Originally published at blog.gopenai.com on Oct 27

GraphRAG-RS: Production-Ready Knowledge Graph Platform with Multi-Interface Architecture

#knowledgegraph #rust #nlp #graphrag

Image by Author with recraft.ai

1. Architecture Evolution: From Prototype to Modular Platform

⚠️ Important Note: This project is currently in alpha phase. Despite the complete implementation of the described features, errors or unexpected behaviors may occur. It is recommended to test thoroughly in a development environment before using it in production.

Multi-Layer Architecture with GraphRAG-Core

The main evolution compared to the previous version is the architectural layer separation that transforms the project from a single implementation into a complete modular platform. The heart of the system is graphrag-core, an independent Rust library that exposes all fundamental functionalities through trait-based abstractions.

This architectural refactoring has enabled the creation of four distinct user interfaces sharing the same underlying engine:

graphrag-cli: Command-line interface for batch processing and automation
graphrag-server: REST API for integration into distributed systems and microservices
graphrag-wasm: Web application for browser-based usage without backend
graphrag-tui: Interactive text interface for monitoring and debugging

This clear separation between core logic and presentation layer guarantees superior reusability, testability, and maintainability. Each interface can be compiled independently, including only the necessary features, drastically reducing the final footprint.

Dual Configuration System: TOML and JSON5

A significant improvement is the introduction of JSON5 support alongside traditional TOML. JSON5 offers comments, trailing commas, and more flexible syntax, making configuration files more readable and maintainable for complex projects.

The template configuration system has been expanded with 9 predefined configurations:

semantic_pipeline.graphrag.json5: LLM-based extraction for maximum quality
algorithmic_pipeline.graphrag.json5: Completely algorithmic zero-cost approach
hybrid_pipeline.graphrag.json5: Balance between performance and quality
symposium_with_llm.graphrag.json5: Complete configuration with Ollama integration
symposium_zero_cost.graphrag.json5: Maximum efficiency without LLM dependencies

Each template is documented, validated, and optimized for specific use cases, allowing you to start with already configured best practices.

Workspace Persistence: Persistent State and Incremental Updates

Workspace persistence is a completely new feature that solves a critical problem of the previous version: the need to reprocess everything at each execution. Now the system supports:

Automatic saving of pipeline state after each phase
Incremental loading of existing graphs for multiple queries
Multi-format support: JSON (human-readable), Parquet (columnar), Lance (vector-optimized)
Metadata tracking with creation statistics and used configuration
Configurable auto-save every N entities for resilience against interruptions

This enables realistic workflows where the graph is built once and hundreds of queries are executed without reprocessing overhead, reducing times from minutes to seconds.

2. GraphRAG-Core Pipeline: 8 Configurable Domain Architecture

Graphrag-core Organizational Chart(WSP)

Overview: Config-Driven Architecture

The graphrag-core pipeline is organized around 8 configuration domains that control both the creation phase of the graph and the query/retrieval phase. Each domain offers multiple strategies allowing you to choose between high performance (algorithmic approaches) and high quality (LLM-based approaches).

Creation Phase Query Phase
────────────── ─────────────
Text Processing → Query Analysis
Entity Extraction → Entity Matching
Graph Construction → Graph Traversal
Embedding Gen. → Vector Similarity
Summarization → Hierarchical Search
     ↓ ↓
Persistent Storage ← Multi-Strategy Fusion

This dual-phase architecture allows clear separation of responsibilities: creation builds and enriches the graph, while retrieval optimizes search strategies for different query types.

Domain 1: Text Configuration — Intelligent Chunking

The first step of the pipeline is the segmentation of the document into processable chunks. The configuration allows you to control:

chunk_size: 1000 tokens (default), balance between context and granularity
chunk_overlap: 200 tokens to preserve semantic continuity
boundary_detection: Respect for paragraphs and sentences to avoid artificial breaks

The TextProcessor generates Vec where each chunk receives:

ChunkId unique for tracking
DocumentId for provenance
Metadata with position and statistics

Query Phase: The query text undergoes the same preprocessing for normalization and keyword extraction used in BM25 retrieval.

Domain 2: Embeddings Configuration — Multi-Backend Vector Representations

The embedding system has been completely redesigned to support 7 different backends, each optimized for different trade-offs:

Zero-Cost Backend (Maximum Performance):

Hash Embedding: FNV-1a algorithm for deterministic embeddings without ML
Similarity threshold: 0.35 empirically calibrated
Latency: <1ms, ideal for resource-constrained systems

Local LLM Backends (Quality/Privacy):

Ollama: Integration with local models like nomic-embed-text
HuggingFace: Support for sentence-transformers/all-MiniLM-L6-v2, BAAI/bge-small-en-v1.5
Automatic fallback to hash if service is unavailable

Cloud API Backends (Maximum Quality):

OpenAI: text-embedding-3-small (1536 dimensions)
Voyage: voyage-2 optimized for retrieval
Cohere: embed-english-v3.0 with compression
Jina: jina-embeddings-v2-base multilingual

Creation Phase: Generates embeddings for:

Each chunk of processed text
Each extracted entity (format: “{name} {type}”)
Re-indexing of the graph with vectors for fast lookup

Query Phase: Converts the query into a vector for cosine similarity search in the vector index.

Domain 3: Entity Configuration — Adaptive Multi-Strategy Extraction

Entity extraction offers 3 strategies with increasing levels of quality and cost:

Strategy A: Algorithmic (Pattern-Based — Zero Cost)

The EntityExtractor uses linguistic rules to identify:

PERSON/CHARACTER: Capitalization + titles (Dr., Mr., Mrs., Professor)
ORGANIZATION: Suffixes (Inc, Corp, LLC, University, Institute, Foundation)
LOCATION: Matching with dictionary of known places
CONCEPT: Abstract terms (Theory, Principle, Philosophy, Methodology)
EVENT: Temporal keywords (meeting, war, conference, revolution)
OBJECT: Tool and artifact detection

Confidence scoring is based on:

Pattern strength: exact vs fuzzy matching
Context indicators: co-occurrences with known entities
Capitalization consistency: uppercase maintenance

Automatic deduplication for (name, type) pairs with merging of mention counts.

Strategy B: Semantic (LLM-Based Gleaning — Maximum Quality)

The GleaningEntityExtractor implements iterative extraction in 3 rounds:

Round 1: Initial extraction with structured prompt
Round 2: Review and identification of missing entities
Round 3: Final validation and confidence refinement

Gleaning configuration:

max_gleaning_rounds: 3 (iterative refinement)
confidence_threshold: 0.8 to filter false positives
temperature: 0.1 for deterministic output
max_tokens: 1500 per chunk

Requires config.ollama.enabled = true and returns both entities and relationships directly from the LLM.

Strategy C: Hybrid (Cost/Quality Balance)

Combines approaches:

First pass algorithmic for obvious entities (high confidence)
Second pass LLM only for ambiguous cases (confidence < threshold)
Cost reduction up to 70% compared to full-LLM

Query Phase: Extracted entities are matched against the query for entity-focused retrieval with boosting of relevant results.

Domain 4: Graph Configuration — Construction and Traversal

Graph construction supports automatic relationship extraction with granular configuration:

Construction parameters:

max_connections: 10 edges per node to limit complexity
similarity_threshold: 0.8 for edge creation based on embedding
relationship_confidence_threshold: 0.5 to filter weak relationships

Pattern-Based Relationship Extraction (Algorithmic):

When extract_relationships = true, co-occurrence analysis generates typed relationships:

PERSON-ORGANIZATION: WORKS_FOR, LEADS, ASSOCIATED_WITH
PERSON-LOCATION: BORN_IN, LOCATED_IN, LIVES_IN
ORGANIZATION-LOCATION: HEADQUARTERED_IN, OPERATES_IN
PERSON-PERSON: MARRIED_TO, COLLEAGUE_OF, KNOWS, FRIEND_OF
CONCEPT-CONCEPT: RELATED_TO, DERIVES_FROM, IMPLEMENTS

LLM Extraction (Gleaning):

The GleaningExtractor returns directly structured relationships with:

Source and target entity IDs
Semantically accurate relationship type
Explicit confidence score
Supporting context snippet

Graph Building:

Nodes: Adding entities with attributes (type, confidence, mentions)
Edges: Creating edges with weights based on confidence
Enrichment algorithms:

PageRank: Calculating entity importance (used in retrieval)
Leiden Community Detection: Clustering for topic discovery
Centrality Measures: Betweenness and closeness for entity ranking

Query Phase: Configurable traversal with:

max_depth: 3 levels of expansion
max_paths: 10 alternative paths for path queries
expansion_penalty: Decay 1.0 → 0.8 → 0.64 for successive hops
Advanced traversal for complex queries (complexity > 0.7)

Domain 5: Retrieval Configuration — Multi-Strategy Fusion Pipeline

The retrieval pipeline is the most significant improvement compared to the previous version, implementing LightRAG-inspired multi-strategy fusion.

Step 1: Query Analysis (Automatic Classification)

The system analyzes the query to determine:

Key entities: Explicitly mentioned entities
Concepts: Abstract terms and themes
Query type: EntityFocused, Relationship, Conceptual, Exploratory, Factual
User intent: Overview, Detailed, Comparative, Causal, Temporal
Complexity score: 0.0–1.0 based on length and structure

Step 2: Strategy Weight Selection (Adaptive Weighting)

Based on query type + intent, the system assigns weights to 3 strategies:

EntityFocused: Vector(0.5) + Graph(0.4) + Hierarchical(0.1)
Relationship: Vector(0.3) + Graph(0.6) + Hierarchical(0.1)
Conceptual+Overview: Vector(0.2) + Graph(0.2) + Hierarchical(0.6)
Exploratory: Vector(0.4) + Graph(0.4) + Hierarchical(0.2)
Factual: Vector(0.6) + Graph(0.3) + Hierarchical(0.1)

Step 3: Parallel Retrieval Execution

Vector Similarity Search:

Cosine similarity in embedding space
Search on HNSW index for sub-linear complexity
Return top_k × 2 to have buffer for fusion
Threshold filtering to eliminate weak matches
Score: similarity × vector_weight

Graph-Based Search:

Entity matching via embedding similarity
Relationship expansion up to max_depth
Neighbor traversal with decay penalty
Score: similarity × decay^hop × graph_weight

Hierarchical Search:

Query on document trees (if summarization enabled)
Adaptive result count per intent:
Overview: 3 high-level summaries
Detailed: 8 granular sections
Other: 5 balanced results
Hierarchy bonus: +0.3 for overview, +0.2 for detailed
Return HierarchicalSummary with context path

BM25 (Optional):

Traditional term frequency analysis
IDF weighting for term importance
Useful for exact keyword matches

PageRank Retrieval (Optional):

Boost results containing high-PageRank entities
Favors central concepts in the corpus

Step 4: Cross-Strategy Fusion

Content similarity grouping:

Groups similar results from different strategies
Fusion boost: +0.2 × (num_strategies - 1)
Results confirmed by multiple strategies rise in ranking

Step 5: Adaptive Ranking (Query-Specific Adjustments)

Contextual boosting:

Entity queries: +20% for ResultType::Entity
Conceptual queries: +10% for ResultType::Hierarchical
Relationship queries: +15% for multi-entity results
Query entity matching: +10% if contains query entities

Step 6: Deduplication & Diversity

Content signature: hash of first 50 chars + length
Type counting: tracking per ResultType
Limit per type to ensure diversity

Step 7: Return Top-K

Results sorted by final score with rich metadata:

Source (chunk/entity/summary)
Score breakdown per strategy
Entity list mentioned
Aggregated confidence

Optional: LLM Answer Generation

If config.ollama.enabled = true:

Top 5 results as context
RAG prompt construction with context injection
LLM generation with controlled temperature
Post-processing and citation extraction

3. Multi-Level Interfaces: From Script to Web App

GraphRAG-CLI: Automation and Batch Processing

The CLI interface is designed for pipeline automation and scripting:

Main features:

Auto-detection of state: identifies if workspace already exists
Intelligent directory naming: “Tom Sawyer.txt” → ./output/tom_sawyer/
Configuration validation with detailed error reporting
Incremental processing: uses existing workspace if available
Batch mode: processes multiple documents in sequence

Typical workflow:

# First execution: build + query
graphrag-cli --config symposium.json5 --query "Who is Socrates?"
# Subsequent executions: use cached workspace
graphrag-cli --config symposium.json5 --query "What is love?"

Advantages:

Zero overhead for subsequent queries (workspace loading <100ms)
Scriptable: integration into CI/CD and automation
Granular logging for debugging

GraphRAG-Server: REST API for Microservices

The HTTP server exposes RESTful endpoints for distributed integration:

Main endpoints:

POST /api/v1/create: Create workspace from document
POST /api/v1/query: Execute query on existing workspace
GET /api/v1/workspaces: List available workspaces
GET /api/v1/workspaces/:id/stats: Statistics and metadata
DELETE /api/v1/workspaces/🆔 Cleanup workspace

Features:

Async processing with Tokio runtime
Connection pooling for database/LLM clients
Configurable rate limiting
OpenAPI/Swagger documentation
Health checks and metrics endpoint

Deployment:

Ready Docker container
Horizontal scaling via stateless design
Load balancing ready

GraphRAG-WASM: Zero-Install Browser Experience

The WebAssembly application brings GraphRAG directly to the browser:

Architecture:

Yew framework for reactive UI
wasm-bindgen for JS interop
Web Workers for background processing
IndexedDB for local persistence

Capabilities:

File upload via drag-and-drop
Real-time processing with progress bar
Interactive graph visualization (D3.js integration)
Local-first: all data stays in the browser

Limitations:

No access to Ollama (CORS restrictions)
Hash embeddings only for now
Browser-dependent memory constraints

Use cases:

Demos and presentations without setup
Privacy-sensitive documents (no server upload)
Educational tool to explore GraphRAG

GraphRAG-TUI: Interactive Terminal Interface

The text interface (Terminal User Interface) offers real-time monitoring:

Features:

Live stats during processing
Progress bars for each phase
Interactive entity/relationship explorer
Query REPL with history
Log viewer with filtering

Technologies:

Ratatui for TUI rendering
Crossterm for input handling
Real-time updates via channels

Advantages:

Visual feedback during long-running operations
Debugging facilitated with state inspection
SSH-friendly: works on remote terminals

4. Performance vs Quality Alternatives: Architectural Decisions

Embeddings Level: From Zero-Cost to State-of-the-Art

Option 1: Hash Embeddings (Maximum Performance)

Latency: <1ms per embedding
Memory: ~4KB per 1000 entities
Quality: Sufficient for entities with distinctive names
Use case: Prototyping, resource-constrained, privacy-first

Option 2: Local LLM (Balanced)

Latency: ~50ms per embedding (Ollama)
Quality: High for semantic similarity
Costs: Zero (self-hosted)
Use case: Production without cloud dependencies

Option 3: Cloud APIs (Maximum Quality)

Latency: ~100–200ms (network overhead)
Quality: State-of-the-art (OpenAI, Voyage)
Costs: Pay-per-use
Use case: High-stakes applications, benchmark quality

Storage Level: JSON vs Parquet vs Lance

JSON (Developer-Friendly)

Write speed: Moderate
Read speed: Slow for large datasets
Compression: Low
Debugging: Excellent (human-readable)
Use case: Development, small datasets (<10K entities)

Parquet (Analytics-Optimized)

Write speed: Fast (columnar batching)
Read speed: Very fast for columnar queries
Compression: High (Snappy/ZSTD)
Integration: Excellent with Pandas, Polars, DuckDB
Use case: Data analysis, large corpora

Lance (Vector-Optimized)

Write speed: Optimized for append
Read speed: Excellent for vector similarity
Compression: High with vector quantization
Features: Versioning, time-travel queries
Use case: Production vector search, ML pipelines

Retrieval Level: Single-Strategy vs Multi-Strategy

Vector-Only (Simple)

Latency: ~5–10ms
Quality: Good for semantic similarity
Limitation: Misses relational information

Graph-Only (Relationship-Focused)

Latency: ~20–50ms (traversal cost)
Quality: Excellent for entity-centric queries
Limitation: Weak for conceptual queries

Multi-Strategy Fusion (Production)

Latency: ~50–100ms (parallel execution)
Quality: Superior thanks to strategy diversity
Robustness: Graceful degradation if one strategy fails
Complexity: Requires weight tuning

5. Roadmap and Future Considerations

Stability and Testing (Alpha → Beta Phase)

As an alpha project, the current priority is:

Comprehensive testing: Expansion from 168 to 500+ test cases
Edge case handling: Identification and fixing of corner cases
Performance profiling: Optimization of identified hotpaths
API stabilization: Locking of public interfaces for semantic versioning

Planned Features

Short-term (Q1 2025):

Streaming responses: Incremental result delivery for large queries
Cache layer: Redis integration for multi-instance deployments
Monitoring dashboard: Grafana/Prometheus metrics
Multi-document queries: Cross-corpus search

Medium-term (Q2-Q3 2025):

GraphQL API: Alternative to REST
Real-time updates: Incremental graph updates without rebuild
Distributed processing: Ray/Dask integration for petabyte-scale
Fine-tuning support: Custom entity/relation extractors

Community Contributions

The project is open-source and welcomes:

Detailed bug reports with minimal reproducible examples
Feature requests with use case descriptions
Pull requests with test coverage
Documentation improvements: Tutorials, examples, translations

Conclusions: GraphRAG-RS as a Production Platform

GraphRAG-RS has evolved from a research prototype to a production-ready platform through:

Modular architecture with core/interfaces separation
Multiple strategies for each level (performance vs quality trade-offs)
Diversified interfaces (CLI, Server, WASM, TUI) for every workflow
Configuration-driven design for maximum flexibility
Workspace persistence for production efficiency

Key improvements over the previous version:

Query latency reduction: from minutes to seconds (workspace caching)
Backend expansion: from 1 to 7 embedding providers
Advanced retrieval: single-strategy → multi-strategy fusion
Deployment options: from CLI-only to 4 complete interfaces
Configuration: from TOML-only to TOML + JSON5 with 9 templates

The 8 configurable domain architecture enables granular tuning for:

Rapid prototyping: Zero-cost pipeline in <5 minutes
Production deployment: Multi-strategy fusion with monitoring
Research: Custom extractors and retrieval strategies

Despite the alpha state, the project demonstrates that Rust is the ideal language for AI infrastructure: type safety, predictable performance, and mature ecosystem for async processing, HTTP servers, and WASM compilation.

The roadmap towards beta and v1.0 continues with focus on stability, testing, and community feedback. GraphRAG-RS aims to become the reference implementation for knowledge graph construction in Rust, combining cutting-edge research with robust engineering.

Resources and Links

Repository: github.com/automataIA/graphrag-rs
Documentation: graphrag-core/README.md and graphrag-core/ENTITY_EXTRACTION.md
Configuration Templates: config/templates/ directory
Examples: graphrag-core/examples/ for usage patterns

DEV Community