DEV Community

rerere L.
rerere L.

Posted on • Originally published at blog.gopenai.com on

GraphRAG-RS: Production-Ready Knowledge Graph Platform with Multi-Interface Architecture


Image by Author with recraft.ai

1. Architecture Evolution: From Prototype to Modular Platform

⚠️ Important Note: This project is currently in alpha phase. Despite the complete implementation of the described features, errors or unexpected behaviors may occur. It is recommended to test thoroughly in a development environment before using it in production.

Multi-Layer Architecture with GraphRAG-Core

The main evolution compared to the previous version is the architectural layer separation that transforms the project from a single implementation into a complete modular platform. The heart of the system is graphrag-core, an independent Rust library that exposes all fundamental functionalities through trait-based abstractions.

This architectural refactoring has enabled the creation of four distinct user interfaces sharing the same underlying engine:

  • graphrag-cli: Command-line interface for batch processing and automation
  • graphrag-server: REST API for integration into distributed systems and microservices
  • graphrag-wasm: Web application for browser-based usage without backend
  • graphrag-tui: Interactive text interface for monitoring and debugging

This clear separation between core logic and presentation layer guarantees superior reusability, testability, and maintainability. Each interface can be compiled independently, including only the necessary features, drastically reducing the final footprint.

Dual Configuration System: TOML and JSON5

A significant improvement is the introduction of JSON5 support alongside traditional TOML. JSON5 offers comments, trailing commas, and more flexible syntax, making configuration files more readable and maintainable for complex projects.

The template configuration system has been expanded with 9 predefined configurations:

  • semantic_pipeline.graphrag.json5: LLM-based extraction for maximum quality
  • algorithmic_pipeline.graphrag.json5: Completely algorithmic zero-cost approach
  • hybrid_pipeline.graphrag.json5: Balance between performance and quality
  • symposium_with_llm.graphrag.json5: Complete configuration with Ollama integration
  • symposium_zero_cost.graphrag.json5: Maximum efficiency without LLM dependencies

Each template is documented, validated, and optimized for specific use cases, allowing you to start with already configured best practices.

Workspace Persistence: Persistent State and Incremental Updates

Workspace persistence is a completely new feature that solves a critical problem of the previous version: the need to reprocess everything at each execution. Now the system supports:

  • Automatic saving of pipeline state after each phase
  • Incremental loading of existing graphs for multiple queries
  • Multi-format support: JSON (human-readable), Parquet (columnar), Lance (vector-optimized)
  • Metadata tracking with creation statistics and used configuration
  • Configurable auto-save every N entities for resilience against interruptions

This enables realistic workflows where the graph is built once and hundreds of queries are executed without reprocessing overhead, reducing times from minutes to seconds.

2. GraphRAG-Core Pipeline: 8 Configurable Domain Architecture


Graphrag-core Organizational Chart(WSP)

Overview: Config-Driven Architecture

The graphrag-core pipeline is organized around 8 configuration domains that control both the creation phase of the graph and the query/retrieval phase. Each domain offers multiple strategies allowing you to choose between high performance (algorithmic approaches) and high quality (LLM-based approaches).

Creation Phase Query Phase
────────────── ─────────────
Text Processing → Query Analysis
Entity Extraction → Entity Matching
Graph Construction → Graph Traversal
Embedding Gen. → Vector Similarity
Summarization → Hierarchical Search
     ↓ ↓
Persistent Storage ← Multi-Strategy Fusion
Enter fullscreen mode Exit fullscreen mode

This dual-phase architecture allows clear separation of responsibilities: creation builds and enriches the graph, while retrieval optimizes search strategies for different query types.

Domain 1: Text Configuration — Intelligent Chunking

The first step of the pipeline is the segmentation of the document into processable chunks. The configuration allows you to control:

  • chunk_size: 1000 tokens (default), balance between context and granularity
  • chunk_overlap: 200 tokens to preserve semantic continuity
  • boundary_detection: Respect for paragraphs and sentences to avoid artificial breaks

The TextProcessor generates Vec where each chunk receives:

  • ChunkId unique for tracking
  • DocumentId for provenance
  • Metadata with position and statistics

Query Phase: The query text undergoes the same preprocessing for normalization and keyword extraction used in BM25 retrieval.

Domain 2: Embeddings Configuration — Multi-Backend Vector Representations

The embedding system has been completely redesigned to support 7 different backends, each optimized for different trade-offs:

Zero-Cost Backend (Maximum Performance):

  • Hash Embedding: FNV-1a algorithm for deterministic embeddings without ML
  • Similarity threshold: 0.35 empirically calibrated
  • Latency: <1ms, ideal for resource-constrained systems

Local LLM Backends (Quality/Privacy):

  • Ollama: Integration with local models like nomic-embed-text
  • HuggingFace: Support for sentence-transformers/all-MiniLM-L6-v2, BAAI/bge-small-en-v1.5
  • Automatic fallback to hash if service is unavailable

Cloud API Backends (Maximum Quality):

  • OpenAI: text-embedding-3-small (1536 dimensions)
  • Voyage: voyage-2 optimized for retrieval
  • Cohere: embed-english-v3.0 with compression
  • Jina: jina-embeddings-v2-base multilingual

Creation Phase: Generates embeddings for:

  • Each chunk of processed text
  • Each extracted entity (format: “{name} {type}”)
  • Re-indexing of the graph with vectors for fast lookup

Query Phase: Converts the query into a vector for cosine similarity search in the vector index.

Domain 3: Entity Configuration — Adaptive Multi-Strategy Extraction

Entity extraction offers 3 strategies with increasing levels of quality and cost:

Strategy A: Algorithmic (Pattern-Based — Zero Cost)

The EntityExtractor uses linguistic rules to identify:

  • PERSON/CHARACTER: Capitalization + titles (Dr., Mr., Mrs., Professor)
  • ORGANIZATION: Suffixes (Inc, Corp, LLC, University, Institute, Foundation)
  • LOCATION: Matching with dictionary of known places
  • CONCEPT: Abstract terms (Theory, Principle, Philosophy, Methodology)
  • EVENT: Temporal keywords (meeting, war, conference, revolution)
  • OBJECT: Tool and artifact detection

Confidence scoring is based on:

  • Pattern strength: exact vs fuzzy matching
  • Context indicators: co-occurrences with known entities
  • Capitalization consistency: uppercase maintenance

Automatic deduplication for (name, type) pairs with merging of mention counts.

Strategy B: Semantic (LLM-Based Gleaning — Maximum Quality)

The GleaningEntityExtractor implements iterative extraction in 3 rounds:

  1. Round 1: Initial extraction with structured prompt
  2. Round 2: Review and identification of missing entities
  3. Round 3: Final validation and confidence refinement

Gleaning configuration:

  • max_gleaning_rounds: 3 (iterative refinement)
  • confidence_threshold: 0.8 to filter false positives
  • temperature: 0.1 for deterministic output
  • max_tokens: 1500 per chunk

Requires config.ollama.enabled = true and returns both entities and relationships directly from the LLM.

Strategy C: Hybrid (Cost/Quality Balance)

Combines approaches:

  • First pass algorithmic for obvious entities (high confidence)
  • Second pass LLM only for ambiguous cases (confidence < threshold)
  • Cost reduction up to 70% compared to full-LLM

Query Phase: Extracted entities are matched against the query for entity-focused retrieval with boosting of relevant results.

Domain 4: Graph Configuration — Construction and Traversal

Graph construction supports automatic relationship extraction with granular configuration:

Construction parameters:

  • max_connections: 10 edges per node to limit complexity
  • similarity_threshold: 0.8 for edge creation based on embedding
  • relationship_confidence_threshold: 0.5 to filter weak relationships

Pattern-Based Relationship Extraction (Algorithmic):

When extract_relationships = true, co-occurrence analysis generates typed relationships:

  • PERSON-ORGANIZATION: WORKS_FOR, LEADS, ASSOCIATED_WITH
  • PERSON-LOCATION: BORN_IN, LOCATED_IN, LIVES_IN
  • ORGANIZATION-LOCATION: HEADQUARTERED_IN, OPERATES_IN
  • PERSON-PERSON: MARRIED_TO, COLLEAGUE_OF, KNOWS, FRIEND_OF
  • CONCEPT-CONCEPT: RELATED_TO, DERIVES_FROM, IMPLEMENTS

LLM Extraction (Gleaning):

The GleaningExtractor returns directly structured relationships with:

  • Source and target entity IDs
  • Semantically accurate relationship type
  • Explicit confidence score
  • Supporting context snippet

Graph Building:

  1. Nodes: Adding entities with attributes (type, confidence, mentions)
  2. Edges: Creating edges with weights based on confidence
  3. Enrichment algorithms:
  • PageRank: Calculating entity importance (used in retrieval)

  • Leiden Community Detection: Clustering for topic discovery

  • Centrality Measures: Betweenness and closeness for entity ranking

Query Phase: Configurable traversal with:

  • max_depth: 3 levels of expansion
  • max_paths: 10 alternative paths for path queries
  • expansion_penalty: Decay 1.0 → 0.8 → 0.64 for successive hops
  • Advanced traversal for complex queries (complexity > 0.7)

Domain 5: Retrieval Configuration — Multi-Strategy Fusion Pipeline

The retrieval pipeline is the most significant improvement compared to the previous version, implementing LightRAG-inspired multi-strategy fusion.

Step 1: Query Analysis (Automatic Classification)

The system analyzes the query to determine:

  • Key entities: Explicitly mentioned entities
  • Concepts: Abstract terms and themes
  • Query type: EntityFocused, Relationship, Conceptual, Exploratory, Factual
  • User intent: Overview, Detailed, Comparative, Causal, Temporal
  • Complexity score: 0.0–1.0 based on length and structure

Step 2: Strategy Weight Selection (Adaptive Weighting)

Based on query type + intent, the system assigns weights to 3 strategies:

EntityFocused: Vector(0.5) + Graph(0.4) + Hierarchical(0.1)
Relationship: Vector(0.3) + Graph(0.6) + Hierarchical(0.1)
Conceptual+Overview: Vector(0.2) + Graph(0.2) + Hierarchical(0.6)
Exploratory: Vector(0.4) + Graph(0.4) + Hierarchical(0.2)
Factual: Vector(0.6) + Graph(0.3) + Hierarchical(0.1)
Enter fullscreen mode Exit fullscreen mode

Step 3: Parallel Retrieval Execution

Vector Similarity Search:

  • Cosine similarity in embedding space
  • Search on HNSW index for sub-linear complexity
  • Return top_k × 2 to have buffer for fusion
  • Threshold filtering to eliminate weak matches
  • Score: similarity × vector_weight

Graph-Based Search:

  • Entity matching via embedding similarity
  • Relationship expansion up to max_depth
  • Neighbor traversal with decay penalty
  • Score: similarity × decay^hop × graph_weight

Hierarchical Search:

  • Query on document trees (if summarization enabled)
  • Adaptive result count per intent:

  • Overview: 3 high-level summaries

  • Detailed: 8 granular sections

  • Other: 5 balanced results

  • Hierarchy bonus: +0.3 for overview, +0.2 for detailed

  • Return HierarchicalSummary with context path

BM25 (Optional):

  • Traditional term frequency analysis
  • IDF weighting for term importance
  • Useful for exact keyword matches

PageRank Retrieval (Optional):

  • Boost results containing high-PageRank entities
  • Favors central concepts in the corpus

Step 4: Cross-Strategy Fusion

Content similarity grouping:

  • Groups similar results from different strategies
  • Fusion boost: +0.2 × (num_strategies - 1)
  • Results confirmed by multiple strategies rise in ranking

Step 5: Adaptive Ranking (Query-Specific Adjustments)

Contextual boosting:

  • Entity queries: +20% for ResultType::Entity
  • Conceptual queries: +10% for ResultType::Hierarchical
  • Relationship queries: +15% for multi-entity results
  • Query entity matching: +10% if contains query entities

Step 6: Deduplication & Diversity

  • Content signature: hash of first 50 chars + length
  • Type counting: tracking per ResultType
  • Limit per type to ensure diversity

Step 7: Return Top-K

Results sorted by final score with rich metadata:

  • Source (chunk/entity/summary)
  • Score breakdown per strategy
  • Entity list mentioned
  • Aggregated confidence

Optional: LLM Answer Generation

If config.ollama.enabled = true:

  1. Top 5 results as context
  2. RAG prompt construction with context injection
  3. LLM generation with controlled temperature
  4. Post-processing and citation extraction

3. Multi-Level Interfaces: From Script to Web App

GraphRAG-CLI: Automation and Batch Processing

The CLI interface is designed for pipeline automation and scripting:

Main features:

  • Auto-detection of state: identifies if workspace already exists
  • Intelligent directory naming: “Tom Sawyer.txt” → ./output/tom_sawyer/
  • Configuration validation with detailed error reporting
  • Incremental processing: uses existing workspace if available
  • Batch mode: processes multiple documents in sequence

Typical workflow:

# First execution: build + query
graphrag-cli --config symposium.json5 --query "Who is Socrates?"
# Subsequent executions: use cached workspace
graphrag-cli --config symposium.json5 --query "What is love?"
Enter fullscreen mode Exit fullscreen mode

Advantages:

  • Zero overhead for subsequent queries (workspace loading <100ms)
  • Scriptable: integration into CI/CD and automation
  • Granular logging for debugging

GraphRAG-Server: REST API for Microservices

The HTTP server exposes RESTful endpoints for distributed integration:

Main endpoints:

  • POST /api/v1/create: Create workspace from document
  • POST /api/v1/query: Execute query on existing workspace
  • GET /api/v1/workspaces: List available workspaces
  • GET /api/v1/workspaces/:id/stats: Statistics and metadata
  • DELETE /api/v1/workspaces/🆔 Cleanup workspace

Features:

  • Async processing with Tokio runtime
  • Connection pooling for database/LLM clients
  • Configurable rate limiting
  • OpenAPI/Swagger documentation
  • Health checks and metrics endpoint

Deployment:

  • Ready Docker container
  • Horizontal scaling via stateless design
  • Load balancing ready

GraphRAG-WASM: Zero-Install Browser Experience

The WebAssembly application brings GraphRAG directly to the browser:

Architecture:

  • Yew framework for reactive UI
  • wasm-bindgen for JS interop
  • Web Workers for background processing
  • IndexedDB for local persistence

Capabilities:

  • File upload via drag-and-drop
  • Real-time processing with progress bar
  • Interactive graph visualization (D3.js integration)
  • Local-first: all data stays in the browser

Limitations:

  • No access to Ollama (CORS restrictions)
  • Hash embeddings only for now
  • Browser-dependent memory constraints

Use cases:

  • Demos and presentations without setup
  • Privacy-sensitive documents (no server upload)
  • Educational tool to explore GraphRAG

GraphRAG-TUI: Interactive Terminal Interface

The text interface (Terminal User Interface) offers real-time monitoring:

Features:

  • Live stats during processing
  • Progress bars for each phase
  • Interactive entity/relationship explorer
  • Query REPL with history
  • Log viewer with filtering

Technologies:

  • Ratatui for TUI rendering
  • Crossterm for input handling
  • Real-time updates via channels

Advantages:

  • Visual feedback during long-running operations
  • Debugging facilitated with state inspection
  • SSH-friendly: works on remote terminals

4. Performance vs Quality Alternatives: Architectural Decisions

Embeddings Level: From Zero-Cost to State-of-the-Art

Option 1: Hash Embeddings (Maximum Performance)

  • Latency: <1ms per embedding
  • Memory: ~4KB per 1000 entities
  • Quality: Sufficient for entities with distinctive names
  • Use case: Prototyping, resource-constrained, privacy-first

Option 2: Local LLM (Balanced)

  • Latency: ~50ms per embedding (Ollama)
  • Quality: High for semantic similarity
  • Costs: Zero (self-hosted)
  • Use case: Production without cloud dependencies

Option 3: Cloud APIs (Maximum Quality)

  • Latency: ~100–200ms (network overhead)
  • Quality: State-of-the-art (OpenAI, Voyage)
  • Costs: Pay-per-use
  • Use case: High-stakes applications, benchmark quality

Storage Level: JSON vs Parquet vs Lance

JSON (Developer-Friendly)

  • Write speed: Moderate
  • Read speed: Slow for large datasets
  • Compression: Low
  • Debugging: Excellent (human-readable)
  • Use case: Development, small datasets (<10K entities)

Parquet (Analytics-Optimized)

  • Write speed: Fast (columnar batching)
  • Read speed: Very fast for columnar queries
  • Compression: High (Snappy/ZSTD)
  • Integration: Excellent with Pandas, Polars, DuckDB
  • Use case: Data analysis, large corpora

Lance (Vector-Optimized)

  • Write speed: Optimized for append
  • Read speed: Excellent for vector similarity
  • Compression: High with vector quantization
  • Features: Versioning, time-travel queries
  • Use case: Production vector search, ML pipelines

Retrieval Level: Single-Strategy vs Multi-Strategy

Vector-Only (Simple)

  • Latency: ~5–10ms
  • Quality: Good for semantic similarity
  • Limitation: Misses relational information

Graph-Only (Relationship-Focused)

  • Latency: ~20–50ms (traversal cost)
  • Quality: Excellent for entity-centric queries
  • Limitation: Weak for conceptual queries

Multi-Strategy Fusion (Production)

  • Latency: ~50–100ms (parallel execution)
  • Quality: Superior thanks to strategy diversity
  • Robustness: Graceful degradation if one strategy fails
  • Complexity: Requires weight tuning

5. Roadmap and Future Considerations

Stability and Testing (Alpha → Beta Phase)

As an alpha project, the current priority is:

  • Comprehensive testing: Expansion from 168 to 500+ test cases
  • Edge case handling: Identification and fixing of corner cases
  • Performance profiling: Optimization of identified hotpaths
  • API stabilization: Locking of public interfaces for semantic versioning

Planned Features

Short-term (Q1 2025):

  • Streaming responses: Incremental result delivery for large queries
  • Cache layer: Redis integration for multi-instance deployments
  • Monitoring dashboard: Grafana/Prometheus metrics
  • Multi-document queries: Cross-corpus search

Medium-term (Q2-Q3 2025):

  • GraphQL API: Alternative to REST
  • Real-time updates: Incremental graph updates without rebuild
  • Distributed processing: Ray/Dask integration for petabyte-scale
  • Fine-tuning support: Custom entity/relation extractors

Community Contributions

The project is open-source and welcomes:

  • Detailed bug reports with minimal reproducible examples
  • Feature requests with use case descriptions
  • Pull requests with test coverage
  • Documentation improvements: Tutorials, examples, translations

Conclusions: GraphRAG-RS as a Production Platform

GraphRAG-RS has evolved from a research prototype to a production-ready platform through:

  1. Modular architecture with core/interfaces separation
  2. Multiple strategies for each level (performance vs quality trade-offs)
  3. Diversified interfaces (CLI, Server, WASM, TUI) for every workflow
  4. Configuration-driven design for maximum flexibility
  5. Workspace persistence for production efficiency

Key improvements over the previous version:

  • Query latency reduction: from minutes to seconds (workspace caching)
  • Backend expansion: from 1 to 7 embedding providers
  • Advanced retrieval: single-strategy → multi-strategy fusion
  • Deployment options: from CLI-only to 4 complete interfaces
  • Configuration: from TOML-only to TOML + JSON5 with 9 templates

The 8 configurable domain architecture enables granular tuning for:

  • Rapid prototyping: Zero-cost pipeline in <5 minutes
  • Production deployment: Multi-strategy fusion with monitoring
  • Research: Custom extractors and retrieval strategies

Despite the alpha state, the project demonstrates that Rust is the ideal language for AI infrastructure: type safety, predictable performance, and mature ecosystem for async processing, HTTP servers, and WASM compilation.

The roadmap towards beta and v1.0 continues with focus on stability, testing, and community feedback. GraphRAG-RS aims to become the reference implementation for knowledge graph construction in Rust, combining cutting-edge research with robust engineering.

Resources and Links

  • Repository: github.com/automataIA/graphrag-rs
  • Documentation: graphrag-core/README.md and graphrag-core/ENTITY_EXTRACTION.md
  • Configuration Templates: config/templates/ directory
  • Examples: graphrag-core/examples/ for usage patterns

Top comments (0)