
Image by Author with recraft.ai
1. Architecture Evolution: From Prototype to Modular Platform
⚠️ Important Note: This project is currently in alpha phase. Despite the complete implementation of the described features, errors or unexpected behaviors may occur. It is recommended to test thoroughly in a development environment before using it in production.
Multi-Layer Architecture with GraphRAG-Core
The main evolution compared to the previous version is the architectural layer separation that transforms the project from a single implementation into a complete modular platform. The heart of the system is graphrag-core, an independent Rust library that exposes all fundamental functionalities through trait-based abstractions.
This architectural refactoring has enabled the creation of four distinct user interfaces sharing the same underlying engine:
- graphrag-cli: Command-line interface for batch processing and automation
- graphrag-server: REST API for integration into distributed systems and microservices
- graphrag-wasm: Web application for browser-based usage without backend
- graphrag-tui: Interactive text interface for monitoring and debugging
This clear separation between core logic and presentation layer guarantees superior reusability, testability, and maintainability. Each interface can be compiled independently, including only the necessary features, drastically reducing the final footprint.
Dual Configuration System: TOML and JSON5
A significant improvement is the introduction of JSON5 support alongside traditional TOML. JSON5 offers comments, trailing commas, and more flexible syntax, making configuration files more readable and maintainable for complex projects.
The template configuration system has been expanded with 9 predefined configurations:
- semantic_pipeline.graphrag.json5: LLM-based extraction for maximum quality
- algorithmic_pipeline.graphrag.json5: Completely algorithmic zero-cost approach
- hybrid_pipeline.graphrag.json5: Balance between performance and quality
- symposium_with_llm.graphrag.json5: Complete configuration with Ollama integration
- symposium_zero_cost.graphrag.json5: Maximum efficiency without LLM dependencies
Each template is documented, validated, and optimized for specific use cases, allowing you to start with already configured best practices.
Workspace Persistence: Persistent State and Incremental Updates
Workspace persistence is a completely new feature that solves a critical problem of the previous version: the need to reprocess everything at each execution. Now the system supports:
- Automatic saving of pipeline state after each phase
- Incremental loading of existing graphs for multiple queries
- Multi-format support: JSON (human-readable), Parquet (columnar), Lance (vector-optimized)
- Metadata tracking with creation statistics and used configuration
- Configurable auto-save every N entities for resilience against interruptions
This enables realistic workflows where the graph is built once and hundreds of queries are executed without reprocessing overhead, reducing times from minutes to seconds.
2. GraphRAG-Core Pipeline: 8 Configurable Domain Architecture

Graphrag-core Organizational Chart(WSP)
Overview: Config-Driven Architecture
The graphrag-core pipeline is organized around 8 configuration domains that control both the creation phase of the graph and the query/retrieval phase. Each domain offers multiple strategies allowing you to choose between high performance (algorithmic approaches) and high quality (LLM-based approaches).
Creation Phase Query Phase
────────────── ─────────────
Text Processing → Query Analysis
Entity Extraction → Entity Matching
Graph Construction → Graph Traversal
Embedding Gen. → Vector Similarity
Summarization → Hierarchical Search
↓ ↓
Persistent Storage ← Multi-Strategy Fusion
This dual-phase architecture allows clear separation of responsibilities: creation builds and enriches the graph, while retrieval optimizes search strategies for different query types.
Domain 1: Text Configuration — Intelligent Chunking
The first step of the pipeline is the segmentation of the document into processable chunks. The configuration allows you to control:
- chunk_size: 1000 tokens (default), balance between context and granularity
- chunk_overlap: 200 tokens to preserve semantic continuity
- boundary_detection: Respect for paragraphs and sentences to avoid artificial breaks
The TextProcessor generates Vec where each chunk receives:
- ChunkId unique for tracking
- DocumentId for provenance
- Metadata with position and statistics
Query Phase: The query text undergoes the same preprocessing for normalization and keyword extraction used in BM25 retrieval.
Domain 2: Embeddings Configuration — Multi-Backend Vector Representations
The embedding system has been completely redesigned to support 7 different backends, each optimized for different trade-offs:
Zero-Cost Backend (Maximum Performance):
- Hash Embedding: FNV-1a algorithm for deterministic embeddings without ML
- Similarity threshold: 0.35 empirically calibrated
- Latency: <1ms, ideal for resource-constrained systems
Local LLM Backends (Quality/Privacy):
- Ollama: Integration with local models like nomic-embed-text
- HuggingFace: Support for sentence-transformers/all-MiniLM-L6-v2, BAAI/bge-small-en-v1.5
- Automatic fallback to hash if service is unavailable
Cloud API Backends (Maximum Quality):
- OpenAI: text-embedding-3-small (1536 dimensions)
- Voyage: voyage-2 optimized for retrieval
- Cohere: embed-english-v3.0 with compression
- Jina: jina-embeddings-v2-base multilingual
Creation Phase: Generates embeddings for:
- Each chunk of processed text
- Each extracted entity (format: “{name} {type}”)
- Re-indexing of the graph with vectors for fast lookup
Query Phase: Converts the query into a vector for cosine similarity search in the vector index.
Domain 3: Entity Configuration — Adaptive Multi-Strategy Extraction
Entity extraction offers 3 strategies with increasing levels of quality and cost:
Strategy A: Algorithmic (Pattern-Based — Zero Cost)
The EntityExtractor uses linguistic rules to identify:
- PERSON/CHARACTER: Capitalization + titles (Dr., Mr., Mrs., Professor)
- ORGANIZATION: Suffixes (Inc, Corp, LLC, University, Institute, Foundation)
- LOCATION: Matching with dictionary of known places
- CONCEPT: Abstract terms (Theory, Principle, Philosophy, Methodology)
- EVENT: Temporal keywords (meeting, war, conference, revolution)
- OBJECT: Tool and artifact detection
Confidence scoring is based on:
- Pattern strength: exact vs fuzzy matching
- Context indicators: co-occurrences with known entities
- Capitalization consistency: uppercase maintenance
Automatic deduplication for (name, type) pairs with merging of mention counts.
Strategy B: Semantic (LLM-Based Gleaning — Maximum Quality)
The GleaningEntityExtractor implements iterative extraction in 3 rounds:
- Round 1: Initial extraction with structured prompt
- Round 2: Review and identification of missing entities
- Round 3: Final validation and confidence refinement
Gleaning configuration:
- max_gleaning_rounds: 3 (iterative refinement)
- confidence_threshold: 0.8 to filter false positives
- temperature: 0.1 for deterministic output
- max_tokens: 1500 per chunk
Requires config.ollama.enabled = true and returns both entities and relationships directly from the LLM.
Strategy C: Hybrid (Cost/Quality Balance)
Combines approaches:
- First pass algorithmic for obvious entities (high confidence)
- Second pass LLM only for ambiguous cases (confidence < threshold)
- Cost reduction up to 70% compared to full-LLM
Query Phase: Extracted entities are matched against the query for entity-focused retrieval with boosting of relevant results.
Domain 4: Graph Configuration — Construction and Traversal
Graph construction supports automatic relationship extraction with granular configuration:
Construction parameters:
- max_connections: 10 edges per node to limit complexity
- similarity_threshold: 0.8 for edge creation based on embedding
- relationship_confidence_threshold: 0.5 to filter weak relationships
Pattern-Based Relationship Extraction (Algorithmic):
When extract_relationships = true, co-occurrence analysis generates typed relationships:
- PERSON-ORGANIZATION: WORKS_FOR, LEADS, ASSOCIATED_WITH
- PERSON-LOCATION: BORN_IN, LOCATED_IN, LIVES_IN
- ORGANIZATION-LOCATION: HEADQUARTERED_IN, OPERATES_IN
- PERSON-PERSON: MARRIED_TO, COLLEAGUE_OF, KNOWS, FRIEND_OF
- CONCEPT-CONCEPT: RELATED_TO, DERIVES_FROM, IMPLEMENTS
LLM Extraction (Gleaning):
The GleaningExtractor returns directly structured relationships with:
- Source and target entity IDs
- Semantically accurate relationship type
- Explicit confidence score
- Supporting context snippet
Graph Building:
- Nodes: Adding entities with attributes (type, confidence, mentions)
- Edges: Creating edges with weights based on confidence
- Enrichment algorithms:
PageRank: Calculating entity importance (used in retrieval)
Leiden Community Detection: Clustering for topic discovery
Centrality Measures: Betweenness and closeness for entity ranking
Query Phase: Configurable traversal with:
- max_depth: 3 levels of expansion
- max_paths: 10 alternative paths for path queries
- expansion_penalty: Decay 1.0 → 0.8 → 0.64 for successive hops
- Advanced traversal for complex queries (complexity > 0.7)
Domain 5: Retrieval Configuration — Multi-Strategy Fusion Pipeline
The retrieval pipeline is the most significant improvement compared to the previous version, implementing LightRAG-inspired multi-strategy fusion.
Step 1: Query Analysis (Automatic Classification)
The system analyzes the query to determine:
- Key entities: Explicitly mentioned entities
- Concepts: Abstract terms and themes
- Query type: EntityFocused, Relationship, Conceptual, Exploratory, Factual
- User intent: Overview, Detailed, Comparative, Causal, Temporal
- Complexity score: 0.0–1.0 based on length and structure
Step 2: Strategy Weight Selection (Adaptive Weighting)
Based on query type + intent, the system assigns weights to 3 strategies:
EntityFocused: Vector(0.5) + Graph(0.4) + Hierarchical(0.1)
Relationship: Vector(0.3) + Graph(0.6) + Hierarchical(0.1)
Conceptual+Overview: Vector(0.2) + Graph(0.2) + Hierarchical(0.6)
Exploratory: Vector(0.4) + Graph(0.4) + Hierarchical(0.2)
Factual: Vector(0.6) + Graph(0.3) + Hierarchical(0.1)
Step 3: Parallel Retrieval Execution
Vector Similarity Search:
- Cosine similarity in embedding space
- Search on HNSW index for sub-linear complexity
- Return top_k × 2 to have buffer for fusion
- Threshold filtering to eliminate weak matches
- Score: similarity × vector_weight
Graph-Based Search:
- Entity matching via embedding similarity
- Relationship expansion up to max_depth
- Neighbor traversal with decay penalty
- Score: similarity × decay^hop × graph_weight
Hierarchical Search:
- Query on document trees (if summarization enabled)
Adaptive result count per intent:
Overview: 3 high-level summaries
Detailed: 8 granular sections
Other: 5 balanced results
Hierarchy bonus: +0.3 for overview, +0.2 for detailed
Return HierarchicalSummary with context path
BM25 (Optional):
- Traditional term frequency analysis
- IDF weighting for term importance
- Useful for exact keyword matches
PageRank Retrieval (Optional):
- Boost results containing high-PageRank entities
- Favors central concepts in the corpus
Step 4: Cross-Strategy Fusion
Content similarity grouping:
- Groups similar results from different strategies
- Fusion boost: +0.2 × (num_strategies - 1)
- Results confirmed by multiple strategies rise in ranking
Step 5: Adaptive Ranking (Query-Specific Adjustments)
Contextual boosting:
- Entity queries: +20% for ResultType::Entity
- Conceptual queries: +10% for ResultType::Hierarchical
- Relationship queries: +15% for multi-entity results
- Query entity matching: +10% if contains query entities
Step 6: Deduplication & Diversity
- Content signature: hash of first 50 chars + length
- Type counting: tracking per ResultType
- Limit per type to ensure diversity
Step 7: Return Top-K
Results sorted by final score with rich metadata:
- Source (chunk/entity/summary)
- Score breakdown per strategy
- Entity list mentioned
- Aggregated confidence
Optional: LLM Answer Generation
If config.ollama.enabled = true:
- Top 5 results as context
- RAG prompt construction with context injection
- LLM generation with controlled temperature
- Post-processing and citation extraction
3. Multi-Level Interfaces: From Script to Web App
GraphRAG-CLI: Automation and Batch Processing
The CLI interface is designed for pipeline automation and scripting:
Main features:
- Auto-detection of state: identifies if workspace already exists
- Intelligent directory naming: “Tom Sawyer.txt” → ./output/tom_sawyer/
- Configuration validation with detailed error reporting
- Incremental processing: uses existing workspace if available
- Batch mode: processes multiple documents in sequence
Typical workflow:
# First execution: build + query
graphrag-cli --config symposium.json5 --query "Who is Socrates?"
# Subsequent executions: use cached workspace
graphrag-cli --config symposium.json5 --query "What is love?"
Advantages:
- Zero overhead for subsequent queries (workspace loading <100ms)
- Scriptable: integration into CI/CD and automation
- Granular logging for debugging
GraphRAG-Server: REST API for Microservices
The HTTP server exposes RESTful endpoints for distributed integration:
Main endpoints:
- POST /api/v1/create: Create workspace from document
- POST /api/v1/query: Execute query on existing workspace
- GET /api/v1/workspaces: List available workspaces
- GET /api/v1/workspaces/:id/stats: Statistics and metadata
- DELETE /api/v1/workspaces/🆔 Cleanup workspace
Features:
- Async processing with Tokio runtime
- Connection pooling for database/LLM clients
- Configurable rate limiting
- OpenAPI/Swagger documentation
- Health checks and metrics endpoint
Deployment:
- Ready Docker container
- Horizontal scaling via stateless design
- Load balancing ready
GraphRAG-WASM: Zero-Install Browser Experience
The WebAssembly application brings GraphRAG directly to the browser:
Architecture:
- Yew framework for reactive UI
- wasm-bindgen for JS interop
- Web Workers for background processing
- IndexedDB for local persistence
Capabilities:
- File upload via drag-and-drop
- Real-time processing with progress bar
- Interactive graph visualization (D3.js integration)
- Local-first: all data stays in the browser
Limitations:
- No access to Ollama (CORS restrictions)
- Hash embeddings only for now
- Browser-dependent memory constraints
Use cases:
- Demos and presentations without setup
- Privacy-sensitive documents (no server upload)
- Educational tool to explore GraphRAG
GraphRAG-TUI: Interactive Terminal Interface
The text interface (Terminal User Interface) offers real-time monitoring:
Features:
- Live stats during processing
- Progress bars for each phase
- Interactive entity/relationship explorer
- Query REPL with history
- Log viewer with filtering
Technologies:
- Ratatui for TUI rendering
- Crossterm for input handling
- Real-time updates via channels
Advantages:
- Visual feedback during long-running operations
- Debugging facilitated with state inspection
- SSH-friendly: works on remote terminals
4. Performance vs Quality Alternatives: Architectural Decisions
Embeddings Level: From Zero-Cost to State-of-the-Art
Option 1: Hash Embeddings (Maximum Performance)
- Latency: <1ms per embedding
- Memory: ~4KB per 1000 entities
- Quality: Sufficient for entities with distinctive names
- Use case: Prototyping, resource-constrained, privacy-first
Option 2: Local LLM (Balanced)
- Latency: ~50ms per embedding (Ollama)
- Quality: High for semantic similarity
- Costs: Zero (self-hosted)
- Use case: Production without cloud dependencies
Option 3: Cloud APIs (Maximum Quality)
- Latency: ~100–200ms (network overhead)
- Quality: State-of-the-art (OpenAI, Voyage)
- Costs: Pay-per-use
- Use case: High-stakes applications, benchmark quality
Storage Level: JSON vs Parquet vs Lance
JSON (Developer-Friendly)
- Write speed: Moderate
- Read speed: Slow for large datasets
- Compression: Low
- Debugging: Excellent (human-readable)
- Use case: Development, small datasets (<10K entities)
Parquet (Analytics-Optimized)
- Write speed: Fast (columnar batching)
- Read speed: Very fast for columnar queries
- Compression: High (Snappy/ZSTD)
- Integration: Excellent with Pandas, Polars, DuckDB
- Use case: Data analysis, large corpora
Lance (Vector-Optimized)
- Write speed: Optimized for append
- Read speed: Excellent for vector similarity
- Compression: High with vector quantization
- Features: Versioning, time-travel queries
- Use case: Production vector search, ML pipelines
Retrieval Level: Single-Strategy vs Multi-Strategy
Vector-Only (Simple)
- Latency: ~5–10ms
- Quality: Good for semantic similarity
- Limitation: Misses relational information
Graph-Only (Relationship-Focused)
- Latency: ~20–50ms (traversal cost)
- Quality: Excellent for entity-centric queries
- Limitation: Weak for conceptual queries
Multi-Strategy Fusion (Production)
- Latency: ~50–100ms (parallel execution)
- Quality: Superior thanks to strategy diversity
- Robustness: Graceful degradation if one strategy fails
- Complexity: Requires weight tuning
5. Roadmap and Future Considerations
Stability and Testing (Alpha → Beta Phase)
As an alpha project, the current priority is:
- Comprehensive testing: Expansion from 168 to 500+ test cases
- Edge case handling: Identification and fixing of corner cases
- Performance profiling: Optimization of identified hotpaths
- API stabilization: Locking of public interfaces for semantic versioning
Planned Features
Short-term (Q1 2025):
- Streaming responses: Incremental result delivery for large queries
- Cache layer: Redis integration for multi-instance deployments
- Monitoring dashboard: Grafana/Prometheus metrics
- Multi-document queries: Cross-corpus search
Medium-term (Q2-Q3 2025):
- GraphQL API: Alternative to REST
- Real-time updates: Incremental graph updates without rebuild
- Distributed processing: Ray/Dask integration for petabyte-scale
- Fine-tuning support: Custom entity/relation extractors
Community Contributions
The project is open-source and welcomes:
- Detailed bug reports with minimal reproducible examples
- Feature requests with use case descriptions
- Pull requests with test coverage
- Documentation improvements: Tutorials, examples, translations
Conclusions: GraphRAG-RS as a Production Platform
GraphRAG-RS has evolved from a research prototype to a production-ready platform through:
- Modular architecture with core/interfaces separation
- Multiple strategies for each level (performance vs quality trade-offs)
- Diversified interfaces (CLI, Server, WASM, TUI) for every workflow
- Configuration-driven design for maximum flexibility
- Workspace persistence for production efficiency
Key improvements over the previous version:
- Query latency reduction: from minutes to seconds (workspace caching)
- Backend expansion: from 1 to 7 embedding providers
- Advanced retrieval: single-strategy → multi-strategy fusion
- Deployment options: from CLI-only to 4 complete interfaces
- Configuration: from TOML-only to TOML + JSON5 with 9 templates
The 8 configurable domain architecture enables granular tuning for:
- Rapid prototyping: Zero-cost pipeline in <5 minutes
- Production deployment: Multi-strategy fusion with monitoring
- Research: Custom extractors and retrieval strategies
Despite the alpha state, the project demonstrates that Rust is the ideal language for AI infrastructure: type safety, predictable performance, and mature ecosystem for async processing, HTTP servers, and WASM compilation.
The roadmap towards beta and v1.0 continues with focus on stability, testing, and community feedback. GraphRAG-RS aims to become the reference implementation for knowledge graph construction in Rust, combining cutting-edge research with robust engineering.
Resources and Links
- Repository: github.com/automataIA/graphrag-rs
- Documentation: graphrag-core/README.md and graphrag-core/ENTITY_EXTRACTION.md
- Configuration Templates: config/templates/ directory
- Examples: graphrag-core/examples/ for usage patterns
Top comments (0)