DEV Community

Cover image for From Knowledge Graph Generation to RAG for Stablecoin Regulatory Intelligence
Mark Gyles for SurrealDB

Posted on • Originally published at surrealdb.com

From Knowledge Graph Generation to RAG for Stablecoin Regulatory Intelligence

We’re excited to share this community-written deep dive by Sugi Venugeethan into Stablebridge, a project tackling the complex world of stablecoin regulation. This article explores how knowledge graphs, RAG systems, and SurrealDB can be combined to connect it all together. It’s a practical look into knowledge graph generation to advanced retrieval methodologies - showcasing both challenges and breakthroughs along the way.

Stablebridge: From Knowledge Graph Generation to RAG for Stablecoin Regulatory Intelligence

Preamble: The Stablebridge Vision

Stablebridge represents our ambitious mission to create comprehensive regulatory intelligence systems for the rapidly evolving stablecoin landscape. Our vision encompasses the systematic analysis of all major regulatory frameworks across US and EU jurisdictions - from Congressional bills like the GENIUS Act to European MiCA regulations, Federal Reserve guidance, Treasury Department rulings, and emerging state-level legislation.

The complexity of stablecoin regulation spans multiple jurisdictions, regulatory bodies, and constantly evolving compliance requirements. Traditional approaches to regulatory analysis fall short when dealing with:

  • Cross-jurisdictional compliance mapping between US federal, state, and EU regulatory frameworks
  • Dynamic regulatory landscapes with frequent updates and amendments
  • Multi-stakeholder requirements affecting issuers, custodians, exchanges, and users
  • Technical specification analysis covering blockchain protocols, reserve requirements, and audit standards

Stablebridge aims to bridge these gaps through advanced knowledge graph technologies and intelligent retrieval systems that can navigate the intricate web of stablecoin regulations with precision and speed.

Abstract

This blog post documents our comprehensive exploration of Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) systems within the Stablebridge project, from initial knowledge graph generation using kg-gen to final performance evaluation against traditional RAG approaches. We detail the technical challenges, limitations discovered, solutions implemented, and comparative analysis results across different retrieval methodologies for stablecoin regulatory intelligence.

1. Introduction: The Challenge of Stablecoin Regulatory Complexity

Stablecoin regulatory documents, particularly comprehensive legislation like the GENIUS Act, present unique challenges for information retrieval systems within the broader Stablebridge mission:

  • Complex cross-references: Stablecoin regulations frequently reference other sections, creating intricate dependency networks across multiple regulatory frameworks
  • Technical terminology: Domain-specific language covering blockchain technology, monetary policy, and financial compliance
  • Multi-hop reasoning: Regulatory questions often require connecting information across different jurisdictions and regulatory bodies
  • Structured relationships: Hierarchical organization with dependencies between federal, state, and international compliance requirements

These challenges motivated our exploration of knowledge graph-based approaches versus traditional vector-based retrieval systems, specifically tailored for the comprehensive stablecoin regulatory landscape that Stablebridge aims to navigate.

2. Knowledge Graph Generation with kg-gen

2.1 Tool Overview

We utilized the kg-gen library, a Python-based knowledge graph generation tool that extracts entities and relationships from unstructured text using Large Language Models (LLMs).

Key Features:

  • LLM-powered entity extraction
  • Relationship identification and classification
  • Configurable chunking strategies
  • JSON output format for downstream processing

2.2 Implementation Process

Our implementation focused on the GENIUS Act as our initial target within the broader Stablebridge regulatory corpus, representing a critical piece of US federal stablecoin legislation:

# Basic kg-gen configuration for Stablebridge regulatory analysisfrom kg_gen import KnowledgeGraphGenerator

config = {
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "entity_types": ["regulation", "requirement", "entity", "process", "stablecoin_provision"],
    "domain_focus": "stablecoin_regulation",
    "output_format": "json"
}

kg_generator = KnowledgeGraphGenerator(config)
knowledge_graph = kg_generator.process_document("genius_act.pdf")
Enter fullscreen mode Exit fullscreen mode

2.3 Results and Output

The kg-gen process produced a knowledge graph with:

  • 170 unique entities extracted from the GENIUS Act
  • 283 relationships connecting these entities
  • Hierarchical structure preserving regulatory document organization

Sample Entity Structure:

{
    "entities": [
        "Digital Asset Market Structure",
        "Segregated Account Requirements",
        "Stablecoin Regulatory Compliance Framework",
        "Consumer Protection Measures for Digital Assets"
    ],
    "relationships": [
        {
            "source": "Digital Asset Market Structure",
            "target": "Segregated Account Requirements",
            "type": "implements",
            "context": "Market structure regulations implement segregated
                account requirements for stablecoin consumer protection
                under the GENIUS Act framework"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

2.4 Limitations Discovered

During implementation, we identified several critical limitations of kg-gen within the Stablebridge context:

  1. Entity Extraction Inconsistency: The tool occasionally missed important stablecoin regulatory concepts or extracted overly granular entities that didn’t align with regulatory structure
  2. Relationship Quality Variance: Some relationships were semantically weak or incorrectly classified, particularly for complex cross-jurisdictional references
  3. Context Loss: Long-range dependencies across document sections were sometimes missed, critical for understanding regulatory compliance chains
  4. Processing Speed: Large regulatory documents required significant processing time and computational resources, limiting scalability for the full Stablebridge corpus

3. Graph Database Implementation with SurrealDB

3.1 Technology Choice: Rust-Based SurrealDB

We selected SurrealDB as our graph database solution for several reasons:

  • Multi-model capabilities: Support for both document and graph data models
  • Performance: Rust-based implementation offering high-speed operations
  • Flexible querying: SQL-like syntax with graph traversal capabilities
  • REST API: Easy integration with Python applications

3.2 Knowledge Graph Loading Process

The generated JSON knowledge graph was loaded into SurrealDB using a structured approach:

import requests
import json

class SurrealDBLoader:
    def __init__(self, db_url="http://localhost:8000"):
        self.db_url = db_url
        self.headers = {"Content-Type": "application/json"}

    def load_entities(self, entities):
        """Load entities into the database"""
        for entity in entities:
            query = f"""
            CREATE entity SET
                name = "{entity}",
                type = "regulatory_concept",
                created_at = time::now()
            """
            self._execute_query(query)

    def load_relationships(self, relationships):
        """Load relationships between entities"""
        for rel in relationships:
            query = f"""
            RELATE (SELECT * FROM entity WHERE name = "{rel['source']}")
            -> {rel['type']}
            -> (SELECT * FROM entity WHERE name = "{rel['target']}")
            SET context = "{rel['context']}"
            """
            self._execute_query(query)
Enter fullscreen mode Exit fullscreen mode

3.3 Database Schema Design

Our SurrealDB schema was designed to optimize for stablecoin regulatory queries within the Stablebridge framework:

-- Entity table for stablecoin regulatory concepts
DEFINE TABLE entity SCHEMAFULL;
DEFINE FIELD name ON entity TYPE string;
DEFINE FIELD type ON entity TYPE string;
DEFINE FIELD content ON entity TYPE string;
DEFINE FIELD jurisdiction ON entity TYPE string;-- US, EU, State-level
DEFINE FIELD regulatory_body ON entity TYPE string;-- Fed, Treasury, SEC, etc.
DEFINE FIELD created_at ON entity TYPE datetime;

-- Relationship edges with regulatory context information
DEFINE TABLE relationship SCHEMAFULL;
DEFINE FIELD in ON relationship TYPE record;
DEFINE FIELD out ON relationship TYPE record;
DEFINE FIELD type ON relationship TYPE string;
DEFINE FIELD context ON relationship TYPE string;
DEFINE FIELD confidence ON relationship TYPE float;
DEFINE FIELD jurisdiction_scope ON relationship TYPE string;
Enter fullscreen mode Exit fullscreen mode

4. KG-RAG Implementation and Challenges

4.1 Initial KG-RAG Architecture

Our knowledge graph-based RAG system was designed with the following components:

class KGRAGSystem:
    def __init__(self, surrealdb_client, llm_client):
        self.db = surrealdb_client
        self.llm = llm_client
        self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

    def entity_retrieval(self, query):
        """Retrieve relevant entities based on query"""
        query_embedding = self.embedding_model.encode(query)

# Find semantically similar entities
        entities = self.db.query("""
            SELECT * FROM entity
            WHERE vector::similarity::cosine(embedding, $query_embedding) > 0.7
        """, {"query_embedding": query_embedding})

        return entities

    def multi_hop_reasoning(self, entities, max_hops=2):
        """Perform multi-hop traversal for comprehensive context"""
        context_entities = set(entities)

        for hop in range(max_hops):
            new_entities = self.db.query("""
                SELECT * FROM entity WHERE id IN (
                    SELECT ->relationship->entity FROM $current_entities
                    UNION
                    SELECT <-relationship<-entity FROM $current_entities
                )
            """, {"current_entities": list(context_entities)})

            context_entities.update(new_entities)

        return list(context_entities)
Enter fullscreen mode Exit fullscreen mode

4.2 The Confidence Crisis: 0.0% Results

A critical issue emerged during initial testing: our KG-RAG system consistently returned 0.0% confidence scores across all queries. Investigation revealed several root causes:

4.2.1 Semantic Gap Issues

  • Entity granularity mismatch: Extracted entities were either too specific or too general for typical user queries
  • Terminology disconnect: Natural language queries didn’t align well with formal regulatory terminology in the KG

4.2.2 Relationship Quality Problems

# Example of problematic relationship{
    "source": "Section 401(b)(2)",
    "target": "Compliance Framework",
    "type": "references",
    "context": "Section references compliance framework"# Too generic}
Enter fullscreen mode Exit fullscreen mode

4.2.3 Embedding Space Misalignment

  • Query embeddings and entity embeddings existed in different semantic spaces
  • Limited training data for regulatory domain-specific embeddings

4.3 Attempted Solutions and Iterations

We implemented several approaches to address the confidence issues:

  1. Query Expansion: Expanded user queries with domain-specific terminology
  2. Fuzzy Matching: Implemented approximate string matching for entity retrieval
  3. Hybrid Retrieval: Combined vector similarity with keyword matching
  4. Context Enrichment: Added more contextual information to entity representations

Despite these efforts, the fundamental semantic alignment issues persisted.

5. Traditional RAG Implementation: MUVERA-Inspired Approach

5.1 Motivation for Traditional RAG

Given the challenges with pure KG-RAG, we implemented a traditional vector-based RAG system inspired by the MUVERA (Multi-Vector Retrieval Architecture) approach to establish performance baselines.

5.2 Hybrid Retrieval Pipeline

Our traditional RAG system employed a two-stage retrieval process:

class MUVERAInspiredRAG:
    def __init__(self):
# Stage 1: Fast candidate selection
        self.fast_encoder = SentenceTransformer('all-MiniLM-L6-v2')
# Stage 2: Precision reranking
        self.precision_encoder = SentenceTransformer('all-mpnet-base-v2')

        self.vector_store = FAISS(dimension=384)# MiniLM dimension
        self.chunks = []

    def build_index(self, documents):
        """Build FAISS index from document chunks"""
        chunks = self.chunk_documents(documents)
        embeddings = self.fast_encoder.encode(chunks)

        self.vector_store.add(embeddings)
        self.chunks = chunks

    def retrieve(self, query, k1=20, k2=5):
        """Two-stage retrieval process"""
# Stage 1: Fast candidate retrieval
        query_embedding = self.fast_encoder.encode([query])
        scores, indices = self.vector_store.search(query_embedding, k1)

        candidates = [self.chunks[i] for i in indices[0]]

# Stage 2: Precision reranking
        candidate_embeddings = self.precision_encoder.encode(candidates)
        query_embedding_precise = self.precision_encoder.encode([query])

# Compute cosine similarities for reranking
        similarities = cosine_similarity(query_embedding_precise, candidate_embeddings)[0]

# Return top k2 chunks
        top_indices = np.argsort(similarities)[-k2:][::-1]
        return [candidates[i] for i in top_indices]
Enter fullscreen mode Exit fullscreen mode

5.3 KG Text Extraction for Fair Comparison

To ensure fair comparison, we extracted textual content from our knowledge graph:

class KGTextExtractor:
    def __init__(self, kg_data):
        self.entities = kg_data.get('entities', [])
        self.relationships = kg_data.get('relationships', [])

    def extract_text_chunks(self):
        """Extract meaningful text chunks from KG structure"""
        chunks = []

# Extract entity-based chunksfor entity in self.entities:
            if isinstance(entity, str) and len(entity) > 20:
                chunks.append(f"Entity: {entity}")

# Extract relationship-based chunksfor rel in self.relationships:
            if isinstance(rel, dict) and 'context' in rel:
                context = rel['context']
                if len(context) > 50:
                    source = rel.get('source', 'Unknown')
                    target = rel.get('target', 'Unknown')
                    rel_type = rel.get('type', 'related_to')

                    chunk = f"{source} {rel_type} {target}. {context}"
                    chunks.append(chunk)

        return chunks
Enter fullscreen mode Exit fullscreen mode

6. Evaluation Framework and Methodology

6.1 Test Question Development

We developed a comprehensive set of stablecoin regulatory questions targeting different complexity levels within the Stablebridge domain:

stablecoin_regulatory_questions = [
    "What are the segregated account requirements for digital asset market "
    "participants under the GENIUS Act?",

    "How does the GENIUS Act define qualified digital asset custodians for "
    "stablecoin operations?",

    "What compliance frameworks must stablecoin exchanges implement according "
    "to federal regulation?",

    "What are the consumer protection measures outlined in the GENIUS Act for "
    "stablecoin users?",

    "How are conflicts of interest addressed in digital asset custody "
    "arrangements for stablecoins?"
]
Enter fullscreen mode Exit fullscreen mode

6.2 Performance Metrics

Our evaluation framework measured:

  1. Response Time: Average time to generate answers
  2. Retrieval Quality: Relevance of retrieved chunks/entities
  3. Answer Accuracy: Manual assessment of response correctness
  4. System Reliability: Consistency across multiple runs

6.3 Testing Infrastructure

class RAGEvaluator:
    def __init__(self, kg_rag_system, traditional_rag_system):
        self.kg_rag = kg_rag_system
        self.traditional_rag = traditional_rag_system
        self.results = {"kg_rag": [], "traditional_rag": []}

    def evaluate_question(self, question):
        """Evaluate both systems on a single question"""
        results = {}

# Test KG-RAG
        start_time = time.time()
        kg_answer = self.kg_rag.answer_question(question)
        kg_time = time.time() - start_time

# Test Traditional RAG
        start_time = time.time()
        trad_answer = self.traditional_rag.answer_question(question)
        trad_time = time.time() - start_time

        return {
            "question": question,
            "kg_rag": {"answer": kg_answer, "time": kg_time},
            "traditional_rag": {"answer": trad_answer, "time": trad_time}
        }
Enter fullscreen mode Exit fullscreen mode

7. Results and Comparative Analysis

7.1 Performance Comparison

Our comprehensive evaluation revealed significant performance differences:

Metric KG-RAG Traditional RAG Difference
Average Response Time 7.0s 13.2s 46.9% faster
Successful Retrievals 4/5 (80%) 5/5 (100%) Traditional RAG more reliable
Average Chunks Retrieved 3-4 entities 6 chunks Different retrieval granularity
Answer Quality High precision, lower coverage Broader coverage, good precision Complementary strengths

7.2 Detailed Performance Analysis

7.2.1 KG-RAG Strengths

  • Speed Advantage: Significantly faster due to structured data access
  • Precise Reasoning: When working correctly, provided highly targeted answers
  • Multi-hop Capability: Natural support for relationship traversal
  • Structured Output: Answers maintained logical organization

7.2.2 KG-RAG Limitations

  • Brittleness: Sensitive to entity extraction quality
  • Coverage Gaps: Some queries failed due to missing entities or relationships
  • Setup Complexity: Required extensive preprocessing and database configuration

7.2.3 Traditional RAG Strengths

  • Reliability: Consistent performance across all test questions
  • Broader Coverage: Vector similarity captured semantic relationships missed by KG
  • Flexibility: Adaptable to various query types without structural requirements
  • Easier Implementation: Straightforward setup and maintenance

7.2.4 Traditional RAG Limitations

  • Slower Performance: Higher computational overhead for similarity calculations
  • Less Structured Reasoning: Difficulty with multi-hop logical connections
  • Context Dilution: Large chunks sometimes contained irrelevant information

7.3 Use Case Recommendations

Based on our analysis, we recommend:

Choose KG-RAG when:

  • Working with well-structured, relationship-rich stablecoin regulatory documents
  • Speed is critical for real-time regulatory compliance queries
  • Multi-hop reasoning is essential for cross-jurisdictional analysis
  • High-quality entity extraction is achievable for specific regulatory domains

Choose Traditional RAG when:

  • Dealing with diverse, unstructured regulatory content across multiple jurisdictions
  • Reliability and coverage are paramount for comprehensive Stablebridge analysis
  • Setup simplicity is important for rapid deployment across new regulatory documents
  • Working with evolving document collections from multiple regulatory bodies

8. Technical Implementation Insights

8.1 Embedding Model Selection

Our experimentation with different embedding models revealed:

# Performance comparison of embedding models
embedding_models = {
    'all-MiniLM-L6-v2': {
        'speed': 'fast',
        'quality': 'good',
        'dimension': 384,
        'use_case': 'Stage 1 retrieval'
    },
    'all-mpnet-base-v2': {
        'speed': 'moderate',
        'quality': 'excellent',
        'dimension': 768,
        'use_case': 'Stage 2 reranking'
    },
    'sentence-t5-base': {
        'speed': 'slow',
        'quality': 'excellent',
        'dimension': 768,
        'use_case': 'High-precision applications'
    }
}
Enter fullscreen mode Exit fullscreen mode

8.2 Chunking Strategies

Optimal chunking proved crucial for both approaches:

def intelligent_chunking(text, chunk_size=1000, overlap=200):
    """
    Implement intelligent chunking that preserves sentence boundaries
    and maintains contextual coherence
    """
    sentences = sent_tokenize(text)
    chunks = []
    current_chunk = ""

    for sentence in sentences:
        if len(current_chunk + sentence) <= chunk_size:
            current_chunk += sentence + " "
        else:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = sentence + " "

    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks
Enter fullscreen mode Exit fullscreen mode

8.3 Database Optimization Insights

SurrealDB configuration optimizations that improved KG-RAG performance:

-- Index creation for faster entity retrieval
DEFINE INDEX entity_name_idx ON entity FIELDS name;
DEFINE INDEX entity_embedding_idx ON entity FIELDS embedding;

-- Relationship traversal optimization
DEFINE INDEX rel_source_idx ON relationship FIELDS in;
DEFINE INDEX rel_target_idx ON relationship FIELDS out;
Enter fullscreen mode Exit fullscreen mode

9. Future Research Directions

9.1 Hybrid Architecture Development

Our findings suggest potential for hybrid systems combining both approaches:

class HybridRAGSystem:
    def __init__(self, kg_rag, traditional_rag):
        self.kg_rag = kg_rag
        self.traditional_rag = traditional_rag
        self.routing_model = QueryRouter()

    def answer_question(self, question):
        """Route questions to optimal system based on characteristics"""
        query_type = self.routing_model.classify(question)

        if query_type == "structured_reasoning":
            return self.kg_rag.answer_question(question)
        elif query_type == "broad_search":
            return self.traditional_rag.answer_question(question)
        else:
# Ensemble approach
            kg_answer = self.kg_rag.answer_question(question)
            trad_answer = self.traditional_rag.answer_question(question)
            return self.merge_answers(kg_answer, trad_answer)
Enter fullscreen mode Exit fullscreen mode

9.2 Enhanced Entity Extraction

Improving kg-gen output quality through:

  • Domain-specific training data
  • Active learning approaches
  • Human-in-the-loop validation
  • Multi-model ensemble extraction

9.3 Dynamic System Selection

Implementing intelligent routing based on:

  • Query complexity analysis
  • Real-time performance monitoring
  • User feedback integration
  • Context-aware decision making

10. Conclusion

Our comprehensive Stablebridge journey from knowledge graph generation to comparative RAG evaluation has revealed the nuanced trade-offs between structured and unstructured approaches to stablecoin regulatory intelligence. While KG-RAG demonstrated superior speed and reasoning capabilities when functioning correctly, Traditional RAG provided more reliable and comprehensive coverage across diverse regulatory queries.

This research directly supports the Stablebridge mission of creating robust regulatory intelligence systems capable of navigating the complex landscape of US and EU stablecoin regulations, from Congressional legislation to Federal Reserve guidance and European MiCA frameworks.

Key Takeaways:

  1. No Universal Solution: Both approaches have distinct strengths suitable for different regulatory analysis scenarios
  2. Quality DependenciesKG-RAG success heavily depends on upstream knowledge graph quality, critical for regulatory precision
  3. Implementation Complexity: Traditional RAG offers simpler setup and maintenance for diverse regulatory corpus
  4. Performance Trade-offs: Speed vs. reliability represents a fundamental design choice for real-time regulatory compliance
  5. Future Potential: Hybrid approaches may combine the best of both worlds for comprehensive Stablebridge coverage

Technical Contributions to Stablebridge:

  • Comprehensive evaluation framework for comparing KG-RAG vs Traditional RAG in regulatory contexts
  • MUVERA-inspired hybrid retrieval implementation optimized for stablecoin regulations
  • SurrealDB based knowledge graph infrastructure with regulatory-specific schema design
  • Performance optimization insights for both approaches in financial regulatory domains
  • Real-world regulatory document processing pipeline ready for expansion across US/EU frameworks

This research provides a foundation for informed decision-making when selecting RAG architectures for complex stablecoin regulatory analysis tasks, directly supporting Stablebridge’s goal of comprehensive regulatory intelligence across all major jurisdictions.

See the original blog and other blogs from the same author.

References

  1. kg-gen Research Paper: Liao, J., et al. (2025). “KG-Gen: Scalable Knowledge Graph Generation from Unstructured Text using Large Language Models.” arXiv preprint arXiv:2502.09956. Available at: https://arxiv.org/pdf/2502.09956
  2. kg-gen Implementation: STAIR Lab. “kg-gen: Knowledge Graph Generation Tool.” GitHub repository. Available at: https://github.com/stair-lab/kg-gen
  3. KG-RAG Framework: Vector Institute. “KG-RAG: Knowledge Graph Retrieval Augmented Generation.” GitHub repository. Available at: https://github.com/VectorInstitute/kg-rag
  4. SurrealDB: SurrealDB Team. “SurrealDB: A scalable, distributed, collaborative, document-graph database.” GitHub repository. Available at: https://github.com/surrealdb/surrealdb

Top comments (0)