DEV Community

James Lee
James Lee

Posted on

Hybrid Knowledge Retrieval: Combining Neo4j Graph Queries, GraphRAG and Vector Search for Enterprise AI Customer Service

1. Introduction: The Blind Spots of Single-Retrieval Approaches and Defining Full-Stack Capability Closure

This is Part 6 of the series 8 Weeks from Zero to One: Full-Stack Engineering Practice for a Production-Grade LLM Customer Service System. In the first five parts, we completed the MVP architecture, multimodal data pipeline, GraphRAG service wrapping, multi-agent workflow design, and end-to-end safety guardrail system. This article completes the final piece of the system's core capability puzzle — a hybrid knowledge retrieval system — achieving full-stack capability closure from user input to compliant output.

We define production-grade capability closure as: any legitimate customer service query from a user can be fully processed within the system through an automated pipeline of "intent recognition → task decomposition → precise retrieval → safety validation → result output" — with no manual intervention, no cross-system handoffs, while meeting production-grade requirements for compliance, stability, and low latency.

Enterprise customer service queries are never "single-type." Some users ask "Where is the shipping for Order #123?" (structured query), others ask "How do I connect this smart bulb to WiFi?" (unstructured knowledge query), and others ask "What is the after-sales policy for the product in Order #123?" (complex hybrid query). Relying on a single retrieval approach creates obvious capability blind spots that make true closure impossible:

Retrieval Approach Strengths Core Limitations
Neo4j Text2Cypher Precise structured data queries (orders/inventory/customers), fast response, high accuracy Requires strict permission control, vulnerable to injection attacks, cannot cover unstructured knowledge queries
GraphRAG Knowledge graph multi-hop reasoning, cross-chapter semantic queries in long documents (e.g., "full product line after-sales policy") Low efficiency for pure FAQ and short-text fuzzy matching; heavily dependent on graph construction quality
Vector Search Fuzzy semantic matching, unstructured short-text/FAQ queries (e.g., "What is the return process?") Cannot handle structured relational queries, no multi-hop reasoning support, long-document context easily lost

No single retrieval approach can satisfy all scenarios — it either fails on structured queries, struggles with unstructured knowledge, or introduces security risks. We must therefore build a hybrid knowledge base system that coordinates Neo4j structured queries + GraphRAG knowledge graph retrieval + vector semantic search, letting each retrieval capability do what it does best, achieving a "1+1+1>3" effect and ultimately delivering full-stack capability closure.


2. Hybrid Retrieval Architecture: Coordinating Three Retrieval Capabilities with End-to-End Governance

The core of our hybrid knowledge base is a full pipeline of "task decomposition → intelligent routing → parallel retrieval → safety validation → result fusion", letting each retrieval approach handle its specialty while providing a unified invocation interface to the upper-layer Agent system, with safety guardrails embedded throughout to ensure production-grade stability and compliance.

2.1 Architecture Overview

┌──────────────────────────────────────────────────────────────────────┐
│                     User Input + User Identity Info                  │
└───────────────────────────────────┬──────────────────────────────────┘
                                    │
┌───────────────────────────────────▼──────────────────────────────────┐
│           Planner (Complex Query Decomposition + Intent Recognition) │
│  Example: "What is the after-sales policy for the product in         │
│            Order #123?"                                              │
│  → Decomposed into: ["Query product info for Order #123",           │
│                       "Query after-sales policy for that product"]  │
└───────────────────────────────────┬──────────────────────────────────┘
                                    │ Subtask list
┌───────────────────────────────────▼──────────────────────────────────┐
│         Tool Selector (Intelligent Routing + Pre-Safety Validation)  │
│   Subtask 1 → Text2Cypher (structured order query)                  │
│   Subtask 2 → GraphRAG (unstructured after-sales policy query)      │
└──────┬──────────────────────────────┬──────────────────┬────────────┘
       │                              │                  │
┌──────▼──────────┐   ┌──────────────▼──┐   ┌──────────▼──────────────┐
│  Text2Cypher    │   │  Vector Search   │   │       GraphRAG          │
│  Orders /       │   │  Fuzzy semantic/ │   │  Knowledge graph        │
│  Inventory /    │   │  FAQ matching    │   │  multi-hop reasoning /  │
│  Structured     │   │                  │   │  long-doc unstructured  │
└──────┬──────────┘   └────────┬─────────┘   └──────────┬─────────────┘
       └────────────────────────┴──────────────────────────┘
                                    │ Retrieval results from all paths
┌───────────────────────────────────▼──────────────────────────────────┐
│       Result Fusion → Factual Consistency Check → Final Answer       │
└──────────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Diagram note: User input is first decomposed into independent subtasks by the Planner, then routed to the corresponding retrieval tool by the Tool Selector. Safety validation is embedded throughout. Results are fused to generate a compliant final answer, achieving full-stack capability closure.

2.1.1 Complex Query Decomposition (Planner)

A task decomposition prompt framework customized for e-commerce scenarios breaks multi-intent, mixed-type queries into independent, dependency-free subtasks, eliminating the blind spots that arise when a single retrieval approach cannot cover all cases:

PLANNER_SYSTEM_PROMPT = """
You are the task planning component of an e-commerce intelligent customer service system.
Your responsibility is to analyze user queries and decompose them into independent,
executable subtasks.

Core rules:
1. Simple single-intent queries do not need decomposition — return the original query directly.
2. Multi-intent mixed queries MUST be decomposed into independent subtasks with no
   dependencies or overlaps between them.
3. Key information such as user identity, order numbers, and product names MUST be
   preserved in each subtask.
4. Return ONLY the subtask list. Do NOT output any other content.
"""
Enter fullscreen mode Exit fullscreen mode

Example: "What beverages does Northwind Trading carry, and what are their after-sales policies?" is decomposed into ["What beverage products does Northwind Trading carry?", "What are the after-sales policies for Northwind Trading's beverage products?"], routed separately to structured query and unstructured retrieval.

2.1.2 Intelligent Routing Rules (Tool Selector)

We define clear, e-commerce-specific routing logic for each subtask, combining business priority to precisely assign tools while completing pre-flight safety validation:

TOOL_SELECTION_SYSTEM_PROMPT = """
You are the tool selection component of an e-commerce intelligent customer service system.
Your responsibility is to select the most appropriate retrieval tool for each subtask.

Tool selection priority and rules:
1. [HIGHEST PRIORITY] Structured data queries (orders, products, inventory, customers,
   logistics, pricing, suppliers, etc.):
   - High-frequency fixed scenarios: use predefined_cypher (pre-built Cypher templates)
   - Complex dynamic queries: use cypher_query (dynamically generated Cypher)

2. Unstructured long-document / cross-chapter knowledge queries (after-sales policies,
   warranty terms, product manuals, troubleshooting guides, etc.):
   Use microsoft_graphrag_query (GraphRAG knowledge graph retrieval)

3. Short-text FAQ / fuzzy semantic matching / similar question lookup:
   Use vector_search (vector semantic search)

Output ONLY the tool name. Do NOT output any other content:
predefined_cypher / cypher_query / microsoft_graphrag_query / vector_search
"""
Enter fullscreen mode Exit fullscreen mode
Subtask Type Routed Tool Example Scenarios
Structured data queries (products/orders/customers/inventory) predefined_cypher / cypher_query "Check shipping for Order #123" / "How much inventory is left for this product?"
Unstructured long-doc knowledge queries (after-sales/manuals/troubleshooting) microsoft_graphrag_query "What is the return policy?" / "How do I connect the smart bulb to WiFi?"
Short-text FAQ / fuzzy semantic matching vector_search "Is there anything like a '7-day no-questions-asked' return policy?"

2.2 Production-Grade Implementation and Security Governance for All Three Retrieval Capabilities

2.2.1 Text2Cypher Structured Queries: Security as the Top Priority

Structured queries directly touch core enterprise business data — security design is the foundational prerequisite for production deployment. We implement three layers of compliance and security, fully inheriting the safety guardrail system from Part 5:

  1. Strong identity binding: All queries must carry the current logged-in user's user_id; only that user's own orders and personal information may be queried. Cross-user order lookups are blocked at the syntax level;
  2. Predefined templates first: 80% of high-frequency queries are encapsulated as predefined_cypher templates — no dynamic Cypher generation required, just parameter substitution and execution, eliminating injection risk at the root;
  3. Triple validation for dynamic generation: For the minority of complex dynamic queries, a three-stage validation pipeline is enforced — syntax validation → operation permission check → input sanitization. Only MATCH/RETURN read operations are permitted; all write operations are blocked; sensitive characters are filtered to prevent injection.

2.2.2 GraphRAG Unstructured Knowledge Retrieval: End-to-End Data Consistency

Building on the GraphRAG service capabilities from Parts 2 and 3, this layer handles long-document and cross-chapter unstructured knowledge queries, with an added incremental index synchronization mechanism:

  • When unstructured data such as product manuals or after-sales policies is updated, an incremental index build is automatically triggered — no full rebuild required;
  • Indexes for different data sources are isolated by directory, preventing interference and ensuring consistency and stability during data updates.

2.2.3 Vector Search: Supplementary Fallback for Short-Text FAQ Scenarios

As the supplementary fallback capability of the hybrid retrieval system, vector search is optimized for high-frequency short-text FAQ scenarios:

  • A vector store is built using a BGE-zh model fine-tuned on e-commerce data, covering the Top 200 high-frequency customer service FAQs;
  • Retrieval results are deduplicated and fused with GraphRAG results to avoid information redundancy.

2.3 Production-Grade Core Capabilities

2.3.1 Hybrid Retrieval Fallback and Degradation Strategy

To ensure 24/7 availability in production, we designed a three-tier degradation strategy so that any single retrieval path failure does not affect the overall service:

  1. Tier 1 degradation (single tool failure): When one retrieval tool times out or becomes unavailable, traffic is automatically rerouted to a backup tool. Example: GraphRAG service timeout → unstructured queries automatically fall back to vector search;
  2. Tier 2 degradation (multiple tool failures): When multiple retrieval paths fail, the system automatically switches to a predefined FAQ fallback library to maintain basic consultation capability;
  3. Tier 3 degradation (full pipeline failure): When core services are unavailable, a standardized fallback response is returned immediately, directing users to contact a human agent — preventing service collapse.

All degradation events are recorded in the audit log and trigger alerting notifications, enabling operations teams to quickly locate and resolve issues.

2.3.2 Multi-Source Index Synchronization

Data consistency is the foundational prerequisite of the hybrid retrieval system. We designed two synchronization mechanisms to ensure real-time consistency across structured and unstructured data:

  1. Structured data sync: Business data such as orders, products, and inventory is monitored via Binlog. Data changes are automatically synced to the Neo4j graph database with latency < 1s, ensuring query results are fully consistent with the business system;
  2. Unstructured data sync: Documents such as product manuals and after-sales policies are managed by version number. When a document is added or modified, a MinerU parsing → incremental index build pipeline is automatically triggered. The live index is hot-swapped upon completion with zero service interruption.

2.3.3 Result Fusion Prompt Design and Engineering Logic

The quality of multi-path retrieval result fusion directly determines the quality of the final answer. Our core design principle is "aggregate by business logic category + factual consistency validation + customer service language standards". Core prompt framework:

RESULT_FUSION_SYSTEM_PROMPT = """
You are the result fusion component of an e-commerce intelligent customer service system.
Your responsibility is to integrate results returned by multiple retrieval tools into a
single, logically coherent response that conforms to customer service language standards.

Core rules:
1. Generate answers STRICTLY based on retrieval results. Do NOT fabricate any information
   not present in the retrieved content.
2. Present structured query results first, then unstructured query results, in clear
   logical order.
3. If different retrieval results contain conflicting information, the structured business
   database result takes precedence.
4. Language must be friendly, concise, and conform to e-commerce customer service standards.
5. Do NOT expose any information about retrieval tools or technical implementation details.
"""
Enter fullscreen mode Exit fullscreen mode

At the engineering level, a secondary factual consistency check is also applied: the generated final answer is re-matched against the original retrieval results to ensure zero hallucinations and zero false commitments, fully inheriting the hallucination validation capability from Part 5.

2.3.4 Unified Interface: Enabling "Transparent Invocation" for Upper-Layer Agents

We provide the upper-layer multi-agent system with a unified knowledge base interface that abstracts away the underlying retrieval methods, security validations, and degradation strategies, significantly reducing system complexity:

def query_hybrid_kb(user_query: str, user_id: str) -> dict:
    """Unified hybrid knowledge base query interface

    Args:
        user_query: Raw user query text
        user_id: Current logged-in user ID, used for permission validation

    Returns:
        Fused final answer and retrieval provenance information
    """
    # 1. Complex query decomposition
    sub_tasks = planner_decompose(user_query)
    task_results = []

    # 2. Subtask routing and parallel retrieval
    for task in sub_tasks:
        # Tool selection with pre-flight safety validation
        selected_tool = tool_selector(task)
        # Execute retrieval with built-in timeout control and fallback logic
        result = execute_tool_with_fallback(selected_tool, task, user_id)
        task_results.append({"task": task, "tool": selected_tool, "result": result})

    # 3. Result fusion and factual validation
    final_answer = fuse_and_validate_results(task_results)

    # 4. End-to-end audit log
    record_audit_log(user_id, user_query, task_results)

    return {
        "answer": final_answer,
        "source": task_results
    }
Enter fullscreen mode Exit fullscreen mode

Core value: Upper-layer Agents only need to call this single interface — no need to know which retrieval method was used or what security validations were applied. True capability encapsulation and reuse.


3. End-to-End Validation: Hybrid Knowledge Base vs. Single-Retrieval Approaches

We sampled 1,000 user queries from real e-commerce customer service logs (400 structured, 400 unstructured, 200 complex hybrid), manually annotated by 3 customer service domain experts using the standard "semantically consistent with the reference answer, no false information, conforms to business rules." Core metrics comparison across four approaches:

Metric Text2Cypher Only GraphRAG Only Vector Search Only Hybrid KB (This Article)
Answer accuracy 85% 78% 82% 94%
Full-scenario coverage 55% 65% 60% 98%
Avg. response time (s) 0.8 1.3 1.1 1.2 (see note)
Security violation rate 2% 0% 0% 0%
Complex hybrid query resolution 30% 65% 40% 92%

Note on response time: The hybrid knowledge base's 1.2s average is 0.1s slower than vector search alone — a deliberate trade-off to achieve 98% scenario coverage and 94% accuracy. It comfortably meets the < 2s real-time response requirement for e-commerce customer service.

Key Conclusions

  1. Accuracy and coverage leap: The hybrid knowledge base covers 98% of customer service scenarios with an overall answer accuracy of 94%. Complex hybrid query resolution jumped from a maximum of 65% (GraphRAG only) to 92%, fundamentally solving the core pain points of "can't answer structured questions" and "weak reasoning on unstructured knowledge";
  2. Fully controlled performance: Average response time of 1.2s comfortably meets the < 2s real-time response requirement for e-commerce customer service;
  3. Security and compliance baseline: Through end-to-end permission validation + predefined templates + injection protection, the security violation rate dropped to 0, fully satisfying enterprise-grade data security and compliance requirements.

4. Differentiation Analysis: Our Production-Grade Advantages

Compared to general-purpose open-source RAG solutions such as OpenAI RAG and LlamaIndex, our hybrid knowledge base offers three core advantages in enterprise customer service scenarios:

Dimension General Open-Source RAG This Hybrid KB Solution
Security design Basic permission control only; business-layer adaptation required; no industry-specific security templates End-to-end permission validation + injection protection + fallback; e-commerce enterprise compliance out of the box
Complex query handling Single-intent queries only; no native complex task decomposition Planner + Tool Selector deeply customized; native support for multi-intent decomposition and parallel retrieval
Full-stack closure Retrieval module only; Agent integration, security system, and business system connections must be built separately Complete production-grade closure from data pipeline → GraphRAG service → multi-agent → safety guardrails → hybrid KB, seamlessly integrated with business systems
Scenario fit General-purpose; no industry customization Deeply adapted to e-commerce customer service; 80% of high-frequency business queries pre-templated; out of the box

Core value: Our solution is not a "toy-grade retrieval module stack" — it is a complete enterprise-grade solution directly deployable to production, genuinely solving the core requirements of "deployable, secure, and full-scenario coverage."


5. Production Outcomes and Extensibility Roadmap

5.1 Production Deployment Results

After full-stack integration, our intelligent customer service system v1.0 achieved:

  • Full-scenario coverage: From order queries to after-sales consultation, from product instructions to troubleshooting, 98% of user questions are answered automatically;
  • High stability: Supports 1,000 QPS concurrent load, 24/7 stable operation, zero downtime or data leakage incidents;
  • Low human intervention: Human agent escalation rate reduced from 40% to 10%, significantly lowering enterprise operational costs;
  • Compliance met: Satisfies requirements of China's Personal Information Protection Law and equivalent regulations; zero sensitive information leakage incidents.

5.2 Future Extensibility Directions

  1. Multimodal retrieval expansion: Add image/video retrieval capabilities to support scenarios such as "send a photo to diagnose a fault" or "scan a barcode to look up a product," further lowering user interaction friction;
  2. Self-optimizing intelligent routing: Introduce reinforcement learning to let the system automatically learn "which query type fits which retrieval method" based on user feedback and business outcomes, continuously improving routing accuracy;
  3. Streaming response optimization: Integrate LLM streaming output with KV Cache optimization to compress user-perceived time-to-first-token (TTFT) from 1.2s to under 500ms, further improving conversational experience;
  4. A/B testing framework: Establish an A/B testing mechanism for different retrieval strategies and fusion prompts, using real business data to drive continuous iterative optimization of the hybrid knowledge base.

6. Deployment Boundaries and Series Continuity

6.1 Deployment Boundaries

This hybrid knowledge base system is deeply adapted for e-commerce intelligent customer service scenarios. Highly regulated industries such as healthcare and finance will need to adjust permission control rules, data synchronization mechanisms, and retrieval strategies to align with their respective compliance requirements. Full production deployment requires customized interface integration and data adaptation with the target business system.

6.2 Series Continuity

  • GitHub repository: Link TBD
  • Backward reference: Builds on all five preceding parts — MVP architecture, data pipeline, GraphRAG service wrapping, multi-agent architecture, and safety guardrail system — completing the system's core capability closure.
  • Next up: Part 7 will focus on production-grade optimization, providing a complete breakdown of LLM inference cost and performance control strategies, upgrading the system from "functional" to "efficient and cost-effective." Stay tuned.
  • Series finale: Part 8 will provide a complete retrospective of all architecture decisions, engineering pitfalls, and quantifiable outcomes from MVP to production-grade system, forming a full end-to-end engineering practice record.

Top comments (0)