1. Introduction: Four Core Pain Points of Single-Agent Architecture in Customer Service
In e-commerce customer service scenarios, user requests are often complex and multi-dimensional. A typical user message might look like this:
"Check the shipping status of Order #123, look up the after-sales warranty policy for this product, and update my delivery address."
This single message contains three independent intents, requires two different data sources, and demands coordinated execution. Single-agent architecture exposes four unavoidable pain points in scenarios like this:
- No complex task decomposition: A single agent cannot break down composite requests into executable subtasks — it either handles only one intent or produces a confused, incomplete response;
- Poor tool call robustness: When an external tool fails (Neo4j timeout, GraphRAG service unavailable), a single agent falls into an infinite retry loop with no circuit-breaking mechanism, blocking the entire service;
- Fragmented multi-source retrieval: Structured order data (Neo4j) and unstructured product documentation (GraphRAG) require completely different retrieval strategies — a single agent cannot coordinate both within a single response;
- No end-to-end governance: Without a unified safety control node, there is no way to implement circuit breaking, content compliance checks, or permission management — failing to meet enterprise-grade compliance requirements.
This article builds on the technical foundations from the first three parts (MinerU multimodal parsing, Neo4j knowledge graph, GraphRAG service wrapping) to present a complete walkthrough of building an enterprise-grade multi-agent system with LangGraph — solving all four pain points through a layered decoupled architecture, precise intent routing, and end-to-end safety governance.
2. Full-Stack System Architecture
The system adopts a three-tier macro architecture with six decoupled sub-layers, fully isolating the underlying infrastructure from the upper-layer business application.
┌─────────────────────────────────────────────────────────┐
│ LLM Application Architecture Layer │
│ │
│ Application: User Service │ Session Service │ KB Service │
│ │
│ Function: Multi-Agent │ Safety Guardrails │ Hybrid KB Retrieval │
│ Offline/Online Index Build │ Text2Cypher Debug │
├─────────────────────────────────────────────────────────┤
│ LLM Technical Architecture Layer │
│ │
│ Core: Agent │ RAG │ Workflow │
│ Framework: LangChain / LangGraph / Microsoft GraphRAG │
│ Interface: Vue / FastAPI / SSE / Open API │
├─────────────────────────────────────────────────────────┤
│ LLM Platform Architecture Layer │
│ │
│ Model: DeepSeek Online │ vLLM Private Deployment │
│ Data: MySQL │ Redis │ Neo4J │ LanceDB │ Local Disk │
│ Infra: Cloud Server │ GPU Server │ Docker Platform │
└─────────────────────────────────────────────────────────┘
2.1 LLM Application Architecture Layer
The application layer faces users and the frontend directly, comprising three core modules:
- User Service: Login, registration, identity verification, and permission management;
- Session Service: Conversation lifecycle management, context storage, and session state synchronization;
- Knowledge Base Service: Upload, parsing, and index management for product manuals and after-sales policies, integrated with the MinerU multimodal parsing capability from Part 2.
The function layer is the core business capability carrier of the multi-agent system:
- Multi-Agent Architecture: End-to-end coordination covering intent routing, task decomposition, tool execution, and result aggregation;
- Safety Guardrails: Circuit breaking, timeout control, content compliance checks, and request rate limiting;
- Hybrid Knowledge Base Retrieval: Unified query entry point integrating Neo4j structured retrieval and GraphRAG unstructured retrieval;
- Offline/Online Index Construction: Supports batch offline full indexing and real-time incremental updates for streaming data;
- Text2Cypher Debugging: Natural language to Cypher generation, syntax validation, and logic correction.
2.2 LLM Technical Architecture Layer
This layer provides standardized technical capabilities to the upper business layers:
- Core capability layer: Three capability units — Agent scheduling, RAG retrieval augmentation, and Workflow orchestration;
- Framework layer: LangChain/LangGraph for multi-agent workflow orchestration; Microsoft GraphRAG for unstructured knowledge base retrieval;
- Interface layer: Vue frontend, FastAPI backend, SSE streaming responses, and Open API standardized integration.
2.3 LLM Platform Architecture Layer
This layer provides compute, storage, and model capabilities:
- Model layer: Dual-model strategy — DeepSeek online model for general conversation and intent recognition; vLLM private deployment for sensitive business data processing;
- Data layer: Hybrid storage — MySQL for structured business data, Redis for session state caching, Neo4J for the business knowledge graph, LanceDB for vector data;
- Infrastructure layer: Cloud server + GPU server compute foundation with Docker-based containerized deployment.
3. Multi-Agent Workflow: End-to-End Design
Based on LangGraph's StateGraph, the multi-agent collaboration process is abstracted into an observable, governable, and traceable state machine.
┌─────────┐
│ Start │
└────┬────┘
│
┌────────────▼────────────┐
│ analyze_and_route_query │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ route_query │
└──┬──────┬───────┬───┬───┘
│ │ │ │
General Clarify Query Image
│ │ │ │
│ │ ┌──▼──┐│
│ │ │Planner│
│ │ └──┬──┘│
│ │ ┌───┼───┐│
│ │ │ │ ││
│ │ Tool1 Tool2 Tool3
│ │ │ │ ││
│ │ └───┼───┘│
│ │ │ │
└──────┴──────▼────┘
│
┌────────▼────────┐
│ Summary │
└────────┬────────┘
│
┌────────▼────────┐
│ Final Answer │
└────────┬────────┘
│
┌───▼───┐
│ End │
└───────┘
3.1 Entry Node: analyze_and_route_query
The sole entry point for all user requests. Core responsibilities: receive user input, inject context, and trigger intent classification.
Design decision: Analysis and routing are merged into a single node rather than split into two. The reason is that intent analysis depends on the result of context injection — merging eliminates one state read/write cycle and reduces latency.
3.2 Core Decision Node: route_query
This is the "brain" of the entire workflow. It uses an LLM to perform precise intent classification and routes user requests to one of four processing branches.
The core challenge in classification design is defining clear boundaries between categories to prevent classification drift in ambiguous scenarios. Our approach: define classification boundaries using positive/negative sample contrast. After multiple iterations, classification accuracy improved from 78% in the initial version to 94%.
3.3 Four Branch Processing Logic
Branch 1: General Q&A
No external tools required. A response is generated directly via Prompt + LLM.
Use cases: Small talk, greetings, simple rule-based Q&A.
Branch 2: Clarification Required
Core design: Before prompting the user to provide more information, a business relevance check is performed first.
- Relevance check passes → Generate a guided response prompting the user to supply the required parameters;
- Relevance check fails → Return a fallback response directing the user to contact a human agent.
Design decision: The relevance check is anchored to the Neo4j Schema definition and business scope description — not left to free-form LLM judgment. This binds the check result to explicit business boundaries and prevents the LLM from over-generalizing.
Branch 3: Image Q&A
A multimodal LLM parses the image content, extracts key information, and generates the corresponding response.
Use cases: Users uploading screenshots of products, orders, or shipping information to ask questions.
Branch 4: Query Q&A (Core Branch)
This is the system's core processing branch, integrating all technical outputs from the first three parts. It consists of three sub-steps:
Step 1: Planner — Task Decomposition
Decomposes the user's complex query into multiple subtasks that can be executed in parallel or in sequence, specifying the goal, required tool, and execution order for each subtask.
Design decision: The Planner's output format is strictly defined as structured JSON. Core field design:
| Field | Description |
|---|---|
task_id |
Unique subtask identifier for ordered result aggregation |
task_type |
Subtask type identifier for routing to the corresponding tool |
tool |
The tool type required for the subtask |
dependencies |
Dependency relationships controlling parallel/sequential execution order |
Enforcing structured output ensures that the downstream tool selection node can parse results unambiguously, eliminating the uncertainty introduced by natural language descriptions.
Step 2: Tool Selection and Execution
Based on subtask type, requests are automatically routed to one of three tools:
Tool 1: GraphRAG Query
- Use cases: Unstructured data queries (product specifications, after-sales policies, product manuals);
- Integrates the GraphRAG RESTful API wrapped in Part 3, supporting Local / Global / Drift / Basic retrieval modes;
- Tool selection logic: Retrieval mode is automatically selected based on query scope and depth — Local Search for precise local queries, Global Search for broad conceptual queries.
Tool 2: Generate Cypher
- Use cases: Custom queries on structured business data (order status, shipping information, delivery address);
- Integrates the Neo4j knowledge graph from Part 2, converting natural language to Cypher via a "Schema injection → LLM generation → syntax validation → execution" pipeline;
- Key design: Generated Cypher is mandatorily validated for syntax and logic. On failure, it is sent back to the LLM for regeneration, with a maximum of 2 retries before falling back to a predefined result.
Tool 3: Predefined Cypher
- Use cases: High-frequency, fixed structured queries (list all orders, check product inventory);
- Matches the user query against predefined requirement descriptions by similarity, then directly fills in parameters and executes — no dynamic LLM generation required;
- Design value: Covers approximately 80% of high-frequency query scenarios, pushing accuracy for this segment to near 100% while significantly reducing latency and token consumption.
Step 3: Safety Governance
Safety guardrails are active throughout the entire tool execution lifecycle:
- Pre-execution: Validate parameter legality, user permissions, and call frequency;
- During execution: Timeout control (configurable threshold per tool call) and circuit breaking (configurable maximum tool calls per conversation turn);
- Post-execution: Validate relevance and compliance of returned results; filter sensitive information.
3.4 Result Aggregation: Summary Node
Collects execution results from all branches and subtasks, performs semantic-level fusion, resolves information conflicts, and organizes the output into logically coherent content that conforms to customer service language standards.
Design decision: Sequential tasks are merged in dependency order; parallel tasks are merged by business logic category. The two merge strategies are handled separately to prevent result ordering issues.
4. Production-Grade Core Capabilities
4.1 LangGraph-Based State Persistence and Session Management
Using LangGraph's native Checkpointer mechanism, we implement full-lifecycle session state persistence:
-
Checkpointer: Uses
RedisSaveras the backend; after each node completes, the State snapshot is automatically saved to Redis; - Hot/cold storage separation: Active session state is stored in Redis (hot data); upon session end, data is automatically synced to MySQL (cold data);
- Seamless session recovery: When a user resumes an interrupted conversation, the state snapshot is loaded directly from the Checkpointer, restoring execution to the interrupted node;
- Long-conversation memory compression: When a conversation exceeds 10 turns, the LLM is automatically invoked to summarize and compress the conversation history, reducing token consumption while preserving core semantics.
from langgraph.checkpoint.redis import RedisSaver
# Initialize Redis Checkpointer
checkpointer = RedisSaver.from_conn_string("redis://localhost:6379")
# Inject Checkpointer when compiling the workflow
app = workflow.compile(checkpointer=checkpointer)
# Carry thread_id on each call for session isolation
config = {"configurable": {"thread_id": session_id}}
result = await app.ainvoke(state, config=config)
4.2 Hybrid Knowledge Base Collaborative Retrieval
This is the system's core competitive moat, fully integrating the technical outputs of Parts 2 and 3:
- Automatic routing: The Planner automatically routes to Neo4j structured retrieval or GraphRAG unstructured retrieval based on subtask type; complex tasks invoke both pipelines in parallel;
- Result fusion: The Summary module performs semantic-level fusion of results from both pipelines, resolving information conflicts;
- Fallback isolation: The two retrieval pipelines are fully isolated — a failure in one does not affect the other;
- Index synchronization: When structured business data is updated, the GraphRAG incremental index update API is automatically triggered to ensure data consistency.
4.3 End-to-End Observability
Designed for enterprise production operations requirements:
- Distributed tracing: Full-pipeline instrumentation based on OpenTelemetry, enabling end-to-end latency and status tracking from intent routing to final output;
- Core metrics monitoring: Intent classification accuracy, Agent execution success rate, tool call latency/failure rate, average response latency;
- Anomaly alerting: Automated alerts for scenarios such as execution failure rate exceeding threshold or response latency breaching SLA.
5. Production Pitfalls and Solutions
5.1 Agent Tool Call Infinite Loop
Symptom: When a tool call returns an unexpected result, the Agent repeatedly retries the same tool, entering an infinite loop and blocking the entire service for a single user request.
Root cause: Single-agent architecture has no global call counter — each retry is an independent decision, and the Agent has no awareness of how many retries have already occurred.
Solution:
# Maintain a global tool call counter in State
class AgentState(TypedDict):
messages: list
tool_call_count: int # Global call counter
max_tool_calls: int # Configurable threshold based on your SLA
# Add circuit breaker check before the tool execution node
def check_circuit_breaker(state: AgentState):
if state["tool_call_count"] >= state["max_tool_calls"]:
return "fallback" # Route to fallback node
return "execute_tool" # Proceed with normal execution
# Increment counter after each tool call
def execute_tool(state: AgentState):
result = call_tool(state)
return {
**state,
"tool_call_count": state["tool_call_count"] + 1,
"tool_result": result
}
By maintaining a global call counter in State and combining it with LangGraph's conditional routing, the infinite loop problem is resolved at the framework level.
5.2 Low Text2Cypher Generation Accuracy
Symptom: Dynamically generated Cypher statements contain syntax errors or logical deviations, causing Neo4j queries to fail or return incorrect results.
Root cause: The LLM has an imprecise understanding of Neo4j's property graph model and tends to hallucinate non-existent node types or relationship types.
Solution:
async def generate_and_validate_cypher(
query: str,
schema: dict,
max_retries: int = 3
) -> str:
for attempt in range(max_retries):
# Inject full Schema to anchor the business model
cypher = await llm.generate_cypher(query, schema)
# Syntax validation
if not validate_cypher_syntax(cypher):
continue
# Logic validation: check node/relationship types exist in Schema
if not validate_against_schema(cypher, schema):
continue
return cypher
# Exceeded retry threshold — fall back to Predefined Cypher matching
return await match_predefined_cypher(query)
Additionally, Cypher templates are predefined for the 80% of high-frequency query scenarios, pushing accuracy for this segment to near 100%.
5.3 Disordered Result Merging in Parallel Multi-Agent Tasks
Symptom: Results returned by multiple tools executing in parallel have inconsistent formats, preventing the Summary module from effectively integrating them and causing logical incoherence in the final response.
Solution: Define a unified tool output schema. All tool return results are required to conform to the same structure:
| Field | Description |
|---|---|
task_id |
Corresponds to the subtask ID generated by the Planner |
task_type |
Tool type identifier |
status |
Execution status (success / failure / fallback) |
result_data |
Actual result data |
error_msg |
Error information on failure |
latency_ms |
Execution latency for performance monitoring |
The Summary node performs ordered aggregation based on task_id: sequential tasks are merged in dependency order; parallel tasks are merged by business logic category.
6. Production Results
The following data is based on a manually annotated test set of 100 real complex e-commerce customer service queries (annotated by 3 customer service domain experts; inter-annotator agreement Cohen's Kappa = 0.87) and validated through 1,000-round concurrent load testing:
| Metric | Single-Agent | Multi-Agent | Improvement |
|---|---|---|---|
| Complex query resolution rate | 70% | 92% | ↑ 22 pp |
| Average conversation turns | 8 | 4.5 | ↓ 43.75% |
| Tool call failure rate | 15% | 4% | ↓ 73.3% |
| Session recovery success rate | 60% | 96% | ↑ 36 pp |
| Average response latency | 3.5s | 1.1s | ↓ 68.6% |
Core business impact:
- Human agent escalation rate reduced by 42%, significantly lowering operational costs;
- User satisfaction score improved to 4.8 / 5;
- System availability reached 99.9%, meeting 24/7 enterprise-grade service requirements.
7. Deployment Boundaries and Series Continuity
7.1 Deployment Boundaries
This multi-agent architecture is optimized for complex task handling in e-commerce scenarios. Domains such as healthcare and finance will need to adjust intent classification boundaries and safety policies to fit their own business requirements. Production-grade iteration should supplement additional safety guardrails and disaster recovery mechanisms.
7.2 Series Continuity
-
GitHub repository: llm-customer-service,
(Tag:
v1.0.0-multi-agent) - Backward reference: Builds on Part 3 GraphRAG Service Wrapping, addressing the four core pain points of single-agent architecture.
- Next up: Part 5 will focus on the production-grade LLM application safety guardrail system, covering Prompt injection defense, privilege escalation interception, hallucination validation, and more. Stay tuned.
- Series finale: Part 8 will provide a complete retrospective of all architecture decisions, engineering pitfalls, and quantifiable outcomes from MVP to production-grade system, forming a full end-to-end engineering practice record.
Top comments (0)