DEV Community

James Lee
James Lee

Posted on

Building an Enterprise-Grade Multi-Agent Customer Service System with LangGraph

1. Why Single-Agent Architectures Break in Real Customer Service

In e-commerce customer service, user requests are rarely simple. A typical message might look like this:

"Can you check the shipping status for order #123, tell me about the warranty policy for that product, and update my delivery address?"

This single message contains three distinct intents, requires two different data sources, and demands coordinated execution. Single-agent architectures fail here in four predictable ways:

  1. No task decomposition: A single agent cannot break compound requests into executable subtasks — it either handles one intent and ignores the rest, or produces confused, incomplete answers.
  2. Fragile tool execution: When an external tool call fails (Neo4j timeout, GraphRAG service unavailable), a single agent enters retry loops with no circuit-breaking mechanism, eventually blocking the entire service.
  3. Siloed retrieval: Structured order data (Neo4j) and unstructured product documentation (GraphRAG) require fundamentally different retrieval strategies. A single agent cannot coordinate both within one coherent response.
  4. No governance layer: Without a dedicated safety node, there is no unified point for rate limiting, content compliance checks, or permission control — none of which are optional in enterprise deployments.

This article builds on the technical foundations from Parts 1–3 (MinerU multimodal parsing, Neo4j knowledge graph construction, GraphRAG service encapsulation) and walks through how we designed a production-grade multi-agent system using LangGraph to solve all four problems above.


2. System Architecture: Three Layers, Six Sub-Layers

The system follows an enterprise-grade layered architecture that fully decouples business logic, technical infrastructure, and platform services.

┌─────────────────────────────────────────────────────────────┐
│                  LLM Application Architecture               │
│                                                             │
│  Application Layer:                                         │
│    User Service │ Session Service │ Knowledge Base Service  │
│                                                             │
│  Feature Layer:                                             │
│    Multi-Agent System │ Safety Guardrails                   │
│    Hybrid Knowledge Retrieval │ Offline/Online Index Build  │
│    Text2Cypher Debug                                        │
├─────────────────────────────────────────────────────────────┤
│                  LLM Technical Architecture                 │
│                                                             │
│  Core Capabilities:  Agent │ RAG │ Workflow                 │
│  Framework Layer:    LangChain / LangGraph / GraphRAG       │
│  Interface Layer:    Vue / FastAPI / SSE / Open API         │
├─────────────────────────────────────────────────────────────┤
│                  LLM Platform Architecture                  │
│                                                             │
│  Model Layer:   DeepSeek Online │ vLLM Self-Hosted          │
│  Data Layer:    MySQL │ Redis │ Neo4J │ LanceDB │ Local Disk │
│  Infra Layer:   Cloud Server │ GPU Server │ Docker          │
└─────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

2.1 LLM Application Architecture Layer

The Application Layer serves users and frontend clients directly:

  • User Service: Login, registration, identity verification, permission management;
  • Session Service: Conversation lifecycle management, context storage, session state synchronization;
  • Knowledge Base Service: Upload, parsing, and index management for product manuals and after-sales policies — powered by the MinerU multimodal pipeline from Part 2.

The Feature Layer hosts the core business capabilities of the multi-agent system:

  • Multi-Agent System: Full-pipeline agent collaboration covering intent routing, task decomposition, tool execution, and result aggregation;
  • Safety Guardrails: Circuit breaking, timeout control, content compliance checks, and rate limiting;
  • Hybrid Knowledge Retrieval: Unified query interface integrating Neo4j structured retrieval and GraphRAG unstructured retrieval;
  • Offline/Online Index Build: Supports both batch full-index builds and incremental real-time index updates;
  • Text2Cypher Debug: Natural language to Cypher generation with syntax validation and logic correction.

2.2 LLM Technical Architecture Layer

This layer provides standardized technical capabilities for the layers above:

  • Core Capabilities: Agent scheduling, RAG retrieval augmentation, Workflow orchestration;
  • Framework Layer: LangChain/LangGraph for multi-agent workflow orchestration; Microsoft GraphRAG for unstructured knowledge retrieval;
  • Interface Layer: Vue for frontend interaction, FastAPI for backend REST APIs, SSE for streaming responses, Open API for third-party integration.

2.3 LLM Platform Architecture Layer

The infrastructure foundation:

  • Model Layer: Dual-model strategy — DeepSeek online model for general dialogue and intent recognition; vLLM self-hosted deployment for sensitive business data processing;
  • Data Layer: Hybrid storage — MySQL for structured business data, Redis for session state caching, Neo4J for the business knowledge graph, LanceDB for vector data from GraphRAG;
  • Infra Layer: Cloud + GPU servers for compute, Docker for containerized deployment and elastic scaling.

3. Multi-Agent Workflow: Full Pipeline Design

We model the entire multi-agent collaboration as a observable, controllable, and replayable state machine using LangGraph's StateGraph.

                      ┌─────────┐
                      │  start  │
                      └────┬────┘
                           │
              ┌────────────▼─────────────┐
              │   analyze_and_route_query │
              └────────────┬─────────────┘
                           │
              ┌────────────▼─────────────┐
              │        route_query        │
              └──┬──────┬──────┬─────┬───┘
                 │      │      │     │
            General  Clarify  Query  Image
                 │      │      │     │
                 │      │   ┌──▼──┐  │
                 │      │   │Planner│ │
                 │      │   └──┬──┘  │
                 │      │  ┌───┼───┐ │
                 │      │  │   │   │ │
                 │      │ T1   T2  T3 │
                 │      │  │   │   │ │
                 │      │  └───┼───┘ │
                 └──────┴──────▼─────┘
                               │
                      ┌────────▼────────┐
                      │    Summary      │
                      └────────┬────────┘
                               │
                      ┌────────▼────────┐
                      │  Final Answer   │
                      └────────┬────────┘
                               │
                           ┌───▼───┐
                           │  End  │
                           └───────┘
Enter fullscreen mode Exit fullscreen mode

3.1 Entry Node: analyze_and_route_query

The single entry point for all user requests. Responsibilities: receive user input, inject session context, and trigger intent classification.

Design decision: We merged analysis and routing into a single node rather than splitting them. Intent analysis depends on the injected context, so merging eliminates one redundant state read/write cycle and reduces latency.

3.2 Core Decision Node: route_query

The "brain" of the workflow. Uses an LLM to classify user intent into one of four routing branches.

The core challenge in classification design is defining clear boundaries between categories to prevent ambiguous inputs from drifting into the wrong branch. Our approach:

  • Define positive examples for each category (typical expressions that clearly belong);
  • Define negative examples for high-confusion category pairs (inputs that look like A but are actually B);
  • Require the LLM to output only the category name with no explanation, eliminating format parsing failures.

After several iterations, classification accuracy improved from 78% in the initial version to 94%. The biggest gains came from sharpening the boundary between the "clarification" and "knowledge query" categories.

3.3 Four Routing Branches

Branch 1: General Query

No external tool calls required. Handled directly via Prompt + LLM.

Use cases: Greetings, small talk, simple rule-based Q&A.

Branch 2: Clarification Query

Core design: Before asking the user for more information, we first run a business relevance check.

  • Relevance check passes → Generate a guided response prompting the user to provide the missing parameter;
  • Relevance check fails → Return a fallback response directing the user to human support.

Design decision: The relevance check is anchored to the Neo4j Schema definition and business scope description — not left to the LLM's free judgment. This keeps the check within well-defined business boundaries and prevents the LLM from over-generalizing.

Branch 3: Image Query

Uses a multimodal model to parse the uploaded image, extract key information, and generate a response.

Use cases: Users uploading product screenshots, order screenshots, or shipping label photos.

Branch 4: Knowledge Query (Core Branch)

The primary processing branch, integrating all technical outputs from Parts 1–3. It runs through three sub-steps:

Sub-step 1: Planner — Task Decomposition

Breaks the user's complex query into multiple subtasks with explicit goals, required tools, and execution order (parallel or sequential).

Design decision: Planner output is enforced as structured JSON with four fields: task_id, task_type, tool, and dependencies. Forcing structured output ensures the downstream tool selection node can parse results unambiguously, eliminating the uncertainty of natural language descriptions.

Sub-step 2: Tool Selection and Execution

Based on subtask type, the system routes to one of three tools:

Tool 1 — GraphRAG Query

  • Use cases: Unstructured data retrieval (product specs, warranty policies, product manuals);
  • Connects to the GraphRAG RESTful API encapsulated in Part 3, supporting Local / Global / Drift / Basic search modes;
  • Selection logic: Local Search for precise scoped queries; Global Search for broad conceptual queries.

Tool 2 — Generate Cypher

  • Use cases: Custom structured business data queries (order status, shipping info, delivery address);
  • Connects to the Neo4j knowledge graph from Part 2 via a "Schema injection → LLM generation → syntax validation → execution" pipeline;
  • Key design: Every generated Cypher statement goes through syntax and schema validation before execution. On failure, it retries up to 2 times; beyond that, it falls back to Predefined Cypher matching.

Tool 3 — Predefined Cypher

  • Use cases: High-frequency, fixed structured queries (list all orders, check product inventory);
  • Matches user queries against pre-defined requirement descriptions via similarity search, then fills in parameters and executes directly — no LLM generation required;
  • Design value: Covers approximately 80% of high-frequency query scenarios, pushing accuracy for this segment to near 100% while significantly reducing latency and token consumption.

Sub-step 3: Safety Guardrails

The safety layer intervenes at every stage of tool execution:

  • Pre-execution: Parameter validation, permission checks, call frequency limits;
  • During execution: 10-second timeout per tool call; circuit breaker threshold of 3 tool calls per conversation turn;
  • Post-execution: Result relevance and compliance checks; sensitive information filtering.

3.4 Result Aggregation: Summary Node

Collects results from all branches and subtasks, performs semantic-level merging to resolve conflicts, and formats the output into a coherent, customer-service-appropriate response.

Design decision: Sequential task results are merged in dependency order; parallel task results are merged by business category. Keeping these two strategies separate prevents result ordering from becoming scrambled.


4. Production-Grade Core Capabilities

4.1 State Persistence and Session Management with LangGraph

We use LangGraph's native Checkpointer mechanism for full-pipeline session state persistence:

  1. Checkpointer: RedisSaver as the backend — after each node executes, the State snapshot is automatically saved to Redis;
  2. Hot/cold storage split: Active session state lives in Redis (hot); on session end or expiry, state syncs to MySQL (cold);
  3. Seamless session recovery: When a user resumes an interrupted conversation, the workflow loads the last State snapshot and continues from the interrupted node;
  4. Long-conversation memory compression: After 10 conversation turns, the LLM automatically summarizes historical dialogue, replacing raw context with a compressed version to reduce token consumption while preserving key semantics.
from langgraph.checkpoint.redis import RedisSaver

# Initialize Redis Checkpointer
checkpointer = RedisSaver.from_conn_string("redis://localhost:6379")

# Inject Checkpointer at workflow compile time
app = workflow.compile(checkpointer=checkpointer)

# Each invocation carries a thread_id for session isolation
config = {"configurable": {"thread_id": session_id}}
result = await app.ainvoke(state, config=config)
Enter fullscreen mode Exit fullscreen mode

4.2 Hybrid Knowledge Retrieval

The core competitive moat of the system, fully building on Parts 2 and 3:

  1. Auto-routing: Planner routes subtasks to Neo4j structured retrieval or GraphRAG unstructured retrieval based on task type; complex tasks run both pipelines in parallel;
  2. Result fusion: The Summary node performs semantic-level merging of results from both pipelines, resolving conflicts before generating the final response;
  3. Graceful degradation: The two retrieval pipelines are fully isolated — a failure in one does not affect the other;
  4. Index synchronization: When structured business data is updated, the system automatically triggers GraphRAG's incremental index update API to keep both knowledge bases consistent.

4.3 Full-Pipeline Observability

For enterprise production operations:

  1. Distributed tracing: OpenTelemetry instrumentation across the full pipeline — every request is traceable from intent routing through tool execution to final output, with per-node latency and status;
  2. Core metrics: Intent classification accuracy, agent execution success rate, tool call latency/failure rate, average response latency;
  3. Automated alerting: Alerts trigger on tool failure rate exceeding threshold, response latency spikes, and intent classification error rate anomalies.

5. Production Pitfalls and Solutions

5.1 Agent Tool Call Infinite Loop

Symptom: When a tool call returns an unexpected result, the agent retries the same tool indefinitely with no circuit-breaking mechanism, eventually blocking the entire service.

Root cause: Single-agent architectures have no global call counter. Each retry is an independent decision — the agent has no awareness of how many times it has already retried.

Solution:

# Maintain a global tool call counter in State
class AgentState(TypedDict):
    messages: list
    tool_call_count: int      # Global call counter
    max_tool_calls: int       # Maximum call threshold

# Add circuit breaker check before tool execution node
def check_circuit_breaker(state: AgentState):
    if state["tool_call_count"] >= state["max_tool_calls"]:
        return "fallback"     # Route to fallback node
    return "execute_tool"     # Proceed normally

# Increment counter after each tool call
def execute_tool(state: AgentState):
    result = call_tool(state)
    return {
        **state,
        "tool_call_count": state["tool_call_count"] + 1,
        "tool_result": result
    }
Enter fullscreen mode Exit fullscreen mode

By maintaining a global call counter in State and using LangGraph's conditional routing, we eliminate infinite loops at the framework level.

5.2 Low Text2Cypher Generation Accuracy

Symptom: Dynamically generated Cypher statements contain syntax errors or logical deviations, causing Neo4j query failures or incorrect results.

Root cause: The LLM lacks precise understanding of the property graph model and tends to hallucinate node types or relationship types that don't exist in the schema.

Solution:

async def generate_and_validate_cypher(query: str, schema: dict) -> str:
    for attempt in range(MAX_RETRY):
        # Inject full Schema to anchor generation to the business model
        cypher = await llm.generate_cypher(query, schema)

        # Syntax validation
        if not validate_cypher_syntax(cypher):
            continue

        # Schema validation: check node/relationship types exist
        if not validate_against_schema(cypher, schema):
            continue

        return cypher

    # Exceeded retry threshold — fall back to Predefined Cypher matching
    return await match_predefined_cypher(query)
Enter fullscreen mode Exit fullscreen mode

Combined with pre-defined Cypher templates covering ~80% of high-frequency query scenarios, this brings accuracy for that segment to near 100%.

5.3 Chaotic Result Merging in Parallel Multi-Agent Tasks

Symptom: When multiple tools execute in parallel, their results come back in inconsistent formats. The Summary node cannot merge them effectively, producing incoherent or incomplete final answers.

Solution: Define a unified tool output schema that all tools are required to follow:

class ToolResult(BaseModel):
    task_id: str                    # Maps to the subtask ID from Planner
    task_type: str                  # Tool type identifier
    status: Literal["success", "failed", "fallback"]
    result_data: dict               # Actual result payload
    error_msg: Optional[str]        # Error details on failure
    latency_ms: int                 # Execution time for performance monitoring
Enter fullscreen mode Exit fullscreen mode

The Summary node aggregates results by task_id. Sequential tasks are merged in dependency order; parallel tasks are merged by business category.


6. Results

The following metrics are based on a manually annotated test set of 100 real e-commerce customer service queries and validated through 1,000-round concurrent load testing:

Metric Single-Agent Multi-Agent Improvement
Complex query resolution rate 70% 92% ↑ 22 pts
Average conversation turns 8 4.5 ↓ 43.75%
Tool call failure rate 15% 4% ↓ 73.3%
Session recovery success rate 60% 96% ↑ 36 pts
Average response latency 3.5s 1.1s ↓ 68.6%

Business impact:

  • Human escalation rate reduced by 42%, significantly lowering operational costs;
  • User satisfaction score improved to 4.8 / 5;
  • System availability at 99.9%, meeting 24/7 enterprise service requirements.

7. Summary and What's Next

7.1 What We Built

This article delivered a complete enterprise-grade multi-agent customer service system:

  1. Decoupled full-stack architecture: Three-layer, six-sub-layer design with full separation of business logic, technical infrastructure, and platform services;
  2. End-to-end multi-agent workflow: Complete pipeline covering intent routing, task decomposition, tool execution, and result aggregation;
  3. Full series integration: MinerU multimodal parsing (Part 2), Neo4j knowledge graph (Part 2), GraphRAG service encapsulation (Part 3) — all connected end-to-end;
  4. Production-grade reliability: Circuit breaking, full-pipeline safety guardrails, and observability.

GitHub: [link pending] — Tag: v0.7.0-multi-agent-system

Top comments (0)