DEV Community

Cover image for How We Built AI Customer Service Into Our WMS Using Claude and Gemini
Cymone Rabbani
Cymone Rabbani

Posted on

How We Built AI Customer Service Into Our WMS Using Claude and Gemini

Cymone Rabbani | Founder & CEO, WMS 360


Why We Added AI to a Warehouse Management System

Most warehouse management systems stop at inventory and shipping. Ours didn't.

At WMS 360, we run a multichannel Node.js SaaS architecture that handles orders from eBay, Amazon, Shopify, and direct webstores. When you're processing thousands of parcels a day, customer service enquiries pile up fast. "Where's my order?" "Can I change my address?" "I want a refund." The same questions, over and over, across half a dozen channels.

We were already sitting on all the data needed to answer these — live tracking, order history, return windows, product specs. The missing piece was an AI layer that could read that data, reason about it, and actually take action. Not a chatbot with canned responses. A genuine AI agent with the ability to modify records.

So we built one. Here is how.

The Architecture

Our stack is Node.js end to end. The core WMS runs on Express with MongoDB, Redis for caching, and RabbitMQ for async job processing. The AI customer service module slots in as a standalone service consuming messages from a shared queue.

The high-level flow:

  1. An inbound message arrives (eBay, Shopify, email) and lands on a RabbitMQ queue.
  2. The AI service picks it up, hydrates the message with order context from MongoDB, and builds a prompt.
  3. The prompt is sent to either Anthropic's Claude API or Google's Gemini API.
  4. The model's response — including any tool calls — is parsed and executed.
  5. The reply is routed back through the originating channel.

We keep the AI service stateless. It pulls everything it needs per request from the database and cache layer, so we can scale horizontally without session affinity. If the service crashes, RabbitMQ simply redelivers the message to another instance.

The Anthropic Claude SDK and the Google Generative AI client library are both straightforward to integrate. We wrap them behind a unified provider interface so the rest of the codebase never knows which model is handling a given request.

How the AI Knows About Each Customer

This is where most AI customer service implementations fall short. They bolt a language model onto a generic FAQ and hope for the best. We took a different approach: every prompt is assembled dynamically with live data specific to that customer and order.

When a message comes in, we extract identifiers — order numbers, email addresses, eBay usernames — and pull the relevant records. The system prompt then includes:

  • Order details: items, quantities, prices, dispatch date, carrier, tracking number, current tracking status.
  • The return policy applicable to that order (these vary by channel and product category).
  • Product-specific information — dimensions, variants, common issues.
  • Message history for conversational context.

We format this as labelled sections rather than raw JSON, because both Claude and Gemini produce more accurate responses when context is human-readable rather than deeply nested objects.

One critical design decision: we never let the model guess a tracking number. Tracking numbers are injected from the database or not included at all. This eliminates one of the most dangerous hallucination vectors in ecommerce AI — fabricating a plausible-looking reference that leads nowhere.

Multi-Provider Support: Claude and Gemini

Our Claude AI integration and Gemini AI ecommerce support run side by side in production. This was not an ideological choice. It was pragmatic.

Redundancy. API outages happen. When one provider is down, we fail over automatically. Our provider interface checks response latency and error rates on a rolling window and switches if a threshold is breached.

Cost. For straightforward "where is my order" queries, Gemini's lighter models are cost-effective. For nuanced disputes or complex return scenarios, Claude's reasoning produces better outcomes. We route based on message complexity, estimated from token count and the presence of certain keywords (e.g., "refund", "damaged", "legal").

Speed. Gemini's flash models return faster on simple queries. Claude is more deliberate but handles multi-step reasoning with fewer errors. For a customer asking about a delayed international parcel with a customs hold, Claude's structured thinking produces a more accurate and empathetic response.

The provider abstraction is thin — roughly 200 lines. Each module exposes the same interface: generateResponse(systemPrompt, messages, tools). The routing logic sits above it and is easy to adjust as pricing and capabilities evolve.

Tool Calling: AI That Actually Does Things

This is what transforms the system from a smart auto-responder into a genuine AI agent. Both Claude and Gemini support tool calling, where the model can request that your application execute a specific function.

We expose a curated set of tools:

  • updateShippingAddress — modifies the delivery address on an undispatched order.
  • initiateReturn — creates a return authorisation and generates a return label.
  • issueRefund — processes a partial or full refund within policy limits.
  • escalateToHuman — flags the conversation for manual review.
  • lookupAlternativeProduct — searches stock for a similar item if the original is unavailable.

Each tool has strict parameter validation and business rule checks server-side. The AI cannot issue a refund exceeding the order value. It cannot change an address on a dispatched parcel. These guardrails are enforced in code, not in the prompt. Prompts can be jailbroken; server-side validation cannot.

The tool-calling pattern also creates a clean audit trail. Every action is logged with the full tool call payload, the model's reasoning, and the outcome.

Channel Integration: One Pipeline, Many Channels

Multichannel is core to WMS 360. Our users sell on eBay, Amazon, Shopify, WooCommerce, and their own webstores. Messages arrive through eBay's messaging API, Shopify's chat widget, and email via Zoho Mail.

Each channel has an inbound adapter that normalises messages into a common format: sender identity, message body, associated order, and channel metadata. This normalised message is what hits the RabbitMQ queue.

On the outbound side, each channel has a reply adapter. eBay has strict rules about message content (no external links, no contact details). Shopify chat expects short, conversational replies. Email allows longer responses. The AI generates a single response, and the channel adapter reformats it as needed.

Adding a new channel means writing two adapters — inbound and outbound — without touching the AI logic at all.

What We Learned

Prompt engineering for ecommerce is its own discipline. Generic "be helpful" instructions produce generic responses. We spent weeks refining prompts that handle the specific tensions of ecommerce: frustrated customers, policies that seem unfair, carriers that lose parcels. The system prompt includes guidance on tone, escalation triggers, and when to apologise versus explain.

Edge cases are where the value is. Any system can answer "where is my order" when tracking shows delivery. The hard cases are partial deliveries, items marked delivered but not received, and international orders stuck in customs. We maintain a growing library of edge case examples in our prompt templates.

Preventing hallucination requires architecture, not just prompting. Telling the model "do not make up tracking numbers" helps. But withholding data it shouldn't fabricate and validating every tool call server-side is what makes the system production-safe.

Temperature matters more than you think. We run at 0.2-0.3 for customer service. Creative responses are the last thing you want when someone is asking about a missing parcel.

Results

After six months in production:

  • 93%+ of inbound messages resolved without human intervention. The remaining 7% are escalated for judgement calls beyond policy.
  • Median response time under 30 seconds, compared to 4-8 hours when handled manually.
  • Customer satisfaction scores up 18% across channels with AI responses active.
  • Support staff now focus on complex cases and proactive outreach.

The system handles roughly 12,000 messages per week, and the combined API cost is a fraction of equivalent human staffing.

Wrapping Up

AI customer service is not about replacing humans. It is about making sure a customer asking "where is my parcel" at 2 AM on a Sunday gets an accurate answer in 30 seconds rather than waiting until Monday morning.

If you are building something similar, the key takeaways: invest in your data layer before your AI layer, enforce business rules in code rather than prompts, and treat multi-provider support as insurance you will eventually need.

We are continuing to develop these capabilities at WMS 360. If you have questions about the architecture or want to discuss AI customer service patterns, I am happy to chat in the comments.


Cymone Rabbani -- Founder of WMS 360. Building multichannel warehouse management with AI. Node.js, Redis, RabbitMQ, Claude AI, Gemini.

Top comments (0)