P VIKRAM KISHORE

Posted on Jun 30

Building an Agentic AI Customer Support Platform with LangGraph, RAG, and Gemini

#agents #ai #gemini #rag

When most people think about AI applications, they imagine a simple workflow:

User
   ↓
LLM
   ↓
Answer

That works well for demonstrations.

It doesn't work well for production systems.

Real AI applications need memory, retrieval, guardrails, tool calling, validation, observability, and graceful failure handling.

I wanted to understand how those pieces fit together.

So I built Cloudob Security, an end-to-end agentic AI customer support platform designed to resemble a production enterprise system rather than a chatbot demo.

The Goal

The objective wasn't simply to answer customer questions.

The objective was to design an AI system capable of:

Understanding user intent
Retrieving relevant documentation
Calling external business tools
Validating responses
Detecting unsafe inputs
Escalating conversations when appropriate
Maintaining conversation state

The Architecture

The application consists of several layers.

User
    │
    ▼
Next.js Enterprise Console
    │
    ▼
FastAPI Backend
    │
    ▼
LangGraph Workflow
    │
    ├── Input Validation
    ├── Guardrails
    ├── Intent Classification
    ├── Hybrid RAG Retrieval
    ├── Tool Calling
    ├── Response Validation
    └── Human Escalation

Each node has a specific responsibility, making the workflow deterministic, observable, and easier to maintain.

Why LangGraph?

Traditional prompt chains are linear.

Customer support rarely is.

A conversation may require:

Multiple retrieval steps
External tool calls
Retry logic
Conditional routing
Human escalation

LangGraph provided a natural way to model these workflows as a stateful graph instead of a sequence of prompts.

Building Reliable RAG

One thing I learned quickly is that retrieval quality matters as much as model quality.

Instead of simple semantic search, the project implements:

Hybrid search
Parent-child chunking
Contextual compression
Metadata filtering
Configurable vector stores

This significantly improves the relevance of retrieved information before generation begins.

Guardrails

A production AI system must defend itself.

The platform includes deterministic checks for:

Prompt injection
Jailbreak attempts
SQL injection patterns
Sensitive requests
Personally identifiable information
Toxicity
Groundedness

If the system determines that a response isn't sufficiently grounded in retrieved documents, it retries or escalates instead of generating potentially misleading information.

Building Beyond the Model

Another design goal was provider independence.

Models, embeddings, and vector databases are abstracted behind factories.

Changing providers becomes a configuration change rather than a rewrite.

The same principle applies to business tools, making them straightforward to expose through MCP or similar protocols later.

What I Learned

This project changed how I think about AI engineering.

The LLM isn't the application.

It's one component within a much larger system.

The engineering around the model determines whether an AI application is reliable enough for production.

That includes:

Retrieval
Guardrails
Validation
Tool orchestration
Observability
Evaluation
Error handling
User experience

Building Cloudob Security gave me practical experience designing stateful AI systems rather than simple chatbot interfaces.

What's Next?

There are still several improvements I'd like to make, including:

Live telemetry dashboards
Full MCP server integration
Additional enterprise tools
Expanded evaluation pipelines
Production deployment with Pinecone and LangSmith

The project reinforced an idea that has shaped many of my recent projects:

The future of AI engineering isn't about writing better prompts.

It's about building better systems around language models.

DEV Community