When most people think about AI applications, they imagine a simple workflow:
User
↓
LLM
↓
Answer
That works well for demonstrations.
It doesn't work well for production systems.
Real AI applications need memory, retrieval, guardrails, tool calling, validation, observability, and graceful failure handling.
I wanted to understand how those pieces fit together.
So I built Cloudob Security, an end-to-end agentic AI customer support platform designed to resemble a production enterprise system rather than a chatbot demo.
The Goal
The objective wasn't simply to answer customer questions.
The objective was to design an AI system capable of:
- Understanding user intent
- Retrieving relevant documentation
- Calling external business tools
- Validating responses
- Detecting unsafe inputs
- Escalating conversations when appropriate
- Maintaining conversation state
The Architecture
The application consists of several layers.
User
│
▼
Next.js Enterprise Console
│
▼
FastAPI Backend
│
▼
LangGraph Workflow
│
├── Input Validation
├── Guardrails
├── Intent Classification
├── Hybrid RAG Retrieval
├── Tool Calling
├── Response Validation
└── Human Escalation
Each node has a specific responsibility, making the workflow deterministic, observable, and easier to maintain.
Why LangGraph?
Traditional prompt chains are linear.
Customer support rarely is.
A conversation may require:
- Multiple retrieval steps
- External tool calls
- Retry logic
- Conditional routing
- Human escalation
LangGraph provided a natural way to model these workflows as a stateful graph instead of a sequence of prompts.
Building Reliable RAG
One thing I learned quickly is that retrieval quality matters as much as model quality.
Instead of simple semantic search, the project implements:
- Hybrid search
- Parent-child chunking
- Contextual compression
- Metadata filtering
- Configurable vector stores
This significantly improves the relevance of retrieved information before generation begins.
Guardrails
A production AI system must defend itself.
The platform includes deterministic checks for:
- Prompt injection
- Jailbreak attempts
- SQL injection patterns
- Sensitive requests
- Personally identifiable information
- Toxicity
- Groundedness
If the system determines that a response isn't sufficiently grounded in retrieved documents, it retries or escalates instead of generating potentially misleading information.
Building Beyond the Model
Another design goal was provider independence.
Models, embeddings, and vector databases are abstracted behind factories.
Changing providers becomes a configuration change rather than a rewrite.
The same principle applies to business tools, making them straightforward to expose through MCP or similar protocols later.
What I Learned
This project changed how I think about AI engineering.
The LLM isn't the application.
It's one component within a much larger system.
The engineering around the model determines whether an AI application is reliable enough for production.
That includes:
- Retrieval
- Guardrails
- Validation
- Tool orchestration
- Observability
- Evaluation
- Error handling
- User experience
Building Cloudob Security gave me practical experience designing stateful AI systems rather than simple chatbot interfaces.
What's Next?
There are still several improvements I'd like to make, including:
- Live telemetry dashboards
- Full MCP server integration
- Additional enterprise tools
- Expanded evaluation pipelines
- Production deployment with Pinecone and LangSmith
The project reinforced an idea that has shaped many of my recent projects:
The future of AI engineering isn't about writing better prompts.
It's about building better systems around language models.
Top comments (0)