Customer support is one of the few places where RAG and agents earn their keep immediately: the questions are real, the knowledge changes constantly, and a wrong answer has a cost. I built an open-source agentic RAG platform for support automation, and the design choice I keep coming back to is that almost everything should be configuration, not code.
Repo: https://github.com/ahmet-ozel/agentic-rag-customer-support
Why config-driven
A support assistant is never "done." You add a new product, a new escalation rule, a new data source, a new tone of voice. If each of those changes means editing Python and redeploying, the system rots. So the agent behavior, the tools it can call, the data sources, and the routing rules all live in configuration. Adding a knowledge source or a new tool is an edit to config, not a code change.
This also makes the system easier to reason about. You can read one config file and know what the agent is allowed to do, where it gets its knowledge, and how it decides what to answer.
The pieces
The platform wires together a few components behind a FastAPI server:
- An LLM as the reasoning core
- MCP servers as the tool layer (postgres, qdrant, docling, paddleocr), so the agent can query a database, search a vector store, parse documents, and run OCR through a uniform tool interface
- A vector database (Qdrant) for retrieval
- A document pipeline that ingests and processes the knowledge base
- An intent router that decides what kind of request came in
- An agent loop that plans, calls tools, checks results, and answers
The intent router matters more than the model
The instinct is to send everything to one big agent and let it figure things out. In practice, a lightweight intent router in front of the agent does a lot of work: a simple FAQ lookup does not need a multi-step agent, and a billing question needs different tools than a how-to question. Routing first keeps cost down and latency predictable, and only sends the genuinely hard requests into the full agent loop.
The agent loop
For the requests that do need it, the agent runs an iterative tool-calling loop: read the request, decide which tool to use (retrieve from the vector store, query postgres, parse a document), evaluate whether the result is sufficient, and either answer or take another step. MCP is what keeps this clean. The agent reasons about which tool to call; it does not need to know how each backend works.
What I would do differently
The biggest lesson was to invest in evaluation early. It is easy to demo a support agent that answers three questions well. It is hard to know whether a config change made it better or worse across a hundred real questions. If I started over, I would build the eval harness before the second feature.
Repo and setup: https://github.com/ahmet-ozel/agentic-rag-customer-support
If you have built support automation with RAG, I would like to hear how you handle routing and escalation to a human. Where do you draw the line on letting the agent answer versus handing off?
Top comments (0)