I built a Multi-Agent Support System in 5 Days (And Learned Why "God Agents" Don't Work)

#googleaichallenge #ai #agents #devchallenge

This is a submission for the Google AI Agents Writing Challenge: Learning Reflections

The "Aha" Moment

I’ve been playing with LLMs for a while, mostly just prompting them to write code or answer questions. But I kept hitting a wall: the models were smart, but they were trapped in a text box. They couldn't do anything.

Last month, I joined the Google & Kaggle 5-Day AI Agents Intensive, and it clicked. The future isn't just better chatbots; it's Agentic AI—systems that can reason, use tools, and actually execute tasks.

I wanted to prove this to myself, so for my capstone, I built AutoSupport—a system that doesn't just chat, but actually triages and attempts to resolve customer support tickets using a team of specialized agents. Here is how I built it and what I learned along the way.

The Shift: From Monoliths to Teams

The biggest takeaway from the course wasn't a specific line of code, but an architectural shift. In the beginning (Day 1), I was tempted to build one giant agent with a massive prompt to handle everything.

By Day 2, I realized that fails. A "God Agent" gets confused. It hallucinates. It tries to fix billing issues with technical solutions.

Instead, I adopted the Orchestrator Pattern. Think of it like a real office: you don't have the receptionist fix the server. You have a receptionist (Triage Agent) who figures out what’s wrong and sends you to the IT Guy (Technical Agent) or the Accountant (Billing Agent).

Building "AutoSupport": My Capstone

I decided to build a system that could ingest a raw customer complaint, figure out what it was actually about, and route it to a specialist.

The Architecture

I used the google-adk library to structure my team. I needed four distinct roles:

Triage Specialist: The gatekeeper.
Billing Specialist: Handles money, refunds, and subscriptions.
Technical Specialist: Handles API errors and bugs.
Account Specialist: Handles logins and passwords.

Here is the flow I designed:

graph TD
    User[User Query] --> Orchestrator[Support Orchestrator]
    Orchestrator --> Triage[Triage Agent]
    Triage --> Router{Routing Logic}
    Router -->|Money Issue| Billing[Billing Specialist]
    Router -->|Bug/API| Tech[Technical Specialist]
    Router -->|Login| Account[Account Specialist]

    Billing --> Search[Google Search Tool]
    Tech --> Search

    Billing --> Response[Final Answer]
    Tech --> Response

The Code Logic
The most critical part wasn't the LLM itself, but the Python logic around it. In my SupportOrchestrator class, I didn't just trust the LLM to route things magically. I implemented a keyword-based routing layer to assist the decision-making.

Here is a snippet from my actual code. I defined specific trigger words for each domain:

Python

def route_to_specialist(self, customer_message: str, triage_response: str):
    # ...setup...

    # I used strictly defined keywords to catch specific issues
    billing_keywords = ['billing', 'payment', 'refund', 'charge', 'invoice']
    technical_keywords = ['api', 'error', 'technical', 'bug', '401', '403']
    account_keywords = ['account', 'login', 'password', 'access', 'security']

    # Check if the message or the triage agent's thoughts contain these keywords
    if any(keyword in message_lower for keyword in billing_keywords):
        return "billing"
    elif any(keyword in message_lower for keyword in technical_keywords):
        return "technical"
    # ... logic continues ...

This hybrid approach—using the LLM to understand the sentiment and Python to enforce the routing—made the system much more reliable than an LLM alone.

What Actually Happened (The Testing Phase)

I ran the system through a few scenarios to see if it would actually work.

Test 1: The API Error I fed it: "I'm getting 401 unauthorized errors on all my API calls." The Triage agent correctly flagged it as Technical. The router spun up the technical_specialist, which immediately offered troubleshooting steps for API keys and headers. It worked perfectly.

Test 2: The Double Charge Input: "I was charged twice for my subscription." The system routed this to the Billing Specialist. What I liked was the tone shift—the billing agent was prompted to be empathetic ("I understand how frustrating it must be") while the technical agent was more direct.

The "Gotcha" Moment It wasn't all smooth sailing. During my validation run, I noticed that a query about "logging in" initially risked being misrouted to technical support because it contained the word "error." This taught me that simple keyword matching has limits. In a V2, I’d probably implement a semantic router (using embeddings) rather than hard-coded lists.

My Key Takeaways from the Course

Tools are everything: An agent without tools is just a chatty encyclopedia. Giving my agents access to Google Search (Day 2 lab) completely changed the utility of the responses.

State is hard: Managing conversation history (Day 3) is tricky. You can't just dump the whole chat log into the context window or you'll burn through tokens. I learned about "Context Compaction"—summarizing the history to keep the agent focused.

Evaluations save you: You can't fix what you don't measure. The Day 4 labs on observability showed me that I need to log why an agent made a decision, not just the final output.

What's Next?

This course laid the foundation. Next, I want to take AutoSupport and deploy it using Vertex AI so it’s not just running in a notebook. I also want to add a "Human in the Loop" feature for when the confidence score is low—because sometimes, you really just need to talk to a person.

Thanks to the Google and Kaggle team for this wonderful crash course on Agentic AI!