This article was created as part of my official submission to the GKE Turns 10 Hackathon.
The challenge was simple but profound: take a standard microservices application the Online Boutique and "supercharge" it with AI without touching a single line of the original code. The goal was to build an external, containerized "brain" that could add a new layer of intelligence.
My answer? I built a fully autonomous, multi-agent AI team on Google Kubernetes Engine (GKE). Here's how I did it, and how GKE's features were fundamental to making this complex architecture work.
The Foundation: GKE and Service Discovery
The entire project relies on multiple new microservices (agents, servers, UIs) communicating with each other and with the original Online Boutique app. GKE's built-in service discovery made this incredibly simple.
By defining a Kubernetes Service
, I could give each component a stable DNS name. For example, the Marketing Agent's Service
looks like this:
apiVersion: v1
kind: Service
metadata:
name: marketing-campaigner-service
spec:
selector:
app: marketing-campaigner-agent
ports:
- port: 8080
targetPort: 8080
This simple YAML meant that my Business Analyst agent could reliably send A2A messages just by calling http://marketing-campaigner-service:8080/goal
. GKE handled all the complex networking behind the scenes.
The "Senses": An MCP Server with a Secure Sidecar
The first component I built was an MCP Server to act as the system's "senses." It ingests real-time "add to cart" events from the app's frontend-proxy
. To connect securely to the Cloud SQL database, I used the Cloud SQL Auth Proxy running as a sidecar container, a best practice made easy by GKE's pod architecture.
spec:
serviceAccountName: mcp-toolbox-sa
containers:
- name: server # My Python App
image: ...
env:
- name: DB_HOST
value: "127.0.0.1"
- name: cloud-sql-proxy # The Sidecar
image: [gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0](http://gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0)
args: ["..."]
The "Strategist": A Scheduled Agent with CronJob
My Business Analyst agent doesn't need to run 24/7; it just needs to wake up periodically to check for trends. A GKE CronJob
was the perfect tool for this. With a simple schedule
expression, I could define this "sense-think-act" loop to run automatically.
apiVersion: batch/v1
kind: CronJob
metadata:
name: business-analyst-agent
spec:
# Run every 5 minutes
schedule: "*/5 * * * *"
jobTemplate:
After deploying, GKE takes care of the rest, spinning up and shutting down pods on schedule. Watching the pods transition to Completed
status was proof the autonomous loop was working.
The "Executor": Agent Collaboration via A2A
When the Analyst agent finds a trend, it delegates a task to the Marketing agent. This Agent2Agent (A2A) communication is just a simple, decoupled POST
request, made possible by the GKE Service
we defined earlier.
def send_a2a_goal(self, product_name):
# GKE DNS resolves 'marketing-campaigner-service' to the correct pod IP
[self.marketing](http://self.marketing)_agent_url = "http://marketing-campaigner-service:8080/goal"
a2a_message = {
"goal": "create_promotional_content",
"payload": { "product_name": product_name }
}
[requests.post](http://requests.post)([self.marketing](http://self.marketing)_agent_url, json=a2a_message)
The Marketing agent, running in its own Deployment
, simply listens for these incoming requests and executes its own logic.
Agent 3: The AI Specialist (A2A & MCP Client)
This was the most ambitious step. I replaced the original recommendationservice with my own AI-powered Recommendation Agent. This agent is a showcase of advanced agent design, using both A2A collaboration and the MCP pattern. It acts as a gRPC server, and when it receives a request from the frontend:
- It makes an A2A call to the Marketing agent to ask which product is currently being promoted, demonstrating real-time agent collaboration.
- It acts as an MCP Client, using the application's own productcatalogservice as a direct "tool" (via gRPC) to get rich, grounded context on the user's cart items.
- It uses Gemini to generate intelligent recommendations based on all this combined context.
# recommendation-agent/agent.py
def ListRecommendations(self, request, context):
# A2A Collaboration: Ask another agent for its state
promoted_product = requests.get(MARKETING_AGENT_URL).json().get('product_name')
# MCP Client Action (Tool Use): Get data from the application
cart_product_names = [self.get_product_name_from_id(pid) for pid in request.product_ids]
# Think with Gemini using the rich context
prompt = f"User's Cart: {cart_product_names}, Promoted Product: '{promoted_product}'..."
response = model.generate_content(prompt)
# ...
Agent 4: The Conversational Assistant (Grounded MCP Client)
To add an interactive element, I built a Customer Support Agent. This agent is a prime example of an MCP Client that is designed to be user-facing and does not use A2A. Its sole focus is to help the user. It uses the application's productcatalogservice as a tool (via gRPC) to get real-time data. Crucially, it is explicitly "grounded"—instructed to only answer questions based on the data it retrieves. This prevents the AI from hallucinating and builds user trust.
final_prompt = f"""
Answer the user's question based ONLY on the product data provided. Do not make up information.
User's Question: "{question}"
Product Data: "{product_context}"
"""
final_response = model.generate_content(final_prompt)
The "Wow" Factor: Public UIs with LoadBalancer
For the final submission, I needed public URLs for the judges. GKE makes this incredibly simple. By changing one line in my dashboard's Service
definition to type: LoadBalancer
, GKE automatically provisioned a public IP address.
apiVersion: v1
kind: Service
metadata:
name: dashboard-ui-service
spec:
type: LoadBalancer
selector:
app: dashboard-ui
ports:
- port: 80
targetPort: 8080
This allowed me to build and expose two user-facing components: our Mission Control Dashboard and an interactive Customer Support chatbot.
Main Dashboard
Customer support chat
Why GKE Was the Perfect Choice
Building this complex, multi-component system in such a short time would not have been impossible without GKE. It eased the process by providing:
- Declarative Infrastructure: I could define my entire system in a few YAML files.
- Seamless Service Discovery: GKE's internal DNS let my agents find and talk to each other effortlessly.
-
The Right Tool for the Job: Using
CronJobs
for scheduled tasks andDeployments
for 24/7 services was trivial. - Built-in Resilience: GKE ensures that if a pod crashes, it's automatically restarted.
-
Effortless Scalability: If my agents were under heavy load, scaling up would be as simple as changing
replicas: 1
toreplicas: 5
.
GKE truly is the ultimate platform for building and running the next generation of AI workloads.
The Final Architecture
🚀 Live Demos & Code Repository
Code Repository: https://github.com/ki3ani/thanos
Live Demo: Mission Control Dashboard: http://34.9.104.63/
Live Demo: AI Customer Support Chatbot: http://34.123.10.111/
Top comments (0)