Built for the AWS AI Agent Global Hackathon
Introduction
After building a serverless data analytics pipeline for customer churn, I had clean, query-ready customer data sitting in Amazon Athena. The next logical step was to make that data actionable — not just for analysts, but for customers themselves.
That's where the Customer Retention Agent comes in. This is a fully autonomous AI agent built on AWS Bedrock AgentCore that identifies at-risk customers and proactively offers them personalized retention deals through natural conversation. I built this as part of the AWS AI Agent Global Hackathon, and it's a natural continuation of my previous project.
Before diving into the build, I spent time going through the Amazon Bedrock AgentCore Samples repository. The tutorials there were incredibly helpful for getting up to speed with AgentCore concepts — from Runtime and Gateway to Memory and Identity. If you're new to AgentCore, I highly recommend starting there.
The goal was simple: What if customers could talk to an AI agent that knows their churn risk and can instantly generate personalized discount codes? No forms, no waiting for customer service — just a conversation that might save their subscription.
Architecture
Here's the high-level design:
Core Components:
- Amazon Bedrock AgentCore (Runtime, Gateway, Memory) — The brain of the system. Runtime hosts the agent, Gateway connects to external tools, and Memory persists conversation context.
- Claude 3.7 Sonnet — Powers autonomous reasoning and multi-step decision-making.
- Next.js Frontend — Chat interface deployed on Vercel with streaming responses.
- AWS Lambda (3 functions) — Churn Data Query, Retention Offer, Web Search exposed via MCP protocol.
- Amazon Athena — Queries the Telco customer churn dataset (from my previous project).
- Amazon Cognito — Dual authentication: web client for users, M2M client for agent-to-Gateway communication.
- Bedrock Knowledge Base — RAG implementation with company policies and troubleshooting guides.
- Amazon S3 — Stores customer data and knowledge base documents.
You can find the full implementation here: https://github.com/ajithmanmu/customer-retention-agent
Demo Video
https://www.youtube.com/watch?v=nt2-iE_qBIw
URL: https://customer-retention-agent.vercel.app/
Demo showing the agent in action - analyzing churn risk and generating discount codes
Walkthrough
1. The User Journey
When a customer logs into the chat interface:
- Authentication: Frontend authenticates via Cognito, receives JWT token
-
JWT Mapping: Token contains Cognito user ID (UUID) which gets mapped to actual customer ID in the dataset (e.g.,
"3916-NRPAP"
) - Conversation Starts: User sends a message, AgentCore Runtime receives request with JWT
- Memory Retrieval: Before responding, agent pulls customer context from Memory
- Agent Reasoning: Claude 3.7 Sonnet decides which tools to call (if any)
- Tool Execution: Agent calls Lambda functions via Gateway for data/actions
- Response Generation: Claude synthesizes response with retrieved data
- Memory Saving: Interaction gets saved to Memory for future conversations
2. Dual Authentication Architecture
This was one of the trickier parts. The system needs two separate authentication flows:
Web Client (User → Runtime):
- User logs in with username/password
- Cognito returns JWT token
- Frontend includes JWT in every request to AgentCore Runtime
- Token contains
sub
field with user ID
M2M Client (Agent → Gateway):
- Agent needs to call Lambda functions via Gateway
- Uses OAuth 2.0 client credentials flow
- Confidential client with client secret stored in SSM
- Access token validates at Gateway before allowing tool calls
Working with Cognito was more complicated than I expected — configuring two different clients, getting the OAuth flows right, and debugging token scopes took several iterations. But it was a valuable learning experience in production authentication patterns.
3. The Agent's Brain: AgentCore Runtime + Memory
The agent runs on AgentCore Runtime, which is a fully managed, serverless platform for hosting AI agents. No servers to manage, auto-scaling built-in.
Memory Integration is what makes this agent truly conversational:
class CustomerRetentionMemoryHooks:
def __init__(self, memory_id, customer_id, session_id, region):
self.memory_client = boto3.client('bedrock-agent-runtime')
self.memory_id = memory_id
self.actor_id = customer_id # Maps to customer in dataset
Three memory strategies work together:
- USER_PREFERENCE: Stores explicit preferences ("I prefer email contact")
- SEMANTIC: Vector-based semantic memory for conversation context
- SUMMARIZATION: Condensed conversation summaries
This means if a customer says "My customer ID is 3916-NRPAP" in one session, the agent remembers it in future conversations.
4. Tools Layer: Lambda Functions via Gateway
I created three Lambda functions, each with a specific purpose:
Churn Data Query Lambda:
# Queries Athena with SQL
query = f"""
SELECT customerid, churn_risk_score, tenure, contract, monthlycharges
FROM telco_augmented_vw
WHERE customerid = '{customer_id}'
"""
This function:
- Hits Amazon Athena (the data from my previous pipeline project!)
- Returns customer profile, churn risk score, usage patterns
- Uses
cancel_intent
field as our "synthetic churn model" — no separate ML training needed
Retention Offer Lambda:
- Generates personalized discount codes based on risk level
- High risk (>70%): 20-30% off for 3 months (code:
SAVE25
) - Medium risk (40-70%): 15-25% off for 2 months
- Low risk (<40%): Service upgrades and add-ons
Web Search Lambda:
- DuckDuckGo API for real-time information
- Helps agent answer general retention strategy questions
Internal Tool: Product Catalog
In addition to the three external Lambda functions, the agent also has an internal tool that runs directly within the AgentCore Runtime - no external API calls needed. The get_product_catalog()
tool provides real-time information about available telecom plans, pricing, add-on services, and retention offers. This tool is perfect for answering customer questions like "What plans do you offer?" or "Tell me about your premium features" without making external API calls. Having this as an internal tool means faster response times and reduced latency for common queries.
@tool
def get_product_catalog() -> str:
"""Get information about available telecom plans and services."""
# Returns plan details, pricing, features, and retention offers
return formatted_catalog_info
This demonstrates a key architectural pattern: use internal tools for static/reference data that doesn't require external systems, and use external tools (via Gateway) for dynamic data queries or actions that need database access.
All three functions are exposed via AgentCore Gateway using the MCP (Model Context Protocol). The Gateway handles authentication, request routing, and response formatting.
5. The Autonomous Reasoning Flow
Here's what happens when a customer asks: "Can you give me a discount code?"
- Agent Receives Request: Claude reads the prompt and system instructions
- Decision Making: Agent decides it needs customer churn data first
-
Tool Call #1: Calls
churn_data_query
via Gateway → Lambda → Athena - Risk Analysis: Receives churn risk score (e.g., 85% — HIGH risk)
- Decision Making: Agent decides to generate retention offer
-
Tool Call #2: Calls
retention_offer
with customer data -
Offer Generation: Lambda generates
SAVE25
discount code (25% off) - Response: Agent synthesizes natural response with discount code
The agent makes all these decisions autonomously — I didn't hardcode the workflow. The system prompt guides the agent, but Claude decides when and how to use tools.
6. RAG with Bedrock Knowledge Base
The Knowledge Base stores:
- Company policies
- Troubleshooting guides
- FAQ documents
RAG Flow:
User Query → Agent → Knowledge Base → Retrieved Context → Enhanced Response
Using Amazon Titan Embeddings, documents get vectorized for semantic search. When a customer asks about policies, the agent retrieves relevant sections and includes them in the response.
7. Data Connection: From Previous Project
The customer data comes from my previous serverless pipeline project. That pipeline:
- Ingested the Kaggle Telco dataset
- Converted CSV to Parquet with Glue ETL
- Partitioned data in S3
- Made it queryable via Athena
This agent project is the natural next step — taking that clean, query-ready data and making it accessible through conversational AI.
Key Technical Decisions
Why AgentCore Over DIY?
I could have built this with raw Lambda functions and LangChain, but AgentCore provided:
- Built-in Memory: No need to build my own vector database
- Gateway with MCP: Standardized protocol for tool integration
- Managed Runtime: No ECS clusters or container management
- Observability: CloudWatch integration out of the box
Why Dual Cognito Architecture?
- Security: Separates user authentication from agent-to-service authentication
- Scalability: M2M tokens can be cached and reused
- Best Practice: Follows OAuth 2.0 patterns for service-to-service communication
Why Synthetic Churn Model?
The dataset includes a cancel_intent
field which acts as our "pretend ML model." For a hackathon demo, this works perfectly without needing to train and deploy a separate ML model. In production, you'd integrate with SageMaker for real churn predictions.
Security
Even for a hackathon project, I applied production security practices:
- IAM Roles: Least-privilege access for Lambda, Runtime, and Gateway
- JWT Authentication: Secure token-based auth with Cognito
- SSM Parameter Store: All secrets and config stored securely
- S3 Encryption: SSE-S3 for data at rest
- Private Lambda (TODO): Current Lambdas are public; production would use VPC
Challenges & Learnings
1. Cognito Complexity
Setting up dual authentication was harder than expected. Key lessons:
- USER_PASSWORD_AUTH flow must be explicitly enabled
- M2M clients need proper scopes configured
- Discovery URLs must be exact (
.well-known/openid-configuration
) - Token decoding requires proper base64 padding
Working with Cognito was more complicated than I anticipated, but it forced me to deeply understand OAuth 2.0 flows and JWT token structure.
2. Cold Start Problem
The first request to the agent often timed out. Classic serverless cold start:
- AgentCore Runtime takes time to spin up
- Solution: Better error handling and retry logic
- Future: Consider provisioned concurrency for production
3. Multi-Step Tool Calling
Getting Claude to call churn_data_query
first, then pass that data to retention_offer
required explicit prompt engineering:
SYSTEM_PROMPT = """
IMPORTANT: When customers ask for discount codes, you MUST:
1. First call the churn_data_query tool to get customer data
2. Then call the retention_offer tool with the complete churn_data
"""
Learning: LLMs need very explicit instructions for sequential workflows.
4. SSM Parameter Store Permissions
The auto-created Runtime execution role didn't include SSM permissions. Quick fix:
{
"Effect": "Allow",
"Action": ["ssm:GetParameter"],
"Resource": "arn:aws:ssm:*:*:parameter/customer-retention-agent/*"
}
Learning: Always verify IAM permissions when integrating AWS services.
5. Local Development Setup
Testing locally before deploying was crucial:
- Used
agentcore invoke --local
to simulate Runtime - Created automated test suite (
test_invoke_local.py
) - Tested with real AWS services (Lambda, Athena, Memory)
Learning: Local-first development saves time and AWS costs.
6. On-Demand Throughput Not Supported
Discovered that not all Bedrock models support on-demand throughput. Had to adjust model selection.
Learning: Read the AWS documentation carefully for service limitations.
7. Boto3 Sessions
Lambda functions need proper boto3 session management:
athena_client = boto3.client('athena', region_name='us-east-1')
Learning: Always specify region explicitly in Lambda functions.
What I Learned
Technical:
- AgentCore primitives (Runtime, Gateway, Memory) work incredibly well together
- MCP protocol standardizes tool integration
- Memory strategies: USER_PREFERENCE for explicit data, SEMANTIC for context
- JWT token structure and OAuth 2.0 flows
- RAG implementation with Bedrock Knowledge Base
- Serverless cold starts are real — plan accordingly
Architectural:
- Dual authentication is complex but necessary for production systems
- Tool design matters: focused, single-responsibility functions compose well
- Explicit prompt engineering is crucial for multi-step workflows
- Local testing infrastructure saves time and money
Data:
- Synthetic data (like
cancel_intent
) works great for demos - Previous data pipeline projects can be extended with AI layers
- Parquet + Athena = fast, cost-effective queries
Next Steps
If I continue this project:
-
Security Enhancements:
- Make Lambdas private
- Set up VPC and subnets
- Add Web Application Firewall (WAF)
-
Responsible AI:
- Content moderation with Bedrock Guardrails
- Human oversight for high-value offers
- Policy checks before generating discounts
-
Production Features:
- Real-time alerts when high-risk customers detected
- A/B testing for retention strategies
- Analytics dashboard for offer effectiveness
- Sentiment analysis for conversation tone
-
Integration:
- Connect to Confluence for live policy updates (Bedrock KB supports this!)
- Integrate with CRM (Salesforce/HubSpot)
- Multi-channel support (SMS, email, phone)
Conclusion
Building the Customer Retention Agent taught me that autonomous AI agents are production-ready today. With AWS Bedrock AgentCore, I went from idea to working demo faster than expected.
The hardest parts weren't the AI — they were the authentication, cold starts, and getting all the AWS services to work together. But that's the reality of building production systems.
This project is a natural continuation of my data pipeline work. The pipeline gave me clean data in Athena; the agent makes that data actionable through conversation. Together, they demonstrate how serverless + AI can solve real business problems.
Key takeaway: Modern cloud platforms make it possible to build sophisticated AI agents without managing infrastructure. The future of customer service is autonomous, personalized, and conversational.
Thanks to Devpost for hosting the AI Agent Global Hackathon and creating AgentCore. Building with these tools has been an incredible learning experience! 🚀
Resources
- GitHub Repository: https://github.com/ajithmanmu/customer-retention-agent
- Demo Video: https://www.youtube.com/watch?v=nt2-iE_qBIw
- AWS AI Agent Hackathon: https://devpost.com/software/customer-retention-agent?ref_content=user-portfolio&ref_feature=in_progress
- Previous Project: Serverless Data Pipeline
- AWS Bedrock AgentCore Docs: https://aws.amazon.com/bedrock/agentcore/
- AgentCore Samples & Tutorials: https://github.com/awslabs/amazon-bedrock-agentcore-samples (Highly recommended for learning AgentCore!)
Top comments (0)