Abraham Arellano Tavara

Posted on Oct 19 • Originally published at myitbasics.com on Oct 18

After Asana's AI Breach: What It Takes to Deploy Production AI Agents Securely

#aws #machinelearning #security #cloudcomputing

When Asana's Model Context Protocol server leaked data from ~1,000 organizations due to a session isolation flaw in May 2025, it crystallized a question I hear constantly from enterprise CTOs: "Can we actually deploy AI agents without creating the next security incident?"

After spending the past year deploying Amazon Bedrock AgentCore with customers across Europe—from 18-year-old SAP systems to regulated financial services—I've learned that moving AI agents from prototype to production isn't a framework problem. It's an infrastructure problem that most teams discover too late.

The 3-Month vs 6-Month Gap

Here's the uncomfortable pattern I see repeatedly:

3 months: Build an impressive AI agent demo
6 months: Solve infrastructure problems you didn't know existed

The gap isn't about choosing LangChain vs CrewAI or Claude vs GPT. It's about challenges that traditional application architectures never required.

The 4 Infrastructure Problems That Kill Production Deployments

1. Session Isolation (The Asana Problem)

The issue: Traditional stateless functions terminate after each request. AI agents maintain complex state across multiple interactions—conversation history, tool permissions, intermediate computations.

Real-world impact: Cross-tenant data contamination when one user's agent context bleeds into another's session.

Production solution: Each user session requires its own dedicated microVM with isolated compute, memory, and filesystem resources. Complete termination after session completion.

# What actually happens in production
runtime = Runtime.create(
    name="customer-agent",
    container_image="agent:latest",
    protocol="AGENT_CORE_RPC",
    memory_size_mb=4096,
    vcpus=2
)

# Each session gets its own isolated microVM
response = runtime.invoke(
    runtime_session_id="user-12345",  # Isolated session
    payload=json.dumps({"prompt": "Analyze Q4 financials"}).encode()
)

2. Long-Running Workflows

The issue: Research agents analyzing competitive intelligence or processing regulatory documents can't complete in Lambda's 15-minute window.

Real-world example: A financial services agent analyzing SEC filings needs to:

Fetch documents (5-10 min)
Parse and extract data (15-20 min)
Cross-reference with historical data (10-15 min)
Generate compliance report (5-10 min)

Total time: 35-55 minutes

What you need: Agent sessions lasting up to 8 hours for multi-step agentic workflows.

3. Identity Complexity

The issue: A single agent invocation might require:

OAuth authentication from the user
IAM roles for AWS resources
API keys for third-party services
All while maintaining proper permission boundaries

The gotcha I see constantly: OAuth token expiration during long-running sessions manifests as tool invocation failures after 60-90 minutes.

Production fix: Implement token refresh logic in your middleware rather than relying on cached credentials.

4. Observability for Non-Deterministic Systems

The challenge: When an agent produces unexpected results, you need to trace not just what happened, but why the foundation model made specific reasoning decisions across potentially dozens of tool invocations.

Traditional APM tools don't capture this level of detail.

The SAP Integration Reality

Here's a question from a recent architecture review: "Can AgentCore connect to our SAP ECC 6.0 system?"

The system: 18 years old, custom ABAP code, no REST APIs.

This is enterprise reality. Most production systems weren't designed for modern API consumption.

The pattern that works:

# AgentCore Gateway + Lambda middleware pattern
from agentcore import Gateway

sap_order_tool = Gateway.create_tool(
    name="check_order_status",
    description="Retrieve SAP order status using order number",
    lambda_function_arn="arn:aws:lambda:eu-central-1:123456:function:sap-rfc-connector",
    input_schema={
        "type": "object",
        "properties": {
            "order_number": {"type": "string"}
        },
        "required": ["order_number"]
    }
)

The Lambda function becomes your translation layer between the agent's expectations and SAP's proprietary RFC/BAPI protocols.

What actually fails in production:

Network timeouts between Lambda and on-premises SAP
OAuth token refresh during long sessions
SAP-specific error codes that agents can't interpret

Cost Reality Check

When a customer asked about costs for 1,000 conversations daily (5 messages each, 3 tool calls per message), here's what it looked like in Frankfurt region:

Runtime (2 vCPU, 4GB, 8-min avg sessions): ~$4,200/month
Gateway (15,000 tool calls daily): ~$225/month
Memory (5,000 events daily): ~$375/month
Observability (CloudWatch): ~$100/month

Total: ~$4,900/month

The comparison that matters: Building equivalent infrastructure in-house requires a senior engineer (€90K annually = €7,500/month) for 3+ months of development, plus ongoing maintenance.

Break-even point: 3 months

When AgentCore Makes Sense

✅ Yes, use it when:

Multi-tenant applications where session isolation is critical
Regulated industries with audit requirements (finance, healthcare)
Complex integrations across SAP, Salesforce, ServiceNow
OAuth identity requirements where agents act on behalf of users

❌ No, don't use it when:

High-frequency, sub-100ms latency requirements
Simple automation tasks (single database queries)
Budget constraints below $3-5K monthly
You need complete infrastructure control

The Architecture Insight That Changed Everything

AgentCore isn't competing with LangChain, CrewAI, or LlamaIndex.

AgentCore is the infrastructure those frameworks run on. Think Kubernetes for AI agents—you bring your framework and model, AgentCore provides production-grade runtime, security, and operational tooling.

GDPR Reality for European Markets

The critical gotcha I've seen catch multiple organizations:

AgentCore Memory supports both short-term event retention and long-term storage. By default, long-term memory persists indefinitely.

You must configure time-to-live policies to comply with GDPR's right to erasure.

My recommendation:

90-day retention for short-term memory
Explicit deletion workflows for long-term storage
Deploy in Frankfurt region (eu-central-1) for data residency

The "Start Simple" Pattern That Works

Based on successful deployments:

Week 1-3: Prototype in free tier (until Sep 16, 2025)

Build agent using your preferred framework
Deploy to AgentCore Runtime
Integrate 2-3 tools through Gateway

Week 4-10: Pilot with 100-500 users

Monitor costs and observability
Refine tool integrations
Gather user feedback

Week 11+: Production rollout

Start with one use case
Expand based on ROI
Implement memory strategies

The teams that struggled: Tried to migrate entire application portfolios at once without understanding cost implications.

Key Takeaways

Session isolation isn't optional for multi-tenant agents. The Asana incident demonstrated what happens when isolation fails.
Integration complexity compounds quickly. Every additional backend system adds authentication layers, error handling, and monitoring. Gateway's automatic conversion eliminates months of work.
Production agents require production infrastructure. Memory management, observability, and identity controls aren't features you add later—they're foundational.

Discussion Questions

I'd love to hear your perspective:

Have you deployed AI agents in production? What infrastructure challenges surprised you?
For those running SAP or legacy systems—how are you handling integration?
What's your biggest concern: security, cost, or complexity?

Full technical deep-dive (with code examples, architecture diagrams, and cost breakdowns):
👉 Read the complete guide on MyITBasics

This covers:

Complete SAP integration architecture with authentication flows
Regional deployment strategies for GDPR compliance
Debugging common production issues
Implementation quickstart with Dockerfile
All 7 AgentCore services explained in detail

Abraham Arellano Tavara | Senior Strategic Solutions Architect, AWS Munich

DEV Community