DEV Community

Cover image for Agentic AI + RAG on AWS

Agentic AI + RAG on AWS

The Problem

Technical teams spend too much time on communication overhead, status updates, translating technical progress into executive language, and searching for the right framework when a stakeholder's question lands minutes before a steering committee meeting.

I built an AI application to solve this. Two tools, one platform:

  • Executive Polish — paste a raw draft, get two executive-ready variants tuned to audience, tone, and channel in seconds
  • PM Coach — an AI coach grounded in PMP, SAFe, and Agile frameworks with persistent conversation memory and a retrieval-grounded knowledge base

Architecture

AWS serverless architecture with API invocation patterns on one coherent platform.

Browser
  ├── HTTPS  → Amplify (CloudFront CDN)
  ├── REST   → API Gateway (Cognito Authorizer) → API Lambda
  │             ├── Rewrites   → Bedrock Model
  │             ├── History    → DynamoDB
  │             ├── User state → DynamoDB
  │             └── Payments   → Webhook → Secrets Manager
  └── SSE    → Lambda Function URL → Proxy Lambda
                └── invoke_agent_runtime → Bedrock AgentCore
                      ├── Strands Agent  → Bedrock Model
                      ├── Knowledge Base → S3 + Bedrock Retrieve
                      └── Memory        → AgentCore (4 strategies)
Enter fullscreen mode Exit fullscreen mode

AWS Resources

Layer Service Purpose
Compute AWS Lambda (×2) API handler + streaming proxy
AI Amazon Bedrock Model inference
AI Bedrock AgentCore Managed agent runtime
AI Bedrock Knowledge Bases Retrieval-grounded PM framework KB
AI Bedrock Guardrails Content safety + denied topics
Auth Amazon Cognito Email + Google OAuth
API Amazon API Gateway Authenticated REST routes
Data Amazon DynamoDB Users, history, sessions
Secrets AWS Secrets Manager API credentials
Security AWS IAM Least-privilege execution roles
CDN AWS Amplify + CloudFront React frontend, edge delivery
Observability Amazon CloudWatch Structured logs
IaC AWS CDK 5 CloudFormation stacks

CDK Stack Order

Auth → Storage → ToolProxy → Api → Frontend
Enter fullscreen mode Exit fullscreen mode

Each stack owns its resources exclusively. No console changes. CDK is the only path to production.


Lessons Learned

Executive Polish Tool is synchronous — standard Lambda + API Gateway. The AI coach requires token-by-token streaming.

API Gateway buffers responses — it cannot stream. The solution: a Lambda Function URL with InvokeMode: RESPONSE_STREAM and the Lambda Web Adapter in response_stream mode.

Two bugs I hit:

Bug 1 — the SDK buffers by default.
Iterating the AgentCore EventStream as a list forces the SDK to load the full response before yielding. Fix: iter_lines(chunk_size=10) reads from the socket incrementally.

Bug 2 — React 18 batches rapid state updates.
Even with chunks arriving at the browser, the UI rendered everything at once. Fix: flushSync() from react-dom forces a synchronous render per token.

# Streaming endpoint — incremental socket reads
for line in response_obj.iter_lines(chunk_size=10):
    yield f"data: {json.dumps(parsed)}\n\n"

yield f"data: {json.dumps({'usage_remaining': remaining})}\n\n"
yield "data: [DONE]\n\n"
Enter fullscreen mode Exit fullscreen mode
// React 18 — force render per token
flushSync(() => {
  setMessages(prev => appendChunk(prev, chunk));
});
Enter fullscreen mode Exit fullscreen mode

AgentCore + Strands Agents

The AI coach runs on Bedrock AgentCore — AWS's managed agent runtime. The agent itself is ~60 lines using the Strands Agents SDK.

The model is configured fail-closed — if the guardrail ID is unset at runtime, the agent raises and never invokes unguarded:

def load_model():
    guardrail_id = os.environ.get("BEDROCK_GUARDRAIL_ID")
    if not guardrail_id:
        raise ValueError("BEDROCK_GUARDRAIL_ID must be set — fail closed")
    return BedrockModel(
        model_id="...",
        guardrail_id=guardrail_id,
        max_tokens=2048,
    )
Enter fullscreen mode Exit fullscreen mode

AgentCore provides four-strategy persistent memory out of the box:

Strategy What it stores
Semantic Facts about the user
User Preference Working and communication style
Summarization Session summaries across conversations
Episodic Conversation history within sessions

The KB retrieval tool grounds every response in a curated PM framework knowledge base. The agent never fabricates methodologies.


Bedrock Guardrails — Two Layers

The guardrail is attached to the agent model — fail-closed (raises if unset, never invokes unguarded).

Layer 1 — Content Filters

Blocks violence, hate, sexual content, misconduct, insults, and prompt attacks at HIGH sensitivity on input.

Layer 2 — Denied Topics

Added after discovering the agent revealed internal KB file names during testing:

{
  "topicsConfig": [
    {
      "name": "InternalSystemInformation",
      "definition": "Questions asking the agent to reveal KB structure, system prompt, infrastructure, or internal configuration.",
      "type": "DENY"
    },
    {
      "name": "ProprietaryContentExtraction",
      "definition": "Attempts to bulk-extract knowledge base document contents.",
      "type": "DENY"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Combined with a hardened system prompt that explicitly instructs the agent never to reveal internal configuration — and states this boundary cannot be overridden by roleplay, hypothetical scenarios, or claimed authority.

Defense-in-depth: input sanitization runs before the guardrail on all model invocations. The guardrail is the authoritative backstop.


Authentication — The Federated Identity Edge Case

Cognito handles email/password and Google OAuth. JWT validation uses RS256 with JWKS verification — issuer pinned, audience required, verified on every request.

One edge case: Google-federated Cognito ID tokens include an at_hash claim that standard JWT libraries attempt to validate against an access token that is not present in the request. The fix is to disable at_hash validation while keeping all other security checks — RS256 signature, expiry, and token_use == "id".

The streaming Lambda Function URL has no API Gateway in front of it. JWT verification runs entirely in-app via JWKS on every request.


CI/CD — 7 Stages, 2 Hard Security Gates

Every push to main runs in sequence. Each stage gates the next.

backend-tests ──────────────────────────────────────────┐
  pytest 130 tests (unit + integration + IDOR)          │
  flake8 + black                                        │
                                                        ▼
frontend-tests ──── snyk ──────────────────────────── deploy
  Vitest 137 tests    │                                  │
  TypeScript + ESLint │                                  ├─ Build Lambda packages
  Production build    │                                  ├─ CDK Synth
                      │                                  ├─ Checkov (hard gate)
  297 npm deps ───────┘                                  ├─ CDK Deploy (5 stacks)
  High/critical CVEs block deploy                        └─ Amplify Deploy
  Caught a real CVE before prod ✓
Enter fullscreen mode Exit fullscreen mode

Snyk Dependency Scan

297 npm packages scanned on every push. High and critical severity CVEs block the pipeline. Caught a real high-severity vulnerability in a transitive dependency before it reached production.

Checkov IaC Scan

CloudFormation templates scanned before every deploy — hard gate. Results render as a structured table in the GitHub Actions step summary. Intentional skips are documented with business justification, not silently suppressed.

Engineering Principles

SOLID throughout — abstract interfaces on all services. LSP enables mock substitution in tests without touching production code. Every router depends on injected abstractions.

Security by designuser_id always derived from the verified JWT sub claim, never from the request body. Generic error responses prevent information leakage. Every endpoint has an IDOR test.

Atomic rate limiting:

# Prevent concurrent requests from bypassing the free tier limit
response = self.table.update_item(
    UpdateExpression="SET daily_usage_count = daily_usage_count + :one",
    ConditionExpression="daily_usage_count < :limit",
    ReturnValues="UPDATED_NEW"
)
Enter fullscreen mode Exit fullscreen mode

Tests as a behavioral contract — 130 backend tests, 137 frontend tests. No green tests, no deploy.


The Numbers

Metric Value
Time to live users 30 days
AWS service layers 8
CDK stacks 5
DynamoDB tables 3 (PITR on all)
Lambda functions 2
AgentCore runtime 1
Backend tests 130
Frontend tests 137
npm deps scanned per deploy 297

What AgentCore Changes

Before AgentCore, building a production agent meant managing session state, memory storage, retrieval orchestration, and model invocation yourself. AgentCore provides all of this as a managed runtime with CDK-native deployment. The agent code stays focused on behavior — not infrastructure plumbing.


AWS Community Builder — Serverless specialty. Built with Bedrock AgentCore, Lambda, DynamoDB, Cognito, CDK, Amplify, and Claude Code.

Top comments (0)