Benjamin Nwokoye for AWS Community Builders

Posted on Jun 10

Prod Grade Agentic AI + RAG on AWS

#ai #aws #serverless #showdev

The Problem

Technical teams spend too much time on communication overhead, status updates, translating technical progress into executive language, and searching for the right framework when a stakeholder's question lands minutes before a steering committee meeting.

I built an AI application to solve this. Two tools, one platform:

Executive Polish — paste a raw draft, get two executive-ready variants tuned to audience, tone, and channel in seconds
PM Coach — an AI coach grounded in PMP, SAFe, and Agile frameworks with persistent conversation memory and a retrieval-grounded knowledge base

Live on

app.bennwokoye.com

Architecture

AWS serverless architecture with API invocation patterns on one coherent platform.

Browser
  ├── HTTPS  → Amplify (CloudFront CDN)
  ├── REST   → API Gateway (Cognito Authorizer) → API Lambda
  │             ├── Rewrites   → Bedrock Model
  │             ├── History    → DynamoDB
  │             ├── User state → DynamoDB
  │             └── Payments   → Webhook → Secrets Manager
  └── SSE    → Lambda Function URL → Proxy Lambda
                └── invoke_agent_runtime → Bedrock AgentCore
                      ├── Strands Agent  → Bedrock Model
                      ├── Knowledge Base → S3 + Bedrock Retrieve
                      └── Memory        → AgentCore (4 strategies)

AWS Resources

Layer	Service	Purpose
Compute	AWS Lambda (×2)	API handler + streaming proxy
AI	Amazon Bedrock	Model inference
AI	Bedrock AgentCore	Managed agent runtime
AI	Bedrock Knowledge Bases	Retrieval-grounded PM framework KB
AI	Bedrock Guardrails	Content safety + denied topics
Auth	Amazon Cognito	Email + Google OAuth
API	Amazon API Gateway	Authenticated REST routes
Data	Amazon DynamoDB	Users, history, sessions
Secrets	AWS Secrets Manager	API credentials
Security	AWS IAM	Least-privilege execution roles
CDN	AWS Amplify + CloudFront	React frontend, edge delivery
Observability	Amazon CloudWatch	Structured logs
IaC	AWS CDK	5 CloudFormation stacks

CDK Stack Order

Auth → Storage → ToolProxy → Api → Frontend

Each stack owns its resources exclusively. No console changes. CDK is the only path to production.

Lessons Learned

Executive Polish Tool is synchronous — standard Lambda + API Gateway. The AI coach requires token-by-token streaming.

API Gateway buffers responses — it cannot stream. The solution: a Lambda Function URL with InvokeMode: RESPONSE_STREAM and the Lambda Web Adapter in response_stream mode.

Two bugs I hit:

Bug 1 — the SDK buffers by default.
Iterating the AgentCore EventStream as a list forces the SDK to load the full response before yielding. Fix: iter_lines(chunk_size=10) reads from the socket incrementally.

Bug 2 — React 18 batches rapid state updates.
Even with chunks arriving at the browser, the UI rendered everything at once. Fix: flushSync() from react-dom forces a synchronous render per token.

# Streaming endpoint — incremental socket reads
for line in response_obj.iter_lines(chunk_size=10):
    yield f"data: {json.dumps(parsed)}\n\n"

yield f"data: {json.dumps({'usage_remaining': remaining})}\n\n"
yield "data: [DONE]\n\n"

// React 18 — force render per token
flushSync(() => {
  setMessages(prev => appendChunk(prev, chunk));
});

AgentCore + Strands Agents

The AI coach runs on Bedrock AgentCore — AWS's managed agent runtime. The agent itself is ~60 lines using the Strands Agents SDK.

The model is configured fail-closed — if the guardrail ID is unset at runtime, the agent raises and never invokes unguarded:

def load_model():
    guardrail_id = os.environ.get("BEDROCK_GUARDRAIL_ID")
    if not guardrail_id:
        raise ValueError("BEDROCK_GUARDRAIL_ID must be set — fail closed")
    return BedrockModel(
        model_id="...",
        guardrail_id=guardrail_id,
        max_tokens=2048,
    )

AgentCore provides four-strategy persistent memory out of the box:

Strategy	What it stores
Semantic	Facts about the user
User Preference	Working and communication style
Summarization	Session summaries across conversations
Episodic	Conversation history within sessions

The KB retrieval tool grounds every response in a curated PM framework knowledge base. The agent never fabricates methodologies.

Bedrock Guardrails — Two Layers

The guardrail is attached to the agent model — fail-closed (raises if unset, never invokes unguarded).

Layer 1 — Content Filters

Blocks violence, hate, sexual content, misconduct, insults, and prompt attacks at HIGH sensitivity on input.

Layer 2 — Denied Topics

Added after discovering the agent revealed internal KB file names during testing:

{
  "topicsConfig": [
    {
      "name": "InternalSystemInformation",
      "definition": "Questions asking the agent to reveal KB structure, system prompt, infrastructure, or internal configuration.",
      "type": "DENY"
    },
    {
      "name": "ProprietaryContentExtraction",
      "definition": "Attempts to bulk-extract knowledge base document contents.",
      "type": "DENY"
    }
  ]
}

Combined with a hardened system prompt that explicitly instructs the agent never to reveal internal configuration — and states this boundary cannot be overridden by roleplay, hypothetical scenarios, or claimed authority.

Defense-in-depth: input sanitization runs before the guardrail on all model invocations. The guardrail is the authoritative backstop.

Authentication — The Federated Identity Edge Case

Cognito handles email/password and Google OAuth. JWT validation uses RS256 with JWKS verification — issuer pinned, audience required, verified on every request.

One edge case: Google-federated Cognito ID tokens include an at_hash claim that standard JWT libraries attempt to validate against an access token that is not present in the request. The fix is to disable at_hash validation while keeping all other security checks — RS256 signature, expiry, and token_use == "id".

The streaming Lambda Function URL has no API Gateway in front of it. JWT verification runs entirely in-app via JWKS on every request.

CI/CD — 7 Stages, 2 Hard Security Gates

Every push to main runs in sequence. Each stage gates the next.

backend-tests ──────────────────────────────────────────┐
  pytest 130 tests (unit + integration + IDOR)          │
  flake8 + black                                        │
                                                        ▼
frontend-tests ──── snyk ──────────────────────────── deploy
  Vitest 137 tests    │                                  │
  TypeScript + ESLint │                                  ├─ Build Lambda packages
  Production build    │                                  ├─ CDK Synth
                      │                                  ├─ Checkov (hard gate)
  297 npm deps ───────┘                                  ├─ CDK Deploy (5 stacks)
  High/critical CVEs block deploy                        └─ Amplify Deploy
  Caught a real CVE before prod ✓

Snyk Dependency Scan

297 npm packages scanned on every push. High and critical severity CVEs block the pipeline. Caught a real high-severity vulnerability in a transitive dependency before it reached production.

Checkov IaC Scan

CloudFormation templates scanned before every deploy — hard gate. Results render as a structured table in the GitHub Actions step summary. Intentional skips are documented with business justification, not silently suppressed.

Engineering Principles

SOLID throughout — abstract interfaces on all services. LSP enables mock substitution in tests without touching production code. Every router depends on injected abstractions.

Security by design — user_id always derived from the verified JWT sub claim, never from the request body. Generic error responses prevent information leakage. Every endpoint has an IDOR test.

Atomic rate limiting:

# Prevent concurrent requests from bypassing the free tier limit
response = self.table.update_item(
    UpdateExpression="SET daily_usage_count = daily_usage_count + :one",
    ConditionExpression="daily_usage_count < :limit",
    ReturnValues="UPDATED_NEW"
)

Tests as a behavioral contract — 130 backend tests, 137 frontend tests. No green tests, no deploy.

The Numbers

Metric	Value
Time to live users	30 days
AWS service layers	8
CDK stacks	5
DynamoDB tables	3 (PITR on all)
Lambda functions	2
AgentCore runtime	1
Backend tests	130
Frontend tests	137
npm deps scanned per deploy	297

What AgentCore Changes

Before AgentCore, building a production agent meant managing session state, memory storage, retrieval orchestration, and model invocation yourself. AgentCore provides all of this as a managed runtime with CDK-native deployment. The agent code stays focused on behavior — not infrastructure plumbing.

AWS Community Builder — Serverless specialty. Built with Bedrock AgentCore, Lambda, DynamoDB, Cognito, CDK, Amplify, and Claude Code.

Top comments (1)

Alex Shev • Jun 11

The production-grade part is the important word here. Agentic RAG gets interesting only when retrieval, permissions, observability, and failure handling are treated as first-class workflow pieces.

The agent itself is rarely the whole product. The durable value is the harness around it: inputs, tools, checks, logs, and rollback paths.