DEV Community: Mohsin Sheikhani

How to Use Claude Code with Qwen models for Free (Linux)

Mohsin Sheikhani — Sat, 03 Jan 2026 20:27:44 +0000

Prerequisites

Qwen CLI installed and authenticated
Node.js v18+ installed

Step 1: Install Claude, Claude Code Router and Qwen Code

npm install -g @qwen-code/qwen-code@latest

npm install -g @anthropic-ai/claude-code @musistudio/claude-code-router

Step 2: Extract Your Access Token

Replace LINUX_USER with your Linux username.

Open /home/LINUX_USER/.qwen/oauth_creds.json

It should look something like this:

{
  "access_token": "YOUR_QWEN_ACCESS_TOKEN_HERE",
  "token_type": "Bearer",
  "refresh_token": "YOUR_QWEN_REFRESH_TOKEN_HERE",
  "resource_url": "portal.qwen.ai",
  "expiry_date": 1764876220290
}

Copy the access_token value.

Step 3: Create router config

cat > ~/.claude-code-router/config.json << 'EOF'
{
  "LOG": true,
  "LOG_LEVEL": "info",
  "HOST": "127.0.0.1",
  "PORT": 3456,
  "API_TIMEOUT_MS": 600000,
  "Providers": [
    {
      "name": "qwen",
      "api_base_url": "https://portal.qwen.ai/v1/chat/completions",
      "api_key": "$QWEN_ACCESS_TOKEN",
      "models": [
        "qwen3-coder-plus",
        "qwen3-coder-plus",
        "qwen3-coder-plus"
      ]
    }
  ],
  "Router": {
    "default": "qwen,qwen3-coder-plus",
    "background": "qwen,qwen3-coder-plus",
    "think": "qwen,qwen3-coder-plus",
    "longContext": "qwen,qwen3-coder-plus",
    "longContextThreshold": 60000,
    "webSearch": "qwen,qwen3-coder-plus"
  }
}
EOF

Verify file was created

cat ~/.claude-code-router/config.json

Step 4: Set your Access Token

echo 'export QWEN_ACCESS_TOKEN="YOUR_QWEN_ACCESS_TOKEN_HERE"' >> ~/.zshrc

source ~/.zshrc

Step 5: Verify Setup

claude --version        # Should show: Claude Code v2.x.x
ccr version             # Should show version number
echo $QWEN_ACCESS_TOKEN # Should show your token

Step 6: Start Using

Restart the router server:

ccr restart

Run Claude Code with Qwen models:

ccr code

Test with:

> hi

Token Refresh (When you get 401 errors)

Your OAuth token expires. Refresh it by:

Re-authenticating your QWEN CODE CLI: If already logged in and the access_token matches in both config.json and oauth_creds.json, delete the oauth_creds.json file and run qwen to initiate re-authentication.
Update the api_key in your config.json with the new access_token: nano ~/.claude-code-router/config.json
Restart: ccr restart

Hopefully this will help you learn Claude Code for Free 💖

Building Production Multi-Agent Systems: My Experience with Amazon Bedrock AgentCore, and AWS Strands Agents

Mohsin Sheikhani — Mon, 29 Sep 2025 18:34:47 +0000

The 2 AM Hotel Booking Nightmare

Picture this: It's 2 AM, you're frantically booking a hotel room for a business trip that got moved up by three days. You find the perfect room, book it, then realize six hours later that your flight got cancelled.
Now you need to cancel the booking, but wait, what's the cancellation policy? Will there be fees? Can you modify instead of cancel? And why is customer service only available during business hours when your life operates on chaos-time?

If this sounds familiar, you've experienced the fundamental problem with traditional hotel booking systems: they treat complex, multi-step workflows as isolated transactions. But real life isn't transactional, it's conversational, contextual, and full of "what-ifs."

This is the story of how I built an AI system that doesn't just book hotels, it thinks like a seasoned concierge who remembers your preferences, understands hotel policies, and can handle the messy reality of travel planning.

Why Traditional Hotel Booking Systems Miss the Mark

Most hotel booking platforms treat you like you're ordering a pizza: pick your toppings (dates, location, price range), pay, and done. But hotel booking isn't pizza ordering, it's relationship management.

Think about what a great hotel concierge does:

Understands complex policies and explains them in plain English
Handles changes gracefully without making you start over
Coordinates multiple services (booking, modifications, cancellations, notifications)
Remembers your preferences from previous stays
Provides proactive guidance based on your specific situation

Traditional systems fail because they're built around databases and forms, not conversations and context. They force users to navigate separate interfaces for search, booking, modification, and support, each requiring you to re-explain your situation.

The result? Frustrated customers, abandoned bookings, and support teams drowning in "simple" requests that require human intelligence to resolve.

The Multi-Agent Vision

What if instead of building another booking form, I built a team of AI specialists that work together like a hotel's back-office staff?

Search Specialist who knows every hotel's availability and can compare options intelligently
Booking Specialist who understands reservation lifecycles and policy implications
Policy Advisor who can explain complex terms and check compliance before actions
Communications Specialist who handles confirmations and keeps everyone informed
Supervisor who orchestrates the team and maintains conversation context

This isn't just a technical architecture, it's a business model that scales human-like service.

Enter Amazon Bedrock AgentCore: The Foundation for Intelligent Orchestration

Here's where the vision meets reality. Amazon Bedrock AgentCore isn't just another AI service, it's the infrastructure that makes multi-agent systems production-ready.

The challenge with building agent teams isn't creating individual agents (that's the easy part). The real complexity lies in:

AgentCore solves these with three key services that work together:

Memory Service: Persistent conversation context that survives sessions. Your booking agent remembers you mentioned you prefer ground floor rooms, even if you come back next week.

Gateway Service: Secure, managed access to external tools and APIs. No more wrestling with authentication, rate limiting, or connection management.

Runtime Service: Scalable agent execution with built-in observability. Your agents run reliably in production without you managing infrastructure.

But here's what makes it powerful: these services are designed to work together. Memory informs decision-making, Gateway enables action-taking, and Runtime orchestrates it all.

The Architecture That Changes Everything

Instead of building a monolithic booking system, I created a supervisor-agent architecture where specialized agents collaborate through AgentCore's components:

The supervisor doesn't just route requests—it maintains context, orchestrates multi-step workflows, and ensures policy compliance before any action is taken.

Building the Agent Team: From Search to Confirmation

Each agent in the system is a specialist, but they're not working in isolation. Here's how the team comes together:

The Search & Discovery Agent

Role: The research specialist who knows every hotel's availability and pricing.

Link To Repo

This agent doesn't just return search results, it understands context. When you ask for "hotels near the conference center," it factors in your previous preferences, compares pricing across dates, and highlights amenities that matter to business travelers.

class SearchDiscoveryAgent(BaseAgent):
    """Agent responsible for hotel search and discovery operations"""

    def __init__(self):
        super().__init__(port="9001")

    def get_agent_name(self) -> str:
        return "SearchDiscoveryAgent"

    def get_agent_description(self) -> str:
        return "Handles hotel search, availability checking, and price comparisons using AgentCore Lambda tools"

The Reservation Agent

Role: The booking specialist who manages the entire reservation lifecycle.

Link To Repo

But here's where it gets interesting, this agent is policy-aware. Before making any booking, modification, or cancellation, it checks with the Guest Advisory Agent to understand implications.

class ReservationAgent(BaseAgent):
    """Agent responsible for managing hotel reservations"""

    def __init__(self):
        super().__init__(port="9002")

    def get_agent_name(self) -> str:
        return "ReservationAgent"

    def get_agent_description(self) -> str:
        return "Handles the full lifecycle of hotel room reservations, including booking, updating, canceling, and fetching past reservations for a guest."

The Guest Advisory Agent

Role: The policy expert who bridges business rules and user actions.

Link To Repo

This agent has access to a knowledge base of hotel policies, cancellation terms, and booking rules. It doesn't just recite policies—it explains them in context and calculates real implications.

class GuestAdvisoryAgent(BaseAgent):
    """Agent responsible for providing hotel policies and advisory information"""

    def __init__(self):
        super().__init__(port="9003")

    def get_agent_name(self) -> str:
        return "GuestAdvisoryAgent"

    def get_agent_description(self) -> str:
        return "Provides hotel policies, rules, and advisory information including cancellation policies, check-in/out procedures, and general hotel guidelines."

The Notification Agent

Role: The communications specialist who keeps everyone informed.

Link To Repo

Every booking, modification, or cancellation triggers appropriate notifications. But it's not just email templates, it's contextual communication that includes next steps and relevant details.

class NotificationAgent(BaseAgent):
    """Agent responsible for handling booking notifications and communications"""

    def __init__(self):
        super().__init__(port="9004")

    def get_agent_name(self) -> str:
        return "NotificationAgent"

    def get_agent_description(self) -> str:
        return "Handles booking confirmations, modifications, cancellations, and other communication needs for hotel reservations."

The Supervisor's Secret Weapon: Memory

Here's what makes this system truly intelligent—the supervisor agent uses AgentCore's Memory service to maintain conversation context across sessions.

Link To Repo

class MemoryManager:
    """Manages Bedrock AgentCore memory operations"""

    def __init__(self, region_name: str, memory_name: str):
        self.client = MemoryClient(region_name=region_name)
        self.memory_name = memory_name
        self.memory_id = None

    def initialize_memory(self) -> str:
        """Initialize or retrieve existing memory"""

class MemoryHookProvider(HookProvider):
    """Provides memory hooks for agent lifecycle events"""

    def __init__(self, memory_client: MemoryClient, memory_id: str):
        self.memory_client = memory_client
        self.memory_id = memory_id

    def on_agent_initialized(self, event: AgentInitializedEvent):
        """Load recent conversation history when agent starts"""
        try:
            ...
        except Exception as e:
            logger.error(f"Memory load error: {e}")

    def on_message_added(self, event: MessageAddedEvent):
        """Store messages in memory"""
        try:
            ...
        except Exception as e:
            logger.error(f"Memory save error: {e}")

    def register_hooks(self, registry: HookRegistry):
        """Register memory hooks"""
        registry.add_callback(MessageAddedEvent, self.on_message_added)
        registry.add_callback(AgentInitializedEvent, self.on_agent_initialized)

# Conversation context persists across sessions
# Agents remember preferences, previous bookings, ongoing requests

When you return to continue a booking conversation from yesterday, the system doesn't just remember what you said—it remembers the context, your preferences, and where you left off in the process.

The Magic of Agent Orchestration: How Complex Workflows Become Simple Conversations

Here's where the supervisor agent earns its keep. It doesn't just route requests, it orchestrates intelligent workflows that would normally require multiple customer service interactions.

A Real Workflow in Action

User: "I need to cancel my booking for next week, but I'm worried about fees."

Traditional System: Navigate to "My Bookings" → Find booking → Click cancel → Read policy wall of text → Call customer service for clarification → Wait on hold → Explain situation → Get transferred → Explain again...

Our Multi-Agent System:

Supervisor identifies this as a policy-sensitive cancellation request
Guest Advisory Agent retrieves specific cancellation policy for this booking
Reservation Agent calculates exact fees and alternatives
Supervisor presents clear options: "Cancelling now incurs a $50 fee, but modifying dates is free until tomorrow"
User chooses, Reservation Agent executes, Notification Agent confirms

All in one conversation. No navigation, no transfers, no re-explaining.

The Orchestration Code That Makes It Possible

# Supervisor's intelligent routing with context awareness
system_prompt = f"""
You are the Supervisor Agent for a multi-agent hotel booking system.

Policy-aware behavior:
- Before performing any booking, modification, or cancellation, check relevant hotel policies (using the Guest Advisory Agent or Knowledge Base) to determine if there are penalties, restrictions, or special conditions.
- Present the user with a summary of what will happen and the relevant policy (e.g., "Cancelling within 24 hours will incur a 20% fee").
- Ask for explicit confirmation before taking action.
"""

# The supervisor maintains context through AgentCore Memory
agent = Agent(
    model=bedrock_model,
    tools=provider.tools, # A2A communication tools
    system_prompt=self._get_system_prompt(),
    hooks=[MemoryHookProvider(memory_manager.client, memory_id)],
    state={
        "actor_id": self.settings.actor_id, 
        "session_id": self.settings.session_id
    },
)

Link To Repo

Why This Architecture Scales

Each agent runs independently on its own port, communicating through Agent-to-Agent (A2A) protocols. This means:

Fault isolation: One agent's issues don't cascade
Independent scaling: High-demand agents can scale separately
Easy updates: Modify one agent without touching others
Clear responsibilities: Each agent has a single, well-defined purpose

Production Reality: Memory, Monitoring, and the Details That Matter

Building a demo is one thing. Building something that handles real customer conversations at 3 AM when your infrastructure is under load? That's where AgentCore's production features become essential.

Memory That Actually Works

The breakthrough isn't just that the system remembers, it's how it remembers intelligently.

# Memory hooks that maintain context across sessions
class MemoryHookProvider(HookProvider):
    def on_agent_initialized(self, event: AgentInitializedEvent):
        # Load last 10 conversation turns when agent starts
        recent_turns = self.memory_client.get_last_k_turns(
            memory_id=self.memory_id, 
            actor_id=actor_id, 
            session_id=session_id, 
            k=10
        )
        # Inject context into agent's system prompt
        event.agent.system_prompt += f"\n\nRecent conversation:\n{context}"

When a customer returns after a week to modify their booking, the system doesn't just remember the booking ID, it remembers they mentioned preferring ground floor rooms, that they're traveling for a conference, and that they were concerned about cancellation policies.

Error Handling That Prevents Disasters

In production, things fail. APIs timeout, agents crash, networks hiccup. The difference between a good system and a great one is graceful degradation:

Link To Repo

# Supervisor handles agent failures gracefully
async def process_request(self, question: str):
    """Process a user request through the supervisor agent"""
    try:
        logger.info(f"Processing request: {question}")
        response = await self.agent.invoke_async(question)
        logger.info("Request processed successfully")
        return response
    except Exception as e:
        logger.error(f"Error processing request: {e}")
        raise

Observability That Tells the Real Story

AgentCore's built-in observability means you can see exactly what's happening:

Which agent handled each request
How long each step took
Where conversations get stuck
Which policies cause the most confusion

This isn't just monitoring, it's business intelligence about how customers actually interact with your system.

The Authentication Reality

Real systems need real security. AgentCore handles the complexity:

Link To Repo

# Cognito integration for secure gateway access
class TokenManager:
    def get_fresh_token(self):
        # Client credentials flow with proper scoping
        # Automatic token refresh and error handling
        # No security vulnerabilities from DIY auth

Business Impact: What This Actually Means for Hotels and Customers

The technical architecture is impressive, but let's talk about what really matters, business outcomes.

For Hotels: Operational Efficiency at Scale

Before: Customer calls about cancellation policy → Agent looks up booking → Checks policy document → Calculates fees → Explains options → Customer decides → Agent processes → Sends confirmation.
Average handling time: 8-12 minutes.

After: Customer asks about cancellation → System instantly knows booking details, policy implications, and alternatives → Presents clear options → Customer decides → Action executed automatically.
Average handling time: 1-2 minutes.

That's not just efficiency, that's 4x capacity increase with the same support team.

For Customers: Intelligence That Feels Personal

The system doesn't just remember your booking, it remembers your story:

"I see you're traveling for the same conference as last year. Would you like the same room type?"
"Based on your previous concern about cancellation fees, I've found options with flexible policies."
"Your flight was delayed last time, should I book a late check-in for this trip?"

This isn't marketing personalization, it's operational intelligence that makes every interaction feel like talking to someone who actually knows you.

The Competitive Advantage

While competitors are still building better search interfaces, this system is solving the real problem: complex travel decisions require conversation, not just transactions.

Hotels using this approach can offer:

24/7 intelligent support without 24/7 staffing costs
Proactive policy guidance that prevents booking mistakes
Seamless modification workflows that retain customers instead of losing them
Contextual upselling based on actual preferences, not generic algorithms

ROI That Actually Matters

Support cost reduction: 4x efficiency gain on routine inquiries
Booking completion rates: Fewer abandoned bookings due to policy confusion
Customer retention: Seamless modification experience vs. starting over elsewhere
Upselling effectiveness: Context-aware recommendations vs. generic offers

The Future of Conversational Commerce: Where This Goes Next

This hotel booking system is just the beginning. The patterns we've established, supervisor orchestration, policy-aware workflows, persistent memory, apply to any complex business process that currently requires human intervention.

Beyond Hotel Booking

Imagine applying this architecture to:

Insurance claims processing with policy specialists and damage assessors
Financial planning with investment advisors and compliance experts
Healthcare coordination with specialists who understand your medical history
Enterprise procurement with budget analysts and vendor specialists

Each domain gets a team of AI specialists who collaborate intelligently, remember context, and handle complexity gracefully.

The AgentCore Foundation: Why This Changes Everything

Amazon Bedrock AgentCore solved the infrastructure problems that kill most multi-agent projects. Here's how each component transformed this hotel booking system from concept to reality.

AgentCore Gateway: Your Lambda Functions Become Agent Tools

The breakthrough moment came when I realized I could keep all my business logic in familiar AWS Lambda functions while making them accessible to agents through the Model Context Protocol (MCP).

The Problem: Agents need access to real business systems, hotel inventory databases, booking APIs, policy engines. Traditionally, this means complex authentication, API management, and security concerns.

AgentCore Gateway's Solution: Your Lambda functions become agent tools automatically.

export const handler = async (
  event: APIGatewayProxyEvent
): Promise<APIGatewayProxyResult> => {
  console.log("Received event:", JSON.stringify(event, null, 2));

  try {
    if (city) {
      const queryResp = await dynamo.send(
        new QueryCommand({
          TableName: tableName,
          KeyConditionExpression: ...,
          ExpressionAttributeValues: {
            ...
          },
        })
      );
    } else {
      const scanResp = await dynamo.send(
        new ScanCommand({
          TableName: tableName,
        })
      );
    }

    return {
      statusCode: 200,
      body: JSON.stringify({ results }),
    };
  } catch (error) {
    console.error("Error:", error);
    return {
      statusCode: 500,
      body: JSON.stringify({ error: "Internal Server Error" }),
    };
  }
};

The Gateway handles:

Authentication and authorization with fine-grained access control
Rate limiting and throttling to protect your backend systems
Request/response transformation between agent protocols and your APIs
MCP protocol abstraction so your Lambdas don't need to know about agents

This means your hotel search, booking, and policy engines remain standard AWS Lambda functions, but agents can invoke them as naturally as calling any other tool.

AgentCore Identity: Security That Scales

Multi-agent systems create complex security challenges. When the Supervisor Agent needs to call the Reservation Agent, which then calls the Policy Agent, how do you maintain security context across the entire chain?

Traditional Approach: Custom authentication between every agent pair, token management nightmares, and security vulnerabilities.

AgentCore Identity: Centralized identity management with automatic token handling.

# Agents authenticate once, communicate securely forever
class TokenManager:
    def __init__(self):
        self.scope_string = f"{self.resource_server_id}/gateway:read {self.resource_server_id}/gateway:write"

    def get_fresh_token(self) -> Optional[str]:
        # AgentCore handles the OAuth2 client credentials flow
        # Automatic token refresh and scope validation

The Identity service provides:

OAuth2 client credentials flow with automatic token refresh
Fine-grained scopes for different agent capabilities
Centralized policy management across your entire agent ecosystem
Audit trails for every agent interaction

AgentCore Memory: Context That Persists and Scales

Here's where the magic happens. Traditional chatbots lose context between sessions. AgentCore Memory makes agents truly intelligent by maintaining conversation context that survives sessions, scales across millions of users, and enables sophisticated reasoning.

The Architecture:

What This Enables:

Cross-session continuity: Customer returns next week, agent remembers their preferences
Multi-agent context sharing: Reservation Agent knows what Search Agent discovered
Intelligent reasoning: "Based on your previous concern about cancellation fees..."
Scalable storage: Millions of conversations with configurable retention policies

The Business Impact: Customers don't repeat themselves. Agents make contextual decisions. Conversations feel natural, not transactional.

AgentCore Runtime: Production-Ready Agent Execution

Running one agent in development is easy. Running a team of agents in production, handling failures gracefully, scaling under load, and maintaining performance—that's where most multi-agent projects die.

AgentCore Runtime's Production Features:

app = BedrockAgentCoreApp()

# Agents that handle production realities
@app.entrypoint
async def send_message(request):
    """Main entry point for the hotel booking system"""
    try:
        question = request.get("question")
        if not question:
            return {"error": "No question provided"}

        response = await supervisor.process_request(question)
        return response.message["content"]
    except Exception as e:
        logger.error(f"Failed to process request: {str(e)}")
        return {"error": f"Failed to process request: {str(e)}"}

Runtime Capabilities:

Automatic scaling based on demand
Health monitoring with automatic recovery
Resource management to prevent runaway processes
Load balancing across agent instances
Graceful degradation when components fail

The Result: Your agents run reliably in production without you managing infrastructure complexity.

AgentCore Observability: Intelligence You Can See

The most sophisticated multi-agent system is useless if you can't understand what's happening inside it. AgentCore Observability provides deep insights into agent behavior, performance, and business impact.

What You Can See:

Agent interaction flows: Which agent handled each step of a complex workflow
Performance metrics: Response times, success rates, resource utilization
Business intelligence: Which policies cause confusion, where customers get stuck
Error patterns: Systematic issues before they become customer problems

The Business Value: You don't just run agents, you optimize them based on real usage patterns and customer behavior.

Why This Architecture Scales

Each AgentCore service solves a specific production challenge:

Gateway: Tool access without security complexity
Identity: Authentication without custom code
Memory: Context without storage management
Runtime: Reliability without infrastructure management
Observability: Insights without custom monitoring

Together, they create a foundation where you focus on business logic while AgentCore handles the production complexity that typically kills multi-agent projects.

Building Your Own Agent Team

Ready to experiment? The architecture is surprisingly approachable:

Start with the supervisor pattern - One agent that orchestrates others
Add memory integration - Context that persists across sessions
Build specialized agents - Each with a single, clear responsibility
Use A2A communication - Agents that collaborate, not compete
Deploy on AgentCore - Production-ready from day one

The complete implementation is available on GitHub, including deployment scripts and documentation for getting started with your own multi-agent system.

What complex workflow in your domain could benefit from conversational intelligence? The tools are ready, the question is what you'll build with them.

GitHub Repo

Multi-Agent Hotel Assistant

This project was built for "AWS AI Engineering Month: Building with Agents". The combination of Amazon Bedrock AgentCore's Memory, Gateway, and Runtime services with the Strands Agents framework creates a powerful foundation for production multi-agent systems.

A Practical Guide to MLOps on AWS: Demand Forecasting with Amazon Bedrock and Automated EC2 Pipelines (Phase 03)

Mohsin Sheikhani — Thu, 10 Jul 2025 17:19:58 +0000

In Phase 02, we transformed raw user interaction events into structured, enriched datasets, organized across bronze, silver, and gold zones in S3, and made them query able through Glue + Athena.

Now in Phase 03, we shift from preparing the data to putting it to work.

This is where AI meets infrastructure:
We’ll use Amazon Bedrock to predict product demand based on historical sales, and architect it the way real systems do.

Why does this matter?

Forecasting is a batch job, not a real-time interaction.
It needs compute, but we don’t want to keep EC2 running 24/7.
So we’ll spin up an EC2 instance nightly, run a forecasting script that:
- Reads gold-zone data
- Sends it to Bedrock
- Updates DynamoDB with new demand forecasts
- Shuts itself down to save cost

All of this is orchestrated via EventBridge and Lambda, forming a complete, automated, cost-efficient forecasting pipeline.

Now, the question here is why we choose EC2 to run the forecasting job?

Because in real-world ML systems, long-running batch jobs like forecasting are often too heavy for Lambda, may require more memory, longer runtimes, or even GPU-based instances. Using EC2 gives us:

Run larger forecasting workloads
Use GPU-based instances (if needed)
Keep costs low by shutting down after completion
Full control over compute resources

How the Orchestration Works

We’ve designed this system to be automated and cost-optimized:

Amazon EventBridge triggers a Lambda function nightly (e.g. every 24 hours)
Lambda starts up an EC2 instance
EC2 pulls cleaned sales data from S3 and runs a Python script
The script:
- Sends data to Amazon Bedrock to forecast the next 7 days
- Updates DynamoDB with the forecasted_demand per product
- Shuts down the EC2 instance when the task is done

Step 1 - Provisioning EC2 Forecasting Instance

In this step, we set up an Amazon EC2 instance that will run our demand forecasting script.

We’re using AWS CDK to provision this instance with the following characteristics:

Pulls aggregated sales data from the S3 Gold Zone (forecast_ready/)
Sends this data to Amazon Bedrock to forecast product demand
Updates the forecasted_demand field in the DynamoDB Inventory Table
Shuts itself down after the job is completed to avoid unnecessary costs

Configuration Highlights

Launched in a VPC with a public subnet (since it needs internet access for Bedrock and S3, we'll move it to private subnet in upcoming phase)
Attached to an IAM role that allows:
- Invoking Bedrock models (bedrock:InvokeModel)
- Reading from S3
- Writing to DynamoDB
Bootstrapped via a user data script that:
- Installs required dependencies (aws cli, etc.)
- Downloads the inventory_forecaster.py script from S3
- Runs the script
- Terminates the instance once done

Create a Dedicated File for EC2 Forecasting

Follow the same pattern by organizing this inside a new folder:

mkdir -p lib/constructs/common/compute/
touch lib/constructs/common/compute/forecast-instance.ts

Then paste the following code:

import { Construct } from "constructs";
import {
  Instance,
  InstanceClass,
  InstanceSize,
  InstanceType,
  MachineImage,
  Vpc,
  SecurityGroup,
  Peer,
  Port,
} from "aws-cdk-lib/aws-ec2";
import {
  Role,
  ServicePrincipal,
  ManagedPolicy,
  PolicyStatement,
} from "aws-cdk-lib/aws-iam";
import { Bucket } from "aws-cdk-lib/aws-s3";
import { Table } from "aws-cdk-lib/aws-dynamodb";
import { aws_ec2 as ec2 } from "aws-cdk-lib";

interface ForecastEc2Props {
  vpc: Vpc;
  goldBucket: Bucket;
  dataAssetsBucket: Bucket;
  forecastTable: Table;
}

export class ForecastEc2Instance extends Construct {
  public readonly instance: Instance;

  constructor(scope: Construct, id: string, props: ForecastEc2Props) {
    super(scope, id);

    const { vpc, goldBucket, dataAssetsBucket, forecastTable } = props;

    const role = new Role(this, "ForecastEC2Role", {
      assumedBy: new ServicePrincipal("ec2.amazonaws.com"),
      managedPolicies: [
        ManagedPolicy.fromAwsManagedPolicyName("CloudWatchAgentServerPolicy"),
        ManagedPolicy.fromAwsManagedPolicyName("AmazonS3ReadOnlyAccess"),
        ManagedPolicy.fromAwsManagedPolicyName("AmazonDynamoDBFullAccess"),
      ],
    });

    role.addToPolicy(
      new PolicyStatement({
        actions: ["bedrock:InvokeModel"],
        resources: ["*"],
      })
    );

    role.addToPolicy(
      new PolicyStatement({
        actions: ["ec2:TerminateInstances"],
        resources: ["*"],
        conditions: {
          StringEquals: {
            "ec2:ResourceTag/Name": "ForecastEC2",
          },
        },
      })
    );

    const securityGroup = new SecurityGroup(this, "ForecastEC2SG", {
      vpc,
      description: "Allow EC2 to access S3/Bedrock/DynamoDB",
      allowAllOutbound: true,
    });

    securityGroup.addIngressRule(
      Peer.anyIpv4(),
      Port.tcp(22),
      "Allow SSH from anywhere"
    );

    const userData = ec2.UserData.forLinux();
    userData.addCommands(
      "sudo yum update -y",
      "sudo yum install -y python3 pip -y",
      "pip3 install boto3 pandas pyarrow",

      "cd /home/ec2-user",

      `curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"`,
      "unzip awscliv2.zip",
      "sudo ./aws/install --update",

      // Export environment variables to .bashrc or directly
      `echo 'export GOLD_BUCKET=${goldBucket.bucketName}' >> /etc/profile`,
      `echo 'export FORECAST_TABLE=${forecastTable.tableName}' >> /etc/profile`,
      "source /etc/profile",

      `aws s3 cp s3://${dataAssetsBucket.bucketName}/scripts/inventory_forecaster.py .`,
      "python3 ./inventory_forecaster.py",

      "shutdown now -h"
    );

    this.instance = new Instance(this, "ForecastEC2", {
      instanceName: "ForecastEC2",
      instanceType: InstanceType.of(InstanceClass.T3, InstanceSize.MEDIUM),
      machineImage: MachineImage.latestAmazonLinux2023(),
      vpc,
      securityGroup,
      role,
      userData,
      associatePublicIpAddress: true,
      vpcSubnets: {
        subnetType: ec2.SubnetType.PUBLIC,
      },
    });
  }
}

In your retail-ai-insights-stack.ts, import the construct:

import { ForecastEc2Instance } from "./constructs/common/compute/forecast-instance";

And instantiate it like this (as you already had):

const forecastInstance = new ForecastEc2Instance(this, "ForecastingEc2", {
  vpc: vpc,
  goldBucket,
  dataAssetsBucket,
  forecastTable: dynamoConstruct.inventoryTable,
});

Deploying the Forecasting Infrastructure

Once the ForecastEc2Instance construct is in place, don't do cdk deploy for now.

Go to AWS Console, and search for Amazon Bedrock, click on the Foundation Models > Model Catalog, on the Providers tab check mark the Anthropic

Look for the Claude 3.7 Sonnet, in my case it's on the first row, third column, click on it

In the next screen, you'll see Available to request

Click on, Request model access

Now, click for Enable specific models, and search for Claude 3.7 Sonnet, check mark it, and at the very bottom click on Next, and provide random details

After a little while, you should see Access Granted

Now that we've access to Bedrock model on our account, we should be good to deploy our resources with:

cdk deploy

Verifying the Results

Wait for the EC2 instance status to become "running" in the EC2 console.

Then watch it auto-terminate after the script completes execution.

Now head over to DynamoDB > Explore Items, and check your table.
You’ll see that the forecasted_demand field has been updated for four products (to save cost by avoidings extra calls to Bedrock).

Step 2 - Automating Forecasting with EventBridge & Lambda

In a real-world scenario, you wouldn’t manually trigger forecasting jobs. Instead, you’d want these predictions to run nightly, every 24 hours, and only spin up compute when needed to save cost.

To do that, we’ll use:

Amazon EventBridge to define a scheduled rule (runs every night at 1:00 AM)
AWS Lambda to start our EC2 forecasting instance
EC2 itself terminates automatically after completing the prediction job

Make a new file

mkdir -p lib/constructs/events
touch lib/constructs/events/schedule-ec2-task.ts

Paste the code that provisions the Lambda, gives it permission to start the EC2 instance, and wires it into the EventBridge rule.

import { Construct } from "constructs";
import * as cdk from "aws-cdk-lib";
import { aws_lambda as lambda, Duration } from "aws-cdk-lib";
import { Rule, Schedule } from "aws-cdk-lib/aws-events";
import { LambdaFunction } from "aws-cdk-lib/aws-events-targets";
import { PolicyStatement } from "aws-cdk-lib/aws-iam";
import {
  NodejsFunction,
  NodejsFunctionProps,
} from "aws-cdk-lib/aws-lambda-nodejs";
import path from "path";

export class ScheduleForecastTask extends Construct {
  constructor(scope: Construct, id: string, instanceId: string) {
    super(scope, id);

    const startInstanceLambdaProps: NodejsFunctionProps = {
      functionName: "StartInstanceLambda",
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: "handler",
      memorySize: 128,
      entry: path.join(__dirname, "../../../lambda/start-instance/index.js"),
      timeout: cdk.Duration.seconds(10),
      environment: {
        INSTANCE_ID: instanceId,
      },
    };

    const startInstanceLambda = new NodejsFunction(
      this,
      "StartInstanceLambda",
      {
        ...startInstanceLambdaProps,
      }
    );

    startInstanceLambda.addToRolePolicy(
      new PolicyStatement({
        actions: ["ec2:StartInstances"],
        resources: [`arn:aws:ec2:*:*:instance/${instanceId}`],
      })
    );

    new Rule(this, "StartInstanceSchedule", {
      schedule: Schedule.cron({ minute: "0", hour: "1" }),
      targets: [new LambdaFunction(startInstanceLambda)],
    });
  }
}

Let's setup the lambda function

Make up a file

mkdir -p lambda/start-instance && touch lambda/start-instance/index.js

And paste the following code to trigger an ec2 instance

import { EC2Client, StartInstancesCommand } from "@aws-sdk/client-ec2";

const ec2 = new EC2Client({});

exports.handler = async (event) => {
  console.info("Start Instance Lambda event", JSON.stringify(event, null, 2));

  const { instanceId } = event;

  try {
    const command = new StartInstancesCommand({
      InstanceIds: [instanceId],
    });

    await ec2.send(command);

    return { message: "Booting Instance command initiated" };
  } catch (error) {
    console.error("Failed to boot up the instance:", error);
    throw error;
  }
};

Finally, back in your retail-ai-insights-stack.ts, import and call the construct like so:

new ScheduleForecastTask(
  this,
  "ScheduleForecastTask",
  forecastInstance.instance.instanceId
);

Once that done, deploy via

cdk deploy

Once deployed, your forecast job will run automatically every night, keeping your product demand predictions fresh and your compute costs optimized.

Wrapping Up Phase 3 – Scalable Forecasting, Zero Waste

With this phase complete, we’ve automated a core business process, demand forecasting, in a way that’s:

AI-driven: Leveraging Amazon Bedrock for predictive insights.
Cost-conscious: EC2 only runs when needed, then shuts down.
Fully automated: Triggered nightly via EventBridge with no manual intervention.
Production-ready: Clean orchestration, secure roles, real-time updates to DynamoDB.

This isn’t just about running a script. It’s about combining compute, AI, storage, and automation into a solution that mirrors how real companies make stocking decisions, every single day, with no human in the loop.

Next up: Let’s use that same user interaction data to drive real-time product recommendations with Amazon Personalize.

Complete Code for the Third Phase

To view the full code for the third phase, checkout the repository on GitHub

🚀 Follow me on LinkedIn for more AWS content!

Hands-On with Amazon Bedrock Agents: Hotel Booking Assistant with Action Groups and Knowledge Bases

Mohsin Sheikhani — Thu, 26 Jun 2025 13:06:25 +0000

In this guide, I’ll walk through how I built a hotel room booking assistant using Amazon Bedrock Agents and AWS Lambda, combining large language models with real business logic.

The goal? Let customers ask questions about hotel rooms, check availability, and book a room, all through natural conversation.

This isn’t just another chatbot. Behind the scenes, it:

Pulls room descriptions from a knowledge base (S3 PDF)
Checks real-time room availability via DynamoDB
Books reservations using a serverless API
And coordinates all of this using Bedrock Agents + Action Groups

Why this project matters

If you're working with hotels, resorts, or any customer-facing business, conversational interfaces are becoming more than a nice-to-have, they're a competitive advantage. Instead of just answering FAQs, this agent can actually act: check, query, and write to your backend systems.

This project is meant to show exactly how that’s possible, and how far Bedrock Agents have come.

We’ll create:

An Amazon Bedrock Agent powered by Claude 3.5 Sonnet
A Knowledge Base (PDF stored in S3) describing room types
Two Action Groups (backed by Lambda + OpenAPI):
- One to check room availability
- Another to book a reservation
A couple of DynamoDB tables for storing room and booking data
And a simple walkthrough to connect it all together

Let’s get into the build.

Head over to the Amazon Bedrock console:

Start by creating a new agent, give it a name and a short description , like so:

Once the agent is created, it's time to choose the foundation model it will use to generate responses. Click on Select Model

Choose Anthropic as the provider and select the Claude 3.5 Sonnet model, then click Apply.

If you don’t have access to this model yet, go to Model Access in the Bedrock console and request access for Claude 3.5 Sonnet.

You should now see the selected model listed in the Agent Builder:

Now, time for us to fill the Instruction for Agent input box. Scroll down to the Instruction for Agent section. copy the instruction from this GitHub Link, and paste it into the instruction input box:

Next, expand Additional Settings and make sure User Input is enabled, this way the agent can ask clarification question from user to make a correct decision when needed.

Now scroll back to the top, click Save, and then Prepare:

Time for a quick test:

Notice how the agent uses the instructions to guide its answers.

Clone the CDK Repository

Now let’s set up the backend infrastructure:

git clone https://github.com/mohsinsheikhani/bedrock-hotel-agent
cd bedrock-hotel-agent
npm install
cdk deploy

This will provision:

2 Lambda functions (check & book availability)
2 DynamoDB tables
1 S3 bucket (to store the knowledge base PDF)

To add sample room availability to the HotelRoomAvailabilityTable, run the following scriptL

python3 ./scripts/insert-to-room-availability.py

To verify, go to the DynamoDB Console, open the HotelRoomAvailabilityTable, and click Explore Items:

Add Domain Knowledge Using Bedrock Knowledge Base

Right now, the agent knows how to help, thanks to the instructions we gave it, but it still doesn’t know what types of rooms exist or what amenities each offers.

Let’s fix that.

We’ll give the agent real hotel knowledge using Amazon Bedrock Knowledge Bases, backed by an S3-hosted PDF.

This is where Retrieval-Augmented Generation (RAG) comes into play. Instead of stuffing everything into the prompt, the model can now pull specific answers directly from documents, structured or unstructured, at runtime.

Start by uploading the Hilton-Portfolio.pdf file (included in the repo) to the S3 bucket we provisioned with CDK: agent-kb-assets.

Next, we’ll create a knowledge base and wire it to our agent so it can pull context from this PDF when answering questions.

Head to the Amazon Bedrock Console, Look for Knowledge Bases underneath Builder tools, click on it.

Click Create and choose the Knowledge base with vector store option:

Fill up the Knowledge Base details with a name, and choose S3 as the Data Source

Next step, is to configure the Data Source and point to the S3 path where your PDF lives:

For Embeddings, select Amazon Titan. For Vector Store, choose Amazon OpenSearch Serverless

This setup gives you serverless RAG with native AWS services:

Click Next, review your configuration, then click Create.

Once it’s created, open your Knowledge Base and click Sync to begin parsing the document and storing the chunks for semantic search:

Return to the Agent Builder, scroll down to the Knowledge bases section:

Click Add knowledge base, select the one you just created, and fill in the Instruction box with:

As an agent route any question by the user related to room type, room amenities, room description, hotel location to the knowledge bases.

Click Add, and you’ll see it appear in the list:

Then, click Save, and Prepare your agent:

Now give the agent a prompt like:

What amenities are included in Embassy Suites by Hilton?

The response should include real answers from the document you uploaded.
But what’s even cooler?
Scroll down and open the Orchestration trace. You’ll see the agent actively calling the Knowledge Base behind the scenes:

This confirms our RAG setup is working, the agent is now retrieval-aware and can ground its responses in real business content.

Clean Up the Knowledge Base (To Avoid Charges)

Amazon Bedrock Knowledge Bases can incur ongoing charges, especially due to the underlying OpenSearch collection.
At this point, if you're just experimenting or done testing:

Go back to Agent Builder and remove the Knowledge Base from your agent.
Then, head to Builder Tools → Knowledge Bases and delete the Knowledge Base itself.
Finally, check the OpenSearch Service dashboard. If there's a collection still running, delete it too.

This will ensure you’re not billed unnecessarily going forward.

Action Group 01 - Room Availability Checks

Now let’s move beyond answering questions and into actions.

Action Groups are just backend utilities which the Bedrock Agent uses to call external code on our behalf.

We'll start with a simple but critical one:
Checking if a specific room is available for a customer’s requested dates.

Go to your Agent Builder and scroll to Action groups.

Click Add, give your Action Group a name, and configure it as follows:

Action group type: Define with API schema
Lambda function: Select the existing one: RoomAvailabilityHandler

For the API Schema, choose Define via in-line schema editor.
Then paste the OpenAPI schema from this GitHub Link

Click Create, and you’ll see it appear in your Action Groups list:

Give the Agent Permission to Call the Lambda

By default, the agent can't invoke your Lambda function, we need to explicitly allow it.

Go to the RoomAvailabilityHandler function in Lambda console.
In the Configuration tab, scroll to Permissions > Resource-based policy statements
Click Add permissions

Fill it out like so:

Head back to Agent Builder, and click Save and then Prepare:

Try testing it out:

Can you check the room availability for 2025-12-25?

You’ll see the agent reason through the prompt and call your Lambda:

And in the tracing panel, you'll notice that the agent invoked your Action Group exactly when it needed to:

That wraps up our first Action Group i.e. Room availability.

Action Group 02 - Room Booking

Now that our agent can check availability, it’s time to let it book a room when the user is ready.

We’ll create a second Action Group that invokes a Lambda function to store booking details in DynamoDB.

Create another Action Group choosing the same steps i.e.:

Just like before, go to Agent Builder > Action groups and click Add.

Configure it with:

Action group type: Define with API schema
Lambda function: HotelRoomBookingHandler
API Schema: Select in-line schema editor and paste the OpenAPI schema from this GitHub Link

Click Create, and you’ll see it appear in your Action Groups list:

Give the Agent Permission to Call the Lambda

Your agent can’t call this function until you give it explicit permission.

Go to the HotelRoomBookingHandler function in Lambda console.
In the Configuration tab, scroll to Permissions > Resource-based policy statements
Click Add permissions

Use the same settings as shown:

Back in Agent Builder, click Save and then Prepare to update the agent.

Test the Full Booking Flow

Once the Booking is successful, verify it by going to DynamoDB, click on Explore Items and choose the HotelRoomBookingTable and you should see the entry created via the Bedrock Agent.

And that's it, you’ve now wired up an LLM agent that can check room availability and book hotel stays, powered by real APIs.

A Practical Guide to MLOps on AWS: Transforming Raw Data into AI-Ready Datasets with AWS Glue (Phase 02)

Mohsin Sheikhani — Thu, 12 Jun 2025 11:36:42 +0000

In Phase 01, we built the ingestion layer of our Retail AI Insights system. We streamed historical product interaction data into Amazon S3 (Bronze zone) and stored key product metadata with inventory information in DynamoDB.

Now that we have raw data arriving reliably, it's time to clean, enrich, and organize it for downstream AI workflows.

Objective

Transform raw event data from the Bronze zone into:
Cleaned, analysis-ready Parquet files in the Silver zone
Forecast-specific feature sets in the Gold zone under /forecast_ready/
Recommendation-ready CSV files under /recommendations_ready/

This will power:

Demand forecasting via Amazon Bedrock
Personalized product recommendations using Amazon Personalize

What We'll Build in This Phase

AWS Glue Jobs: Python scripts to clean, transform, and write data to the appropriate S3 zone
AWS Glue Crawlers: Catalog metadata from S3 into tables for Athena & further processing
AWS CDK Stack: Provisions all jobs, buckets, and crawlers
Athena Queries: Run sanity checks on the transformed data

Directory & Bucket Layout

We'll now be working with the following S3 zones:

retail-ai-bronze-zone/ → Raw JSON from Firehose
retail-ai-silver-zone/cleaned_data/ → Cleaned Parquet
retail-ai-gold-zone/forecast_ready/ → Aggregated features for forecasting
retail-ai-gold-zone/recommendations_ready/ → CSV with item metadata for Personalize

You'll also notice a fourth bucket: retail-ai-zone-assets/, this stores scripts, and training dataset.

Step 1 - Creating Glue Resources via CDK

Now that we've set up our storage zones and uploaded the required ETL scripts and datasets, it's time to define the Glue resources with AWS CDK.

We'll create:

3 Glue Jobs
- DataCleaningETLJob → Cleans raw JSON into structured Parquet for the Silver Zone.
- ForecastGoldETLJob → Transforms cleaned data with features for demand prediction.
- RecommendationGoldETLJob → Prepares item metadata CSV for Amazon Personalize.
Four Crawlers
Validate everything with Athena

From the project root, generate the construct file:

mkdir -p lib/constructs/analytics && touch lib/constructs/analytics/glue-resources.ts

Make sure your local scripts/ and dataset/ directories are present, then upload them to your S3 assets bucket:

aws s3 cp ./scripts/sales_etl_script.py s3://retail-ai-zone-assets/scripts/
aws s3 cp ./scripts/forecast_gold_etl_script.py s3://retail-ai-zone-assets/scripts/
aws s3 cp ./scripts/user_interaction_etl_script.py s3://retail-ai-zone-assets/scripts/
aws s3 cp ./dataset/events_with_metadata.csv s3://retail-ai-zone-assets/dataset/
aws s3 cp ./scripts/inventory_forecaster.py s3://retail-ai-zone-assets/scripts/

Define Glue Jobs & Crawlers in CDK

Now, open the lib/constructs/analytics/glue-resources.ts file and define the full CDK logic to create:

A Glue job role with required permissions
The three ETL jobs with their respective scripts
Four crawlers with S3 targets pointing to Bronze, Silver, Forecast, and Recommendation zones

Open the lib/constructs/analytics/glue-resources.ts file, and add the following code:

import { Construct } from "constructs";
import * as cdk from "aws-cdk-lib";

import { Bucket } from "aws-cdk-lib/aws-s3";
import { CfnCrawler, CfnJob, CfnDatabase } from "aws-cdk-lib/aws-glue";
import {
  Role,
  ServicePrincipal,
  ManagedPolicy,
  PolicyStatement,
} from "aws-cdk-lib/aws-iam";

interface GlueProps {
  bronzeBucket: Bucket;
  silverBucket: Bucket;
  goldBucket: Bucket;
  dataAssetsBucket: Bucket;
}

export class GlueResources extends Construct {
  constructor(scope: Construct, id: string, props: GlueProps) {
    super(scope, id);

    const { bronzeBucket, silverBucket, goldBucket, dataAssetsBucket } = props;

    // Glue Database
    const glueDatabase = new CfnDatabase(this, "SalesDatabase", {
      catalogId: cdk.Stack.of(this).account,
      databaseInput: {
        name: "sales_data_db",
      },
    });

    // Create IAM Role for Glue
    const glueRole = new Role(this, "GlueServiceRole", {
      assumedBy: new ServicePrincipal("glue.amazonaws.com"),
    });

    bronzeBucket.grantRead(glueRole);
    silverBucket.grantReadWrite(glueRole);
    goldBucket.grantReadWrite(glueRole);

    glueRole.addToPolicy(
      new PolicyStatement({
        actions: ["s3:GetObject"],
        resources: [`${dataAssetsBucket.bucketArn}/*`],
      })
    );

    glueRole.addManagedPolicy(
      ManagedPolicy.fromAwsManagedPolicyName("service-role/AWSGlueServiceRole")
    );

    // Glue Crawler (for Bronze Bucket)
    new CfnCrawler(this, "DataCrawlerBronze", {
      name: "DataCrawlerBronze",
      role: glueRole.roleArn,
      databaseName: glueDatabase.ref,
      targets: {
        s3Targets: [{ path: bronzeBucket.s3UrlForObject() }],
      },
      tablePrefix: "bronze_",
    });

    // Glue ETL Job
    new CfnJob(this, "DataCleaningETLJob", {
      name: "DataCleaningETLJob",
      role: glueRole.roleArn,
      command: {
        name: "glueetl",
        pythonVersion: "3",
        scriptLocation: dataAssetsBucket.s3UrlForObject(
          "scripts/sales_etl_script.py"
        ),
      },
      defaultArguments: {
        "--TempDir": silverBucket.s3UrlForObject("temp/"),
        "--job-language": "python",
        "--bronze_bucket": bronzeBucket.bucketName,
        "--silver_bucket": silverBucket.bucketName,
      },
      glueVersion: "3.0",
      maxRetries: 0,
      timeout: 10,
      workerType: "Standard",
      numberOfWorkers: 2,
    });

    // Glue Crawler (for Silver Bucket)
    new CfnCrawler(this, "DataCrawlerSilver", {
      name: "DataCrawlerSilver",
      role: glueRole.roleArn,
      databaseName: glueDatabase.ref,
      targets: {
        s3Targets: [
          {
            path: `${silverBucket.s3UrlForObject()}/cleaned_data/`,
          },
        ],
      },
      tablePrefix: "silver_",
    });

    // Glue Crawler (for Gold Bucket)
    new CfnCrawler(this, "DataCrawlerForecast", {
      name: "DataCrawlerForecast",
      role: glueRole.roleArn,
      databaseName: glueDatabase.ref,
      targets: {
        s3Targets: [{ path: `${goldBucket.s3UrlForObject()}/forecast_ready/` }],
      },
      tablePrefix: "gold_",
    });

    // Glue Crawler (for Gold Bucket)
    new CfnCrawler(this, "DataCrawlerRecommendations", {
      name: "DataCrawlerRecommendations",
      role: glueRole.roleArn,
      databaseName: glueDatabase.ref,
      targets: {
        s3Targets: [
          { path: `${goldBucket.s3UrlForObject()}/recommendations_ready/` },
        ],
      },
      tablePrefix: "gold_",
    });

    // Glue ETL Job to output forecast ready dataset
    new CfnJob(this, "ForecastGoldETLJob", {
      name: "ForecastGoldETLJob",
      role: glueRole.roleArn,
      command: {
        name: "glueetl",
        pythonVersion: "3",
        scriptLocation: dataAssetsBucket.s3UrlForObject(
          "scripts/forecast_gold_etl_script.py"
        ),
      },
      defaultArguments: {
        "--TempDir": silverBucket.s3UrlForObject("temp/"),
        "--job-language": "python",
        "--silver_bucket": silverBucket.bucketName,
        "--gold_bucket": goldBucket.bucketName,
      },
      glueVersion: "3.0",
      maxRetries: 0,
      timeout: 10,
      workerType: "Standard",
      numberOfWorkers: 2,
    });

    // Glue ETL Job to output recommendation ready dataset
    new CfnJob(this, "RecommendationGoldETLJob", {
      name: "RecommendationGoldETLJob",
      role: glueRole.roleArn,
      command: {
        name: "glueetl",
        pythonVersion: "3",
        scriptLocation: dataAssetsBucket.s3UrlForObject(
          "scripts/user_interaction_etl_script.py"
        ),
      },
      defaultArguments: {
        "--TempDir": silverBucket.s3UrlForObject("temp/"),
        "--job-language": "python",
        "--silver_bucket": silverBucket.bucketName,
        "--gold_bucket": goldBucket.bucketName,
      },
      glueVersion: "3.0",
      maxRetries: 0,
      timeout: 10,
      workerType: "Standard",
      numberOfWorkers: 2,
    });
  }
}

Wire it up on the retail-ai-insights-stack.ts file

/**
 * Glue ETL Resources
 **/
new GlueResources(this, "GlueResources", {
  bronzeBucket,
  silverBucket,
  goldBucket,
  dataAssetsBucket,
});

Once deployed via cdk deploy:

Navigate to AWS Glue > ETL Jobs - You should see:

Go to AWS Glue > Data Catalog > Crawlers – Ensure four crawlers exist:

Step 2 - Run Glue Jobs to Transform Raw Data

Now that our Glue jobs and crawlers are deployed, let’s walk through how we run the ETL flow across the Bronze, Silver, and Gold zones.

Locate Raw Data in Bronze Bucket

Go to the Amazon S3 Console, open the retail-ai-bronze-zone bucket.
Drill down through the directories until you see the file, note the tree structure, in my case it's dataset/2025/05/26/20
Copy this full prefix path.

Update the ETL Script Input Path

Open the sales_etl_script.py inside VSCode.
On line 36, update the input_path variable to reflect the directory path you just copied:

input_path = f"s3://{bronze_bucket}/dataset/2025/05/26/20/"

Re-upload the modified script to your S3 data-assets bucket:

aws s3 cp ./scripts/sales_etl_script.py s3://retail-ai-zone-assets/scripts/

Because versioning is enabled on the bucket, this will replace the previous file while preserving version history.

Run the ETL Jobs

Now let’s kick off the transformation pipeline:

Run DataCleaningETLJob

Go to AWS Glue Console > ETL Jobs.
Select the DataCleaningETLJob and click Run Job.
This job will:
- Read raw JSON data from the Bronze bucket.
- Clean, cast, and convert it to Parquet.
- Store the results in the retail-ai-silver-zone bucket under cleaned_data/

Once successful, navigate to the retail-ai-silver-zone bucket and confirm:

Run ForecastGoldETLJob

Go to AWS Glue Console > ETL Jobs.
Select the ForecastGoldETLJob and click Run Job.
This job will:
- Read the cleaned data from retail-ai-silver-zone/cleaned_data/
- Aggregate daily sales
- Output the transformed data to retail-ai-gold-zone/forecast_ready/

Once completed, visit the Gold bucket and confirm the forecast files are present in that directory.

Run RecommendationGoldETLJob

Go to AWS Glue Console > ETL Jobs.
Select the RecommendationGoldETLJob and click Run Job.
This job will:
- Read cleaned product data from the Silver zone
- Output only the required item metadata in CSV format
- Save to retail-ai-gold-zone/recommendations_ready/

After the job runs successfully, go to the Gold bucket and verify the structure and CSV file.

Run All Glue Crawlers

Once the Glue crawlers are deployed, you’ll see four of them listed in the Glue Console > Data Catalog > Crawlers:

Select all four crawlers.
Click Run.
Once completed, look at the "Table changes on the last run" column each should say "1 created".

Validate Table Creation

Navigate to Glue Console > Data Catalog > Databases > Tables. You should now see four new tables, each corresponding to a specific zone:

Each table has an automatically inferred schema, including columns like user_id, event_type, timestamp, price, product_name, and more.

Query with Amazon Athena

Now let’s run SQL queries against these tables:

Open the Amazon Athena Console.

If it's your first time, you’ll see a pop-up:

Choose your retail-ai-zone-assets bucket.

Click Save.

Sample Athena Query

In the query editor, trying running simple SQL queries:

Select * from sales_data_db.<TABLE_NAME>

Try this query on the bronze_retail_ai_bronze_zone table:

Select * from sales_data_db.bronze_retail_ai_bronze_zone

Try this query on the silver_cleaned_data table:

Select * from sales_data_db.silver_cleaned_data

Try this query on the gold_forecast_ready table:

Select * from sales_data_db.gold_forecast_ready

Try this query on the gold_recommendations_ready table:

Select * from sales_data_db.gold_recommendations_ready

What You’ve Just Built

In this phase, you've gone beyond basic ETL. You’ve engineered a production-grade data lake with:

Multi-zone architecture (Bronze, Silver, Gold)
Automated ETL pipelines using AWS Glue
Schema discovery and validation through Crawlers
Interactive querying via Amazon Athena

All of this was done infrastructure-as-code first using AWS CDK, with clean separation of storage, processing, and access layers, exactly how real-world cloud data platforms are designed.

But this isn’t just about organizing data. You’re now sitting on a foundation that’s:

AI-ready
Model-friendly
Cost-efficient
And built for scale

What’s Next?

In Phase 3, we’ll unlock this data’s real potential, using Amazon Bedrock to power AI-based demand forecasting, running nightly on an EC2 instance and storing predictions back into our pipeline.

You’ve built the rails, now it’s time to run intelligence through them.

Complete Code for the Second Phase

To view the full code for the second phase, checkout the repository on GitHub

🚀 Follow me on LinkedIn for more AWS content!

A Practical Guide to MLOps on AWS: Streaming Data Ingestion with Kinesis Firehose (Phase 01)

Mohsin Sheikhani — Sat, 07 Jun 2025 15:25:55 +0000

The Big Picture: Why This Project Matters

In the age of AI-driven decisions, retail businesses are sitting on a goldmine of customer interaction data, but most are struggling to use it effectively.

Imagine this:

A customer browses your store but leaves without buying.
You don't know what caught their eye.
You don’t know what’s likely to sell tomorrow.
You’re restocking based on gut feel, not data.

This is the reality for many retailers.

The goal of this project is to build a cloud-native, AI-enhanced retail analytics platform that solves two critical business problems:

What do you think we should stock next? → Predict demand using historical data and forecast which products need restocking.
What should we recommend? → Use customer behavior to serve personalized product suggestions at runtime.

And we want to achieve this without maintaining complex infrastructure.

The Solution: Real-Time, AI-Driven Retail Intelligence

We'll build a modern data pipeline with these pillars:

Data Lake Architecture: S3-based Bronze → Silver → Gold zones for raw, cleaned, and model-ready data.
AI at the Core: Bedrock for forecasting, Personalize for recommendations.
Real-Time & Batch: Lambda for on-demand actions, EC2 for nightly forecasting.
MLOps Foundation: Glue for ETL, EventBridge for orchestration, DynamoDB for fast lookup.

This is a full-stack, cloud-native project that covers everything from data ingestion to AI inference.

Phase 01: Ingesting Events Like a Real-Time System

Before anything else, we need data.

In this phase, we simulate a real-time ingestion pipeline by streaming historical customer interaction data into our system using Kinesis Firehose and Python.

What We’re Solving in Phase 1

To build useful AI models, we first need user behavior data. This includes:

Product views
Add-to-cart actions
Purchases
Timestamps and metadata

But unlike traditional ETL pipelines, we want to simulate a real-time data flow, so that our system behaves like it would in production, even while testing locally.

How We’re Building It

Here’s what we’re doing in this phase:

Stream JSON events to Kinesis Firehose, which writes them to an S3 Bronze bucket.
Simultaneously write selected fields to DynamoDB to store product metadata and inventory.
Set the foundation for AI-driven recommendations and forecasting later on.

Let’s Build It

We’ll now walk through building this ingestion pipeline using AWS CDK, starting with:

AWS CDK project setup
Creating a DynamoDB table to store product info and initial inventory
Creating the Base VPC
Creating the S3 Buckets (bronze, silver, gold zones)
Setting up Kinesis Firehose to deliver raw data to S3
Writing Python scripts to simulate streaming events and populating DynamoDB

Step 1 - Setting Up the AWS CDK Project

To build this infrastructure in a clean, scalable way, we’re using AWS CDK (Cloud Development Kit). It allows us to define cloud infrastructure using familiar programming languages, in our case, TypeScript.

We’ll organize everything using a modular folder structure that separates shared resources, analytics components, and common storage logic.

Initialize the CDK Project

We start by creating and bootstrapping a CDK app:

mkdir retail-ai-insights && cd retail-ai-insights
cdk init app --language=typescript

This creates a basic CDK project with boilerplate files like cdk.json, tsconfig.json, and a lib/ directory to organize stacks.

Organize Your Constructs

Let’s build a modular file structure right from the start. Inside the lib/ directory, run the following one by one:

cd lib/

# Shared Networking (VPC, Subnets, etc.)
mkdir -p constructs/shared/networking && touch constructs/shared/networking/vpc.ts

# Analytics Pipeline (Firehose, Glue, etc.)
mkdir -p constructs/analytics && touch constructs/analytics/firehose-stream.ts

# Common Storage (S3 Buckets, DynamoDB)
mkdir -p constructs/common/storage
touch constructs/common/storage/dynamodb-inventory.ts
touch constructs/common/storage/s3-bucket-factory.ts

Once done, your project should look like this:

lib/
├── constructs/
│   ├── shared/
│   │   └── networking/
│   │       └── vpc.ts
│   ├── analytics/
│   │   └── firehose-stream.ts
│   └── common/
│       └── storage/
│           ├── dynamodb-inventory.ts
│           └── s3-bucket-factory.ts
└── retail-ai-insights-stack.ts

Step 2 - Creating a DynamoDB Table for Product Inventory

Now that our CDK project is set up, let’s provision a DynamoDB table that will store:

Product metadata (product_id, product_name, etc.)
Current stock levels
Forecasted demand (to be updated later)

This table will power real-time lookups during both recommendation generation and inventory management phases.

Open lib/constructs/common/storage/dynamodb-inventory.ts and add the following:

import { Construct } from "constructs";

import * as dynamodb from "aws-cdk-lib/aws-dynamodb";
import * as cdk from "aws-cdk-lib";

export class DynamoDBInventory extends Construct {
  public readonly inventoryTable: dynamodb.Table;

  constructor(scope: Construct, id: string) {
    super(scope, id);

    this.inventoryTable = new dynamodb.Table(this, "RetailInventoryTable", {
      tableName: "RetailInventoryTable",
      partitionKey: { name: "product_id", type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      encryption: dynamodb.TableEncryption.AWS_MANAGED,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });
  }
}

Now open lib/retail-ai-insights-stack.ts and use the construct like this:

/**
* Retail Inventory Table
**/
const dynamoConstruct = new DynamoDBInventory(this, "DynamoDBInventory");

To see the output, run

cdk deploy

Navigate to the DynamoDB console

This completes our DynamoDB setup, giving us a real-time-accessible source of truth for product stock, prices, and metadata.

Step 3 - Creating the Base VPC

We’ll keep it minimal and efficient by using:

1 Availability Zones
Public Subnets (for instance bootstrap, like downloading packages)
Private Subnets with Egress (for EC2 forecasting to access Bedrock or S3 securely)

Open lib/constructs/shared/networking/vpc.ts and add:

import { Construct } from "constructs";
import { Vpc, SubnetType, NatProvider } from "aws-cdk-lib/aws-ec2";
import { StackProps } from "aws-cdk-lib";

export interface VpcResourceProps extends StackProps {
  maxAzs?: number;
}

export class VpcResource extends Construct {
  public readonly vpc: Vpc;

  constructor(scope: Construct, id: string, props: VpcResourceProps) {
    super(scope, id);

    this.vpc = new Vpc(this, "RetailForecastVpc", {
      vpcName: "RetailAIVPC",
      maxAzs: props.maxAzs ?? 1,
      natGatewayProvider: NatProvider.gateway(),
      natGateways: 0,
      subnetConfiguration: [
        {
          name: "PublicSubnet",
          subnetType: SubnetType.PUBLIC,
          cidrMask: 24,
        },
        {
          name: "PrivateSubnet",
          subnetType: SubnetType.PRIVATE_WITH_EGRESS,
          cidrMask: 24,
        },
      ],
    });
  }
}

In lib/retail-ai-insights-stack.ts, add:

/**
* VPC Setup
**/
const { vpc } = new VpcResource(this, "RetailVpc", {});

Let's go run the

cdk deploy

Once done, navigate to the VPC console

With this, we now have an isolated network environment to run our compute workloads with secure access to AWS services (via VPC endpoints, later on).

Step 4 - Creating the Storage Foundation (S3 Buckets)

A production-grade data lake architecture often follows a multi-zone strategy to maintain a clean separation of data states:

Bronze: Raw data as ingested (e.g., event streams)
Silver: Cleaned, filtered, and enriched data
Gold: Aggregated or transformed data that's ready for ML/AI consumption

In our case, we'll also include a zone-assets bucket to store static datasets and job scripts that need to be referenced during ETL jobs.

We’re creating multiple S3 buckets with similar configurations (like versioning, encryption, SSL-only access, and auto-deletion in dev environments), rather than duplicating logic, we’ll use a factory construct that makes it reusable and DRY.

Open the file:
lib/constructs/common/storage/s3-bucket-factory.ts

And update it with the following code:

import { Construct } from "constructs";
import {
  Bucket,
  BlockPublicAccess,
  BucketEncryption,
} from "aws-cdk-lib/aws-s3";
import * as cdk from "aws-cdk-lib";

interface CustomS3BucketProps {
  bucketName: string;
}

export class S3BucketFactory extends Construct {
  public readonly bucket: Bucket;

  constructor(scope: Construct, id: string, props: CustomS3BucketProps) {
    super(scope, id);

    const { bucketName } = props;

    this.bucket = new Bucket(this, "S3Bucket", {
      bucketName,
      versioned: true,
      enforceSSL: true,
      autoDeleteObjects: true,
      blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
      encryption: BucketEncryption.S3_MANAGED,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });
  }
}

Now in your retail-ai-insights-stack.ts, instantiate the factory construct like this:

/**
* Multi Zone Bucket
**/
const { bucket: bronzeBucket } = new S3BucketFactory(
  this,
  "BronzeDataLakeBucket",
  {
    bucketName: "retail-ai-bronze-zone",
  }
);

const { bucket: silverBucket } = new S3BucketFactory(
  this,
  "SilverDataLakeBucket",
  {
    bucketName: "retail-ai-silver-zone",
  }
);

const { bucket: goldBucket } = new S3BucketFactory(this, "GoldDataBucket", {
  bucketName: "retail-ai-gold-zone",
});

const { bucket: dataAssetsBucket } = new S3BucketFactory(
  this,
  "DataAssets",
  {
    bucketName: "retail-ai-zone-assets",
  }
);

Again, let's do

cdk deploy

Verify the output by going to the S3 Bucket Console, you should see four of the buckets

This sets up the core storage foundation that the rest of our ETL, forecasting, and recommendation workflows will depend on.

Step 5 - Setting up Kinesis Firehose to deliver raw data to S3

With our Bronze bucket in place, we’re now ready to stream raw event data into it using Amazon Kinesis Data Firehose. Firehose is a fully managed service for delivering real-time streaming data directly to destinations like Amazon S3.

In our architecture, it enables our ingestion pipeline by capturing event streams and persisting them as raw JSON files in the Bronze zone.

Oepn the file:
lib/constructs/analytics/firehose-stream.ts

Add the following code:

import { Construct } from "constructs";

import { CfnDeliveryStream } from "aws-cdk-lib/aws-kinesisfirehose";
import { PolicyStatement, Role, ServicePrincipal } from "aws-cdk-lib/aws-iam";
import { Bucket } from "aws-cdk-lib/aws-s3";
import { LogGroup, LogStream } from "aws-cdk-lib/aws-logs";

interface FirehoseProps {
  destinationBucket: Bucket;
}

export class FirehoseToS3 extends Construct {
  public readonly deliveryStream: CfnDeliveryStream;

  constructor(scope: Construct, id: string, props: FirehoseProps) {
    super(scope, id);

    const logGroup = new LogGroup(this, "FirehoseLogGroup");
    const logStream = new LogStream(this, "FirehoseLogStream", {
      logGroup,
    });

    // IAM Role for Firehose to access S3
    const firehoseRole = new Role(this, "FirehoseRole", {
      assumedBy: new ServicePrincipal("firehose.amazonaws.com"),
    });

    props.destinationBucket.grantWrite(firehoseRole);

    firehoseRole.addToPolicy(
      new PolicyStatement({
        actions: [
          "logs:PutLogEvents",
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
        ],
        resources: [logGroup.logGroupArn],
      })
    );

    // Firehose Delivery Stream
    this.deliveryStream = new CfnDeliveryStream(this, "DatasetFirehose", {
      deliveryStreamName: "firehose-to-s3",
      deliveryStreamType: "DirectPut",
      s3DestinationConfiguration: {
        bucketArn: props.destinationBucket.bucketArn,
        roleArn: firehoseRole.roleArn,
        prefix: "dataset/",
        bufferingHints: {
          intervalInSeconds: 60,
          sizeInMBs: 5,
        },
        compressionFormat: "UNCOMPRESSED",
        cloudWatchLoggingOptions: {
          enabled: true,
          logGroupName: logGroup.logGroupName,
          logStreamName: logStream.logStreamName,
        },
      },
    });
  }
}

Open lib/retail-ai-insights-stack.ts and wire it up:

/**
* Firehose Stream
**/
new FirehoseToS3(this, "FirehoseToS3", {
  destinationBucket: bronzeBucket,
});

Deploy and verify the output, this time go to the Kinesis Firehose console

cdk deploy

This completes our infrastructure deployment for our first phase. data.

Step 6 - Simulating Real-Time Events Using Python Scripts

Now that the infrastructure is in place, let’s simulate user activity by streaming mock sales data into our Firehose delivery stream and storing essential product metadata in DynamoDB.

What We’re Doing

Sending user interaction events (like purchases, product views) into the Bronze zone via Firehose.
Writing product-level information (with inventory and forecasted demand fields) directly into DynamoDB.

This gives us two distinct but complementary data flows: one for historical processing (S3), and one for operational lookups (DynamoDB).

Running the Simulation Scripts

Clone the scripts directory directly into your project root:

https://github.com/mohsinsheikhani/retail-ai-insights/tree/main/scripts

1. Stream to Firehose

This script sends historical user events to Firehose in batches (simulating real-time behavior):

python3 ./scripts/stream_to_firehose.py --stream-name firehose-to-s3

Once it’s running:

Open the Kinesis Firehose Console
Select the firehose-to-s3 stream
Scroll down to the monitoring tab
You’ll start to see metrics update (incoming bytes, delivery success, etc.)

Then check your Bronze S3 bucket via the S3 Console:

Navigate into the bucket named retail-ai-bronze-data
You should see a new folder under dataset/ containing your streamed JSON records.

2. Populate DynamoDB

This script writes a subset of product info to DynamoDB:

python3 ./scripts/write_to_dynamodb.py

It sends fields like:

product_id, product_name, category, price, rating And auto-generates:
current_stock (random between 30–70)
forecasted_demand (initially set to 0)

Once the script runs:

Open DynamoDB Console
Click Explore Items
You’ll see all your ingested product records appear in the RetailInventoryTable.

This completes our Phase 1: Ingestion Layer setup, where we now have:

Realistic product interaction data flowing into S3
Fast-access inventory data in DynamoDB

In the next phase, we’ll build our ETL pipeline using AWS Glue to clean and enrich this data. Stay tuned!

Complete Code for the First Phase

To view the full code for the first phase, checkout the repository on GitHub

🚀 Follow me on LinkedIn for more AWS content!