DEV Community: Girish Mukim

BeSA Batch 09 Week6 - Supercharge Development with Kiro | Build Your AI-Enhanced SA Practice

Girish Mukim — Sat, 28 Mar 2026 22:33:47 +0000

BeSA Batch 09 – Week 6
AI-Driven Development with Kiro and Building an AI-Enhanced SA Practice

Disclaimer:
These notes were drafted using AI for clarity, structure, and readability. They are intended solely for learning purposes.

These are the structured notes from Week 6, focused only on the two role plays. Writing this as a quick revision for those who attended the session and a concise recap for anyone who couldn’t make it.

Role Play 1 – Supercharging Development with Kiro

Context

This conversation focused on AI-driven software development and how tools like Kiro are changing the way applications are built—from idea to production.

Getting Started with AI-Driven Development

To build AI-powered applications, a few foundational components are required:

Infrastructure

Compute layer to run AI workloads
Can include specialized AI hardware options

Foundation models

Serve as the “brain” of the system
Accessed through managed services

Supporting services

Orchestration
Memory
Knowledge bases
Security

Once these are in place, developers can start building applications using tools like Kiro or other AI-assisted development environments.

Traditional vs AI-Optimized Infrastructure

It was clarified that traditional compute (like EC2) can still be used.

However, optimized AI infrastructure provides:

Better performance
Lower cost
Improved scalability
More specialized capabilities

The shift is toward purpose-built AI compute rather than general-purpose infrastructure.

Evolution of AI Development Tools

The progression of AI-assisted development was explained in three stages:

Autocomplete phase (around 2023)

Basic code suggestions
Similar to enhanced IntelliSense

Assistant phase (around 2024)

Interactive AI assistants
Developers could ask questions and get help

Agentic development phase (current)

AI agents actively participate in development
Assist in planning, design, and execution

This represents a shift from assistance → collaboration → partial autonomy.

Benefits of AI-Driven Development

Key benefits highlighted:

Faster time to market

Rapid creation of MVPs
Faster experimentation

Increased productivity

Developers can focus on higher-level problems
AI handles repetitive tasks

Developer autonomy

Agents can take decisions and execute tasks
Developers and agents work together

Improved code quality (with proper guidance)

Structured workflows can lead to better outcomes

Challenges to Be Aware Of

Some limitations and risks were also discussed:

Scaling AI development

No centralized way to manage all knowledge and context

Black box behavior

Limited visibility into how outputs are generated

Control limitations

Hard to fully control agent behavior

Code quality concerns

Outputs need validation

These challenges reinforce the need for structured approaches.

Vibe Coding vs Structured Development

A distinction was made between quick coding approaches and structured development.

Vibe coding

Fast, iterative, “build as you go” approach
Useful for:
- Small UI changes
- Quick fixes
- Simple enhancements

Limitations:

Not scalable
Not suitable for complex systems

Structured (spec-driven) development

Starts with requirements and design
Follows a defined process
Suitable for production systems

Key idea:

Use vibe coding for prototyping
Use structured development for production

Spec-Driven Development with Kiro

Kiro introduces a specification-driven workflow.

Flow:

Requirement → Design → Tasks → Implementation

Instead of jumping directly into coding, the process ensures:

Clear requirements
Defined architecture
Predictable execution

This reduces ambiguity and improves scalability.

Core Concepts in Kiro

Specification

Defines what needs to be built
Acts as a contract

Design

Defines how the system will be built

Tasks

Break down implementation steps

Steering files

Guide the agent during development
Act like constraints or guardrails

Key insight:

Clear specifications reduce back-and-forth with AI
Provide better control over outcomes

Agentic Development Environment

Kiro acts as an agentic development environment where:

AI understands relationships between files
Changes propagate across requirements, design, and code
Context is maintained across the system

This enables:

Consistency
Better traceability
Faster iteration

Pricing Model

The pricing model consists of:

Subscription cost
Usage-based cost (based on model usage/credits)

Higher-capability models cost more, while lighter models are cheaper.

Role Play 2 – Building Your AI-Enhanced SA Practice

Context

This conversation focused on how solutions architects can systematically integrate AI into their daily workflow and build a consistent AI-enabled practice.

From Occasional Use to Systematic Practice

A key shift discussed:

From:

Using AI occasionally

To:

Building a consistent AI-driven workflow

This transition enables compounding benefits over time.

AI Toolkit Categories

A structured toolkit approach was recommended.

Four categories:

General-purpose AI tools

Used for broad tasks like research and drafting

Development tools

Integrated into IDEs
Used for coding and architecture generation

Specialized tools

Example: diagram generation tools
Convert text into structured outputs

Enterprise AI tools

Used for sensitive data
Include governance and compliance controls

Choosing the right tool depends on:

Task type
Data sensitivity
Context

Daily Workflow Integration

A structured daily workflow was outlined.

Morning briefing

Provide AI with context for the day
Set expectations

Pre-meeting preparation

Research customer
Understand industry and tech stack

During engagement

Quick lookups
Cost comparisons
Service evaluations

Post-meeting analysis

Paste notes into AI
Extract:
- Requirements
- Risks
- Action items

Periodic review

Identify patterns across engagements
Improve approach over time

This creates a continuous feedback loop.

Prompt Library

A key practice is maintaining a reusable prompt library.

Common categories:

Discovery questions
Architecture generation
Cost analysis
Security reviews
Documentation templates

Benefits:

Saves time
Standardizes quality
Enables team-wide improvement

End-to-End Use of AI in Engagements

Example workflow for a new customer:

Discovery phase

AI-assisted research
Faster understanding of customer context

Post-call analysis

Extract structured insights from notes

Design phase

Generate multiple architecture options

Refinement phase

Apply human judgment
Adapt to real constraints

AI typically gets the solution:

60–70% complete

Human expertise takes it to:

90%+ completeness

Where Human Judgment Matters

Three key areas where AI cannot replace architects:

Understanding human context

Team dynamics
Risk tolerance
Organizational constraints

Making trade-offs

Decisions under uncertainty
Balancing competing priorities

Communication strategy

Tailoring message for stakeholders
Deciding what to present and how

Key idea:
AI can generate content, but cannot decide strategy.

Responsible Use of AI

Guidelines for safe usage:

Avoid sharing sensitive or regulated data in public tools
Use anonymization where required
Prefer enterprise-approved tools for customer data

Three key checks:

Is the data public?
Is it customer-identifiable?
Is it regulated?

Transparency with Customers

Best practice:

Be transparent about using AI
Position it as part of your workflow
Emphasize validation and judgment

Focus should remain on:

Quality of recommendations
Customer outcomes

Action Plan

Simple steps to get started:

Pick one workflow and integrate AI
Build a prompt library
Engage with a learning community

Mindset Shift

Final takeaway:

AI does not replace core strengths of an architect.

Key differentiators remain:

Judgment
Customer relationships
Ability to synthesize complexity

AI amplifies these strengths rather than replacing them.

Week 6 Consolidated Takeaways

From the first role play:
AI-driven development is evolving toward agentic environments where structured, specification-driven workflows improve quality, scalability, and speed.

From the second role play:
Building a consistent AI-enabled workflow is critical for architects, with human judgment remaining the key differentiator in decision-making and communication.

This final week brought together two important themes:

How AI transforms software development workflows
How professionals evolve their practices to effectively leverage AI

This concludes the 6-week learning series for BeSA Batch 09.

BeSA Batch 09 Week5 - Model Context Protocol in Practice and AI‑Powered Solution Validation

Girish Mukim — Mon, 23 Mar 2026 22:15:35 +0000

Week 5 - Model Context Protocol in Practice and AI‑Powered Solution Validation
Batch 09 – BeSA Cloud Academy

Disclaimer:
These notes were drafted using AI for clarity, structure, and readability. They are intended solely for learning purposes.

These are the structured notes from Week 5, focused only on the two role plays. Writing this as a quick revision for those who attended the session and a concise recap for anyone who couldn’t make it.

Role Play 1 – Understanding Model Context Protocol (MCP)

Context

This conversation focused on understanding MCP (Model Context Protocol) and why it is becoming a foundational concept in agentic AI architectures.

Why MCP Matters

When building AI agents that need to perform real actions, a key challenge is connecting the agent to external systems.

Examples include:

Querying databases
Creating tickets
Accessing APIs
Checking inventory

The core question becomes:
How does the agent connect to all these tools in a scalable and standardized way?

MCP addresses this problem.

What is MCP

MCP (Model Context Protocol) is an open protocol that standardizes how AI applications:

Provide context to LLMs
Enable access to tools and external systems

Key idea:
Instead of building custom integrations for every combination of client and tool, MCP provides a standard interface.

Analogy used:

Like a universal connector (similar to USB-C).
Without MCP → every integration is custom.
With MCP → one standard interface works everywhere.

From Custom Integrations to Standardization

Before MCP:

Each AI client needed custom integration with each tool.
This results in an N × M problem.

With MCP:

Tools are built once as MCP servers.
Any client can use them.
This reduces complexity to an N + M model.

Key benefits:

Reduced engineering effort
Better scalability
Improved maintainability
Strong alignment with architectural best practices

Alignment with Architecture Principles

MCP aligns well with key architecture principles such as:

Designed for change
Modular scalability
Standardization
Operational efficiency

It also integrates easily with:

External data sources
APIs
Enterprise systems

Core MCP Components

The architecture was explained using a simple analogy.

Host

Acts as the control plane (e.g., Amazon Bedrock).
Responsible for orchestrating communication.

MCP Clients

Interact with the host.
Represent the interface layer.

MCP Servers

Provide specialized capabilities.
Each server is responsible for a specific function (e.g., database access, ticketing).

The host mediates all communication between the model and MCP servers.

Core Concepts for Agents

Three key concepts were highlighted:

Tools

Functions the agent can invoke.
Example: lookup employee, create ticket.
Tool descriptions are critical because they guide the agent’s decision-making.

Resources

Data or documents the agent can read.
Example: policy documents, knowledge bases.

Prompts

Predefined templates for structured interaction.
Help users initiate consistent workflows.

Summary:

Tools → actions
Resources → context
Prompts → structured interaction

Request Flow

The typical flow works as follows:

User input → Host → LLM (with tool descriptions)
→ LLM selects tool → Host routes request to MCP Server
→ Server returns data → LLM generates response

Important point:

The LLM does not directly communicate with MCP servers.
The host acts as a control layer for security and governance.

Communication Methods

Two communication mechanisms were discussed:

stdio

Used for local processes on the same machine.

HTTP with Server-Sent Events

Used for remote and enterprise-grade integrations.

Multi-Tool Orchestration

Agents can use multiple tools in sequence.

Example scenario:

Check weather
Find calendar availability
Schedule an event

The LLM handles reasoning and dependency between steps.

Building MCP-Based Systems

Typical steps involved:

Create MCP server
Register tools with clear descriptions
Implement error handling
Perform testing (unit, integration, natural language)
Enable logging and debugging for traceability

Production Patterns on AWS

Recommended approach:

Lambda-backed MCP servers

Serverless
Scalable
Cost-efficient
Event-driven

For long-running workflows:

Use Bedrock Agent Core

Supports long sessions (up to 8 hours)
Provides session isolation
Designed for stateful interactions

Security Considerations

Security is critical when exposing tools to AI systems.

Key practices:

Principle of least privilege

Assign minimal required permissions using IAM roles

Human-in-the-loop

Required for destructive or sensitive operations

Input validation

Prevent prompt injection or malicious inputs

Architectural decisions remain human-driven, especially for:

Scope definition
Security boundaries
Integration design

Role Play 2 – AI-Powered Solution Validation

Context

This conversation focused on validating AI-generated architectures to ensure they are production-ready.

While AI can generate solutions quickly, validation ensures they work in real-world conditions.

Why Validation is Critical

AI-generated architectures may:

Look correct
Follow best practices
But fail under real-world constraints

Architects must validate designs across:

Technical feasibility
Business constraints
Organizational realities

AI provides the technical baseline, but human validation ensures practical fit.

Validation Framework

A structured validation approach was discussed.

Key dimensions:

Context (Business Fit)

Does the architecture align with business goals?
Does it match customer constraints?

Security and Compliance

Are regulatory requirements met (e.g., PCI DSS)?
Are encryption and segmentation implemented?

Reliability and Performance

Can the system handle expected load?
Are there bottlenecks?

Cost Reality

Is the architecture cost-effective?
Are there opportunities to optimize?

Feasibility

Can the team build and operate this system?
Does it match their skill set?

Value Alignment

Does the solution provide real value to the business?

Security and Compliance Validation

AI can identify potential issues such as:

Missing encryption
Lack of segmentation

Architects must:

Prioritize based on compliance requirements
Distinguish between:
- Mandatory controls
- Best practices

Risk tolerance of the organization plays a key role.

Reliability and Performance Validation

AI can highlight bottlenecks such as:

Database connection limits
Service throttling

Architects must:

Validate these concerns with actual numbers
Consider real traffic patterns

Example:

Scaling for peak events (e.g., holiday traffic).

Applying Business Context

AI often suggests optimal technical designs.

However, these may not align with business needs.

Example:

AI recommends multi-region deployment for high availability
Business reality may prefer:
- Single-region
- Multi-AZ setup
- Lower cost

Human judgment is required to right-size solutions.

Cost Optimization

AI tools can provide real-time cost estimates.

They can:

Break down costs by service
Suggest alternatives

Example:

Switching from a premium database option to a standard one
Saving costs when advanced features are not required

Architects balance:

Cost
Performance
Requirements

Preparing for Stakeholder Questions

AI can be used to simulate critical stakeholders.

Example:

Prompt AI to act as a skeptical CTO

This helps generate:

Tough questions
Risk scenarios
Objections

Architects can then:

Prepare structured responses
Back answers with data

Presenting Options Instead of Answers

Instead of a single solution, multiple options can be presented.

Example:

Option A

Accept downtime risk

Option B

Multi-region deployment (higher cost)

Option C

Add buffering mechanisms (lower cost alternative)

This approach:

Builds trust
Enables informed decision-making

Key Takeaways from Validation Discussion

Validation is a critical skill in AI-assisted architecture.
AI accelerates design, but validation ensures correctness.
Real-world constraints must always guide final decisions.

The combination of AI speed and human judgment leads to better outcomes.

Week 5 Consolidated Takeaways

From the first role play:
MCP provides a standardized way for agents to interact with tools and external systems, reducing complexity and enabling scalable, modular architectures.

From the second role play:
AI-generated architectures must be rigorously validated across business, technical, and operational dimensions before being considered production-ready.

This week emphasized two key themes:

Standardization in agent integration through MCP
The importance of validation to ensure AI-generated solutions work in real-world scenarios

BeSA Batch 09 Week4 - Architecting AI Systems: Scaling with Bedrock AgentCore and Designing at Speed

Girish Mukim — Tue, 17 Mar 2026 00:48:55 +0000

Week 4 - Architecting AI Systems: Scaling with Bedrock AgentCore and Designing at Speed
Batch 09 – BeSA Cloud Academy

Disclaimer:
These notes were drafted using AI for clarity, structure, and readability. They are intended solely for learning purposes.

These are the structured notes from Week 4, focused only on the two role plays. Writing this as a quick revision for those who attended the session and a concise recap for anyone who couldn’t make it.

Role Play 1 – Architecting for Production with Amazon Bedrock Agent Core

Context

This conversation focused on a healthcare technology company that had successfully built agent prototypes using the Strands framework but was struggling to move those agents into production.

The company’s use case involves AI agents assisting with clinical documentation processing across multiple hospitals.

Prototype to Production Challenge

The customer described several challenges they were facing while scaling their prototype.

High concurrency requirements

The organization has about 500 physicians across three hospitals.
Each physician handles 20–30 patients.
This results in thousands of concurrent agent sessions.
Their current system cannot handle this scale.

Sensitive healthcare data

The system handles PHI (Personal Health Information).
HIPAA compliance requires strict isolation between sessions.
Each physician should only access data relevant to their patients.

External system integrations

The agents must connect to:
- Insurance systems
- Healthcare databases
- External APIs

These systems require different authentication methods such as:

API keys
OAuth-based access.

Observability and auditability

Healthcare systems require full traceability.
It must be possible to track:
- Who accessed what
- When it was accessed
- What actions were performed.

The customer emphasized that a “black box” AI system is not acceptable in this environment.

The Prototype-to-Production Gap

The solutions architect explained that this situation is very common.

Many teams successfully build prototypes, but production systems introduce additional challenges.

Four key areas typically create the gap:

Performance

Managing large numbers of concurrent agent sessions.

Scaling

Infrastructure bottlenecks appear when workloads grow.

Security

Ensuring agents only access data allowed for a specific user.

Governance

Tracking activity, usage, and access for compliance purposes.

Introduction to Amazon Bedrock Agent Core

To address these challenges, the conversation introduced Amazon Bedrock Agent Core.

Agent Core is described as a modular suite of services designed to help move agent applications into production environments.

A key point mentioned was that existing Strands agent code does not need to be rewritten.

Instead, Agent Core services can be integrated alongside existing frameworks.

Agent Core Architecture Components

Agent Core consists of multiple modular services that can be adopted based on requirements.

Runtime

The runtime provides a serverless environment for deploying agents.

Key capabilities include:

Wrapping the agent in a container.
Exposing an endpoint for interaction.
Automatically scaling from zero to thousands of concurrent sessions.

This eliminates the need to manage infrastructure manually.

Session Management

Clinical workflows often involve long interactions.

The runtime supports sessions lasting up to eight hours.

This enables complex medical documentation workflows to complete without interruption.

Identity and Access Management

Two types of identity control were discussed.

Inbound authentication

Ensures only authorized users can access the agent.
Can integrate with identity providers such as Cognito or Okta.

Outbound authorization

Allows the agent to securely access external systems.
API credentials are stored in a credential provider backed by Secrets Manager.
This prevents sensitive credentials from being exposed in the application.

Gateway and Tool Management

When agents interact with many tools, a new challenge emerges.

If every tool is included in the prompt context, it can:

Increase token usage.
Expand the context window unnecessarily.
Reduce response accuracy.

The Agent Core Gateway addresses this problem.

It indexes tools such as:

APIs
Lambda functions
MCP servers.

Using semantic search, the gateway retrieves only the tools relevant to a specific request.

Benefits include:

Reduced token usage.
Lower operational costs.
Improved response accuracy.

Memory Capabilities

Agent Core provides two memory layers.

Short-term memory

Maintains state within an active session.

Long-term memory

Stores persistent information such as user preferences or conversation history.

This allows agents to maintain context across interactions.

Key Takeaways from the Production Discussion

Moving from prototype to production requires solving operational challenges such as scaling, security, and governance.

Production-ready agent systems must support:

High concurrency workloads
Secure data isolation
Integration with external systems
Auditability and observability

Agent Core provides modular services designed to address these needs without requiring a full redesign of existing agents.

Role Play 2 – Rapid Architecture and Design at Speed

Context

This conversation explored how AI tools can significantly accelerate architecture design workflows for solutions architects.

The scenario involved a new customer engagement requiring a complete architecture proposal within three days.

Traditional Architecture Design Timeline

Previously, preparing a full architecture proposal could take two to three weeks.

Tasks involved:

Industry research
Evaluating architecture patterns
Designing diagrams
Writing architecture decision documentation.

The discussion demonstrated how AI-assisted workflows can compress this timeline dramatically.

Agentic Architecture Assistant

The architect described using an AI-powered assistant built with several components.

LLM (the brain)

Responsible for reasoning and generating ideas.

MCPs (Model Context Protocols)

Provide access to domain-specific knowledge such as AWS services or pricing.

Agent interface

Implemented as an IDE extension such as Cline inside VS Code.

Together, these components form an “agentic helper” that assists with architecture design tasks.

Incorporating Real Customer Constraints

While AI can generate architectures quickly, it does not automatically understand real-world constraints.

In the scenario, the customer had several limitations:

Legacy architecture

A 15-year-old monolithic system.

Team skill set

Developers are experienced in Java.
No experience with serverless architectures.

Leadership preferences

The CTO is risk-averse.

These contextual factors strongly influence architectural choices.

Human Judgment in Architecture

The architect explained how human judgment guides AI outputs.

For example:

AI might suggest a fully serverless architecture.

However, considering the team’s Java expertise and risk tolerance, this approach may not be appropriate.

Instead, the architect may choose a hybrid architecture combining:

Containers
Selected serverless components.

This approach balances modernization with practicality.

Generating Architecture Artifacts with AI

The conversation also highlighted how AI can accelerate the creation of design artifacts.

Architecture diagrams

Using specialized MCP tools, the AI can generate diagrams that include:

Front-end layers
Web Application Firewall (WAF)
Content Delivery Network (CDN)
Containerized backend and persistence layers.

What previously required hours of manual diagramming can now be generated in seconds.

Architecture Decision Records (ADRs)

The AI can also generate ADR documents.

These explain why certain design choices were made.

Example:
Choosing ECS Fargate instead of Lambda based on:

Java-based development environment
PCI compliance requirements.

Generating Infrastructure as Code

The agent can also create starter templates for infrastructure.

Example outputs include:

Terraform templates
Deployment configuration samples.

These provide a strong starting point for implementation.

Three-Day Architecture Delivery Plan

The architect outlined a structured workflow for meeting the three-day deadline.

Day 1

Research e-commerce architecture patterns.
Generate three viable architectural approaches.

Day 2

Create detailed architecture diagrams.
Draft Architecture Decision Records.
Generate infrastructure code samples.

Day 3

Refine messaging for leadership stakeholders.
Focus on cost predictability and risk mitigation.
Produce an executive summary.

Cost of AI-Assisted Architecture

An interesting detail mentioned was the cost of the automated workflow.

The entire AI-assisted architecture generation process cost approximately $2.30 in compute time.

Key Takeaways from the Architecture Discussion

AI can dramatically accelerate architecture workflows by:

Automating research
Generating architecture options
Producing diagrams and documentation
Creating infrastructure templates.

However, AI does not replace architectural expertise.

Human architects remain responsible for:

Interpreting organizational constraints
Making trade-offs
Applying context and judgment.

The most effective approach is a partnership between AI capabilities and human architectural thinking.

Week 4 Consolidated Takeaways

From the first role play:
Moving AI agents into production introduces new requirements around scalability, security, governance, and operational visibility.

Services like Amazon Bedrock Agent Core provide modular capabilities that help address these production challenges.

From the second role play:
AI-assisted workflows can significantly accelerate architecture design, but the architect’s role in applying context, judgment, and stakeholder awareness remains critical.

This week highlighted the transition from building agents to operating them at scale and demonstrated how AI tools can also transform how architects design and deliver solutions.

BeSA Batch 09 Week3 - Building Agents with SDKs and Improving Discovery with AI

Girish Mukim — Sun, 08 Mar 2026 02:55:35 +0000

Week 3 - Building Agents with SDKs and Improving Discovery with AI
Batch 09 – BeSA Cloud Academy

Disclaimer:
These notes were drafted using AI for clarity, structure, and readability. They are intended solely for learning purposes.

These are the structured notes from Week 3, focused only on the two role plays. Writing this as a quick revision for those who attended the session and a concise recap for anyone who couldn’t make it.

Role Play 1 – Technical Session: Getting Started with Strands Agent

Context
This conversation focused on the practical challenges of building agents and how a standardized SDK approach can simplify development.

The customer started with a basic understanding of what is needed to build an agentic AI system.

Core Components Required for an Agent

To build an agent system, several components are required:

Infrastructure

Cloud or on-premise environment to run the workloads.

Foundation Model

Acts as the “brain” of the agent.

Supporting Services

Security
Memory for conversations
Observability
Orchestration

The solutions architect confirmed that this understanding is correct and forms the baseline for agent architectures.

Common Challenges When Building Agents

The discussion highlighted several practical challenges teams face:

Steep learning curve

Multiple frameworks
Different SDKs
Rapidly evolving ecosystem

Complex orchestration

Managing how agents call tools
Handling multi-step workflows

Black box behavior

Limited visibility into what the agent is doing
Hard to debug reasoning steps

Language and framework fragmentation

Switching between tools and languages increases complexity.

The main theme here was the need for standardization.

What is Strands Agent

Strands Agent was introduced as a way to simplify agent development.

Definition

An open-source SDK designed for building agents using minimal code.

Conceptually it combines:

Models (brain)
Tools (hands)

This allows developers to focus on agent behavior rather than infrastructure complexity.

Understanding SDK vs Framework

An important clarification was made around SDKs and frameworks.

SDK (Software Development Kit)

Collection of tools, libraries, and documentation.
Helps developers build applications faster.
Provides reusable building blocks.

Analogy used:

Like Lego pieces.
Instead of creating every component from scratch, you assemble existing blocks.

Framework

Defines architectural structure and rules.
Determines how components interact.

Analogy used:

Like a blueprint for a building.

Strands essentially provides both:

The framework structure
The SDK tools to implement it.

Why Use Strands

Key benefits mentioned:

Ease of use

Few lines of code to build agents.

Native AWS integrations

Works naturally with AWS services.

Model agnostic

Can work with different models such as Claude, OpenAI models, or Llama.

Rapid experimentation

Developers can iterate and deploy faster.

Agent Interaction Flow

The discussion explained how the components interact.

Agent

Acts as the orchestrator.

Prompt

User input that triggers the workflow.

Model

Performs reasoning.
Determines which tools are needed.

Tools

Execute actions such as API calls or sending emails.

Response

Final output returned to the user.

This cycle operates continuously in what was described as an agentic loop.

Typical workflow:

Prompt → Reason → Tool Selection → Tool Execution → Response

Working with Models

The example showed how developers can:

Import the agent
Specify the model (example: Claude 3.5 Sonnet)
Provide system instructions
Invoke the agent

Another interesting point discussed was running models locally using Ollama.

This allows developers to:

Experiment locally
Avoid cloud dependency during development
Prototype faster.

Tools in Strands

Tools were compared to tools used by craftsmen.

Just like a carpenter needs specific tools, an agent requires the right tools to perform tasks.

Two types of tools were mentioned:

Pre-built tools
Examples include:

HTTP request tools
Calculator tools

Custom tools
Developers can create tools using a simple decorator approach in Python.

Example concept:

Define a function
Add a tool decorator
The agent can now invoke it.

This enables developers to connect agents to internal APIs or services.

Model Context Protocol (MCP)

A key concept introduced was MCP.

Definition

An open standard for connecting AI systems to external tools and services.

Analogy used:

A USB hub.

Your laptop might have one port, but the hub allows connection to many devices.

Similarly, MCP allows agents to interact with multiple systems through a standardized interface.

Benefits include:

Reduced integration complexity
Consistent communication format
Easier expansion of agent capabilities.

Role Play 2 – Behavioral Session: Using AI to Accelerate Discovery

Context
This conversation explored how architects can use AI tools to prepare for customer engagements and accelerate discovery.

The scenario involved a solutions architect preparing for a new customer meeting with very little time.

Traditional Preparation vs AI-Assisted Preparation

Traditionally, preparing for a customer engagement could take several days.

Typical workflow:

Research the company
Understand industry trends
Identify likely technical challenges
Prepare discovery questions

Using AI changes this process significantly.

Instead of multiple days, preparation can be compressed into a few hours.

Initial Research with AI

The architect begins by briefing the AI with basic information about the customer:

Industry
Market size
Business trends
Competitive pressures

The AI then generates insights such as:

Regulatory environment (e.g., GDPR)
Industry modernization challenges
Technical considerations like latency sensitivity.

This provides a fast baseline understanding.

Adding Human Context

However, AI does not understand internal personalities or organizational dynamics.

This is where the TAM adds valuable insight.

Examples discussed included:

A cost-focused CTO

Strong requirement to reduce costs.

A risk-averse CISO

Concerned about customer data protection.

An engineering leader with a small team

Limited capacity to manage complexity.

This human context becomes critical.

Combining AI insights with relationship knowledge produces much better preparation.

Anticipating Objections

The architect then uses AI to anticipate objections.

Example approach:

Feed AI the stakeholder concerns.
Ask it to generate likely objections or concerns.

This allows the architect to prepare responses in advance rather than reacting in the meeting.

Generating a Discovery Framework

AI can also help generate a discovery framework.

This includes:

Business drivers
Technical risks
Modernization priorities
Operational constraints

However, these questions are often generic.

The architect must adapt them to the specific context of the customer.

Example:

Generic question

What are your modernization goals?

Contextual question

How is your small engineering team managing technical debt during modernization?

AI provides the structure, while the architect adds depth.

Using AI After Meetings

Another useful technique discussed was the “raw notes dump.”

After the meeting, the architect:

Pastes rough notes into the AI tool.
Asks it to identify:
- Explicit requirements
- Implicit concerns
- Risks
- Action items

The AI performs structured analysis on unstructured notes.

This helps convert messy meeting notes into organized documentation.

Producing Clean Documentation

The final step is creating clear documentation to share with the customer.

Examples include:

Requirements summaries
Key concerns identified
Architecture considerations
Next steps

This demonstrates that the architect is listening and thinking strategically.

Important Advice

One key warning from the conversation:

Do not walk into a customer meeting with a generic presentation.

Better approach:

Use AI to understand the customer’s world.
Combine that with the TAM’s relationship knowledge.
Tailor discussions to real concerns.

The winning combination is: AI research + human insight.

Week 3 Consolidated Takeaways

From the technical role play:

SDKs and frameworks can significantly reduce complexity when building agents.
Standardization helps address fragmentation in the agent ecosystem.
Tools and MCP enable agents to interact with external systems in a scalable way.

From the behavioral role play:

AI can dramatically accelerate discovery preparation.
Human context and relationships remain essential.
AI works best as a research and analysis assistant rather than a decision maker.

This week shifted the focus from foundational concepts and architecture to practical workflows—both for building agents and for improving how architects engage with customers during discovery.

BeSA Batch 09 Week2 - Building with Agentic AI

Girish Mukim — Sun, 01 Mar 2026 02:08:59 +0000

Week 2 – Building with Agentic AI
Batch 09 – BeSA Cloud Academy

Disclaimer:
These notes were drafted using AI for clarity, structure, and readability. They are intended solely for learning purposes.

These are the structured notes from Week 2, focused only on the two role plays. Writing this as a quick revision for those who attended the session and a concise recap for anyone who couldn’t make it.

Role Play 1 – Solution Architect and Customer (Insurance Claims Automation)

Context
This conversation was structured as a discovery session. The customer represents a SAS platform aiming to automate insurance claims processing using Agentic AI.

Business Problem

Current state:

Manual claims processing.
Takes days to weeks.
Error-prone due to:
- Reading bulky documents.
- Cross-referencing policy clauses.
- Performing deep reasoning to accept or reject claims.

Target state:

Faster processing.
Reduced human effort for low-risk claims.
Intelligent automation for complex scenarios.

Technical Requirements and Constraints

Latency

High-volume, low-risk claims should be processed in under 5 seconds.
Complex claims involving reasoning can tolerate higher latency.
Not all steps need the same performance profile.

Data Types

Unstructured documents.
Handwritten forms.
Images of vehicle damage.
The solution must combine text understanding and image analysis.

Integrations

S3 for document retrieval.
Policy databases for clause validation.
Third-party APIs for fraud detection.
Agent must orchestrate across multiple systems.

Team Constraints

Team has DevOps and backend engineers.
Upskilling in AI/ML.
Strong preference for managed services.
Avoid managing infrastructure.
Managed services like Amazon Bedrock are preferred.

Model Selection Framework

The Solution Architect proposed evaluating models across four dimensions:

Capability

Does the model support deep reasoning?
Can it handle multi-step logic?

Latency

Is it fast enough for high-volume workflows?

Cost

Is it sustainable at scale?

Features

Tool usage.
Structured output.
Multi-modal support.

Key insight:

Not every task requires a premium model.
Complex reasoning may need higher-tier models.
Simple extraction can use lighter, cheaper models.

Multi-Model Strategy

Instead of using a single expensive model for the entire workflow, a four-stage pipeline was proposed:

Document extraction
Image analysis
Policy reasoning
Decision generation

Each stage can use a different model optimized for that task.

Result:

Right model for the right task.
Cost reduction estimated at 60–70%.
Performance optimization without overpaying.

Resilience and Governance

Inference Profiles

Used to manage throttling.
Provide resilience across regions.

Application inference profiles

Useful for SAS providers.
Track cost per customer.
Ensure data residency compliance.

Data assurance

AWS does not use customer data to train or retrain models.
Important for compliance-sensitive industries like insurance.

Key architectural takeaway:
Agentic systems must be designed not only for intelligence but also for cost control, compliance, and operational resilience.

Role Play 2 – Enterprise Architect and Solutions Architect
The Evolving Role of the Architect in the Age of AI

This conversation explored how Generative AI impacts the role of Solutions Architects and whether AI reduces or transforms their value.

The Arithmetic vs Mathematics Analogy

Concern raised:

AI can generate complex architecture diagrams in seconds.
Does this make architects obsolete?

Response:

AI handles arithmetic (technical plausibility).
Architects handle mathematics (judgment and context).

Meaning:
AI can produce technically valid architectures.
But it cannot:

Understand politics.
Navigate budgets.
Handle human constraints.

The What vs The Why and How

AI provides:

The “what”.
A plausible list of services and patterns.

Architect provides:

The “why”.
The “how”.
Adjustments based on:
- Finance constraints.
- Organizational policies.
- Stakeholder expectations.

Example:
A technically optimal architecture may not work if:

The finance team requires fixed budgets.
The CTO prefers certain vendors.
Compliance constraints override design decisions.

Research vs Judgment Split

Tasks to hand off to AI:

Knowledge retrieval.
Service comparisons.
Summarizing long documentation (e.g., regulatory texts).
Pattern generation.
Code scaffolding.

Tasks to retain as an architect:

Managing stakeholder biases.
Handling organizational politics.
Making trade-offs (availability vs consistency).
Long-term strategic alignment.
Defending architectural decisions.

Three-Loop Workflow

Loop 1 – Discovery

Use AI as a research assistant.
Summarize meeting notes.
Identify gaps in requirements.

Loop 2 – Design

Use AI to generate architecture drafts.
Validate and refine designs.
Ensure feasibility and alignment with constraints.

Loop 3 – Delivery

Use AI to draft executive summaries.
Draft architecture decision records.
Refine documentation.

AI becomes:

A fast assistant.
Not the final decision maker.

Shift in Identity

Earlier:

Architect’s value was in memorized knowledge.
Service limits, certifications, patterns.

Now:

AI fills the knowledge moat.
Real moat is synthesis.

Synthesis means:

Combining technical capabilities.
Understanding human constraints.
Applying empathy.
Building coherent, real-world architectures.

Strategic Advice

Trust but verify.

AI can hallucinate.
AI can sound confident even when incorrect.

Treat AI like:

A fast summer intern.
Productive.
Needs direction.
Requires oversight before delivery.

Week 2 Consolidated Takeaways

From the first role play:

Agentic AI architecture must align with business latency, cost, and compliance constraints.
Multi-model pipelines are practical and cost-efficient.
Governance and inference profiles are critical in enterprise scenarios.

From the second role play:

AI enhances the architect role rather than replacing it.
Technical plausibility is not enough.
Judgment, synthesis, and empathy are long-term differentiators.
AI should be integrated into workflows across discovery, design, and delivery.

This week shifted focus from foundational concepts (Week 1) to real-world application and professional identity in the AI era.

Next week, I’ll continue documenting how these concepts evolve into deeper implementation and architectural patterns.

BeSA Batch 09 Week1 - Foundation of Agentic AI

Girish Mukim — Mon, 23 Feb 2026 23:30:50 +0000

Week 1 – Agentic AI Foundations
Batch 09 – BeSA Cloud Academy

Disclaimer:
These notes were drafted using AI for clarity, structure, and readability. They are intended solely for learning purposes.

These are the structured notes from Week 1. Writing this as a quick revision for those who attended the session and a concise recap for anyone who couldn’t make it.

(1) Why Agentic AI? (GenAI vs Agentic AI)

The core idea discussed was the difference between traditional Generative AI and Agentic AI.

Generative AI (LLMs) act as the brain:

Understand context
Generate responses
Provide suggestions or answers

However, they stop at reasoning. The execution is still done by the user. For example, an LLM can suggest travel options, but I still need to:

Book flights
Check hotel availability
Compare prices
Open multiple tabs and execute actions manually

This is where the limitation becomes clear.

Why simple scripts are not enough:

Scripts assume fixed flows.
Real-world situations are dynamic.
If a condition changes (flight delay, inventory change, pricing update), scripts often fail.
Complex workflows require adaptation and replanning.

Agentic AI introduces:

Reasoning
Planning
Replanning
Autonomous execution

Instead of only answering, the system can take actions and adjust in real time.

(2) What is an Agent?

Definition:
An agent is an entity that drives autonomous work on behalf of a user or business.

Important characteristics:

Has a goal
Can act independently
Uses tools
Can change its plan if needed

Analogy used: James Bond (007)

Gets a mission (trigger)
Assesses context
Uses tools (gadgets)
Adjusts plan when things change
Completes the mission autonomously

Evolution of the technology:

Generative Assistant

Basic automation
Examples: spelling suggestions, summarization

AI Agent

Executes a dynamic workflow
Focused on achieving a single goal

Agentic AI System

Multiple agents collaborating
Multi-task and multi-system execution
Comparable to coordinated teams (e.g., Avengers concept)

The progression is from static response → goal-driven execution → coordinated multi-agent systems.

(3) Anatomy of an Agent

Agents operate in a loop:

Think → Act → Evaluate/Reflect

This loop is important for understanding architecture.

The Brain (LLM)

Responsible for reasoning
Chooses which tool to use
Makes decisions based on context

Planning / Orchestration

Creates a path to the goal
Adjusts path when something changes
Enables replanning

Tools (Action Groups)

APIs
Lambda functions
Databases
External integrations

These are how the agent interacts with the outside world.

Memory

Maintains context across interactions
Prevents users from repeating information
Enables continuity in conversations

Judge LLM (in complex systems)

Acts as a reviewer
Provides checks and balances
Useful in multi-agent or high-risk workflows (e.g., supply chain scenarios)

The major takeaway:
An agent is not just an LLM. It is LLM + planning + tools + memory + control loop.

(4) Technical Implementation and Architecture (AWS Focus)

Amazon Bedrock

Managed service for building and deploying agents
Provides access to foundation models
Supports agent configuration

Agent Core

Handles runtime
Memory management
Observability capabilities

Instructions
Agents are guided using natural language instructions:

Define role (e.g., “You are a car parts assistant”)
Define guidelines
Specify what tools are available
Constrain behavior

This becomes the control layer for agent behavior.

Key architectural pillars discussed:

Security

Agents should be treated like human identities.
Assign restricted permissions.
Apply least privilege principles.
Each agent should have scoped access to resources.

Cost

Token usage directly impacts cost.
Complex reasoning and tool chaining increases token consumption.
Need to monitor and optimize prompt and reasoning flows.

Observability

Visibility into agent traces.
Ability to inspect reasoning steps.
Important for debugging and improving workflows.

Architectural thinking must include:

Identity and access management
Cost monitoring
Traceability of decisions

(5) Important Conceptual Takeaways

LLM alone is not an agent.
Agents require structured orchestration.
Real-world systems need replanning capability.
Multi-agent systems introduce coordination complexity.
Security and cost control are architectural responsibilities.
Observability is essential for production systems.

Mental model to retain:
Agent = Brain (LLM) + Tools + Memory + Planning + Control Loop

End of Week 1 Notes

This session focused on conceptual foundations and architectural clarity. Future weeks will likely build deeper into implementation patterns and advanced agent systems.

I’ll continue documenting each week in this series for structured revision.

Fixing AgentCore CLI Hangs on Windows: Step‑by‑Step with Python, AWS CLI, Strands agents and WSL

Girish Mukim — Mon, 19 Jan 2026 02:41:34 +0000

Introduction

While experimenting with Amazon Bedrock AgentCore on Windows, I ran into a frustrating issue:

agentcore configure would hang indefinitely, even with a minimal sample agent.

No errors. No logs. Just a frozen terminal.

This blog documents:

The exact symptoms
Why this happens on Windows
Why Python 3.13 breaks AgentCore
Why installing Python 3.11 is not enough
And the only reliable fix

If you’re using Strands + AgentCore on Windows, this will save you hours.

You can try running AgentCore directly on Windows and follow the troubleshooting steps I went through, but in my experience, it never worked reliably. If it doesn’t work for you either, don’t worry — WSL on Windows is the way to go for a stable setup.

Environment

OS: Windows 10 / 11
Python versions installed:
- ❌ Python 3.13
- ✅ Python 3.11.9
Tools:
- bedrock-agentcore-starter-toolkit
- strands
- strands-agents-tools
Shell: Command Prompt / PowerShell

The Symptom

Running the following command:

agentcore configure --entrypoint agentcore.py --region us-east-1

Results in:

No output
No error
CLI hangs forever

Never ending wait...

First Assumption: My Agent Code Is Wrong ❌

I simplified the agent to the absolute minimum:

from strands import Agent
from strands.models import MockModel

agent = Agent(
    model=MockModel(),
    system_prompt="test"
)

Still hangs.

➡️ This proves the issue is not agent logic.

Second Assumption: AWS Credentials or Bedrock ❌

I validated AWS access:

aws sts get-caller-identity --region us-east-1

✔ Works instantly.

➡️ Not an AWS or Bedrock permission issue.

Third Assumption:: Python Version (3.11) ❌

If you don't want to try Python version 3.11, go ahead to skip this section and move to Network & Connectivity Validation

Checking Python version:

python --version

Python 3.13.x

Checking where AgentCore-related packages are installed:

pip show strands-agents | findstr Location
pip show strands-agents-tools | findstr Location
pip show bedrock-agentcore-starter-toolkit | findstr Location

Output:

Location: C:\Users\AppData\Roaming\Python\Python313\site-packages

➡️ All AgentCore binaries are bound to Python 3.13

C:\Users\LearnAI>pip show strands-agents | findstr Python
Location: C:\Users\AppData\Roaming\Python\Python313\site-packages

C:\Users\LearnAI>pip show strands-agents-tools | findstr Python
Location: C:\Users\AppData\Roaming\Python\Python313\site-packages

C:\Users\LearnAI>pip show bedrock-agentcore-starter-toolkit | findstr Python
Location: C:\Users\AppData\Roaming\Python\Python313\site-packages

C:\Users\LearnAI>pip show bedrock-agentcore | findstr Python
Location: C:\Users\AppData\Roaming\Python\Python313\site-packages

C:\Users\LearnAI>

Installing Python 3.11 Is Necessary — But Not Sufficient

I installed Python 3.11.9:

🔗 https://www.python.org/downloads/release/python-3119/

But after installation:

python --version

Still showed:

Python 3.13.x

Why?

Because Windows PATH still prioritizes Python 3.13.

Attempt 1: Fix PATH Order (Partial Fix)

Steps:

Open Environment Variables
Edit System PATH
Move Python 3.11 paths above Python 3.13

Example:

C:\Python311\
C:\Python311\Scripts\
C:\Users\...\Python313\

Restart terminal.

Verify:

python --version

Python 3.11.9

Still Not Working? Here’s Why

Even after fixing PATH, pip-installed binaries remain tied to Python 3.13.

This is the key gotcha.

Example:

agentcore

This binary was installed using:

Python 3.13 pip

So it still launches under 3.13, regardless of PATH.

Let's remove Python 3.13

✅ Completely remove Python 3.13 tooling

1. Uninstall AgentCore and related packages

pip uninstall bedrock-agentcore-starter-toolkit strands-agents strands-agents-tools -y

Make sure this pip is from Python 3.13.

C:\Users\LearnAI>where pip
C:\Python313\Scripts\pip.exe
C:\Users\AppData\Roaming\Python\Python313\Scripts\pip.exe

C:\Users\LearnAI>pip uninstall bedrock-agentcore-starter-toolkit strands strands-agents-tools -y
Found existing installation: bedrock-agentcore-starter-toolkit 0.1.10
Uninstalling bedrock-agentcore-starter-toolkit-0.1.10:
  Successfully uninstalled bedrock-agentcore-starter-toolkit-0.1.10
WARNING: Skipping strands as it is not installed.
Found existing installation: strands-agents-tools 0.2.19
Uninstalling strands-agents-tools-0.2.19:
  Successfully uninstalled strands-agents-tools-0.2.19

C:\Users\LearnAI>pip uninstall strands-agents -y
Found existing installation: strands-agents 1.22.0
Uninstalling strands-agents-1.22.0:
  Successfully uninstalled strands-agents-1.22.0

C:\Users\LearnAI>

2. (Optional but recommended) Uninstall Python 3.13

From:

Windows “Add or Remove Programs”

This avoids future confusion.

3. Activate Python 3.11 explicitly

py -3.11 -m venv agentcore-env
agentcore-env\Scripts\activate

Verify:

python --version

Python 3.11.9

In my case, I removed path referencing Python313 from system environment variables and started a new command line terminal.

still observed that pip is pointing to Python313.

C:\Users\LearnAI>where pip
C:\Users\AppData\Roaming\Python\Python313\Scripts\pip.exe

C:\Users\LearnAI>python -m pip install --upgrade pip
Requirement already satisfied: pip in c:\users\appdata\local\programs\python\python311\lib\site-packages (24.0)
Collecting pip
  Using cached pip-25.3-py3-none-any.whl.metadata (4.7 kB)
Using cached pip-25.3-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.0
    Uninstalling pip-24.0:
      Successfully uninstalled pip-24.0
  WARNING: The scripts pip.exe, pip3.11.exe and pip3.exe are installed in 'C:\Users\AppData\Local\Programs\Python\Python311\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-25.3

C:\Users\LearnAI>

Added C:\Users\AppData\Local\Programs\Python\Python311\Scripts to PATH.
Also manually removed pip.exe from Python313 folder.

C:\Users\LearnAI>where pip
C:\Users\AppData\Local\Programs\Python\Python311\Scripts\pip.exe

C:\Users\LearnAI>

4. Reinstall everything under Python 3.11

pip install --upgrade pip
pip install boto3 bedrock-agentcore-starter-toolkit strands-agents strands-agents-tools

Verify:

pip show bedrock-agentcore-starter-toolkit | findstr Location

Expected:

Python311\site-packages

C:\Users\LearnAI>
C:\Users\LearnAI>pip show bedrock-agentcore-starter-toolkit | findstr Location
Location: C:\Users\AppData\Local\Programs\Python\Python311\Lib\site-packages

C:\Users\LearnAI>

Oops Still the same issue -

Despite this cleanup, the agentcore configure hang still persists, confirming this is not a Python-version issue.

Here’s how you can write the Python 3.11 point in the same style as your other assumptions:

python --version
# Python 3.11.x

✔ Python 3.11 is correctly set up and recognized.

➡️ Python version was not the root cause of the AgentCore issues.

Network & Connectivity Validation

To rule out firewall or endpoint connectivity issues, the following test was performed:

curl https://bedrock.us-east-1.amazonaws.com

Result:
<UnknownOperationException/>

Why this result is important

DNS resolution succeeded.
TLS handshake succeeded.
Outbound HTTPS (port 443) connectivity is confirmed.
The error is expected when calling the Bedrock control-plane endpoint without request signing.

The local machine can successfully reach AWS Bedrock endpoints. Firewall or port blocking is unlikely to be the root cause.

Since AgentCore continued to hang or behave unpredictably on Windows despite fixing Python and AWS CLI issues, I decided to rule out platform-related problems. To do this, I switched to WSL (Windows Subsystem for Linux), which provides a Linux environment on Windows, and tested AgentCore there. This approach helps isolate any Windows-specific limitations and ensures a cleaner, more consistent runtime for Python 3.11 and AWS CLI.

High-level approach

You will:

Open a Linux distro inside WSL (Ubuntu recommended)
Install Python 3.11 inside WSL
Install AgentCore + Strands inside WSL
Run agentcore configure from WSL
Compare behavior with Windows

Step-by-step: Running AgentCore in WSL

You can refer Microsoft’s official WSL installation documentation — step‑by‑step on installing WSL and a Linux distro (like Ubuntu) on Windows using the integrated installer:
👉 Install WSL

This doc explains how to enable WSL, install a Linux distribution (Ubuntu by default), and get up and running with WSL.

1️⃣ Confirm your WSL distro

From Windows command line:

C:\Users\LearnAI>wsl
gmukim@LearnAI:/mnt/c/Users/LearnAI$

From PowerShell:

wsl -l -v

You should see something like:

NAME      STATE   VERSION
Ubuntu    Running 2

If you don’t have Ubuntu, install it:

wsl --install -d Ubuntu

2️⃣ Open Ubuntu (WSL)

Run:

wsl

You are now inside Linux, not Windows. Start WSL from your working directory.

3️⃣ Install Python 3.11 in WSL (clean)

Inside Ubuntu:

sudo apt update
sudo apt install -y python3.11 python3.11-venv python3.11-dev

Verify:

python3.11 --version

4️⃣ Create a virtual environment (important)

python3.11 -m venv agentcore-env
source agentcore-env/bin/activate

You should see:

(agentcore-env)

gmukim@LearnAI:/mnt/c/GirishIBM/SkillUP/AWS/Certifications/AI Professional/LearnAI$ python3.11 -m venv agentcore-env

gmukim@LearnAI:/mnt/c/Users/LearnAI$
gmukim@LearnAI:/mnt/c/Users/LearnAI$ source agentcore-env/bin/activate
(agentcore-env) gmukim@LearnAI:/mnt/c/Users/LearnAI$

5️⃣ Install AgentCore + dependencies

pip install --upgrade pip
pip install bedrock-agentcore-starter-toolkit strands-agents strands-agents-tools

Verify installs:

pip show bedrock-agentcore
pip show strands-agents
pip show strands-agents-tools

(agentcore-env) gmukim@LearnAI:/mnt/c/GirishIBM/SkillUP/AWS/Certifications/AI Professional/LearnAI$ pip show bedrock-agentcore
Name: bedrock-agentcore
Version: 1.2.0
Summary: An SDK for using Bedrock AgentCore
Home-page: https://github.com/aws/bedrock-agentcore-sdk-python
Author:
Author-email: AWS <opensource@amazon.com>
License: Apache-2.0
Location: /mnt/c/GirishIBM/SkillUP/AWS/Certifications/AI Professional/LearnAI/agentcore-env/lib/python3.11/site-packages
Requires: boto3, botocore, pydantic, starlette, typing-extensions, urllib3, uvicorn, websockets
Required-by: bedrock-agentcore-starter-toolkit

(agentcore-env) gmukim@LearnAI:/mnt/c/Users/LearnAI$ pip show strands-agents
Name: strands-agents
Version: 1.22.0
Summary: A model-driven approach to building AI agents in just a few lines of code
Home-page: https://github.com/strands-agents/sdk-python
Author:
Author-email: AWS <opensource@amazon.com>
License: Apache-2.0
Location: /mnt/c/Users/LearnAI/agentcore-env/lib/python3.11/site-packages
Requires: boto3, botocore, docstring-parser, jsonschema, mcp, opentelemetry-api, opentelemetry-instrumentation-threading, opentelemetry-sdk, pydantic, typing-extensions, watchdog
Required-by: strands-agents-tools
(agentcore-env) gmukim@LearnAI:/mnt/c/Users/LearnAI$

(agentcore-env) gmukim@LearnAI:/mnt/c/Users/LearnAI$ pip show strands-agents-tools
Name: strands-agents-tools
Version: 0.2.19
Summary: A collection of specialized tools for Strands Agents
Home-page: https://github.com/strands-agents/tools
Author:
Author-email: AWS <opensource@amazon.com>
License: Apache-2.0
Location: /mnt/c/Users/LearnAI/agentcore-env/lib/python3.11/site-packages
Requires: aiohttp, aws-requests-auth, botocore, dill, markdownify, pillow, prompt-toolkit, pyjwt, requests, rich, slack-bolt, strands-agents, sympy, tenacity, typing-extensions, watchdog
Required-by:
(agentcore-env) gmukim@LearnAI:/mnt/c/Users/LearnAI$

7️⃣ Configure AWS credentials inside WSL

This is important — Windows AWS creds do not automatically carry over.

Inside WSL:

aws configure

If require, install AWS CLI.

8️⃣ Run AgentCore configure and Launch (the real test)

Now the moment of truth:

agentcore configure --entrypoint agentcore.py --region us-east-1 --verbose

agentcore launch

agentcore status
agentcore invoke '{"prompt": "What is the current time?"}'

✔ Completes successfully
✔ No hang
✔ AgentCore works as expected

Destroying AgentCore Resources [ Critical ]

After testing your agent, run:

agentcore destroy

This deletes the deployed agent and any remote build artifacts.

Do you want me to also add the one-line reminder message for learners?

Final Thoughts

This issue is subtle, silent, and extremely frustrating — especially because AgentCore fails by hanging instead of erroring.

Hopefully, this guide helps you avoid the same trap.

This journey showed me how environment issues can block AgentCore. I found that using WSL with Python 3.11 and a clean AWS CLI install provided a stable setup.

That’s it for this deep dive! Hopefully, this helps you avoid the pitfalls I ran into.

— Girish, signing off. See you in the next one!

How to Build a Scalable RAG-Based Chatbot on AWS?

Girish Mukim — Sun, 14 Dec 2025 17:32:05 +0000

This article is written in collaboration with Ajay Pokale, a Senior Architect at Cognizant.

TL;DR (Key Takeaways)

Retrieval-Augmented Generation (RAG) allows LLMs to answer questions using your private data.
Amazon Bedrock + S3 Vectors enable a fully serverless RAG implementation
Amazon Nova Lite provides fast, cost-efficient responses for real-time chatbots
The solution is scalable, secure, and ideal for schools, enterprises, and internal knowledge systems
No servers to manage, minimal cost, production-ready architecture

Introduction

Large Language Models (LLMs) are incredibly powerful. They can generate text, summarize content, and answer complex questions.
However, they have one critical limitation:

👉 They do not know your private or domain-specific data.

This makes them unreliable for scenarios involving:

Internal company policies
School rules and schedules
Proprietary documents
Frequently changing information

Retrieval-Augmented Generation (RAG) solves this problem by combining:

Information retrieval from your own data sources
Text generation using an LLM

The result is an AI assistant that produces accurate, grounded, and trustworthy answers.

In this guide, we will build a scalable, serverless RAG chatbot on AWS using Amazon Bedrock and modern AWS services.

Real-World Use Case: School Assistant Chatbot

Let’s consider a use case for primary schools: a chatbot designed to provide quick answers to everyday questions for parents, students, and staff.

Current Challenges

Parents call the school office for routine questions
Staff repeatedly answer the same queries
Information delivery is slow and inconsistent

Way Forward using chatbot

Parents ask the chatbot 24/7
Answers are retrieved directly from official school documents
Administrative workload is significantly reduced

Example Questions

When is the next school holiday?
What documents are required for admission?
What are the school lunch rules?

This makes RAG an ideal solution for education, HR, compliance, and internal knowledge systems.

Solution Architecture Overview

We use a fully serverless AWS architecture to achieve:

Automatic scaling
High availability
Low operational cost
Minimal infrastructure management

Key Components Explained:

Knowledge Base (Amazon S3 + S3 Vectors)

School documents are stored in Amazon S3
Amazon Bedrock converts documents into vector embeddings using the Titan Embeddings model.
These embeddings are stored in S3 Vectors for fast semantic search.

Why S3 Vectors?

Fully serverless (no cluster management)
Cost-effective for RAG workloads
Seamless upgrade path to Amazon OpenSearch for advanced search needs

Amazon Bedrock Agent (RAG Intelligence Layer)

The Bedrock Agent acts as the brain of the chatbot:

Embeds the user’s question
Searches the vectorized knowledge base
Retrieves relevant context
Generates grounded responses

This ensures accuracy, relevance, and traceability.

API Gateway and Lambda (Serverless Backend)

Amazon API Gateway exposes a secure HTTP endpoint
AWS Lambda invokes the Bedrock Agent
Event-driven execution keeps costs low, and scaling is automatic

Frontend Chatbot (Amazon S3 Static Website)

Chatbot UI hosted on Amazon S3
Lightweight and highly available
Optional Amazon CloudFront for HTTPS, caching, and WAF protection

Step-by-step implementation

Step 1: Create the Knowledge Base

Create an Amazon S3 bucket

Head to the AWS Management Console and create an S3 bucket for the documents.

Upload school documents (PDFs, text files, etc.)

Create a Knowledge Base using the S3 bucket. The above S3 bucket will serve as our data source for the knowledge base (managed RAG).

Head over to the Bedrock service page in the AWS Console. Look for the Build section in the left-hand navigation, and select Knowledge Base. From there, the console will guide you through the initial setup steps. We have included screenshots below to walk you through the entire process visually.

Verify that the vector bucket is created by navigating to the following location -

Step 2: Choose the LLM (Amazon Nova Lite)

This is arguably the most critical decision, as your LLM selection is a direct trade-off between operational cost and response quality tailored to your specific use case.

We selected Amazon Nova Lite because it provides:

Low latency
Cost-efficient inference
Strong performance for RAG use cases

It is ideal for real-time conversational workloads.

Amazon Bedrock offers a wide range of foundation models. You can explore the full catalog and its respective capabilities HERE.

Step 3: Configure the Bedrock Agent

With the Knowledge Base ready, the next logical step is to create the Bedrock Agent that will utilize it. You'll find Agents located under the Build section in the left panel. Simply click there and follow the console instructions to define your agent. For a quick visual walkthrough, reference the screenshots provided below.

Create an agent in Amazon Bedrock.

Select Nova Lite as the foundation model.

You can find “Instructions for the Agent” in the GitHub repository HERE.

Attach the Knowledge Base

💡 Tip: Save the agent before attaching the Knowledge Base to avoid configuration errors.

Once the knowledge is attached successfully to the Agent, click on the “Prepare” button as shown in the below screenshot.

Step 4: Backend Integration

Now that our Bedrock Agent is fully configured (which serves as our backend core), we need a public, scalable interface for the frontend chatbot application to securely interact with it.

This interface is a classic serverless pattern: using Amazon API Gateway as the secure HTTP endpoint and AWS Lambda as the compute layer to orchestrate the request. The Lambda function acts as the handler, taking the user's query from the chatbot frontend and passing it directly to the Bedrock Agent.

Create an AWS Lambda function.

The Lambda code is available on the GitHub repository HERE.

Grant bedrock: InvokeAgent permission

The Lambda Execution Role would need permission to invoke an agent.
Below IAM policy is available on the GitHub repository HERE.

Integrate Lambda with Amazon API Gateway

Check our YouTube video for a tutorial on creating an API Gateway and Lambda integration.

Step 5: Deploy the Chatbot Website

The final step is deploying the frontend interface - the actual chatbot widget on the school website - where users will interact with the agent. To achieve maximum availability, scalability, and cost-efficiency, we will host the static assets using Amazon S3 Static Website Hosting.

This approach ensures your chatbot widget is always available on the main school site. The process involves three simple steps:

Bucket Configuration: Create an S3 bucket and enable Static Website Hosting, configuring the appropriate Index and Error documents.
Asset Upload: Upload all HTML, CSS, JavaScript to this S3 bucket.
Access Control: Ensure the bucket policy grants public read access, allowing the website content to be served correctly.

Below files are available on the GitHub repository HERE.

Enable S3 static website hosting and have an appropriate bucket policy.

For a production environment, we highly recommend using Amazon CloudFront with the S3 bucket as its origin. This provides better security, lower latency via the edge network, and allows you to keep the S3 bucket fully private. However, for simplicity in this tutorial, we opted for direct S3 hosting.

Final Outcome

After all the configuration and code deployment, the final and most satisfying result is the fully operational School Assistant chatbot. Below, you can see the assistant handling a couple of real-world queries, demonstrating how it correctly retrieves and grounds answers using the Knowledge Base.

Meet the School Assistant

Uses verified, document-based knowledge
Delivers accurate, explainable answers
Scales automatically with traffic
Runs at minimal cost

This is production-ready RAG with serverless simplicity.

Cleanup

While the components we used (S3, Lambda, API Gateway) are very cost-effective and offer generous free tiers, the key components for cost management are the Amazon Bedrock Agent and its associated resources. If you are finished with your prototype, cleaning up is essential to prevent ongoing charges.

1.1 Delete the Bedrock Agent:

Navigate to the Amazon Bedrock console, find the Agents section, and delete the agent you created (School Assistant).

1.2 Delete the Knowledge Base:

In the Bedrock console, go to Knowledge bases. Delete the Knowledge Base you created.

1.3 Delete the S3 Bucket and Lambda Function:

S3 Buckets: You must first empty the S3 bucket used for both your Knowledge Base data and your static website before you can delete the bucket itself.
Lambda Function: Delete the Lambda function that served as your API handler.

1.4 Delete the API Gateway:

In the API Gateway console, delete the HTTP API that exposed your Lambda handler.

1.5 Review IAM Roles:

Finally, review the IAM Roles created for the Bedrock Agent and the Lambda function. While these generally incur no cost, deleting them is a good security practice to maintain the principle of least privilege.

Conclusion

We have successfully walked through the entire process of building a highly effective, cost-optimized, and scalable School Assistant powered by Amazon Bedrock. By combining the retrieval power of the Knowledge Base with the efficiency of the Amazon Nova Lite model and tying it all together with a serverless API layer, we have created a truly intelligent application. We encourage you to use the code repository we have provided to deploy this solution today.

We’ll continue building on this foundation, explore how the architecture can be extended to support additional real-world use cases, and keep sharing our knowledge along the way.

Bonus for readers preparing for Linux Foundation certifications

If you're exploring certifications in the cloud-native and open source space, the Linux Foundation offers training and exams across areas like Kubernetes, Linux, DevOps, and more.

You can browse their full course catalog here:
https://training.linuxfoundation.org/full-catalog/

If you plan to take one of their exams, you can use this code for 30% off: LNF30.
(Using the code may also support my writing at no extra cost to you.)

Strands Agents in Action: Making AI Practical and Tool-Aware

Girish Mukim — Mon, 01 Sep 2025 23:39:35 +0000

In today’s world, AI is no longer just about generating text. It’s about taking action, making decisions, and bridging the gap between knowledge and execution. While large language models like those on Amazon Bedrock excel at understanding and generating language, they are limited to providing text-based responses. Strands Agents represent the next frontier: intelligent agents that combine the reasoning power of LLMs with the ability to interact with tools, systems, and data.

By embedding tool-awareness directly into the agent, Strands Agents empower developers to build applications that not only respond to queries but also perform meaningful work, whether that’s performing calculations, reading and writing files, or managing time-sensitive operations. They are a paradigm shift in how AI can act as a true assistant in real-world workflows, blending insight with action.

Architecture Overview

The architecture of a Strands Agent can be visualized as a three-step workflow:

User Query – The process starts when a user sends a request or question to the agent.

Strands Agent SDK – The SDK receives the query and decides how to handle it. It determines whether the LLM alone can answer the query or if a tool should be invoked.

LLM and Tools – If a tool is required (e.g., a calculator or file editor), the SDK calls it and integrates its output with the model’s reasoning.

Result from SDK – Finally, the combined response is returned to the user, often including both model reasoning and tool results in a coherent answer.

This architecture allows developers to build intelligent agents that can reason, compute, and act on external data, making Strands Agents a powerful extension of Amazon Bedrock LLMs or any other LLMs for that matter.

Theory isn't enough

Understanding the concepts behind Strands Agents is important, but real mastery comes from getting your hands dirty. In this section, we’ll guide Windows users through installing the Strands Agent SDK and its tools, so you can start building intelligent, tool-enabled applications right away.

Step 1: Install the packages

pip install strands-agents strands-agents-tools boto3

strands-agents → core framework
strands-agents-tools → optional tools (calculator, time, etc.)
boto3 → needed to connect to Amazon Bedrock

You can check strands-agents and strands-agents-tools version.

pip show strands-agents

Name: strands-agents
Version: 1.4.0
Summary: A model-driven approach to building AI agents in just a few lines of code
Home-page: https://github.com/strands-agents/sdk-python
Author:
Author-email: AWS <opensource@amazon.com>
License: Apache-2.0
Location: C:\Users\AppData\Roaming\Python\Python313\site-packages
Requires: boto3, botocore, docstring-parser, mcp, opentelemetry-api, opentelemetry-instrumentation-threading, opentelemetry-sdk, pydantic, typing-extensions, watchdog
Required-by: strands-agents-tools

pip show strands-agents-tools

Name: strands-agents-tools
Version: 0.2.3
Summary: A collection of specialized tools for Strands Agents
Home-page: https://github.com/strands-agents/tools
Author:
Author-email: AWS <opensource@amazon.com>
License: Apache-2.0
Location: C:\Users\AppData\Roaming\Python\Python313\site-packages
Requires: aws-requests-auth, botocore, dill, markdownify, pillow, prompt-toolkit, pyjwt, readabilipy, rich, slack-bolt, strands-agents, sympy, tenacity, tzdata, watchdog
Required-by:

Step 2: Create a minimal agent

File: agent_basic.py

from strands import Agent
from strands.models import BedrockModel

# Connect to Amazon Bedrock model (Nova Lite here, but you can use Claude/Sonnet/etc.)
model = BedrockModel(model_id="amazon.nova-lite-v1:0", region_name="us-east-1")

# Create an agent with ONLY the model (no tools yet)
agent = Agent(model=model)

# Send a simple question
response = agent("Hello! Can you introduce yourself in one line?")
print(response)

Output

C:\Users\aws_strands_agent>agent_basic.py
Hello! I'm an AI designed to assist with a wide range of tasks, from answering questions to providing information and support.Hello! I'm an AI designed to assist with a wide range of tasks, from answering questions to providing information and support.

This is the most basic working Strands Agent:

It calls Amazon Bedrock Nova Lite.
Responds to your input.
No multi-agent or tools yet.

Step 3: Add a simple tool

With preliminary testing out of the way, let's look at how tools can be used using Strands agent. We are still sticking to a single agent. No multi-agent audacious exploration as yet.

*File: agent_with_tool.py *

from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator

# Create Bedrock model
model = BedrockModel(
    model_id="amazon.nova-lite-v1:0",
    region_name="us-east-1"
)

# Create agent with calculator tool
system_prompt = """You are a simple, dedicated calculator. Your sole purpose is to perform mathematical calculations.

Instructions:
1.  Identify: Check if the user's input contains a clear mathematical operation or request for a calculation.
2.  Use Tool: If the input is a calculation, you must use the provided `calculator` tool to solve it. Do not perform the calculation yourself.
3.  Respond Directly: Your output must be the input from user and final numerical result of the calculation. Do not add any extra text, explanations, or conversational filler.
4.  Handle Non-Calculations: If the user's input is not a request for a calculation, you must respond with a single, polite message stating that you can only perform mathematical operations.
5.  Be Concise: Your responses should be as brief and to the point as possible."""

agent = Agent(model=model, system_prompt=system_prompt, tools=[calculator])

# Run the agent with event trace to get detailed results
result = agent("What is 23 * 17?")

# The result is an AgentResult object with 'final_output' and 'events' attributes

print("\n\nFinal Answer:", result)

Output:

C:\Users\aws_strands_agent>agent_with_tool.py
<thinking>The user has requested a simple multiplication operation. I can use the calculator tool to perform this operation.</thinking>

Tool #1: calculator
23 * 17 = 391

Final Answer: 23 * 17 = 391

Keep Strands Agent Library up-to-date

Strands Agents are evolving rapidly, with new features, tools, and bug fixes released frequently. To take full advantage of the latest capabilities, it’s important to keep your installation up-to-date.

Before upgrading, it’s useful to preview what changes an update would bring.

pip install strands-agents --upgrade --dry-run

Requirement already satisfied: strands-agents in c:\users\appdata\roaming\python\python313\site-packages (1.4.0)
Collecting strands-agents
  Downloading strands_agents-1.6.0-py3-none-any.whl.metadata (12 kB)
--
--
Downloading strands_agents-1.6.0-py3-none-any.whl (186 kB)
Would install strands-agents-1.6.0

pip install strands-agents-tools --upgrade --dry-run

Requirement already satisfied: strands-agents-tools in c:\users\appdata\roaming\python\python313\site-packages (0.2.3)
Collecting strands-agents-tools
  Downloading strands_agents_tools-0.2.5-py3-none-any.whl.metadata (41 kB)
--
--
Downloading strands_agents_tools-0.2.5-py3-none-any.whl (277 kB)
Downloading aiohttp-3.12.15-cp313-cp313-win_amd64.whl (449 kB)
Downloading multidict-6.6.4-cp313-cp313-win_amd64.whl (45 kB)
Downloading yarl-1.20.1-cp313-cp313-win_amd64.whl (86 kB)
Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB)
Downloading aiosignal-1.4.0-py3-none-any.whl (7.5 kB)
Downloading frozenlist-1.7.0-cp313-cp313-win_amd64.whl (43 kB)
Downloading propcache-0.3.2-cp313-cp313-win_amd64.whl (40 kB)
Would install aiohappyeyeballs-2.6.1 aiohttp-3.12.15 aiosignal-1.4.0 frozenlist-1.7.0 multidict-6.6.4 propcache-0.3.2 strands-agents-tools-0.2.5 yarl-1.20.1

These commands will show the latest version pip would install without actually upgrading.

You can quickly verify your currently installed versions.

pip show strands-agents | findstr Version
pip show strands-agents-tools | findstr Version

C:\Users\aws_strands_agent>pip show strands-agents | findstr Version
Version: 1.4.0

C:\Users\aws_strands_agent>pip show strands-agents-tools | findstr Version
Version: 0.2.3

Now let's upgrade

pip install --upgrade strands-agents strands-agents-tools

Requirement already satisfied: strands-agents in c:\users\appdata\roaming\python\python313\site-packages (1.4.0)
Collecting strands-agents
  Using cached strands_agents-1.6.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: strands-agents-tools in c:\users\appdata\roaming\python\python313\site-packages (0.2.3)
Collecting strands-agents-tools
  Using cached strands_agents_tools-0.2.5-py3-none-any.whl.metadata (41 kB)
--
--
Installing collected packages: propcache, multidict, frozenlist, aiohappyeyeballs, yarl, aiosignal, aiohttp, strands-agents, strands-agents-tools
  Attempting uninstall: strands-agents
    Found existing installation: strands-agents 1.4.0
    Uninstalling strands-agents-1.4.0:
      Successfully uninstalled strands-agents-1.4.0
  Attempting uninstall: strands-agents-tools
    Found existing installation: strands-agents-tools 0.2.3
    Uninstalling strands-agents-tools-0.2.3:
      Successfully uninstalled strands-agents-tools-0.2.3
Successfully installed aiohappyeyeballs-2.6.1 aiohttp-3.12.15 aiosignal-1.4.0 frozenlist-1.7.0 multidict-6.6.4 propcache-0.3.2 strands-agents-1.6.0 strands-agents-tools-0.2.5 yarl-1.20.1

Verify upgraded versions again -

C:\Users\aws_strands_agent>pip show strands-agents | findstr Version
Version: 1.6.0

C:\Users\aws_strands_agent>pip show strands-agents-tools | findstr Version
Version: 0.2.5

Keeping the SDK current ensures you have access to the latest tools, bug fixes, and improvements. It also guarantees that your scripts leveraging events, tool calls, or advanced integrations will work as intended, without running into errors due to outdated APIs.

Additional Resources

To help you get the most out of Strands Agents, there are several resources that provide documentation, examples, and best practices. One of the most comprehensive references is the official Strands Agents examples page:

Strands Agents Examples – Explore ready-to-run scripts, sample agent configurations, and tool usage examples.
https://strandsagents.com/latest/documentation/docs/examples/

Closing Thoughts

Strands Agents aren’t just about AI that talks—they’re about AI that acts. By combining LLM reasoning with built-in tools, these agents can perform tasks, fetch data, and make your workflows smarter.

Don’t just read about it - try the SDK, experiment with the calculator, file, or time tools, and see how quickly you can turn ideas into action. Strands Agents make AI interactive, practical, and ready for real-world use.

For learning AWS cloud and AI, follow me on YouTube

Solving Oracle Migration Challenges with Amazon RDS Custom

Girish Mukim — Wed, 23 Jul 2025 21:42:01 +0000

This blog is for cloud architects, DBAs, and developers who want to understand how to handle complex Oracle workloads miration to AWS. Whether you're exploring Amazon RDS or facing limitations with standard configurations, this post will help you see how Amazon RDS Custom can be a game-changer.

Migrating large Oracle databases to the AWS cloud can be complex. Even with a well-planned strategy, unexpected issues can appear, especially when dealing with legacy systems or custom configurations.

In this blog, I’ll share how we used Amazon RDS Custom to solve real-world Oracle migration challenges that standard RDS could not handle.

What is an Amazon RDS?

Amazon RDS (Relational Database Service) is a fully managed service that supports popular engines like Oracle, PostgreSQL, MySQL, and more. It helps you avoid the heavy lifting of managing infrastructure, backups, and patching.

Key benefits of Amazon RDS:

Automated backups and snapshots
Multi-AZ high availability
Easy scaling of compute and storage
Built-in monitoring and security

When RDS Isn’t Enough

In our case, we were migrating a 40 TB Oracle database. Everything was going smooth,until we hit a bug during the import process. The fix required applying one-off Oracle patch, which standard RDS does not support.

We also needed:

Guaranteed Restore Points for flashback (not supported in RDS)

Oracle DBAs often rely on Guaranteed Restore Points (GRPs) for performing safe database changes or upgrades. GRPs allow DBAs to quickly rewind the entire database to a known consistent state without restoring from backups, which can be time-consuming. For example, when applying major schema changes or testing potentially risky application deployments, a DBA can create a GRP before changes and roll back within minutes if something goes wrong.

Control over patching schedules

In my client’s environment, DBAs owned the patching process end to end. They did not apply every quarterly Oracle patch as soon as it was released—instead, patches were evaluated for business impact and scheduled according to internal change control policies. Some patches were deliberately deferred to avoid disrupting stable production workloads, while one-off patches—often critical or vendor-recommended—were applied on-demand when needed. This level of control ensured they could balance stability, compliance, and vendor requirements. Moving to Amazon RDS doesn't offer that flexibility and applying one-off patches is not an option.

Access to the OS for tuning and diagnostics

The application running on this Oracle database was a vendor-provided product, and while most of the time it operated smoothly, there were occasional situations where DBAs had to make changes at the operating system level. This could include adjusting kernel parameters, deploying vendor-recommended diagnostic tools, or analyzing OS logs to troubleshoot performance issues specific to the application’s behavior. These interventions were infrequent but critical when required to maintain stability and performance. In a fully managed service like Amazon RDS, direct OS access is not available, making such tuning and troubleshooting scenarios a challenge without alternative approaches.

That’s when we pivoted to Amazon RDS Custom.

What is Amazon RDS Custom?

Amazon RDS Custom is a managed database service that gives you more control over the database and operating system. It’s ideal for workloads that need custom configurations, patches, or legacy support. Amazon RDS custom supports Oracle and SQL server database engines as of writing this blog.

What makes RDS Custom different:

Access to the OS via SSH or Systems Manager
Ability to apply custom patches
Install agents or modify OS packages
Use advanced Oracle features like Flashback and Data Guard

RDS vs. RDS Custom – Key Differences

How to Set Up Amazon RDS Custom for Oracle?

Let's break it down into simplified steps:

1. Set Up the Environment

Create or reuse a KMS key

Amazon RDS Custom for Oracle requires encryption at rest, which uses AWS Key Management Service (KMS). You can either create a new KMS key or reuse an existing one. The KMS key must be customer-managed, and you must ensure the RDS Custom service has permissions to use it. This key encrypts database storage, automated backups, read replicas, and snapshots.

Extract CloudFormation templates for RDS Custom

RDS Custom deployments use CloudFormation templates to create and manage the necessary AWS resources in your account. These templates define the instance, networking, and supporting infrastructure. AWS provides downloadable templates so you can customize and deploy them according to your environment and compliance requirements.

Configure IAM role

RDS Custom for Oracle requires specific IAM roles that allow the service to manage the underlying EC2 instance, Systems Manager (SSM) sessions, and related automation workflows.

2. Create a Custom Engine Version (CEV)

Upload Oracle software to Amazon S3

Before you begin creating a custom engine version, you need to gather the required Oracle installation files and patches. Download the relevant Oracle database software patches and installation media from the Oracle Support portal, ensuring you have all necessary components for your desired version. Once downloaded, upload these files to an Amazon S3 bucket. During the CEV creation process, you will specify the path to this S3 bucket, allowing CEV creation workflow to access the required Oracle binaries and patches.

Create the CEV and launch the DB instance

With your Oracle software now in S3, navigate to the Amazon RDS Console and select the option to create a custom engine version. Point to the S3 location containing your Oracle installation files. The system will validate the files before proceeding. After the custom engine version is successfully created, you can use it as the base to launch a new RDS Custom for Oracle DB instance, giving you a tailored environment that meets your application and compliance needs.

3. Customize and Connect

Access the host via SSH or Session Manager

To access the host of an Amazon RDS Custom for Oracle instance, you can use SSH or AWS Systems Manager Session Manager. You can use Putty to connect using key pair as you would do for any other EC2 server.However Systems Manager lets you start a browser-based shell session without needing direct network connectivity or key management. Both methods give you direct operating system access for advanced troubleshooting, configuration, or installing custom tools—capabilities not available with standard RDS instances.

Connect your application to the DB endpoint

To connect your application to the Amazon RDS Custom for Oracle DB instance, use the provided DB endpoint as the hostname in your database connection string along with the appropriate port and database identifier. This enables your application to communicate directly with the managed database instance.

If you're a visual learner, I've created a video just for you, demonstrating how to provision RDS Custom for Oracle with clear, step-by-step instructions.

This flexibility offered by Amazon RDS custom for Oracle helped us complete the migration without compromising stability or supportability.

Before you decide, Few questions to ask yourself -

Do you require direct OS-level (root/SSH) access to the database server?
Do you need to apply custom Oracle patches, use specific Oracle features, or install third-party agents that require OS access?
Do your DBAs need precise control over patching, upgrades, or custom monitoring deployments?
Is your application highly customized and doesn't fit a fully managed database environment?

Final Thought

Amazon RDS is great for most use cases. But when you need more control—especially for Amazon RDS Custom gives you the flexibility of EC2 with the benefits of a managed service.

If you're planning a complex Oracle migration, consider RDS Custom early in your design. It might save you from a painful pivot later.

AWS Cost Estimate Tracker: Enhanced Insights with Generative AI

Girish Mukim — Sun, 15 Sep 2024 16:17:14 +0000

The AWS Pricing Calculator is an online tool provided by Amazon Web Services (AWS) that helps estimate the cost of using various AWS services. Whether you’re just starting a small project or planning a large-scale deployment, this tool gives you the ability to calculate monthly costs for a range of services, from simple EC2 instances to complex architectures using multiple AWS services.

While it's incredibly useful, the AWS Pricing Calculator has a few limitations:

Link Management: Every time you create or update an estimate, a unique link is generated. If you're working on multiple versions of the same project (like adding a production environment after initially estimating only non-production costs), you'll quickly end up with multiple links. Keeping track of these can become cumbersome.

Changing Prices: AWS service prices are not static. As AWS updates their pricing, estimates might change when you revisit a link. This can lead to confusion if you want to compare the old cost with the new prices.

Link Expiration: AWS pricing calculator links are active for up to three years. After that, you lose access to the data unless you’ve saved a copy in another format (CSV, PDF, or JSON).

AWS Cost Estimate Tracker (ACET)

Let's solve the problem of managing and tracking multiple pricing calculator links for different scenarios. This tool will help you stay organized by keeping all those links, and PDFs in one place so that you don’t have to worry about losing track of any estimate, even as you work on different projects or adjust your pricing estimates over time.

The idea is to provide a simple, efficient way to store and access past estimates, track changes, and even compare how service prices have evolved since the estimate was originally made. I'm sharing this concept as a way to help others who might be facing the same challenges.
No conversation is complete without generative AI these days, and APCT is keeping up with the trend. My goal is to leverage AWS Bedrock to analyze estimates and deliver actionable insights and recommendations.

Features of AWS Cost Estimate Tracker (ACET)

User Management
Users will be able to sign up and manage their accounts securely using Amazon Cognito User Pools. Once logged in, you can access your personalized dashboard, where you can view and manage all the AWS Pricing Calculator links you've tracked, along with the associated information.

Track Links with Project and Version Information
Keep a detailed record of your AWS Pricing Calculator links organized by project. You can add descriptive information for each link, including version details (e.g., initial estimate, added production environment, etc.), making it easier to track different pricing scenarios over time.

Upload and Download Files
For each pricing estimate, you’ll have the ability to upload and download PDF files. This ensures you maintain a snapshot of the original pricing data even if AWS service prices change. You can reference these files later for comparison or documentation purposes, preserving the historical cost estimates for your projects.

GenAI Analysis
GenAI Analysis is another feature of the application that uses generative AI to analyze AWS estimates and provide actionable insights and recommendations. It simplifies understanding complex cost structures and helps identify optimization strategies efficiently.

Solution Architecture

Building Blocks:

Webpage Hosted on S3 with Static Web Hosting Enabled
The front-end of the application is hosted on an S3 bucket with static website hosting enabled. This allows for a cost-effective and scalable way to serve the application’s HTML, CSS, JavaScript, and other static assets directly from S3.

CloudFront Distribution
To improve the performance and security of the web application, we are using Amazon CloudFront as a content delivery network (CDN). CloudFront distributes the content globally, reducing latency for users and ensuring fast load times by caching assets closer to the user's location. The origin for the CloudFront distribution is the S3 bucket where the static website is hosted.

User Management with Amazon Cognito User Pool
User authentication and management are handled through Amazon Cognito User Pools. Users can sign up, log in, and manage their accounts securely. The user pool also enables features such as multi-factor authentication (MFA), account recovery, and password management, ensuring robust security for user information. Once logged in, users can access their personalized dashboard to track and manage AWS Pricing Calculator links.

DynamoDB as a Data Store
All data related to AWS Pricing Calculator links, project versions, and associated files (CSV, PDF, JSON) are stored in an Amazon DynamoDB table. This NoSQL database provides fast, scalable, and reliable storage. Each user can store project-related data, including:

AWS Pricing Calculator links
Descriptions and versioning information
References to uploaded files (PDF)

DynamoDB ensures fast read/write operations and can scale automatically to handle increasing amounts of data.

Amazon Textract
Amazon Textract is utilized in this solution to extract structured text and data from uploaded PDF documents containing project estimates. It efficiently processes the content, identifying text lines, tables, and key-value pairs to extract meaningful information. This enables seamless downstream analysis.

Amazon Bedrock
In this solution, Amazon Bedrock is leveraged to analyze extracted project estimates and generate actionable insights and cost optimization recommendations. Using its advanced large language models (LLMs), Bedrock processes the structured data to identify patterns, inefficiencies, and potential savings.

This architecture provides a fully serverless solution with high performance, scalability, and security, making it a solid foundation for the prototype as it continues to evolve.

How to use this application?

Have your AWS projects estimates completed using AWS Pricing calculator.
Get the sharable link and download estimates in a PDF format.
SignUp or SignIn into the application.
Complete a simple form to add your project estimates on right section; estimates will be rendered at runtime on the left section. Any changes for the same project will add a separate entry as a new version.
Download PDF files or go to calculator link from the estimates.
Click on an Anazlyze button for the specific estimate to generate Generative AI analysis.

You would need to setup few resources manually and then refer this
Github Repo.

S3 Bucket with static web hosting. Use code from repo.
Cloudfront distribution with above S3 bucket as origin
Amazon Cognito user pool and identity pool
Dynamodb tables awscalcculatorhistorytracker - To track of calculator links and estimate files. Partition key: email (String) & Sort key: project_name_version (String) EstimatesAnalysis - To save GenAI analysis for future use. Partition key: project_version (String)
Lambda Function - code is available in repo AnalyzeEstimate.py
API gateway to expose the Lambda function as HTTP API.

I plan to work on an IaC Terraform template to create these resources in the future, but I’m not there yet. I’d be thrilled to collaborate with Terraform experts who are interested!

Watch this video to see the tool in action. -

Optimizing Oracle Database Migration to Amazon RDS with EFS Integration

Girish Mukim — Mon, 29 Jul 2024 01:01:12 +0000

The idea for this blog came from my recent experience with migrating an Oracle database from on-premises to Amazon RDS for Oracle. This involved moving a 30TB database using the Oracle native datapump utility. The goal was to conduct a proof-of-concept (POC) to see if Amazon RDS could be a viable solution and to ensure the application performance met expectations.

The POC results were encouraging, but migrating such a large database using datapump with S3 integration was problematic. The process involved moving the dump file to an S3 bucket, enabling S3_INTEGRATION on the RDS database instance, transferring the files from S3 to RDS storage, and then importing from there. The backup dump files needed around 16TB of storage, which became part of the Oracle RDS and couldn't be reclaimed after deleting the dump file. This leads to significant costs since storage is not cheap. Refer Amazon S3 integration.

A better solution is to use EFS_INTEGRATION. The storage with EFS is external to the database, but you can still import from the EFS mount. Although EFS is costlier than S3, it avoids consuming database storage for a long time.

Let's consider the cost implications -

The message is clear: RDS storage is too expensive to use solely for migration and leave it there. I'm showing the cost for one month of storage, but it could take longer before the storage can be released or used. For example, 12 months of RDS storage for 16TB would cost $45,219.84 ($3,768.32 per month).

While migration performance is important, the high cost makes a strong case for using EFS. The performance aspect can be discussed next time.

Let's now look at what EFS Integration would look like and how to perform export and import using data pump utilities.

Pre-requisite for this tutorial -

Amazon RDS for Oracle database instance
Elastic File System (EFS) target mount
EC2 instance with Oracle database client installed

Please reach out to me if you require help with the above setup. I'll include configurations for each but won't delve into details on how to set them up.

*Amazon RDS for Oracle *

You can choose to create an EC2 server for RDS connectivity. This setup will include security groups to enable connectivity from the EC2 server to the RDS database on port 1521. We'll use this server as the Oracle database client.

*Elastic File System *

I am using us-east-1a and us-east-1b for all resources to keep traffic within the same Availability Zones, avoiding inter-AZ data transfer costs.

*EC2 instance with Oracle database client installed *

Connect to the RDS instance from EC2 server (DB client)

You'll have to install Oracle client. The easiest way is to download client home from Oracle website (https://www.oracle.com/database/technologies/oracle19c-linux-downloads.html) and unzip.



[oracle@ip-10-0-12-223 ~]$ cd $ORACLE_HOME/network/admin/
[oracle@ip-10-0-12-223 admin]$ cat tnsnames.ora
demodb = (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST= demodb.ceb9h7hgowbh.us-east-1.rds.amazonaws.com)(PORT=1521))(CONNECT_DATA=(SID= demodb)))
[oracle@ip-10-0-12-223 admin]$

[oracle@ip-10-0-12-223 admin]$ sqlplus admin@demodb

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Jul 27 02:40:19 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Enter password:
Last Successful login time: Fri Jul 26 2024 00:35:18 +00:00

Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.23.0.0.0

SQL> select name,open_mode from v$database;

NAME      OPEN_MODE
--------- --------------------
DEMODB    READ WRITE

SQL>

EFS Integration & using datapump export-import

1. First, create a new option group and add EFS_INTEGRATION.

Note that you would need EFS ID to add EFS Integration. Also USE_IAM_ROLE is set to TRUE, so IAM role should be associated with RDS instance and should have access to EFS mount.

2. Modify RDS instance to use newly created option group.

Click on Modify.

Click on continue. Choose "Apply Immediately" and modify DB instance.

3. Create IAM role and add permissions for EFS.

Role Name: efs-integ-role-for-rds

Review and create role.

4. Add this role to RDS DB instance for Feature EFS_INTEGRATION.

5. Mount EFS on EC2 server (DB client).

Get mount command from EFS

connect to server as root



mkdir /efsdir
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport 10.0.5.27:/ /efsdir

cd /efsdir
mkdir datapump
chown oracle /efsdir/datapump
chgrp oinstall /efsdir/datapump
chown 777 -R /efsdir/datapump

Entry can be added to /etc/fstab to have EFS mouted across reboot.

6. Next you must create an Oracle directory on Amazon RDS for Oracle.

These commands are run by admin user, other users may need proper privileges to run these commands. Note that file system path must begin with /rdsefs-.

BEGIN
rdsadmin.rdsadmin_util.create_directory_efs(
p_directory_name => 'DATA_PUMP_DIR_EFS',
p_path_on_efs => '/rdsefs-fs-0965c2a7d95fc9e06/datapump');
END;
/

fs-0965c2a7d95fc9e06 is EFS ID.

7. Verify that the database can write a file.



SQL> declare
  f utl_file.file_type;
begin
  f := utl_file.fopen ('DATA_PUMP_DIR_EFS', 'test.txt', 'w');
  utl_file.put_line(f, 'test');
  utl_file.fclose(f);
end;
/
  2    3    4    5    6    7    8
PL/SQL procedure successfully completed.

SQL> !ls -l /efsdir/datapump/test.txt
-rw-r--r--. 1 3001 101 5 Jul 28 23:06 /efsdir/datapump/test.txt

SQL>

8. Create a test table, admin.demotable, to perform Data Pump export and import steps.



[oracle@ip-10-0-12-223 ~]$ sqlplus admin@demodb

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Jul 28 22:56:51 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Enter password:
Last Successful login time: Sun Jul 28 2024 21:44:33 +00:00

Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.23.0.0.0

SQL> create table demotable (a number);

Table created.

SQL> insert into demotable values (1);

1 row created.

SQL> c/1/2
  1* insert into demotable values (2)
SQL> /

1 row created.

SQL> commit;

Commit complete.

SQL> select * from demotable;

         A
----------
         1
         2

SQL>

9. Export the table to EFS mount using the directory created earlier.

Run this as oracle OS user.



[oracle@ip-10-0-12-223 ~]$ expdp admin@demodb tables=admin.demotable directory=DATA_PUMP_DIR_EFS dumpfile=demotable.dmp logfile=expdp_demotable.log

Export: Release 19.0.0.0.0 - Production on Sun Jul 28 23:08:29 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.
Password:

Connected to: Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Starting "ADMIN"."SYS_EXPORT_TABLE_01":  admin/********@demodb tables=admin.demotable directory=DATA_PUMP_DIR_EFS dumpfile=demotable.dmp logfile=expdp_demotable.log
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "ADMIN"."DEMOTABLE"                         5.062 KB       2 rows
Master table "ADMIN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for ADMIN.SYS_EXPORT_TABLE_01 is:
  /rdsefs-fs-0965c2a7d95fc9e06/datapump/demotable.dmp
Job "ADMIN"."SYS_EXPORT_TABLE_01" successfully completed at Sun Jul 28 23:08:56 2024 elapsed 0 00:00:19

[oracle@ip-10-0-12-223 ~]$

That's great. we can use normal expdp utility that DBAs are quite comfirtable with.

10. Let's take it a step further and perform the import.

Drop table admin.demotable.



SQL> drop table admin.demotable;

Table dropped.

SQL> select * from admin.demotable;
select * from admin.demotable
                    *
ERROR at line 1:
ORA-00942: table or view does not exist

SQL>

Import table admin.demotable



[oracle@ip-10-0-12-223 ~]$ impdp admin@demodb tables=admin.demotable directory=DATA_PUMP_DIR_EFS dumpfile=demotable.dmp logfile=impdp_demotable.log

Import: Release 19.0.0.0.0 - Production on Sun Jul 28 23:13:28 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.
Password:

Connected to: Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Master table "ADMIN"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "ADMIN"."SYS_IMPORT_TABLE_01":  admin/********@demodb tables=admin.demotable directory=DATA_PUMP_DIR_EFS dumpfile=demotable.dmp logfile=impdp_demotable.log
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "ADMIN"."DEMOTABLE"                         5.062 KB       2 rows
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Job "ADMIN"."SYS_IMPORT_TABLE_01" successfully completed at Sun Jul 28 23:14:08 2024 elapsed 0 00:00:32

[oracle@ip-10-0-12-223 ~]$

[oracle@ip-10-0-12-223 ~]$ sqlplus admin@demodb

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Jul 28 23:33:16 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Enter password:
Last Successful login time: Sun Jul 28 2024 23:32:56 +00:00

Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.23.0.0.0

SQL> select * from admin.demotable;

         A
----------
         1
         2

SQL>

Reference documents -

Amazon EFS integration

Integrate Amazon RDS for Oracle with Amazon EFS