DEV Community: Volodymyr Marynychev

Building Persistent Memory for Kiro with Bash Hooks

Volodymyr Marynychev — Sat, 31 Jan 2026 22:00:01 +0000

Making Kiro learn from every session

The Problem

A few months into using Kiro and Kiro CLI daily, I wanted to review my week.

What problems did I solve? What approaches worked? What did I learn?

Kiro has conversation persistence: you can resume chats, save and load sessions. That's useful. But it's not what I was looking for.

I didn't want to scroll through old conversations. I wanted the insights extracted. The patterns. The solutions that worked and why.

Tuesday's debugging session taught me something about Terraform state locks. But that lesson was buried in a 200-message conversation. Next time I hit the same issue, would I remember to search for it? Would I even remember it happened?

Kiro gives you great infrastructure: agents, hooks, steering files. But no system for capturing what you learn and surfacing it when relevant.

I started asking: What if I built that layer on top of what Kiro already provides?

The Idea

I found Daniel Miessler's PAI project - Personal AI Infrastructure for Claude. It clicked.

The idea: an AI assistant that learns from your work and applies that knowledge to future tasks.

What if I could build something similar for Kiro? A layer that:

Captures learnings as they happen
Knows my projects and priorities
Follows a consistent problem-solving approach

Not a smarter model. A system that learns.

Kiro already has the building blocks: hooks that run at key moments, steering files for persistent context, agent configurations. I just needed to wire them together.

What I Built

PILOT is an agent for Kiro that adds three things:

1. The Algorithm

Every task follows seven phases:

OBSERVE → THINK → PLAN → BUILD → EXECUTE → VERIFY → LEARN

Not revolutionary. It's how experienced engineers already work. But making it explicit means the AI follows it consistently.

The key insight: Define success criteria before executing. Most people skip this. They try something, vaguely check if it worked, move on.

2. Memory

PILOT captures solutions, but only after verification confirms they work.

~/.pilot/learnings/
├── 2026-01-15_terraform-state-lock-fix.md
├── 2026-01-14_lambda-timeout-optimization.md
└── 2026-01-12_git-merge-conflict-pattern.md

Next time you hit a similar problem, PILOT surfaces the past solution.

For semantic search over learnings, PILOT uses Kiro's /knowledge feature. Without it, PILOT still works but uses keyword matching instead.

Unverified solutions are guesses. Verified solutions are knowledge.

3. Identity

Optional markdown files that give PILOT context about you:

~/.pilot/identity/
├── MISSION.md       # What you're building
├── GOALS.md         # Current objectives
├── PROJECTS.md      # Active work
└── STRATEGIES.md    # Approaches that work for you

You don't have to fill these manually. PILOT observes your work and gradually populates them. The more you use it, the more context it captures: projects you work on, challenges you face, strategies that work for you.

Without identity: "Here's how to fix this Lambda timeout"

With identity: "Given your focus on cost optimization, consider this approach you used last month..."

How It Works

PILOT uses four Kiro features: hooks, resources, agents, and the experimental knowledge base.

Resources load context into the agent. Kiro supports three types:

file:// resources load directly at startup (identity files like MISSION.md, GOALS.md)
skill:// resources load on-demand when relevant (the algorithm, principles)
knowledgeBase resources enable semantic search over indexed content

PILOT indexes your learnings folder as a knowledge base. When you ask about something, it can find related past solutions even if the keywords don't match exactly.

This keeps context lean. The agent sees your mission and goals immediately, but only loads the full algorithm documentation when it needs to reference it.

Hooks run scripts at key moments. Kiro provides the trigger points, PILOT provides the scripts:

┌─────────────────────────────────────────────────────────────┐
│                      Kiro Session                           │
│                                                             │
│   Kiro Event              PILOT Script                      │
│   ──────────              ────────────                      │
│                                                             │
│   agentSpawn        →     agent-spawn.sh                    │
│                           Load identity, past learnings     │
│                                                             │
│   userPromptSubmit  →     user-prompt-submit.sh             │
│                           Search relevant patterns          │
│                                                             │
│   preToolUse        →     pre-tool-use.sh                   │
│                           Validate before execution         │
│                                                             │
│   [Tool executes]                                           │
│                                                             │
│   postToolUse       →     post-tool-use.sh                  │
│                           Capture results                   │
│                                                             │
│   stop              →     stop.sh                           │
│                           Archive session, save learnings   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

 Kiro provides:          PILOT provides:
 • Hook trigger points   • Scripts that run at each hook
 • Agent system          • Learning capture logic
 • Steering files        • Memory management
                         • Identity context

Agents tie it together. The pilot.json configuration defines:

Which hooks to run and when
Which resources to load
The system prompt with the algorithm phases
Tool permissions and safety rules

Hooks must exit 0. A crashed hook breaks the session.

It's all shell scripts. No TypeScript, no build tools, no dependencies.

Why bash? Three reasons:

No runtime startup cost
Works on macOS and Linux out of the box
Plain text scripts - open any file and see exactly what it does

What Changed

Persistent learning compounds.

Early on, the benefit is small: a few captured solutions, some context loaded at startup. But each session adds to the knowledge base. Each verified fix becomes available for next time.

After a few weeks, the difference becomes noticeable. Not because any single feature is transformative, but because the system remembers what you've already figured out.

The Tradeoffs

What works well:

Learning capture is automatic
Past solutions surface when relevant
The algorithm keeps work structured
Bash is fast and debuggable

What's limited:

Each project has isolated memory (no cross-project patterns yet)
Learning detection is keyword-based, not ML
Semantic search requires enabling Kiro's /knowledge feature

What's missing:

Team collaboration (your learnings stay yours)
Visual dashboard
Cross-project pattern sharing

This is an MVP. Try it and see if it fits your workflow.

Try It

git clone https://github.com/requix/pilot
cd pilot
./install.sh

For semantic search over learnings (optional but recommended):

kiro-cli settings chat.enableKnowledge true

Then: kiro-cli --agent pilot

The code is simple. The impact compounds over time.

Questions? Open an issue. The system improves through use.

Why an AWS Architect Built Azure Powers for Kiro (And What I Learned)

Volodymyr Marynychev — Mon, 26 Jan 2026 20:15:38 +0000

How I used Kiro powers to bridge my cloud platform knowledge gap

Quick Start: Just Try It

Want to skip the story and experiment with power?

UPD: Now, all three powers are available in the Kiro Community Library.

You can install them directly from the library or follow the next step to install them manually from the GitHub repository.

Open Kiro IDE → Powers panel → Add power from GitHub → Enter the URL for the power you want:

Power	URL
azure-architect	`https://github.com/requix/azure-kiro-powers/tree/main/azure-architect`
azure-operations	`https://github.com/requix/azure-kiro-powers/tree/main/azure-operations`
azure-monitoring	`https://github.com/requix/azure-kiro-powers/tree/main/azure-monitoring`

Click Install.

⚠️ Before installing: These are third-party powers, not official Kiro or Microsoft tools. Review the repository and code before installing.

No authentication needed for azure-architect. Start with: "What are the best practices for Azure storage account security?"

Have fun.

If you're interested in building Kiro powers yourself, the rest of this post walks through the development process, design decisions, and what I learned.

The Context

You never know what the next project brings you.

I'm an AWS cloud architect. Have been for years. Then I joined a project running entirely on Azure. Different services, different naming conventions, same deadline pressure.

Here's the irony: I reached for Kiro - an AWS IDE - to help me learn Azure. An AWS architect using an AWS tool to work with Microsoft's cloud. The cloud world is strange sometimes.

But it made sense. I didn't have months to follow the classic learning path - certifications, documentation deep-dives, sandbox experiments. I needed to ship. I needed Azure knowledge in context, at the moment of need, without constantly switching between documentation tabs and my development environment.

So I built first, learned along the way. Three Kiro powers for Azure. Each design choice taught me how Azure actually works.

What Kiro Powers Are

Powers are Kiro's extension system. They solve two problems that traditional MCP setups create:

Without framework context, agents guess. Your agent can call Azure APIs, but does it know the right patterns? Without built-in expertise, you're both manually reading documentation and refining approaches until the output is right.

With too much context, agents slow down. Connect five MCP servers and your agent loads 100+ tool definitions before writing a single line of code. Five servers might consume 50,000+ tokens - 40% of your context window - before your first prompt. More tools should mean better results, but unstructured context overwhelms the agent.

Powers fix this through dynamic loading. Instead of loading all MCP tools at once, powers activate based on keywords in your conversation. Mention "Azure architecture" and the azure-architect power loads. Switch to deployment topics and azure-operations activates.

A power consists of:

POWER.md - Required. Contains frontmatter (metadata, keywords for activation) and instructions (onboarding steps, steering guidance)
mcp.json - Optional. MCP server configuration for tool integrations
steering/ - Optional. Workflow-specific guidance files
hooks/ - Optional. Automated tasks that run on IDE events or via slash commands

My Azure powers use the first three. Hooks are useful for validation workflows or automated setup tasks - something to explore in future iterations.

Three Powers, Not One: The Design Decision

The official Azure MCP Server has dozens of namespaces. Loading everything at once would defeat the purpose of powers - you'd be back to context overload.

I split it into three powers based on workflow phases:

┌─────────────────┐     ┌──────────────────┐     ┌───────────────────┐
│ azure-architect │ ──▶ │ azure-operations │ ──▶ │ azure-monitoring  │
│                 │     │                  │     │                   │
│ "Design it"     │     │ "Build & run it" │     │ "Watch & fix it"  │
│ Design tools    │     │ Resource mgmt    │     │ Observability     │
└─────────────────┘     └──────────────────┘     └───────────────────┘

azure-architect: Best practices, architecture guidance, documentation search, schema references. Design-time namespaces only.

azure-operations: Storage, databases, RBAC, Key Vault, AKS management. Resource management namespaces.

azure-monitoring: Log Analytics, metrics, alerts, resource health. Observability namespaces.

The primary benefit is focus. Each power loads only the tools relevant to that workflow phase. When you're designing infrastructure, you don't need monitoring tools consuming context. When you're debugging production, you don't need architecture best practices.

When you install all three powers, Kiro automatically selects the right one based on your request. Ask about Azure best practices, it uses architect. Query storage accounts, it switches to operations. Check resource health, it activates monitoring.

A secondary benefit is authentication separation. The architect power works without az login - useful for design work on a fresh machine. The operations and monitoring powers require authentication, with monitoring limited to read-only namespaces.

⚠️ Note on permissions: Your Azure permissions come from az login. If you authenticate with write access, that access exists regardless of which power is active. The powers organize workflows; your Azure RBAC controls what's actually permitted.

Building azure-architect: The MCP Configuration

The mcp.json file defines which MCP servers and namespaces a power uses:

{
  "mcpServers": {
    "microsoft-docs": {
      "type": "http",
      "url": "https://learn.microsoft.com/api/mcp"
    },
    "azure-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "@azure/mcp@latest",
        "server",
        "start",
        "--namespace", "documentation",
        "--namespace", "bicepschema",
        "--namespace", "cloudarchitect",
        "--namespace", "bestpractices"
      ],
      "env": {
        "AZURE_MCP_COLLECT_TELEMETRY": "false"
      }
    }
  }
}

Two MCP servers. Microsoft Learn for documentation search. Azure MCP for design tools.

The --namespace flags are where token efficiency happens. Without them, the Azure MCP server loads all namespaces. By specifying only design-time namespaces, this power stays focused - loading exactly what's needed for architecture work, nothing more.

Steering Files: Teaching Kiro How to Think

MCP connections give Kiro access to tools. Steering files teach it when and how to use them.

Here's a section from the naming conventions steering file:

## Azure Resource Naming Pattern

{resource-type}-{workload}-{environment}-{region}-{instance}

Examples:
- st-payments-prod-westeu-001 (storage account)
- kv-payments-prod-westeu-001 (key vault)
- aks-payments-prod-westeu-001 (kubernetes cluster)

When user asks to create a resource:
1. Ask for workload name if not provided
2. Infer environment from context or ask
3. Apply naming pattern automatically

This isn't just reference material. It's encoded behavior. When I ask Kiro to create a storage account, it doesn't just generate code - it asks clarifying questions and applies the naming pattern automatically.

The steering files also include ready-to-use patterns. From the KQL patterns file:

// Error Rate Calculation
AppServiceHTTPLogs
| where TimeGenerated > ago(1h)
| summarize 
    TotalRequests = count(),
    ErrorCount = countif(ScStatus >= 500),
    ErrorRate = round(countif(ScStatus >= 500) * 100.0 / count(), 2)

// Response Time Percentiles
AppServiceHTTPLogs
| where TimeGenerated > ago(6h)
| summarize 
    p50 = percentile(TimeTaken, 50),
    p95 = percentile(TimeTaken, 95),
    p99 = percentile(TimeTaken, 99)
  by bin(TimeGenerated, 15m)

These aren't just examples. They're templates Kiro adapts to specific queries. When I ask "show me slow requests from the last hour," Kiro modifies the percentile pattern with my timeframe.

Real Usage: What Actually Gets Used

After completing an IaC authoring task with the azure-architect power, I asked Kiro to analyze its own tool usage. Here's the honest breakdown:

Tool	Used	Value
`microsoft_docs_search`	✅	High - Found exact configuration patterns and integration details
`get_azure_bestpractices`	✅	Medium - General Azure coding guidelines
`azureterraformbestpractices`	✅	Medium - Validation workflow patterns
`bicepschema`	✅	Discovered it exists, didn't use for this task

The verdict from Kiro: "Documentation search was the killer feature. Worth having for the documentation search alone."

What made the documentation search valuable wasn't generic information - it was concrete implementation details: exact API endpoint formats, available metrics and thresholds, configuration patterns for specific integrations.

The best practices tools confirmed patterns but didn't provide service-specific guidance. Moderately useful, not transformative.

What this reveals about the three-power design:

Kiro noted that operational tools (subscription queries, live resource inspection) weren't needed for this IaC authoring task. Those capabilities "would shine more in a 'diagnose my existing infrastructure' scenario rather than 'author new IaC.'"

This is exactly why the powers are separated. The architect power handles design-time work with minimal context overhead. The operations and monitoring powers exist for when you need to interact with live resources.

Different workflows. Different tools. Focused context.

Security Steering: Making Best Practices Unavoidable

The azure-operations power includes security guidelines that encode least privilege into every workflow:

## Least Privilege Patterns

### Pattern 1: Application Access to Storage

**Instead of:** Contributor role on storage account
**Use:** Storage Blob Data Contributor on specific container

### Pattern 2: Application Access to Key Vault

**Instead of:** Key Vault Contributor
**Use:** Key Vault Secrets User (read-only) or Key Vault Secrets Officer (read/write)

### Pattern 3: CI/CD Pipeline Access

**Instead of:** Contributor on subscription
**Use:** Contributor on specific resource groups + specific data plane roles

When I ask about access management, Kiro frames answers in terms of principals, definitions, and scopes. The steering file made it harder to give overly permissive advice.

The Development Process: Using Kiro to Build Kiro Powers

Here's the meta part: I used Kiro to build these powers.

The Kiro team maintains a power-builder power specifically for creating new powers. Install it, and Kiro becomes your power development assistant.

Spec Mode for Requirements

Kiro's Spec mode generates structured plans from descriptions. I described what I wanted - three workflow-aligned powers with namespace separation - and Spec mode produced:

Requirements documents for each power
File structure recommendations
Task lists for implementation

The Iteration Loop

Define MCP configuration - Which namespaces to include
Write steering files - Patterns, workflows, decision trees
Install power locally - Powers panel → Add power from Local Path
Test with real queries - "List my storage accounts"
Check tool availability - Verify expected tools load
Refine based on gaps - Fix tool names, add missing patterns

Step 5 caught several issues. The Azure MCP uses azmcp_ prefixes for tool names. My early documentation referenced incorrect names. Testing revealed the mismatch.

Local Installation

Installing during development:

Open Kiro's Powers panel
Click Add power from Local Path
Select the power directory containing POWER.md
Click Install

No build step. No packaging. Direct folder reference. Change a steering file, reload the power, test immediately.

Once ready, push to a public GitHub repository and others can install via Add power from GitHub using the URL to the specific power folder.

What Actually Matters

Powers aren't just a packaging format. They're a model for how AI agents should acquire expertise.

The old approach: stuff everything into context upfront. Hope the agent figures out what's relevant. Watch token costs climb while response quality drops.

The new approach: agents learn what they need, when they need it. Expertise flows in on demand. Context stays focused. The agent expands its capabilities as the tools around it evolve.

This matters beyond my Azure learning curve. HashiCorp built their Terraform power in days after learning about the format. Stripe, Supabase, Datadog - all shipping domain expertise as installable packages. The pattern scales.

For tool providers: Write one POWER.md, and your expertise reaches every developer using powers. No maintaining separate integrations for each AI tool.

For teams: Package internal knowledge - your design system, your deployment patterns, your security policies - as powers. Every developer's agent knows how to use them correctly.

For individuals: Install the expertise you need today. Uninstall when you're done. Your agent's capabilities match your current project, not some generic average.

This is what separates useful AI assistance from the "chat with docs" experience. Not just answering questions. Bringing the right context at the right moment, then getting out of the way.

Documentation teaches concepts. Powers teach workflows. The difference is action.

The Future: Cross-Tool Compatibility

Today, powers work in Kiro IDE. The team is building toward a future where powers work across any AI development tool - Kiro CLI, Cursor, Claude Code, and beyond.

The Model Context Protocol provides a standard for tool communication. Powers extend this with standards for packaging, activation, and knowledge transfer.

This matters for the ecosystem. Tool providers don't want to maintain separate integrations for each AI tool. Write one POWER.md, use it anywhere.

I'm particularly interested in Kiro CLI support. Running these powers from a terminal would match my actual workflow better than the IDE interface.

Tradeoffs

Cognitive Distance from the Platform

Using powers means interacting with Azure through an abstraction layer. For learning fundamentals, this might hide important details.

MCP Server Dependency

These powers depend on Microsoft maintaining the Azure MCP Server. Version updates have already changed tool naming conventions, requiring documentation updates across all three powers.

Steering File Maintenance

Steering files encode current best practices. Azure evolves. The files need periodic updates to stay relevant.

What I'd Change

Better Tool Discovery

You need to know what tools exist to use them effectively. A more systematic discovery mechanism would help - something that surfaces available capabilities based on what you're trying to accomplish.

Add Hooks for Validation

The powers currently don't use hooks. Adding automated validation, like checking Terraform syntax before deployment or verifying RBAC configurations, would make the workflow tighter.

What I Learned

Building these powers taught me things that reading documentation wouldn't have:

About Azure: Working with the MCP namespaces forced me to understand how Azure organizes its services. The separation between control plane and data plane operations became obvious when I had to decide which namespaces each power needed. You learn a platform's structure by building tools for it.

About Powers: The format is more accessible than I expected.
My three Azure powers took a weekend of focused work. The barrier isn't technical complexity - it's knowing what workflows to optimize for.

About Context Efficiency: Before this project, I would have connected every MCP server and hoped for the best. Now I think in terms of focused context. What does this specific task need? What's consuming tokens without adding value?

About Learning Paths: Sometimes building tools is the learning path. The classic route - docs, tutorials, certifications - works when you have time. When you don't, building forces understanding faster. Every decision about what to include in a power required me to understand what Azure actually offers.

The unexpected part: an AWS architect, using an AWS IDE, building Azure tooling. But that's the point of powers. They're platform-agnostic expertise packages. The tool doesn't care which cloud you're learning. It just loads the right context when you need it.

Try It

Repository: github.com/requix/azure-kiro-powers

Each power installs independently. Start with azure-architect - no authentication required.

Building AI-powered development tools? The interesting work happens at the edges, where people try things.

Your move.

Building a Serverless AI Chatbot: Integrating OpenAI with Telegram on AWS

Volodymyr Marynychev — Thu, 30 Jan 2025 21:29:37 +0000

Introduction

Let me share how I built an AI chatbot using AWS, OpenAI, and Telegram. The main goal was to create a smart, cost-effective chatbot without dealing with server maintenance. A serverless approach was a perfect fit for this task.

The project needed to solve these main challenges:

Create an intelligent chatbot using OpenAI
Keep running costs low with serverless architecture
Ensure secure handling of sensitive data
Guarantee reliable message delivery

Serverless architecture was chosen because:

Pay-per-use pricing model
Automatic scaling capabilities
Minimal maintenance overhead
Built-in high availability

The tech stack includes:

AWS services (Lambda, API Gateway, SQS, DynamoDB, KMS)
OpenAI's GPT-4 for message processing
Telegram as a messaging platform
Terraform for infrastructure setup
AWS Lambda Powertools for monitoring

Architecture Overview

How It Works

The system processes messages in a simple flow:

User sends a message to the Telegram bot
Telegram forwards it to AWS API Gateway
Message goes through processing pipeline
User receives response from OpenAI

Here's the visual representation:

Core Components

Each component has a specific role in the system:

API Gateway serves as an entry point:

module "api_gateway" {
  name          = "${var.app_name}-webhook"
  protocol_type = "HTTP"
  integrations = {
    "ANY /" = {
      integration_type = "AWS_PROXY"
      integration_subtype = "SQS-SendMessage"
    }
  }
}

SQS Queue handles message buffering:

resource "aws_sqs_queue" "inbound" {
  name_prefix = "${var.app_name}-inbound-queue"
  visibility_timeout_seconds = 360
  message_retention_seconds = 86400
}

Lambda Function processes messages:

module "lambda_function" {
  function_name = "${var.app_name}-messages-processing"
  handler       = "index.handler"
  runtime       = "python3.12"
  timeout       = 60
}

DynamoDB stores conversation state:

resource "aws_dynamodb_table" "threads" {
  name      = "${var.app_name}-threads"
  hash_key  = "chat_id"
  range_key = "thread_id"
}

Each component was designed with scalability and reliability in mind. The system can handle multiple conversations simultaneously while maintaining message order and conversation context.

Deep Dive: Implementation Details

Message Flow Implementation

Let's break down how messages move through the system. This section covers the actual implementation of each component.

Setting Up Telegram Webhook

First, we need to connect Telegram to our AWS endpoint. Here's a simple script that handles this:

TELEGRAM_TOKEN="your-bot-token"
ENDPOINT="your-api-gateway-url"

curl -X "POST" "https://api.telegram.org/bot${TELEGRAM_TOKEN}/setWebhook" \
    -d "{\"url\": \"${ENDPOINT}\"}" \
    -H "Content-Type: application/json"

Message Processing Pipeline

The Lambda function processes messages in several steps. Here's the main handler:

def handler(event, _context):
    """
    Main entry point for processing messages.
    Receives events from SQS, processes them, and sends responses.
    """
    try:
        # Extract message from SQS event
        request_body = json.loads(event['Records'][0]['body'])
        update = telebot.types.Update.de_json(request_body)

        # Process message only if user is allowed
        if update.message.chat.id in ALLOWED_USERS:
            process_message(update.message)

        return {"statusCode": 200}
    except Exception as e:
        logger.error(f"Error processing message: {str(e)}")
        return {"statusCode": 500}

OpenAI Integration

The OpenAI integration is handled through a dedicated function:

def ask_openai_threads(chat_id, question):
    """
    Sends user message to OpenAI and manages conversation threads.
    """
    # Get or create assistant
    assistant_id = get_stored_assistant_id()
    if not assistant_id:
        assistant_id = create_assistant()
        save_assistant(assistant_id)

    # Get or create thread
    thread_id = get_stored_thread_id(chat_id)
    if not thread_id:
        thread = openai_client.beta.threads.create()
        thread_id = thread.id
        save_thread(chat_id, thread_id)

    # Add message and run assistant
    openai_client.beta.threads.messages.create(
        thread_id=thread_id,
        role="user",
        content=question
    )

    run = openai_client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=assistant_id
    )

    # Wait for response
    while run.status != 'completed':
        run = openai_client.beta.threads.runs.retrieve(
            thread_id=thread_id,
            run_id=run.id
        )

    # Get and return assistant's response
    messages = openai_client.beta.threads.messages.list(thread_id=thread_id)
    return messages.data[0].content[0].text.value

State Management

The project uses DynamoDB to keep track of conversations and assistant configuration.

Thread Storage

Here's how we store and retrieve conversation threads:

def save_thread(chat_id, thread_id):
    """
    Saves new thread to DynamoDB.
    """
    item = {
        'chat_id': {'N': str(chat_id)},
        'thread_id': {'S': thread_id},
        'thread_status': {'S': 'ACTIVE'},
        'created_at': {'S': datetime.now().isoformat()}
    }

    dynamodb_client.put_item(
        TableName=THREADS_TABLE_NAME,
        Item=item
    )

def get_stored_thread_id(chat_id):
    """
    Retrieves active thread for a chat.
    """
    response = threads_table.query(
        IndexName='UserStatusIndex',
        KeyConditionExpression=Key('chat_id').eq(chat_id) & 
                             Key('thread_status').eq('ACTIVE'),
        Limit=1
    )
    return response['Items'][0]['thread_id'] if response['Items'] else None

Security Implementation

Secret Management

We use AWS Parameter Store to keep API tokens and other secrets safe.

# Instead of hardcoding tokens:
ssm = boto3.client('ssm')

# Get Telegram token
TELEGRAM_TOKEN = ssm.get_parameter(
    Name=TELEGRAM_TOKEN_PARAM_NAME, 
    WithDecryption=True
)['Parameter']['Value']

# Get OpenAI token
OPENAI_TOKEN = ssm.get_parameter(
    Name=OPENAI_TOKEN_PARAM_NAME, 
    WithDecryption=True
)['Parameter']['Value']

The parameters are created using Terraform:

resource "aws_ssm_parameter" "bot-token" {
  name   = "${var.app_name}-bot-token"
  type   = "SecureString"
  key_id = "alias/aws/ssm"
  value  = "CHANGE-ME"  # Changed manually after deployment
}

Encryption

We use KMS for encrypting data at rest. Here's how we set it up:

resource "aws_kms_key" "dynamo-encryption-key" {
  description             = "Key for DynamoDB encryption"
  deletion_window_in_days = 10
  enable_key_rotation     = true

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "Enable IAM User Permissions"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${local.account_id}:root"
        }
        Action   = "kms:*"
        Resource = "*"
      }
    ]
  })
}

Access Control

We limit who can use the bot with a simple check:

ALLOWED_USERS = list(map(int, allowed_users_ids.split(',')))

@bot.message_handler(func=lambda message: message.chat.id not in ALLOWED_USERS)
def decline_strangers(message):
    response = (
        f"Access denied.\n"
        f"Your user ID: {message.chat.id}"
    )
    bot.reply_to(message, response)

IAM Roles

Lambda needs specific permissions to access other services. Here's the IAM configuration:

module "lambda_function" {
  # ... other configuration ...

  attach_policy_statements = true
  policy_statements = {
    sqs = {
      effect    = "Allow",
      actions   = ["sqs:SendMessage"],
      resources = [aws_sqs_queue.inbound.arn]
    },
    ssm = {
      effect    = "Allow",
      actions   = ["ssm:GetParameter"],
      resources = [
        aws_ssm_parameter.bot-token.arn,
        aws_ssm_parameter.openai-token.arn,
      ]
    },
    dynamodb = {
      effect = "Allow",
      actions = [
        "dynamodb:PutItem",
        "dynamodb:Query",
        "dynamodb:Scan"
      ],
      resources = ["*"]
    }
  }
}

Security Best Practices

Some key security measures we implemented:

Network Security:
- API Gateway uses HTTPS only
- Lambda functions run in private VPC (optional)
Data Security:
- All sensitive data encrypted at rest
- Secrets stored in Parameter Store
- DynamoDB encryption enabled
Access Security:
- Minimal IAM permissions
- User allowlist
- API key rotation enabled

Monitoring and Operations

CloudWatch Integration

We use AWS Lambda Powertools to make monitoring easier. Here's how we set it up:

from aws_lambda_powertools import Tracer, Metrics, Logger

tracer = Tracer()
metrics = Metrics()
logger = Logger()

@tracer.capture_lambda_handler
@metrics.log_metrics
@logger.inject_lambda_context
def handler(event, _context):
    """
    Main handler with full observability.
    """
    try:
        process_event(event)
        metrics.add_metric(name="SuccessfulProcessing", value=1, unit="Count")
    except Exception:
        metrics.add_metric(name="FailedProcessing", value=1, unit="Count")
        raise

Logging Strategy

We use structured logging to make debugging easier:

def process_event(event):
    """
    Process events with structured logging.
    """
    logger.info("Processing new event", extra={
        "event_type": "message_received",
        "timestamp": datetime.now().isoformat(),
        "source": "telegram"
    })

Conclusion

What We Built

We created a serverless AI chatbot that combines:

AWS serverless infrastructure
OpenAI's powerful language models
Telegram's messaging platform

The system handles:

Secure message processing
Reliable conversation management
Cost-effective scaling
Comprehensive monitoring

Key Takeaways

Serverless architecture reduces operational overhead
Queue-based design ensures message reliability
DynamoDB provides flexible state management
KMS encryption protects sensitive data

Lessons Learned

What Worked Well

Serverless architecture scaled smoothly
SQS prevented message loss
Lambda Powertools improved observability

What Could Be Better

Cold starts need optimization
OpenAI API costs need monitoring
Error handling could be more robust

Final Thoughts

Building a serverless AI chatbot taught us that:

Simple architecture can handle complex tasks
AWS services work well together
Proper monitoring is crucial
Cost management needs constant attention

Getting Started

Want to try it yourself? Here's a quick start:

Clone the repository
Set up AWS credentials
Deploy with Terraform
Update SSM parameters with your API keys
Set up the Telegram webhook

Check deployment instructions in the repository.

The code is open source and available on GitHub: https://github.com/requix/aws-telegram-ai-module
Feel free to contribute or adapt it for your needs.

This project shows how modern cloud services and AI can work together to create practical, scalable applications. While there's always room for improvement, this architecture provides a solid foundation for building AI-powered chatbots.

Orchestrating AI: Dynamic LLM Routing based on AWS Step Functions

Volodymyr Marynychev — Wed, 29 Jan 2025 22:11:39 +0000

By expanding this simple architectural pattern, you can significantly reduce your LLM costs while maintaining high-quality responses across different use cases.

🚨 Important Disclaimer: Proof of Concept 🚨

This project is a demonstration of the dynamic AI model routing concept and should NOT be considered a production-ready solution.

Key Limitations:

Experimental architecture
Prototype-level implementation
Minimal error handling
Requires significant enhancement for enterprise use

Use at Your Own Risk

Not recommended for mission-critical applications
Potential unexpected behaviors
May incur unexpected cloud service costs

The goal of this project is to demonstrate a technical concept and provide a starting point for building intelligent, cost-effective AI routing systems. It's an educational resource and a blueprint for building more sophisticated solutions.

The Evolution of LLM Usage

The landscape of Large Language Models (LLMs) has evolved dramatically over the past few years. What started with GPT-3 has expanded into a diverse ecosystem of models, each with its own strengths and cost structures. You now have access to various options:

OpenAI's GPT-4 and GPT-3.5
Anthropic's Claude series
Open-source models like Llama 2
Cloud provider solutions like Amazon Bedrock
Budget DeepSeek models

This diversity brings both opportunities and challenges. While having multiple options provides flexibility, it also complicates the decision-making process. How do you choose the right model for each specific use case? How do you balance cost against performance? These questions become increasingly important as you scale your AI implementations.

Core Components and Resources

Complexity Analyzer

The first step in our routing system is analyzing the complexity of incoming queries. For this demonstration, we've implemented a simple classifier that categorizes inputs based on their characteristics. While we're using Claude 3 Sonnet in this example, you could easily swap it for a more cost-effective model like GPT- 3.5 or DeepSeek-R1 or even a simpler rule-based system, depending on your specific needs and budget constraints.

The complexity analyzer categorizes inputs into three basic levels, which helps determine the most appropriate model for handling each request:

def analyze_complexity(input_text):
    # Note: This is a demonstration using Claude 3 Sonnet
    # Consider using more cost-effective alternatives like DeepSeek
    # or implementing a custom rule-based classifier for production
    bedrock_client = boto3.client('bedrock-runtime')

    prompt = f"""
Analyze the complexity of the following input:
"{input_text}"

Classify it into one of these categories:
1. SIMPLE: Basic questions, straightforward tasks
2. CALCULATION: Mathematical operations, data analysis
3. COMPLEX: Multi-step reasoning, creative problem-solving

Return ONLY the classification (SIMPLE/CALCULATION/COMPLEX)
"""

response = bedrock_client.invoke_model(
        modelId="anthropic.claude-3-sonnet-v1",
        body=json.dumps({
            "messages": [{"role": "user", "content": prompt}],
            "max_tokens": 10
        })
    )

AWS Step Functions State Machine

Orchestrates the entire workflow
Handles model selection logic based on complexity analysis
Manages error handling and retries
Integrates with various AWS services and external APIs

Lambda Functions

Complexity Analyzer Lambda

Uses Amazon Bedrock (Claude 3 Sonnet) to analyze input complexity
Classifies inputs into three categories: SIMPLE, CALCULATION, COMPLEX
Helps in optimal model selection

Bedrock Lambda (Instant & Sonnet)

Handles requests to Amazon Bedrock models
Claude Instant for simple queries
Claude Sonnet for complex analysis

Cost Calculator Lambda

Triggered by DynamoDB streams
Calculates precise costs for each model invocation
Updates cost information in DynamoDB

Storage and Database

DynamoDB Table

Stores execution results and metadata
Uses stream processing for cost calculations
Encrypted at rest using KMS

Security Components

KMS (Key Management Service)

Manages encryption keys for sensitive data
Used for DynamoDB encryption
Secures CloudWatch logs

SSM Parameter Store

Securely stores API keys
Manages configuration values
Encrypted using KMS

Access Control

Fine-grained IAM permissions
Service-to-service authentication
Secure parameter management

Integration Points

EventBridge API Destination

Manages OpenAI API integration
Handles API key authentication
Provides secure HTTP endpoints

The Cost-Effectiveness Dilemma: Is Dynamic Routing Worth It? 🤔

One of the most critical questions when designing any sophisticated system is: "Does the complexity come with a meaningful benefit?" In our dynamic AI model routing approach, we need to carefully analyze whether the overhead of complexity analysis justifies the potential cost savings.

The Hidden Cost of Complexity Analysis

Let's break down the economics of our approach:

# Complexity Analysis Cost Calculation
complexity_cost = 0.015 / 1000 * tokens  # Using Claude Sonnet as analyzer

model_costs = {
    "gpt-3.5": 0.000002 / 1000,       # Cheapest model
    "claude-instant": 0.0003 / 1000,  # Mid-range model 
    "claude-sonnet": 0.015 / 1000     # Most expensive model

}
def is_routing_cost_effective(input_text):
    # Complexity check costs ~50-100 tokens
    complexity_check_cost = 0.015 / 1000 * 100  # ~$0.0015

    # Potential savings by choosing optimal model
    potential_savings = calculate_model_cost_difference(input_text)

    return potential_savings > complexity_check_cost

When Dynamic Routing Makes Sense

Dynamic model routing is most beneficial in scenarios with:

High-volume systems (1000+ daily requests)
Significant cost variation between models
Diverse input complexity
Large token count differences

When to Reconsider

You might want to skip complexity analysis if:

Your system has low request volume
Input complexity is relatively uniform
Model pricing is similar
You have strict latency requirements

Cost Comparison

Our solution doesn't just route requests - it meticulously tracks and calculates the cost of every single AI interaction. We've implemented a dedicated cost calculator Lambda function that processes each request's details and stores comprehensive cost information in DynamoDB. This approach allows for:

Granular cost tracking per request
Historical cost analysis
Insights into model usage patterns

def calculate_cost(model_used, tokens):
    MODEL_COSTS = {
        "gpt-3.5-turbo-1106": 0.000002,  # Average cost per token
        "bedrock-instant": 0.0003,
        "bedrock-sonnet": 0.015
    }

    # Calculate cost based on tokens used
    cost = (tokens * MODEL_COSTS.get(model_used, 0)) / 1000

    # Store detailed cost information in DynamoDB
    dynamodb.put_item(
        TableName='ai-usage-costs',
        Item={
            'execution_id': {'S': str(uuid.uuid4())},
            'model_used': {'S': model_used},
            'tokens_used': {'N': str(tokens)},
            'calculated_cost': {'N': str(cost)},
            'timestamp': {'S': datetime.now().isoformat()}
        }
    )

    return cost

Terraform: Infrastructure as Code 🏗️

The entire solution is implemented as a modular Terraform project, making it easy to deploy and customize:

Supports multiple AWS regions
Easily configurable through variables
Manages all AWS resources declaratively
Includes security best practices
- KMS encryption
- IAM least-privilege roles
- Secure parameter management

Getting Started 🚀

Want to try it out? Here's how:

Prerequisites:

# Ensure you have
brew install terraform  # macOS
# or
sudo apt-get install terraform  # Linux

# Install AWS CLI
pip install awscli

# Configure AWS credentials
aws configure

Clone the Repository:

git clone https://github.com/requix/aws-step-functions-ai-orchestration.git
cd aws-step-functions-ai-orchestration/terraform

Set Up OpenAI API Key:

aws ssm put-parameter \
    --name "/ai-orchestration/openai-api-key" \
    --type "SecureString" \
    --value "your-openai-api-key"

Deploy Infrastructure:

terraform init
terraform plan
terraform apply

Run Your First Execution:

# Use the output from terraform apply
aws stepfunctions start-execution \
    --state-machine-arn YOUR_STATE_MACHINE_ARN \
    --input '{"input": "What is the capital of France?"}'

Open Source and Community 🌐

The entire project is open-source and available on GitHub:
🔗 https://github.com/requix/aws-step-functions-ai-orchestration

We welcome contributions, issue reports, and feature suggestions!

Conclusion

By expanding this architectural pattern, you can create an intelligent, cost-effective AI routing system that adapts to different use cases. The key is flexibility, continuous monitoring, and a willingness to iterate.

Remember:

This is a proof of concept
Always test thoroughly
Monitor and optimize continuously

Happy routing! 🤖✨