DEV Community: Salam Shaik

Text-to-Clip: Building a Serverless AI Engine that Edits Video from Descriptions

Salam Shaik — Thu, 08 Jan 2026 20:49:35 +0000

Hi everyone,

Imagine typing: 'Show me the car chase scene' and within seconds getting a perfectly(almost perfectly 😅) cut video delivered to your screen
I call it Text-To-Clip. It's an automated video editing engine that combines computer vision(to see), Vector search(to remember), and FFmpeg(to act)

In this post, I am breaking down the exact AWS architecture I used to build it. From the distributed 'Map' states in the Step function to Dockerized computer vision containers running on Lambda

Services I used:

AWS Bedrock
Lambda
Step functions
Open search Service
Amazon Rekognition
S3

Architecture:

Implementation steps:
Identifying shots in the video: Whenever the user uploads any video, it will be sent for shots identification to the AWS Rekognition service, and a transcription job will start using the AWS Transcribe service

Analyzing shots: From each shot x number of frames will be picked along with transcription in that time frame. These will be sent to the Bedrock Nova model for scene analysis

Storing analyzed data: Model analyzed data will be sent to the embedding model, and the generated embeddings will be stored in the OpenSearch index

Querying and Generating video: User query will be converted to embedding and compared against shot embeddings. Returned embedding timestamps will be passed to FFMpeg to cut and stitch the video.

Load Balancing: When I uploaded a movie with 1hr 30 mins duration, More than 800+ shots were returned. It took more than 40 minutes to analyze each shot. So, a parallel processing mechanism was implemented using Step functions Service, where multiple lambdas will run with the distributed load at the same time.

Let's start the implementation

Folder Structure:

Module Files:

The requirements file contains all the Python libraries

boto3
opencv-python-headless
numpy<2.0.0
opensearch-py
requests

AWS config file for initiating clients

import boto3
import config

class AWSManager:
    def __init__(self):
        self.session = boto3.Session(region_name=config.AWS_REGION)
        self.s3_client = self.session.client('s3')
        self.rekognition_client = self.session.client('rekognition')
        self.bedrock_client = self.session.client('bedrock-runtime')
        self.transcribe = self.session.client('transcribe')

aws = AWSManager()

Config file for configurations

AWS_REGION = "us-west-2"
S3_BUCKET_NAME = "bucket"  

BEDROCK_EMBEDDING_MODEL = "amazon.titan-embed-image-v1"

VIDEO_FILENAME = "NightOfTheLivingDead.mp4"        
LOCAL_VIDEO_PATH = "NightOfTheLivingDead.mp4"      
OUTPUT_DB_FILE = "vector_index.json"
OPENSEARCH_HOST = "open search end point without https"

Video tools file for Rekognition service for shot detection and frame selection

import time
import cv2
import os
from .aws_clients import aws

def start_shot_detection(bucket, video_key):
    print(f"  Requesting Shot Detection for: {video_key}...")
    response = aws.rekognition_client.start_segment_detection(
        Video={'S3Object': {'Bucket': bucket, 'Name': video_key}},
        SegmentTypes=['SHOT']
    )
    return response['JobId']

def wait_for_job(job_id):
    print(" Waiting for shot detection to complete...")
    while True:
        status = aws.rekognition_client.get_segment_detection(JobId=job_id)
        s = status['JobStatus']
        if s in ['SUCCEEDED', 'FAILED']:
            return status
        print("  ...still processing...")
        time.sleep(5)

def extract_frames_for_shot(video_path, start_ms, end_ms, max_frames=5):
    """
    Extracts up to 'max_frames' evenly spaced across the shot duration.
    Returns a list of image bytes.
    """
    duration = end_ms - start_ms
    if duration < 1000:
        timestamps = [start_ms + (duration / 2)]
    else:
        step = duration / (max_frames + 1)
        timestamps = [start_ms + step * i for i in range(1, max_frames + 1)]

    frames = []
    cap = cv2.VideoCapture(video_path)

    for ts in timestamps:
        cap.set(cv2.CAP_PROP_POS_MSEC, ts)
        success, frame = cap.read()
        if success:
            _, buffer = cv2.imencode('.jpg', frame)
            frames.append(buffer.tobytes())

    cap.release()
    return frames

Transcriber file for starting the transcription job and picking words in the time frame required

# modules/transcriber.py
import time
import json
import urllib.request
from .aws_clients import aws

def start_transcription_job(bucket, video_key, job_name):
    """
    Starts an AWS Transcribe job for the video in S3.
    """
    file_uri = f"s3://{bucket}/{video_key}"
    print(f" Starting Transcription for: {file_uri}")

    try:
        aws.transcribe.start_transcription_job(
            TranscriptionJobName=job_name,
            Media={'MediaFileUri': file_uri},
            MediaFormat='mp4',
            LanguageCode='en-US',
            Settings={'ShowSpeakerLabels': False} 
        )
        return job_name
    except aws.transcribe.exceptions.ConflictException:
        print(f" Job {job_name} already exists. Using existing job.")
        return job_name

def wait_for_job(job_name):
    """
    Polls AWS Transcribe until the job is done.
    """
    print(f" Waiting for Transcription (Job: {job_name})...")
    while True:
        status = aws.transcribe.get_transcription_job(TranscriptionJobName=job_name)
        s = status['TranscriptionJob']['TranscriptionJobStatus']

        if s in ['COMPLETED', 'FAILED']:
            return status['TranscriptionJob']

        print("... transcribing audio ...")
        time.sleep(10)

def get_transcript_text(transcript_uri):
    """
    Downloads the JSON result from AWS and parses it into a clean list of segments.
    """
    print("Downloading Transcript JSON...")
    with urllib.request.urlopen(transcript_uri) as response:
        data = json.loads(response.read().decode())

    items = data['results']['items']

    clean_segments = []
    current_sentence = []
    start_time = 0.0

    for item in items:
        content = item['alternatives'][0]['content']
        type = item.get('type')

        if type == 'pronunciation':
            if not current_sentence:
                start_time = float(item['start_time'])
            current_sentence.append(content)

        elif type == 'punctuation':
            if current_sentence:
                current_sentence[-1] += content

            if content in ['.', '?', '!']:
                end_time = start_time 

                full_text = " ".join(current_sentence)
                clean_segments.append({
                    "text": full_text,
                    "start": start_time,

                })
                current_sentence = []

    print(f" Parsed {len(clean_segments)} sentences from audio.")
    return clean_segments

def get_text_in_range(segments, start_sec, end_sec):
    """
    Helper to find all text spoken between start_sec and end_sec.
    """
    matched_text = []
    for seg in segments:
        if seg['start'] >= start_sec and seg['start'] < end_sec:
            matched_text.append(seg['text'])

    return " ".join(matched_text)

Analyzer file for sending shot-related data to the model for analysis

# modules/analyzer.py
import json
import boto3
from .aws_clients import aws

MODEL_ID = "arn:aws:bedrock:us-west-2:accountid:inference-profile/us.amazon.nova-pro-v1:0" 

def analyze_shot(frames, transcript_text):
    """
    Sends multiple images and context text to the LLM.
    Returns a rich text description.
    """

    content_blocks = []

    if transcript_text:
        content_blocks.append({
            "text": f"TRANSCRIPT OF AUDIO IN THIS SCENE:\n'{transcript_text}'\n\n"
        })
    else:
        content_blocks.append({
            "text": "AUDIO TRANSCRIPT: [No dialogue detected]\n\n"
        })

    for i, img_bytes in enumerate(frames):
        content_blocks.append({
            "text": f"Image {i+1}:" 
        })
        content_blocks.append({
            "image": {
                "format": "jpeg", 
                "source": {"bytes": img_bytes}
            }
        })

    prompt = """
    TASK: Analyze this sequence of frames and the audio transcript.
    OUTPUT: A detailed visual and narrative description.

    GUIDELINES:
    1. Describe the ACTION: What is physically happening? (e.g., chasing, fighting, kissing).
    2. Describe the EMOTION: What is the mood? (e.g., tense, joyful).
    3. Incorporate the DIALOGUE: Explain how the words relate to the visual.

    Return ONLY the description paragraph. Do not add headers like "Here is the analysis".
    """
    content_blocks.append({"text": prompt})

    try:
        response = aws.bedrock_client.converse(
            modelId=MODEL_ID,
            messages=[{"role": "user", "content": content_blocks}],
            inferenceConfig={"temperature": 0.1, "maxTokens": 500}
        )
        return response['output']['message']['content'][0]['text']

    except Exception as e:
        print(f" Analysis Error: {e}")
        return "Analysis failed."

Embedder file for embedding generations

import json
import base64
import config
from modules.aws_clients import aws

def generate_vector_from_text(text_input):
    """
    Generates a vector for the text description using Titan Multimodal.
    """
    if not text_input: return None

    payload = {
        "inputText": text_input
    }

    try:
        response = aws.bedrock_client.invoke_model(
            modelId=config.BEDROCK_EMBEDDING_MODEL,
            contentType="application/json",
            accept="application/json",
            body=json.dumps(payload)
        )
        response_body = json.loads(response.get('body').read())
        return response_body.get('embedding')
    except Exception as e:
        print(f" Embedding Error: {e}")
        return None

Indexing file for pushing analyzed data to the open search

# modules/indexer.py
import boto3
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import config

credentials = boto3.Session().get_credentials()
auth = ("username", "password")

client = OpenSearch(
    hosts=[{'host': config.OPENSEARCH_HOST, 'port': 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

def index_shot(video_key, shot_data):
    """
    Pushes a single analyzed shot to OpenSearch.
    """
    document = {
        "video_id": video_key,
        "shot_id": shot_data['shot_id'],
        "start_time": shot_data['start_ms'] / 1000.0, 
        "end_time": shot_data['end_ms'] / 1000.0,
        "description": shot_data['description'],
        "vector_embedding": shot_data['vector']
    }

    try:
        client.index(index="reelsmith-index", body=document)
        print(f"   Indexed Shot {shot_data['shot_id']} to Cloud.")
    except Exception as e:
        print(f"   Indexing Failed: {e}")

Now that we have the tools ready, let's start building the flow

Create an S3 bucket for storing raw video files and generated video clips
Create a Dynamo DB table for storing the process updates
Let's build the Dispatcher, Worker, and Finalizer Python files. These will be deployed to Lambda later

Dispatcher file:

This file will help in shot detection and transcription service. Returned shots list with video id

import json
import boto3
import math
import os
import config
from modules import video_tools, transcriber

s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ReelSmith_Jobs')

SHOTS_PER_WORKER = 50 

def lambda_handler(event, context):

    bucket = ""
    key = ""

    if 'detail' in event and 'bucket' in event['detail']:
        bucket = event['detail']['bucket']['name']
        key = event['detail']['object']['key']

    elif 'Records' in event:
        bucket = event['Records'][0]['s3']['bucket']['name']
        key = event['Records'][0]['s3']['object']['key']

    else:
        bucket = config.S3_BUCKET_NAME
        key = event.get('key', '')

    print(f" Received Event for: {key}")

    if not key.startswith("raw/"):
        print(f"SAFETY STOP: File '{key}' is not in 'raw/' folder. Ignoring.")
        return {"status": "Ignored", "reason": "Wrong Folder"}

    video_id = os.path.basename(key)

    print(f"Validation Passed. Processing Video ID: {video_id}")

    table.put_item(Item={
        'video_id': video_id,
        'status': 'ANALYZING_STRUCTURE',
        'timestamp': str(context.aws_request_id)
    })

    job_id = video_tools.start_shot_detection(bucket, key)
    shot_result = video_tools.wait_for_job(job_id)
    shots = [s for s in shot_result['Segments'] if s['Type'] == 'SHOT']

    job_name = f"reelsmith_parallel_{video_id[:10]}"
    transcriber.start_transcription_job(bucket, key, job_name)
    transcribe_result = transcriber.wait_for_job(job_name)
    transcript_uri = transcribe_result['Transcript']['TranscriptFileUri']

    total_shots = len(shots)
    print(f" Found {total_shots} shots. Splitting...")

    batches = []
    num_batches = math.ceil(total_shots / SHOTS_PER_WORKER)

    for i in range(num_batches):
        start = i * SHOTS_PER_WORKER
        end = start + SHOTS_PER_WORKER
        batch_shots = shots[start:end]

        batches.append({
            "bucket": bucket,
            "key": key,
            "batch_id": i,
            "total_batches": num_batches,
            "transcript_uri": transcript_uri,
            "shots": batch_shots
        })

    table.update_item(
        Key={'video_id': video_id},
        UpdateExpression="set #s = :s",
        ExpressionAttributeNames={'#s': 'status'},
        ExpressionAttributeValues={':s': 'PROCESSING_PARALLEL'}
    )

    return {"batches": batches, "video_id": video_id}

Worker file:
This file will help in extracting frames and combining them with text from the transcription service for model analysis

import json
import os
import boto3
from modules import video_tools, transcriber, analyzer, embedder, indexer

s3 = boto3.client('s3')

def lambda_handler(event, context):

    bucket = event['bucket']
    key = event['key']
    batch_id = event['batch_id']
    shots = event['shots']
    transcript_uri = event['transcript_uri']

    print(f" Worker {batch_id}: Processing {len(shots)} shots...")

    local_path = f"/tmp/{os.path.basename(key)}"
    if not os.path.exists(local_path):
        s3.download_file(bucket, key, local_path)

    all_sentences = transcriber.get_transcript_text(transcript_uri)

    for shot in shots:
        start_ms = shot['StartTimestampMillis']
        end_ms = shot['EndTimestampMillis']

        frames = video_tools.extract_frames_for_shot(local_path, start_ms, end_ms, max_frames=3)
        if not frames: continue

        text_context = transcriber.get_text_in_range(all_sentences, start_ms/1000, end_ms/1000)

        description = analyzer.analyze_shot(frames, text_context)

        vector = embedder.generate_vector_from_text(description)

        shot_data = {
            "shot_id": f"{batch_id}_{start_ms}",
            "start_ms": start_ms,
            "end_ms": end_ms,
            "description": description,
            "vector": vector
        }
        indexer.index_shot(key, shot_data)

    return {"status": "success", "batch_id": batch_id}

Finalizer file:
This file will update the process state in DynamoDB after the analysis process is completed.

import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ReelSmith_Jobs')

def lambda_handler(event, context):

    video_id = event.get('video_id') 

    if video_id:
        table.update_item(
            Key={'video_id': video_id},
            UpdateExpression="set #s = :s",
            ExpressionAttributeNames={'#s': 'status'},
            ExpressionAttributeValues={':s': 'COMPLETED'}
        )
    return {"status": "Job Completed"}

Director file:

This file will help in generating an embedding for the user query, comparing it with analyzed embeddings, and editing the video based on the returned time stamps

import json
import os
import boto3
import subprocess
from botocore.exceptions import ClientError
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import config
from modules import embedder

s3 = boto3.client('s3')

credentials = boto3.Session().get_credentials()
auth = ("admin","Hai@1214129182")

client = OpenSearch(
    hosts=[{'host': config.OPENSEARCH_HOST, 'port': 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

def check_file_exists(bucket, key):
    """Helper to check if file exists in S3 before downloading"""
    try:
        s3.head_object(Bucket=bucket, Key=key)
        return True
    except ClientError:
        return False

def cut_video_ffmpeg(bucket, video_key, start_s, end_s):
    """
    Downloads video, cuts it, uploads clip, returns Signed URL.
    """
    final_key = video_key

    if not check_file_exists(bucket, final_key):
        print(f" Warning: '{final_key}' not found. Trying 'raw/' prefix...")
        alt_key = f"raw/{video_key}" if not video_key.startswith("raw/") else video_key

        if check_file_exists(bucket, alt_key):
            print(f" Found file at: {alt_key}")
            final_key = alt_key
        else:
            alt_key_2 = video_key.replace("raw/", "")
            if check_file_exists(bucket, alt_key_2):
                final_key = alt_key_2
            else:
                raise FileNotFoundError(f"❌ Critical: Could not find video '{video_key}' in bucket '{bucket}'")

    filename = os.path.basename(final_key)
    local_input = f"/tmp/{filename}"
    local_output = f"/tmp/clip_{filename}"

    if not os.path.exists(local_input): 
        print(f" Downloading source: {final_key}...")
        s3.download_file(bucket, final_key, local_input)

    duration = end_s - start_s
    print(f" Cutting {duration}s from {start_s}s...")

    cmd = [
        "ffmpeg", "-y",
        "-ss", str(start_s),
        "-i", local_input,
        "-t", str(duration),
        "-c", "copy", 
        local_output
    ]
    subprocess.run(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

    clip_key = f"processed/clip_{int(start_s)}_{filename}"
    s3.upload_file(local_output, bucket, clip_key)

    url = s3.generate_presigned_url(
        'get_object',
        Params={'Bucket': bucket, 'Key': clip_key},
        ExpiresIn=3600
    )

    if os.path.exists(local_output): os.remove(local_output)

    return url

def lambda_handler(event, context):
    print(" Director Agent Started...")

    body = event
    if 'body' in event:
        body = json.loads(event['body'])

    query = body.get('query', '')
    threshold = body.get('threshold', 0.6)

    print(f" Query: '{query}' (Threshold: {threshold})")

    query_vector = embedder.generate_vector_from_text(query)

    search_query = {
        "size": 3,
        "query": {
            "knn": {
                "vector_embedding": {
                    "vector": query_vector,
                    "k": 3
                }
            }
        }
    }

    response = client.search(index="reelsmith-index", body=search_query)
    hits = response['hits']['hits']

    if not hits:
        return {"statusCode": 404, "body": json.dumps("No index data found.")}

    best_hit = hits[0]
    score = best_hit['_score']
    source = best_hit['_source']

    print(f" Best Match: {score:.4f} (Desc: {source['description'][:50]}...)")

    if score < threshold:
        msg = f" Confidence Low ({score:.2f} < {threshold}). No scene found."
        print(msg)
        return {
            "statusCode": 200, 
            "body": json.dumps({"found": False, "message": msg})
        }

    print(" Confidence High. Generating Video...")

    video_key = source.get('video_id') or source.get('video_key')

    try:
        clip_url = cut_video_ffmpeg(
            config.S3_BUCKET_NAME, 
            video_key, 
            source['start_time'], 
            source['end_time']
        )

        return {
            "statusCode": 200,
            "body": json.dumps({
                "found": True,
                "confidence": score,
                "description": source['description'],
                "video_url": clip_url
            })
        }
    except Exception as e:
        print(f" Director Error: {str(e)}")
        return {
            "statusCode": 500,
            "body": json.dumps({"found": False, "message": f"Processing Error: {str(e)}"})
        }

Docker file:

# Use AWS Lambda Python 3.11 Base Image
FROM public.ecr.aws/lambda/python:3.11

# Update and Install System Dependencies (gcc needed for compiling)
RUN yum update -y && \
    yum install -y mesa-libGL gcc gcc-c++ python3-devel

# Copy Requirements
COPY requirements.txt ${LAMBDA_TASK_ROOT}

# Install Python Packages
RUN pip install -r requirements.txt

# Copy Application Code
COPY config.py ${LAMBDA_TASK_ROOT}
COPY lambda_dispatcher.py ${LAMBDA_TASK_ROOT}
COPY lambda_worker.py ${LAMBDA_TASK_ROOT}     
COPY lambda_finalizer.py ${LAMBDA_TASK_ROOT}   
COPY lambda_director.py ${LAMBDA_TASK_ROOT}    
COPY modules/ ${LAMBDA_TASK_ROOT}/modules/

# Default CMD (Overridden in Console)
CMD [ "lambda_dispatcher.lambda_handler" ]

Build the Docker image using this Docker file and upload to ECR
Build the 4 Lambdas using the same image and just change the CMD of the image like this

lambda_dispatcher.lambda_handler
lambda_worker.lambda_handler
lambda_finalizer.lambda_handler
lambda_director.lambda_handler

Now that we have everything ready, let's connect all the functions using step functions

Visit the step functions service and create a state machine with the following code

{
  "Comment": "Parallel Video Processor",
  "StartAt": "Dispatcher",
  "States": {
    "Dispatcher": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-west-2:acccountid:function:lambda-Dispatcher",
      "Next": "ParallelProcessing"
    },
    "ParallelProcessing": {
      "Type": "Map",
      "ItemsPath": "$.batches",
      "MaxConcurrency": 20,
      "Iterator": {
        "StartAt": "ProcessBatch",
        "States": {
          "ProcessBatch": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-west-2:accountid:function:lambda-Worker",
            "End": true
          }
        }
      },
      "ResultPath": "$.workerResults",
      "Next": "MarkComplete"
    },
    "MarkComplete": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-west-2:accountid:function:lambda-Finalizer",
      "End": true
    }
  }
}

It will look like this

We have to connect this state machine with the S3 file upload
Visit the Event bridge service to create a rule that will trigger the state machine if any file is uploaded to the bucket

Rule Event Pattern:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Created"],
  "detail": {
    "bucket": {
      "name": ["reel-smith-ai"]
    },
    "object": {
      "key": [{
        "prefix": "raw/"
      }]
    }
  }
}

Select the state machine for receiving the file upload notification

Create a domain in the Open Search service and create an index using this file

from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import boto3

# --- CONFIG ---
HOST = "endpoint" # e.g., collection-id.us-east-1.aoss.amazonaws.com
REGION = "us-west-2"
SERVICE = "aoss"

username = 'username' 
password = 'password'

auth = (username, password)

client = OpenSearch(
    hosts=[{'host': HOST, 'port': 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    pool_maxsize=20
)

index_name = "reelsmith-index"

def create_index():
    index_body = {
        "settings": {
            "index": {
                "knn": True  # Enable Vector Search
            }
        },
        "mappings": {
            "properties": {
                "video_id": {"type": "keyword"},
                "shot_id": {"type": "integer"},
                "start_time": {"type": "float"},
                "end_time": {"type": "float"},
                "description": {"type": "text"},
                "vector_embedding": {
                    "type": "knn_vector",
                    "dimension": 1024  # Titan Multimodal Dimension
                }
            }
        }
    }

    if not client.indices.exists(index=index_name):
        response = client.indices.create(index=index_name, body=index_body)
        print("Index created:", response)
    else:
        print("Index already exists.")

if __name__ == "__main__":
    create_index()

Now that we have everything ready, let's build our front end using Streamlit, and we will start testing our setup

Streamlit code:

import streamlit as st
import boto3
import json
import time
import os
import config

# --- CONFIGURATION ---
# Ensure these match your actual AWS setup details in config.py
DIRECTOR_FUNCTION_NAME = "ReelSmith-Director"
BUCKET_NAME = config.S3_BUCKET_NAME
TABLE_NAME = "ReelSmith_Jobs"

# --- AWS CLIENTS ---
# We use boto3 to talk to the Cloud Backends
s3 = boto3.client('s3', region_name=config.AWS_REGION)
lambda_client = boto3.client('lambda', region_name=config.AWS_REGION)
dynamodb = boto3.resource('dynamodb', region_name=config.AWS_REGION)
table = dynamodb.Table(TABLE_NAME)

# --- PAGE SETUP ---
st.set_page_config(page_title="ReelSmith Cloud", layout="wide", page_icon="🎬")

st.title("🎬 ReelSmith: Cloud AI Director")
st.markdown("""
**Serverless Video Intelligence Platform** *Powered by AWS Step Functions, Nova Premier, and OpenSearch Serverless.*
""")

# Create Tabs for the two Agents
tab1, tab2 = st.tabs(["🔎 Ask the Director", "📤 Upload New Footage"])

# ==========================================
# TAB 1: THE DIRECTOR AGENT (Retrieval)
# ==========================================
with tab1:
    st.header("Ask the Director")
    st.caption("Agent 2: Performs Semantic Search & Real-time Editing")

    col1, col2 = st.columns([3, 1])
    with col1:
        query = st.text_input("Describe the scene:", placeholder="e.g., A truck exploding in flames")
    with col2:
        threshold = st.slider("Confidence Threshold", 0.0, 1.0, 0.60, help="Only show results if the AI is this confident.")

    if st.button("🎬 Action!", type="primary"):
        if not query:
            st.warning("Please enter a scene description.")
        else:
            # 1. VISUAL FEEDBACK
            with st.status("🧠 Agent is thinking...", expanded=True) as status:
                st.write("📡 Contacting AWS Lambda Director...")

                # 2. INVOKE LAMBDA (Synchronous)
                payload = {"query": query, "threshold": threshold}

                try:
                    response = lambda_client.invoke(
                        FunctionName=DIRECTOR_FUNCTION_NAME,
                        InvocationType='RequestResponse',
                        Payload=json.dumps(payload)
                    )

                    # 3. PARSE RESPONSE
                    response_payload = json.loads(response['Payload'].read())

                    # Handle Lambda System Errors (500s)
                    if 'body' not in response_payload:
                        st.error(f"Lambda System Error: {response_payload}")
                        status.update(label="System Error", state="error")
                    else:
                        body = json.loads(response_payload['body'])

                        if body.get("found"):
                            # SUCCESS PATH
                            st.write(f"✅ **Match Found!** (Confidence: {body['confidence']:.2f})")
                            st.write("✂️  Cutting video in the cloud (FFmpeg)...")

                            # Update Status
                            status.update(label="Video Ready!", state="complete", expanded=False)

                            # 4. DISPLAY RESULTS
                            st.divider()
                            st.success(f"**Scene Context:** {body['description']}")
                            st.video(body['video_url'])

                        else:
                            # FAILURE PATH (Confidence Low or No Match)
                            status.update(label="No Scene Found", state="error", expanded=False)
                            st.error(f"⛔ Agent Response: {body.get('message', 'Unknown error')}")

                except Exception as e:
                    status.update(label="Connection Failed", state="error")
                    st.error(f"Client Error: {e}")

# ==========================================
# TAB 2: THE ANALYST AGENT (Ingestion)
# ==========================================
with tab2:
    st.header("Ingest New Footage")
    st.caption("Agent 1: Watches, Transcribes, and Indexes Video into OpenSearch")

    uploaded_file = st.file_uploader("Choose an MP4 file", type=["mp4"])

    if uploaded_file is not None:
        if st.button("🚀 Upload & Analyze"):
            video_id = uploaded_file.name

            # 1. UPLOAD TO S3 (The Trigger)
            with st.spinner("Uploading to S3..."):
                # CRITICAL: We upload to 'raw/' folder to trigger EventBridge
                s3_key = f"raw/{video_id}"

                # Save to temp locally
                temp_path = f"temp_{video_id}"
                with open(temp_path, "wb") as f:
                    f.write(uploaded_file.getbuffer())

                # Upload
                s3.upload_file(temp_path, BUCKET_NAME, s3_key)

                # Cleanup
                os.remove(temp_path)

            st.success("Upload Complete! EventBridge has triggered the Step Function.")

            # 2. POLL DYNAMODB FOR STATUS
            st.markdown("### 📡 Job Status")
            progress_bar = st.progress(0)
            status_text = st.empty()

            # Polling Loop
            while True:
                try:
                    # Get Item from DynamoDB
                    response = table.get_item(Key={'video_id': video_id})
                except Exception as e:
                    st.error(f"DB Error: {e}")
                    break

                # Case A: Job hasn't started yet (EventBridge lag)
                if 'Item' not in response:
                    status_text.info("⏳ Queued... Waiting for Dispatcher...")
                    time.sleep(3)
                    continue

                # Case B: Check Status
                job = response['Item']
                status = job.get('status', 'UNKNOWN')

                if status == 'COMPLETED':
                    progress_bar.progress(100)
                    status_text.success("✅ Analysis Complete! You can now search for this video in Tab 1.")
                    st.balloons()
                    break

                elif status == 'ANALYZING_STRUCTURE':
                    progress_bar.progress(10)
                    status_text.text("👀 Phase 1: Detecting shots and transcribing audio...")

                elif status == 'PROCESSING_PARALLEL':
                    progress_bar.progress(50)
                    status_text.text("⚡ Phase 2: Parallel Agents analyzing shots (Distributed Map)...")

                elif status == 'FAILED':
                    status_text.error("❌ Analysis Failed. Check CloudWatch logs.")
                    break

                # Wait before next poll
                time.sleep(5)

Run the Streamlit app using this command

streamlit run app.py

I downloaded a movie clip from YouTube and tested the process

Uploading Process:

Output:

Based on the input video, I gave this prompt. It worked. Not a perfect one, but it worked. We can always increase the accuracy by changing the ingestion process and the analysis process

Here is the output video

Incase if the gif full not rendering visit this Gdrive link to see the output
output.gif

I am still working on improving accuracy and reducing hallucinations. I am open to suggestions. Please comment your suggestions or inputs.

Thanks for reading.

How I Built a Video Memory Agent using AWS Bedrock and OpenSearch

Salam Shaik — Tue, 23 Dec 2025 06:24:36 +0000

Hi everyone,

Searching on OTT platforms is often frustrating. While metadata-based search works for titles, it fails when you search for specific moments — like describing a scene or asking, ‘When does the hero cry?’ You simply can’t get that level of detail from metadata alone.

My solution was to build an agent that actually ‘watches’ the movie. By analyzing video frames and transcriptions to create a semantic memory, we can achieve far greater accuracy and unlock entirely new ways to interact with video content.

AWS Services I used:

Bedrock for Titan, Nova Premiere models, and for Agent creation
Open search service for storing the video data
AWS Transcribe Service
Lambda
S3 for storing raw video and video frames.

High-Level Data Flow:

Let me divide this article into different sections:

Building the Ingestion layer
Building the Memory layer
Building the Reasoning layer
Building the Interface layer

Building the Ingestion layer:

For this experiment, I downloaded the movie **The Night Of The Living Dead(1968), **which is an open-source movie
Upload the movie to an S3 bucket
Visit the AWS Transcribe service and create a transcription job
Give a name for the job and keep the General model only

Select the stored movie as input
Select .srt as output format

Click on next and keep the other options as they are, and create the job
Once the job is finished, you can see the output SRT file in the bucket you mentioned
Now that we have the movie file ready and the transcription ready.
Let’s analyze the video and store that information as JSON files in S3 for further processing

 import argparse
    import io
    import json
    import math
    import os
    from datetime import timedelta

    import boto3
    from botocore.exceptions import ClientError
    from moviepy import VideoFileClip
    from PIL import Image
    import srt


    # -----------------------------
    # S3 helpers
    # -----------------------------
    def parse_s3_uri(uri: str):
        """
        Parse an s3://bucket/key URI into (bucket, key).
        """
        if not uri.startswith("s3://"):
            raise ValueError(f"Invalid S3 URI: {uri}")
        without_scheme = uri[5:]
        parts = without_scheme.split("/", 1)
        if len(parts) != 2:
            raise ValueError(f"Invalid S3 URI (missing key): {uri}")
        bucket, key = parts
        return bucket, key


    def download_s3_object(s3_client, s3_uri: str, local_path: str):
        bucket, key = parse_s3_uri(s3_uri)
        os.makedirs(os.path.dirname(local_path), exist_ok=True)
        print(f"Downloading {s3_uri} -> {local_path}")
        s3_client.download_file(bucket, key, local_path)
        return local_path


    def upload_json_to_s3(s3_client, s3_uri_prefix: str, filename: str, data: dict):
        """
        Upload a JSON dict as a file under a given s3://bucket/prefix.
        """
        bucket, prefix = parse_s3_uri(s3_uri_prefix)
        key = prefix.rstrip("/") + "/" + filename
        body = json.dumps(data, ensure_ascii=False).encode("utf-8")
        print(f"Uploading scene doc -> s3://{bucket}/{key}")
        s3_client.put_object(Bucket=bucket, Key=key, Body=body)


    # -----------------------------
    # SRT helpers
    # -----------------------------
    def load_srt_subtitles(srt_path: str):
        """
        Load SRT file and return a list of srt.Subtitle objects.
        """
        with open(srt_path, "r", encoding="utf-8") as f:
            content = f.read()
        subtitles = list(srt.parse(content))
        return subtitles


    def get_text_for_range(subtitles, start_sec: float, end_sec: float) -> str:
        """
        Get concatenated subtitle text for [start_sec, end_sec) range.
        Includes any subtitle that overlaps this time range.
        """
        texts = []
        for sub in subtitles:
            sub_start = sub.start.total_seconds()
            sub_end = sub.end.total_seconds()
            # Overlap check
            if sub_end <= start_sec or sub_start >= end_sec:
                continue
            # Clean line breaks
            texts.append(sub.content.replace("\n", " "))
        return " ".join(texts).strip()


    # -----------------------------
    # Video frame extraction helpers
    # -----------------------------
    def extract_minute_frame_bytes(
        clip: VideoFileClip,
        minute_index: int,
        frames_per_minute: int = 5,
        image_format: str = "jpeg",
    ):
        """
        For a given minute index, extract `frames_per_minute` frames
        as raw image bytes (JPEG by default).
        Returns a list of bytes objects.
        """
        duration_sec = clip.duration
        start = minute_index * 60.0
        if start >= duration_sec:
            return []

        end = min(start + 60.0, duration_sec)
        window = end - start
        if window <= 0:
            return []

        # Sample timestamps evenly inside the minute window.
        # Use (frames_per_minute + 1) so we don't hit exact edges.
        step = window / (frames_per_minute + 1)
        timestamps = [start + step * (i + 1) for i in range(frames_per_minute)]

        images_bytes = []
        for t in timestamps:
            # Clip to duration just in case of rounding
            t = min(t, duration_sec - 0.01)
            frame = clip.get_frame(t)  # numpy array (H, W, 3)

            # Convert numpy array to JPEG bytes using Pillow
            pil_img = Image.fromarray(frame)
            buf = io.BytesIO()
            pil_img.save(buf, format=image_format.upper())
            buf.seek(0)
            images_bytes.append(buf.read())

        return images_bytes


    def format_timestamp(seconds: float) -> str:
        """
        Format seconds as HH:MM:SS (floor).
        """
        return str(timedelta(seconds=int(seconds)))


    # -----------------------------
    # Bedrock (Nova Premier) helper
    # -----------------------------
    def call_nova_premier(
        bedrock_client,
        model_id: str,
        scene_text: str,
        frame_bytes_list,
        minute_index: int,
    ):
        """
        Call Amazon Nova Premier via the Bedrock Converse API with:
          - scene_text (subtitles for this minute)
          - up to N images (frames) as bytes

        Returns a structured dict with:
          scene_summary, characters, locations, emotions,
          relationships, topics, visual_tags, important_events
        """

        system_prompt = (
            "You are a precise video scene analyst. "
            "You receive up to 5 frames from a one-minute video segment "
            "plus the dialogue/subtitles text for the same time range.\n"
            "Your task is to return a STRICT JSON object with this exact schema:\n\n"
            "{\n"
            '  \"scene_summary\": \"...\",\n'
            "  \"characters\": [\"...\"],\n"
            "  \"locations\": [\"...\"],\n"
            "  \"emotions\": [\"...\"],\n"
            "  \"relationships\": [\"...\"],\n"
            "  \"topics\": [\"...\"],\n"
            "  \"visual_tags\": [\"...\"],\n"
            "  \"important_events\": [\"...\"]\n"
            "}\n\n"
            "Rules:\n"
            "- Only output JSON, nothing else.\n"
            "- If a field is unknown, use an empty list [] or a short generic summary.\n"
            "- Keep lists reasonably short and focused."
        )

        user_text = (
            f"This is minute {minute_index} of the video.\n\n"
            f"Subtitles for this minute:\n{scene_text or '[No subtitles in this range]'}\n\n"
            "Use the attached frames and text together to analyze this specific one-minute scene.\n"
            "Return ONLY the JSON object as specified."
        )

        # Build message content: first text, then each image as a separate block
        content_blocks = [{"text": user_text}]

        for img_bytes in frame_bytes_list:
            content_blocks.append(
                {
                    "image": {
                        "format": "jpeg",
                        "source": {
                            "bytes": img_bytes
                        },
                    }
                }
            )

        messages = [
            {
                "role": "user",
                "content": content_blocks,
            }
        ]

        try:
            response = bedrock_client.converse(
                modelId=model_id,
                system=[{"text": system_prompt}],
                messages=messages,
                inferenceConfig={
                    "maxTokens": 512,
                    "temperature": 0.2,
                    "topP": 0.9,
                },
            )

            output_message = response["output"]["message"]
            raw_text = output_message["content"][0]["text"]

            # Sometimes models may wrap JSON with text; try to extract JSON substring
            raw_text = raw_text.strip()
            try:
                # Try direct parse first
                scene_info = json.loads(raw_text)
            except json.JSONDecodeError:
                # Fallback: find first '{' and last '}' and parse that
                start = raw_text.find("{")
                end = raw_text.rfind("}")
                if start != -1 and end != -1 and end > start:
                    json_str = raw_text[start : end + 1]
                    scene_info = json.loads(json_str)
                else:
                    raise

            # Ensure all expected keys exist
            default_scene = {
                "scene_summary": "",
                "characters": [],
                "locations": [],
                "emotions": [],
                "relationships": [],
                "topics": [],
                "visual_tags": [],
                "important_events": [],
            }
            default_scene.update(scene_info or {})
            return default_scene

        except Exception as e:
            print(f"[ERROR] Bedrock call or JSON parsing failed for minute {minute_index}: {e}")
            return {
                "scene_summary": "",
                "characters": [],
                "locations": [],
                "emotions": [],
                "relationships": [],
                "topics": [],
                "visual_tags": [],
                "important_events": [],
            }


    # -----------------------------
    # Main orchestration
    # -----------------------------
    def analyze_video_from_s3(
        video_s3_uri: str,
        srt_s3_uri: str,
        region: str,
        model_id: str,
        frames_per_minute: int,
        output_s3_prefix: str,
        video_id: str,
        episode_title: str = "",
        season: int | None = None,
        episode: int | None = None,
    ):
        """
        1. Download video and SRT from S3.
        2. Parse duration and subtitles.
        3. For each minute:
           - sample frames
           - gather subtitle text
           - call Nova Premier for structured analysis
           - build scene_doc and upload to S3
        """

        session = boto3.Session(region_name=region)
        s3_client = session.client("s3")
        bedrock_client = session.client("bedrock-runtime")

        # Local temp paths
        tmp_dir = "./tmp_video_analysis"
        os.makedirs(tmp_dir, exist_ok=True)
        video_path = os.path.join(tmp_dir, "video_input.mp4")
        srt_path = os.path.join(tmp_dir, "subtitles.srt")

        # Download from S3
        download_s3_object(s3_client, video_s3_uri, video_path)
        download_s3_object(s3_client, srt_s3_uri, srt_path)

        # Load subtitles
        subtitles = load_srt_subtitles(srt_path)

        # Load video and get duration
        clip = VideoFileClip(video_path)
        duration_sec = clip.duration
        num_minutes = math.ceil(duration_sec / 60.0)

        print(f"Video duration: {duration_sec:.2f} seconds (~{num_minutes} minutes)\n")

        try:
            for minute_index in range(num_minutes):
                start_sec = minute_index * 60.0
                end_sec = min((minute_index + 1) * 60.0, duration_sec)

                print(f"Processing minute {minute_index} [{start_sec:.1f}s - {end_sec:.1f}s]")

                # 1) Extract frames for this minute
                frames = extract_minute_frame_bytes(
                    clip,
                    minute_index,
                    frames_per_minute=frames_per_minute,
                    image_format="jpeg",
                )

                if not frames:
                    print(f"  No frames extracted for minute {minute_index}, skipping scene doc.")
                    continue

                # 2) Extract subtitles text for this minute
                scene_text = get_text_for_range(subtitles, start_sec, end_sec)

                # 3) Call Nova Premier for structured scene info
                scene_info = call_nova_premier(
                    bedrock_client=bedrock_client,
                    model_id=model_id,
                    scene_text=scene_text,
                    frame_bytes_list=frames,
                    minute_index=minute_index,
                )

                # 4) Build scene_doc
                scene_id = f"{video_id}_m{minute_index:04d}"
                scene_doc = {
                    "video_id": video_id,
                    "episode_title": episode_title,
                    "season": season,
                    "episode": episode,
                    "scene_id": scene_id,
                    "start_sec": start_sec,
                    "end_sec": end_sec,
                    "timestamp_label": f"{format_timestamp(start_sec)} - {format_timestamp(end_sec)}",
                    "transcript": scene_text,
                    "nova_scene_summary": scene_info.get("scene_summary", ""),
                    "characters": scene_info.get("characters", []),
                    "locations": scene_info.get("locations", []),
                    "emotions": scene_info.get("emotions", []),
                    "relationships": scene_info.get("relationships", []),
                    "topics": scene_info.get("topics", []),
                    "visual_tags": scene_info.get("visual_tags", []),
                    "important_events": scene_info.get("important_events", []),
                }

                # 5) Upload scene_doc JSON to S3
                filename = f"{scene_id}.json"
                upload_json_to_s3(s3_client, output_s3_prefix, filename, scene_doc)

                print(f"  Scene doc created: {scene_id}")

        finally:
            clip.close()


    def main():
        parser = argparse.ArgumentParser(
            description=(
                "Download a video from S3, sample frames per minute, "
                "combine with SRT text, analyze with Amazon Nova Premier, "
                "and upload per-minute scene docs to S3."
            )
        )
        parser.add_argument(
            "--video-s3",
            required=True,
            help="S3 URI of the video file. Example: s3://my-bucket/path/video.mp4",
        )
        parser.add_argument(
            "--srt-s3",
            required=True,
            help="S3 URI of the SRT subtitle file. Example: s3://my-bucket/path/video.srt",
        )
        parser.add_argument(
            "--output-s3-prefix",
            required=True,
            help=(
                "S3 URI prefix where scene docs will be stored. "
                "Example: s3://my-bucket/3netra/FRIENDS_S01E01"
            ),
        )
        parser.add_argument(
            "--video-id",
            required=True,
            help="Logical video ID (e.g., FRIENDS_S01E01). Used in scene_id and metadata.",
        )
        parser.add_argument(
            "--episode-title",
            default="",
            help="Optional episode title for metadata.",
        )
        parser.add_argument(
            "--season",
            type=int,
            default=None,
            help="Optional season number for metadata.",
        )
        parser.add_argument(
            "--episode",
            type=int,
            default=None,
            help="Optional episode number for metadata.",
        )
        parser.add_argument(
            "--region",
            default="us-east-1",
            help="AWS Region where Bedrock is available (default: us-east-1).",
        )
        parser.add_argument(
            "--model-id",
            default="amazon.nova-premier-v1:0",
            help="Bedrock model ID for Nova Premier (default: amazon.nova-premier-v1:0).",
        )
        parser.add_argument(
            "--frames-per-minute",
            type=int,
            default=5,
            help="How many frames to sample per minute window (default: 5).",
        )

        args = parser.parse_args()

        analyze_video_from_s3(
            video_s3_uri=args.video_s3,
            srt_s3_uri=args.srt_s3,
            region=args.region,
            model_id=args.model_id,
            frames_per_minute=args.frames_per_minute,
            output_s3_prefix=args.output_s3_prefix,
            video_id=args.video_id,
            episode_title=args.episode_title,
            season=args.season,
            episode=args.episode,
        )


    if __name__ == "__main__":
        main()

This script will help in downloading the video from the S3 and extracting frames for every minute, and combines those frames with the dialogues in the SRT file within that limit and sends them to the Nova Premiere model for analyzing
Those JSON files will be dumped to the S3 bucket.
This ends the ingestion module. Let’s work on the memory module

Building the Memory layer:

Visit the Open Search service from the AWS console
Create a domain with 1AZ, single standby, and with a t3.large instance for the dev/test environment
Once the domain is up and running, visit the Opensearch dashboard and create an index with this schema in dev tools from the side menu

 PUT video_scenes
    {
      "settings": {
        "index": {
          "knn": true
        }
      },
      "mappings": {
        "properties": {
          "video_id":        { "type": "keyword" },
          "scene_id":        { "type": "keyword" },
          "episode_title":   { "type": "text" },
          "season":          { "type": "integer" },
          "episode":         { "type": "integer" },

          "start_sec":       { "type": "integer" },
          "end_sec":         { "type": "integer" },
          "timestamp_label": { "type": "keyword" },

          "transcript":       { "type": "text" },
          "nova_scene_summary": { "type": "text" },

          "characters":      { "type": "keyword" },
          "locations":       { "type": "keyword" },
          "emotions":        { "type": "keyword" },
          "relationships":   { "type": "text" },
          "topics":          { "type": "keyword" },
          "visual_tags":     { "type": "keyword" },
          "important_events":{ "type": "text" },

          "embedding": {
            "type": "knn_vector",
            "dimension": 1024,
            "method": {
              "name": "hnsw",
              "space_type": "cosinesimil",
              "engine": "lucene"
            }
          }
        }
      }
    }

Use the script below to generate embeddings for the JSON files and dump them into this index

import argparse
    import json
    import os

    import boto3
    from botocore.exceptions import ClientError
    from opensearchpy import OpenSearch, RequestsHttpConnection
    from requests_aws4auth import AWS4Auth


    # -------------- S3 helpers --------------
    def parse_s3_uri(uri: str):
        if not uri.startswith("s3://"):
            raise ValueError(f"Invalid S3 URI: {uri}")
        without = uri[5:]
        parts = without.split("/", 1)
        if len(parts) != 2:
            raise ValueError(f"Invalid S3 URI (missing key/prefix): {uri}")
        return parts[0], parts[1]


    def list_s3_json_objects(s3_client, s3_prefix_uri: str):
        """
        List all JSON objects under the given s3://bucket/prefix
        (this is where your scene_docs are stored).
        """
        bucket, prefix = parse_s3_uri(s3_prefix_uri)
        paginator = s3_client.get_paginator("list_objects_v2")

        for page in paginator.paginate(Bucket=bucket, Prefix=prefix.rstrip("/") + "/"):
            for obj in page.get("Contents", []):
                key = obj["Key"]
                if key.lower().endswith(".json"):
                    yield bucket, key


    def get_s3_json(s3_client, bucket: str, key: str) -> dict:
        resp = s3_client.get_object(Bucket=bucket, Key=key)
        body = resp["Body"].read().decode("utf-8")
        return json.loads(body)


    # -------------- Bedrock (Titan Embeddings) --------------
    def get_titan_embedding(bedrock_client, model_id: str, text: str, dimensions: int = 1024):
        """
        Call Titan Text Embeddings v2 (or G1) to get an embedding vector.
        Docs: inputText + optional dimensions. :contentReference[oaicite:1]{index=1}
        """
        if not text:
            text = " "  # Titan requires non-empty string

        body = json.dumps(
            {
                "inputText": text
            }
        )

        resp = bedrock_client.invoke_model(
            modelId=model_id,
            body=body,
            contentType="application/json",
            accept="application/json",
        )
        resp_body = json.loads(resp["body"].read())
        # titan-embed-text-v2 response: {"embedding": [...], "inputTextTokenCount": ...}
        embedding = resp_body["embedding"]
        return embedding


    def build_embedding_text(scene_doc: dict) -> str:
        """
        Concatenate important fields into a single text for embeddings.
        """
        parts = []

        summary = scene_doc.get("nova_scene_summary") or scene_doc.get("scene_summary") or ""
        transcript = scene_doc.get("transcript", "")
        chars = ", ".join(scene_doc.get("characters", []))
        rels = "; ".join(scene_doc.get("relationships", []))
        topics = ", ".join(scene_doc.get("topics", []))
        emotions = ", ".join(scene_doc.get("emotions", []))
        visual_tags = ", ".join(scene_doc.get("visual_tags", []))

        if summary:
            parts.append("[Summary] " + summary)
        if transcript:
            parts.append("[Transcript] " + transcript)
        if chars:
            parts.append("[Characters] " + chars)
        if rels:
            parts.append("[Relationships] " + rels)
        if topics:
            parts.append("[Topics] " + topics)
        if emotions:
            parts.append("[Emotions] " + emotions)
        if visual_tags:
            parts.append("[Visual tags] " + visual_tags)

        return "\n".join(parts)


    # -------------- OpenSearch client --------------
    def create_opensearch_client(
        region: str,
        endpoint: str,
        service: str = "aoss",
        username: str | None = None,
        password: str | None = None,
    ):
        """
        Create an OpenSearch client.

        Authentication modes:
          - If `username` and `password` are provided, use HTTP Basic auth.
          - Otherwise, fall back to SigV4 (AWS) auth for Serverless or classic domains.

        service:
          - 'aoss' for OpenSearch Serverless
          - 'es'   for classic OpenSearch domains
        """
        host = endpoint.replace("https://", "").replace("http://", "")

        if username is not None and password is not None:
            # Use HTTP basic auth (username/password)
            http_auth = (username, password)
        else:
            # Fall back to AWS SigV4 auth
            session = boto3.Session(region_name=region)
            credentials = session.get_credentials()
            awsauth = AWS4Auth(
                credentials.access_key,
                credentials.secret_key,
                region,
                service,
                session_token=credentials.token,
            )
            http_auth = awsauth

        client = OpenSearch(
            hosts=[{"host": host, "port": 443}],
            http_auth=http_auth,
            use_ssl=True,
            verify_certs=True,
            connection_class=RequestsHttpConnection,
        )
        return client


    def index_scene_doc(os_client, index_name: str, doc: dict):
        """
        Index a single scene_doc into OpenSearch.
        """
        scene_id = doc["scene_id"]
        resp = os_client.index(index=index_name, id=scene_id, body=doc, refresh=False)
        return resp


    # -------------- Main indexing flow --------------
    def index_scenes_to_opensearch(
        scene_s3_prefix: str,
        region: str,
        embed_model_id: str,
        os_endpoint: str,
        os_index: str,
        os_service: str = "aoss",
        os_username: str | None = None,
        os_password: str | None = None,
        embedding_dim: int = 1024,
    ):
        session = boto3.Session(region_name=region)
        s3_client = session.client("s3")
        bedrock_client = session.client("bedrock-runtime")
        os_client = create_opensearch_client(
            region, os_endpoint, service=os_service, username=os_username, password=os_password
        )

        for bucket, key in list_s3_json_objects(s3_client, scene_s3_prefix):
            print(f"Processing s3://{bucket}/{key}")
            scene_doc = get_s3_json(s3_client, bucket, key)

            # Build text and embedding
            embed_text = build_embedding_text(scene_doc)
            embedding = get_titan_embedding(
                bedrock_client, embed_model_id, embed_text, dimensions=embedding_dim
            )

            # Attach embedding field
            scene_doc["embedding"] = embedding

            # Index into OpenSearch
            resp = index_scene_doc(os_client, os_index, scene_doc)
            result = resp.get("result", "unknown")
            print(f"  Indexed scene_id={scene_doc.get('scene_id')} result={result}")


    def main():
        parser = argparse.ArgumentParser(
            description=(
                "Index 3Netra scene docs from S3 into OpenSearch using Titan embeddings."
            )
        )
        parser.add_argument(
            "--scene-s3-prefix",
            required=True,
            help=(
                "S3 URI prefix where scene JSONs are stored. "
                "Example: s3://my-bucket/3netra/FRIENDS_S01E01"
            ),
        )
        parser.add_argument(
            "--region",
            default="us-east-1",
            help="AWS Region for Bedrock, S3, and OpenSearch (default: us-east-1).",
        )
        parser.add_argument(
            "--embed-model-id",
            default="amazon.titan-embed-text-v2:0",
            help="Titan embeddings model ID (default: amazon.titan-embed-text-v2:0).",
        )
        parser.add_argument(
            "--os-endpoint",
            required=True,
            help=(
                "OpenSearch HTTPS endpoint (no index). "
                "Example: https://abc123.us-east-1.aoss.amazonaws.com"
            ),
        )
        parser.add_argument(
            "--os-index",
            default="video_scenes",
            help="OpenSearch index name (default: video_scenes).",
        )
        parser.add_argument(
            "--os-service",
            default="aoss",
            help="SigV4 service name: 'aoss' for Serverless, 'es' for domains (default: aoss).",
        )
        parser.add_argument(
            "--os-username",
            help=(
                "OpenSearch basic auth username (optional). "
                "If not provided, the script will read `OS_USERNAME` or `OPENSEARCH_USERNAME` env vars."
            ),
        )
        parser.add_argument(
            "--os-password",
            help=(
                "OpenSearch basic auth password (optional). "
                "If not provided, the script will read `OS_PASSWORD` or `OPENSEARCH_PASSWORD` env vars."
            ),
        )
        parser.add_argument(
            "--embedding-dim",
            type=int,
            default=1024,
            help="Embedding dimension (must match index mapping, default: 1024).",
        )

        args = parser.parse_args()

        # Accept username/password from CLI args or environment variables.
        os_username = (
            args.os_username
            or os.environ.get("OS_USERNAME")
            or os.environ.get("OPENSEARCH_USERNAME")
        )
        os_password = (
            args.os_password
            or os.environ.get("OS_PASSWORD")
            or os.environ.get("OPENSEARCH_PASSWORD")
        )

        index_scenes_to_opensearch(
            scene_s3_prefix=args.scene_s3_prefix,
            region=args.region,
            embed_model_id=args.embed_model_id,
            os_endpoint=args.os_endpoint,
            os_index=args.os_index,
            os_service=args.os_service,
            os_username=os_username,
            os_password=os_password,
            embedding_dim=args.embedding_dim,
        )


    if __name__ == "__main__":
        main()

You can see the data dumped through the Discover section or using the workbench from the side menu in the OpenSearch dashboard

Building the Reasoning Layer:

Create a lambda function with this code

import json
    import boto3
    import os
    from opensearchpy import OpenSearch, RequestsHttpConnection

    # --- Configuration ---
    # Store these in Lambda Environment Variables for security
    OPENSEARCH_HOST = ""
    OPENSEARCH_USER = ""
    OPENSEARCH_PASS = ""
    REGION = os.environ.get('AWS_REGION', 'us-west-2')

    # --- Clients ---
    bedrock_runtime = boto3.client('bedrock-runtime', region_name=REGION)

    os_client = OpenSearch(
        hosts=[{'host': OPENSEARCH_HOST, 'port': 443}],
        http_auth=(OPENSEARCH_USER, OPENSEARCH_PASS),
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection
    )

    def get_embedding(text):
        """Generates vector embedding using Titan v2"""
        body = json.dumps({"inputText": text})
        response = bedrock_runtime.invoke_model(
            modelId="amazon.titan-embed-text-v2:0",
            body=body,
            accept="application/json",
            contentType="application/json"
        )
        response_body = json.loads(response['body'].read())
        return response_body['embedding']

    def search_opensearch(vector, video_id, k=5):
        """Performs k-NN search on OpenSearch"""
        query = {
            "size": k,
            "_source": ["timestamp_label", "nova_scene_summary", "characters", "emotions"],
            "query": {
                "knn": {
                    "embedding": {  # Ensure this field name matches your index mapping
                        "vector": vector,
                        "k": k
                    }
                }
            }
        }
        # In production, use an alias or specific index logic
        index_name = "video_scenes"
        response = os_client.search(index=index_name, body=query)
        return [hit['_source'] for hit in response['hits']['hits']]

    def parse_timestamp(ts_str):
        """Helper to convert timestamp to seconds for sorting"""
        try:
            start = ts_str.split('-')[0].strip()
            parts = list(map(int, start.split(':')))
            if len(parts) == 3: return parts[0]*3600 + parts[1]*60 + parts[2]
            if len(parts) == 2: return parts[0]*60 + parts[1]
            return 0
        except:
            return 0

    def lambda_handler(event, context):
        print(f"Received Event: {event}")

        # 1. Initialize Response Info
        # We must echo back the same identifiers Bedrock sent us
        action_group = event.get('actionGroup', '')
        api_path = event.get('apiPath')
        http_method = event.get('httpMethod')
        function_name = event.get('function') # Fallback for different agent types

        # 2. Extract Parameters
        # Bedrock sends parameters in a list: [{'name': 'query', 'value': '...'}, ...]
        params = {}
        if 'parameters' in event:
            for p in event['parameters']:
                params[p['name']] = p['value']

        # Also check 'requestBody' if parameters aren't found (common in POST requests)
        if not params and 'requestBody' in event:
            try:
                body_content = event['requestBody']['content']['application/json']['properties']
                for prop in body_content:
                    params[prop['name']] = prop['value']
            except:
                pass

        user_query = params.get('query', '')
        video_id = params.get('video_id', 'default_video')

        # 3. Validation
        if not user_query:
            result_text = "Error: No query provided in parameters."
        else:
            # 4. Perform Search (Your existing logic)
            try:
                print(f"Embedding query: {user_query}")
                vector = get_embedding(user_query)

                print(f"Searching OpenSearch for video: {video_id}")
                raw_hits = search_opensearch(vector, video_id)

                # Sort by timestamp
                sorted_hits = sorted(raw_hits, key=lambda x: parse_timestamp(x.get('timestamp_label', '0:00')))

                # Format Context
                result_text = "RELEVANT VIDEO SCENES (Chronological):\n"
                if not sorted_hits:
                    result_text += "No relevant scenes found in memory."
                else:
                    for hit in sorted_hits:
                        # Robust field access
                        time_lbl = hit.get('timestamp_label', 'Unknown Time')
                        summary = hit.get('nova_scene_summary', 'No summary')
                        emotions = hit.get('emotions', [])
                        if isinstance(emotions, list): emotions = ", ".join(emotions)

                        result_text += f"[Time: {time_lbl}] {summary} (Emotions: {emotions})\n"

            except Exception as e:
                print(f"Processing Error: {str(e)}")
                result_text = f"System Error during search: {str(e)}"

        # 5. Construct Response (Dynamic based on Input Type)
        response_body = {
            "application/json": {
                "body": result_text
            }
        }

        response = {
            "messageVersion": "1.0",
            "response": {
                "actionGroup": action_group,
                "responseBody": response_body
            }
        }

        # If it was an API Path call (OpenAPI), add these keys:
        if api_path:
            response['response']['apiPath'] = api_path
            response['response']['httpMethod'] = http_method
            response['response']['httpStatusCode'] = 200
        # If it was a Function call, add this key:
        elif function_name:
            response['response']['function'] = function_name

        print(f"Returning Response: {response}")
        return response
        """
        The Entry Point for the Bedrock Agent.
        Bedrock sends parameters inside 'parameters' or 'requestBody'.
        """
        print(f"Received Event: {event}")

        # 1. Parse Input from Bedrock Agent
        # Bedrock Agents structure events differently. We usually define an Action Group.
        # We will assume we extract 'query' and 'video_id' from the function parameters.

        agent_params = {}
        if 'parameters' in event:
            # Standard Agent format
            for param in event['parameters']:
                agent_params[param['name']] = param['value']

        user_query = agent_params.get('query', '')
        video_id = agent_params.get('video_id', 'default_video') # Fallback if needed

        if not user_query:
            return {
                "messageVersion": "1.0",
                "response": {
                    "actionGroup": event.get('actionGroup', ''),
                    "function": event.get('function', ''),
                    "functionResponse": {
                        "responseBody": {
                            "TEXT": {"body": "Error: No query provided."}
                        }
                    }
                }
            }

        # 2. Logic: Embed -> Search -> Sort
        try:
            vector = get_embedding(user_query)
            raw_hits = search_opensearch(vector, video_id)

            # Sort by timestamp (Layer 3 Logic)
            sorted_hits = sorted(raw_hits, key=lambda x: parse_timestamp(x.get('timestamp_label', '0:00')))

            # 3. Format Output (The Context String)
            context_str = "RELEVANT VIDEO SCENES:\n"
            for hit in sorted_hits:
                context_str += f"[Time: {hit.get('timestamp_label')}] {hit.get('nova_scene_summary')} (Emotions: {hit.get('emotions')})\n"

            # 4. Return to Bedrock Agent
            # The response structure MUST match what Bedrock expects
            response_body = {
                "TEXT": {
                    "body": context_str
                }
            }

            action_response = {
                "actionGroup": event['actionGroup'],
                "function": event['function'],
                "functionResponse": {
                    "responseBody": response_body
                }
            }

            return {
                "messageVersion": "1.0",
                "response": action_response
            }

        except Exception as e:
            print(f"Error: {e}")
            return {
                "messageVersion": "1.0",
                "response": {
                    "actionGroup": event.get('actionGroup', ''),
                    "function": event.get('function', ''),
                    "functionResponse": {
                        "responseBody": {
                            "TEXT": {"body": f"System Error: {str(e)}"}
                        }
                    }
                }
            }

The code first acts as an adapter. It accepts the JSON event from the Bedrock Agent (which contains the user’s natural language query) and extracts the core question (e.g., “Why is he crying?”)
It converts the user’s text query into a vector embedding using the Titan Text v2 model.
It sends this vector to OpenSearch to find the top $k$ most semantically similar scenes. This finds the “right content” regardless of where it is in the video.
This is the most critical step for reasoning. The code takes the search results (which come back sorted by relevance score) and re-sorts them by timestamp.
Why: This reconstructs the narrative timeline. It ensures the LLM reads the “Cause” (Minute 10) before the “Effect” (Minute 50), preventing it from hallucinating a backwards story.
Finally, the code strips away complex JSON syntax and formats the data into a clean, human-readable text block.
It labels every scene [Time: MM:SS] so the LLM can cite its sources in the final answer.

Agent Creation:

Visit the Agents section from the Bedrock side panel
Click on the Create Agent button
Give a name and description for the agent
Select the model you want to use and give proper instructions for the model based on our use case

The instructions I gave for the model are

You are 3Netra, an expert Video Intelligence AI. 
    You have access to a tool called "search_video_memory" that retrieves scene details from the video.

    YOUR RULES:
    1. ALWAYS use the "search_video_memory" tool when the user asks about the video content.
    2. The tool returns a list of scenes with timestamps.
    3. Answer the user's question using ONLY that information.
    4. If the tool returns no relevant info, say "I cannot find that in the video memory."
    5. CITE TIMESTAMPS in your answer like this: [05:00].

Save till this part and click on add action group.
Give a name for this, and select the Define with API Schemas, and select the lambda we previously created

In the action group schema, select Define via in-line Schema Editor

Paste this schema there

{
      "openapi": "3.0.0",
      "info": {
        "title": "Video Memory Search API",
        "version": "1.0.0",
        "description": "API for searching semantic events and scenes within a video."
      },
      "paths": {
        "/search_video_memory": {
          "post": {
            "summary": "Searches the video memory for relevant scenes based on a user query.",
            "description": "Use this function whenever the user asks a question about the content, plot, characters, or events in the video.",
            "operationId": "search_video_memory",
            "parameters": [
              {
                "name": "query",
                "in": "query",
                "description": "The natural language question or topic the user is asking about (e.g., 'Why is he crying?').",
                "required": true,
                "schema": {
                  "type": "string"
                }
              },
              {
                "name": "video_id",
                "in": "query",
                "description": "The identifier of the video (e.g., 'movie_1'). Default to 'current_video' if not specified.",
                "required": true,
                "schema": {
                  "type": "string"
                }
              }
            ],
            "responses": {
              "200": {
                "description": "Successfully retrieved scene context",
                "content": {
                  "application/json": {
                    "schema": {
                      "type": "object",
                      "properties": {
                        "body": {
                          "type": "string",
                          "description": "The text containing relevant scene summaries and timestamps."
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }

Save and Exit. Then, prepare the model with these settings. On the right side, you can test your model with your inputs.
Make sure you have proper permission to access Lambda
If there are any errors, follow the trace to resolve the issue
Once you confirm the agent is working fine, create an alias to use it at the interface layer

Building the Interface layer:

For building the interface layer, we are gonna use the Streamlit library. Install it through PIP
Copy the Alias code and Agent code to use in this script

 import streamlit as st
    import boto3
    import uuid
    import json

    # --- CONFIGURATION ---
    # Replace these with your actual IDs from Step 1
    AGENT_ID = ""      # e.g., "X7W3J9..."
    AGENT_ALIAS_ID = "" # e.g., "TSTALIAS..."
    SESSION_ID = str(uuid.uuid4())           # Unique ID for this chat session
    REGION = "us-west-2"                     # Ensure this matches your Agent's region

    # --- CLIENT SETUP ---
    client = boto3.client("bedrock-agent-runtime", region_name=REGION)

    st.set_page_config(page_title="3Netra Video Agent", layout="wide")
    st.title("👁️ 3Netra: Video Memory Agent")

    # --- SESSION STATE (Memory) ---
    if "messages" not in st.session_state:
        st.session_state.messages = []

    # --- UI LOGIC ---
    # 1. Display Chat History
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

    # 2. Handle User Input
    if prompt := st.chat_input("Ask about the video..."):
        # Add user message to history
        st.session_state.messages.append({"role": "user", "content": prompt})
        with st.chat_message("user"):
            st.markdown(prompt)

        # 3. Call Bedrock Agent
        with st.chat_message("assistant"):
            message_placeholder = st.empty()
            full_response = ""

            try:
                # The invoke_agent API is streaming (it comes back in chunks)
                response = client.invoke_agent(
                    agentId=AGENT_ID,
                    agentAliasId=AGENT_ALIAS_ID,
                    sessionId=SESSION_ID,
                    inputText=prompt
                )

                # Parse the event stream
                event_stream = response.get("completion")
                for event in event_stream:
                    if 'chunk' in event:
                        chunk = event['chunk']
                        if 'bytes' in chunk:
                            text_chunk = chunk['bytes'].decode('utf-8')
                            full_response += text_chunk
                            message_placeholder.markdown(full_response + "▌")

                message_placeholder.markdown(full_response)

                # Add assistant response to history
                st.session_state.messages.append({"role": "assistant", "content": full_response})

            except Exception as e:
                st.error(f"Error invoking agent: {e}")

Once the script is ready. Run the script using this command

streamlit run app.py
It will open the browser with the chat interface. These are the questions I asked. You can see the response from the model

The response is not bad. We can refine it more by making more changes in the ingestion layer to get more accurate results.

That’s it. We built an agent that can respond based on the video memory. It has a lot of other applications also. Will explore more in the upcoming articles.

Thanks. Feel free to drop comments and suggestions.

Building a Real-World Context-Aware Movie Chatbot Using Amazon Bedrock - Nova Pro

Salam Shaik — Wed, 25 Jun 2025 19:57:12 +0000

Hi everyone,

This article helps you build a chatbot that can suggest movies based on your prompt, provide you with the movie details, and maintain context throughout the chat session.

The AWS services I have used to build this solution are:

AWS Bedrock — Nova Pro model — Converse API
Lambda
API Gateway
EC2 for running an Elastic Search container
S3 for static web hosting
CloudFront for CDN
Route 53 for DNS
CloudWatch for Logging
DynamoDB for storing Sessions and Data

Infrastructure Overview of the Chatbot Platform

Let’s start the implementation. I divided this infrastructure into 3 parts

Front-end deployment
API Layer
Backend Services

Deploying Front-end services:

Created an S3 bucket with the domain name I have and enabled S3 static web hosting from the properties of the bucket

Bucket Policy for static web hosting

{
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "PublicReadGetObject",
                "Effect": "Allow",
                "Principal": "*",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::chitrangi.cloudnirvana.in/*"
            }
        ]
    }

Requested an SSL certificate from the Certificate Manager for the domain chitrangi.cloudnirvana.in

Created a distribution in the CloudFront with this certificate and an S3 bucket hosting

Created a record for the subdomain in Route 53 and pointed it to CloudFront

Now everything is ready, let’s upload the index.html file to the S3 bucket. Here is the code for the HTML file

<!DOCTYPE html>
    <html lang="en">
    <head>
      <meta charset="UTF-8">
      <title>🎬 Chitrangi - Movie Chatbot</title>
      <style>
        body {
          font-family: 'Segoe UI', sans-serif;
          background-color: #f2f2f2;
          margin: 0;
          padding: 0;
        }

        #chat-container {
          width: 90%;
          max-width: 800px;
          margin: 30px auto;
          background: #fff;
          border-radius: 8px;
          box-shadow: 0 4px 10px rgba(0, 0, 0, 0.1);
          padding: 20px;
          position: relative;
        }

        h2 {
          margin: 0 0 5px 0;
        }

        #session-info {
          font-size: 13px;
          color: #666;
          margin-bottom: 10px;
        }

        #info-box {
          background-color: #e9f7ef;
          border-left: 5px solid #28a745;
          padding: 10px 15px;
          margin-bottom: 15px;
          border-radius: 5px;
          font-size: 14px;
        }

        #messages {
          height: 500px;
          overflow-y: scroll;
          padding: 10px;
          border: 1px solid #ddd;
          border-radius: 5px;
          background-color: #f9f9f9;
        }

        .message {
          margin: 10px 0;
          padding: 10px 15px;
          border-radius: 18px;
          max-width: 75%;
          display: inline-block;
          word-wrap: break-word;
          animation: fadeIn 0.3s ease;
        }

        .user {
          background-color: #007bff;
          color: white;
          margin-left: auto;
          text-align: right;
        }

        .bot {
          background-color: #e0ffe0;
          color: #333;
          text-align: left;
          margin-right: auto;
        }

        .typing-indicator-bubble {
          background-color: #e0ffe0;
          color: #333;
          padding: 10px 15px;
          border-radius: 18px;
          max-width: 150px;
          margin: 10px 0;
          display: flex;
          gap: 4px;
          justify-content: center;
          align-items: center;
        }

        .dot {
          width: 8px;
          height: 8px;
          background-color: #888;
          border-radius: 50%;
          animation: bounce 1.4s infinite;
        }

        .dot:nth-child(2) {
          animation-delay: 0.2s;
        }

        .dot:nth-child(3) {
          animation-delay: 0.4s;
        }

        @keyframes bounce {
          0%, 80%, 100% { transform: scale(0); }
          40% { transform: scale(1); }
        }

        #input-container {
          margin-top: 15px;
          display: flex;
          gap: 10px;
        }

        #input {
          flex: 1;
          padding: 10px;
          font-size: 16px;
          border-radius: 5px;
          border: 1px solid #ccc;
        }

        #send {
          padding: 10px 20px;
          font-size: 16px;
          background: #007bff;
          color: white;
          border: none;
          border-radius: 5px;
          cursor: pointer;
        }

        #send:hover {
          background: #0056b3;
        }

        #reset {
          position: absolute;
          top: 20px;
          right: 20px;
          background: #dc3545;
          color: white;
          border: none;
          padding: 8px 16px;
          border-radius: 5px;
          cursor: pointer;
        }

        #reset:hover {
          background: #a71d2a;
        }

        @keyframes fadeIn {
          from { opacity: 0; transform: translateY(10px); }
          to { opacity: 1; transform: translateY(0); }
        }
      </style>
    </head>
    <body>

    <div id="chat-container">
      <h2>🎬 Chitrangi - Movie Chatbot</h2>
      <div id="session-info">Session ID: <code id="session-id"></code></div>

      <div id="info-box">
        <strong>What Chitrangi can do:</strong>
        <ul style="margin: 5px 0 0 15px;">
          <li>🎥 Suggest movies by genre or mood</li>
          <li>📖 Provide movie details like synopsis and cast</li>
          <li>🚫 Will not answer non-movie questions</li>
        </ul>
      </div>

      <button id="reset" onclick="resetSession()">Start New Chat</button>

      <div id="messages"></div>

      <div id="input-container">
        <input type="text" id="input" placeholder="Ask for a movie, like 'Suggest a thriller'...">
        <button id="send">Send</button>
      </div>
    </div>

    <script>
      const apiUrl = ""; // Replace with your API endpoint

      let sessionId = localStorage.getItem("chitrangi_session_id");
      if (!sessionId) {
        sessionId = crypto.randomUUID();
        localStorage.setItem("chitrangi_session_id", sessionId);
      }
      document.getElementById("session-id").textContent = sessionId;

      function resetSession() {
        localStorage.removeItem("chitrangi_session_id");
        sessionId = crypto.randomUUID();
        localStorage.setItem("chitrangi_session_id", sessionId);
        document.getElementById("session-id").textContent = sessionId;
        document.getElementById("messages").innerHTML = "";
      }

      const messagesDiv = document.getElementById('messages');
      const inputField = document.getElementById('input');
      const sendButton = document.getElementById('send');

      function appendMessage(text, sender) {
        const div = document.createElement('div');
        div.classList.add('message', sender);
        div.textContent = text;
        const wrapper = document.createElement('div');
        wrapper.style.display = 'flex';
        wrapper.style.justifyContent = sender === 'user' ? 'flex-end' : 'flex-start';
        wrapper.appendChild(div);
        messagesDiv.appendChild(wrapper);
        messagesDiv.scrollTop = messagesDiv.scrollHeight;
      }

      function showTypingBubble() {
        const typingBubble = document.createElement('div');
        typingBubble.classList.add('typing-indicator-bubble');
        typingBubble.id = 'typing-bubble';
        typingBubble.innerHTML = `<div class="dot"></div><div class="dot"></div><div class="dot"></div>`;
        const wrapper = document.createElement('div');
        wrapper.style.display = 'flex';
        wrapper.style.justifyContent = 'flex-start';
        wrapper.appendChild(typingBubble);
        messagesDiv.appendChild(wrapper);
        messagesDiv.scrollTop = messagesDiv.scrollHeight;
      }

      function removeTypingBubble() {
        const typingBubble = document.getElementById('typing-bubble');
        if (typingBubble && typingBubble.parentElement) typingBubble.parentElement.remove();
      }

      async function typeMessage(text, sender) {
        const wrapper = document.createElement('div');
        wrapper.style.display = 'flex';
        wrapper.style.justifyContent = sender === 'user' ? 'flex-end' : 'flex-start';
        const div = document.createElement('div');
        div.classList.add('message', sender);
        div.style.whiteSpace = 'pre-wrap';
        div.innerHTML = text.replace(/(?<=\n|^)\d+\.\s(.*)/g, (_, title) => `• <strong>${title}</strong>`);
        wrapper.appendChild(div);
        messagesDiv.appendChild(wrapper);
        messagesDiv.scrollTop = messagesDiv.scrollHeight;
      }

      async function sendMessage() {
        const userInput = inputField.value.trim();
        if (!userInput) return;

        appendMessage(userInput, 'user');
        inputField.value = '';
        showTypingBubble();

        try {
          const response = await fetch(apiUrl, {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({ message: userInput, session_id: sessionId })
          });

          const data = await response.json();
          removeTypingBubble();
          await typeMessage(data.response, 'bot');
        } catch (err) {
          removeTypingBubble();
          appendMessage('Error fetching response. Check your server.', 'bot');
        }
      }

      sendButton.addEventListener('click', sendMessage);
      inputField.addEventListener('keypress', (e) => {
        if (e.key === 'Enter') sendMessage();
      });
    </script>

    </body>
    </html>

Note: Replace the API URL with the API Gateway endpoint, which we are going to deploy in the next step

Deploying API Layer:

Create a Lambda function, Chitrangi, with Python runtime
Visit the API Gateway and create an HTTP API and integrate with the Lambda function you created, and make sure CORS is configured correctly

Note: Lambda is the heart of this system. Will provide the code at the end of this article. Before that, let’s deploy the backend services

Deploying Backend Services:

We need the following 3 services ready to work with Lambda

Dynamo DB
Bedrock
EC2 — runs Elastic Search

Dynamo DB:

Visit Dynamo DB and create 2 tables, one of which holds the movie data
Another table for holding the session data

For chatbot_sessions, keep the session_id as the Partition key and timestamp as the Sort Key
For imdb_movies, keep the movie_id as the Partition Key
Let’s dump the data in the imdb_movies table. I am using this dataset from Kaggle for this chatbot https://www.kaggle.com/datasets/ashpalsingh1525/imdb-movies-dataset
Use this script to dump this data into DynamoDB

 import boto3
    from botocore.exceptions import ClientError
    import pandas as pd


    file_path = "imdb_movies.csv"

    df = pd.read_csv(file_path)

    df = df[["orig_title","overview","genre","crew"]]

    modified_df = df.dropna(subset=["orig_title","overview","genre","crew"])


    print(modified_df.head())

    # Initialize DynamoDB resource
    dynamodb = boto3.resource('dynamodb', region_name='ap-south-1')  # e.g., 'us-east-1'
    table = dynamodb.Table('imdb_movies')


    for index, row in modified_df.iterrows():
        movie = {
            'movie_id': index+1,
            'title': row['orig_title'],
            'description': row['overview'],
            'genre': row['genre'],
            'crew': row['crew'],
        }

        print(f"Inserting: {movie}")

        try:
            table.put_item(Item=movie)
            print(f"Inserted: {movie['title']}")
        except ClientError as e:
            print(f"Failed to insert {movie['title']}: {e.response['Error']['Message']}")

BedRock:

We need two models for this chatbot—one for converting this data into vectors and one for handling the prompts. I am using these 2 models from the bedrock
Titan Text Embeddings V2 — for generating vectors for the data we stored in DynamoDB
Nova Pro — for handling the user prompts

Note: Please request access to these models to proceed further

EC2 — Elastic search container:

The reason for using EC2 for running Elastic Search instead of using open search service is, I will be working on this on and off for several days. So I need the service to turn off to avoid more when I am not working, and I need it immediately when I want. Opensearch domain creation will take a lot of time, and we can stop and start the service like EC2
To run Elastic Search container in the EC2 instance, please follow this documentation https://www.elastic.co/docs/deploy-manage/deploy/self-managed/install-elasticsearch-docker-basic
Instance configuration is t3.large and 16GB of storage, with the port open for Elastic Search is 9200

Dumping data to Elastic Search:

Use the following script to read data from the Dynamo DB and convert it into vectors, and dump it into the Elastic Search Index

import boto3
    import json
    import requests
    from requests.auth import HTTPBasicAuth
    from elasticsearch import Elasticsearch, helpers

    # Config
    region = "ap-south-1"  # Change to your region
    dynamodb = boto3.resource("dynamodb", region_name=region)
    table = dynamodb.Table("imdb_movies")

    bedrock = boto3.client("bedrock-runtime", region_name=region)

    es = Elasticsearch(
        hosts=["https://elastic search endpoint:9200"],  # Replace with your Elasticsearch endpoin
        basic_auth=("elastic", "password"),  # Basic auth if enabled
        verify_certs=False
        )

    index_name = "imdb_movies"

    # Function to embed text using Titan
    def get_titan_embedding(text):
        payload = {
            "inputText": text
        }
        response = bedrock.invoke_model(
            modelId="amazon.titan-embed-text-v2:0",
            body=json.dumps(payload),
            contentType="application/json"
        )
        embedding = json.loads(response['body'].read())['embedding']
        return embedding

    actions = []
    last_evaluated_key = None
    total_items = 0

    while True:
        if last_evaluated_key:
            response = table.scan(ExclusiveStartKey=last_evaluated_key)
        else:
            response = table.scan()

        items = response.get('Items', [])
        total_items += len(items)
        print(f"Fetched {len(items)} items. Total fetched: {total_items}")

        for item in items:
            desc = item['description']
            embedding = get_titan_embedding(desc)

            action = {
                "_index": index_name,
                "_id": item['movie_id'],
                "_source": {
                    "movie_id": item['movie_id'],
                    "title": item['title'],
                    "description": desc,
                    "embedding": embedding,
                    "genre": item.get('genre', ""),
                    "crew": item.get('crew', ""),
                }
            }
            actions.append(action)

            # Optionally flush every N records to avoid memory bloat
            if len(actions) >= 500:
                helpers.bulk(es, actions)
                print(f"Inserted 500 records to Elasticsearch.")
                actions.clear()

        last_evaluated_key = response.get('LastEvaluatedKey')
        if not last_evaluated_key:
            break

    # Insert any remaining actions
    if actions:
        helpers.bulk(es, actions)
        print(f"Inserted remaining {len(actions)} records to Elasticsearch.")

    print(f"Successfully processed {total_items} records from DynamoDB.")

Now that we have everything ready. Let’s start coding in the Lambda function

Lambda function:

Let’s go through step by step here
When the user enters a prompt, that prompt, along with the session ID, will be sent to the Lambda
If there is a session with that session ID in the DynamoDB, we will fetch the previous messages; if not will store the current message as a user role

def store_user_message(session_id, role, message):
        timestamp = int(time.time() * 1000)  # millisecond precision
        session_table.put_item(Item={
            'session_id': session_id,
            'timestamp': timestamp,
            'role': role,
            'message': message
        })

    def get_session_history(session_id):
        response = session_table.query(
            KeyConditionExpression=Key('session_id').eq(session_id),
            ScanIndexForward=True  # Sort by timestamp ascending
        )

        # Convert DynamoDB items into model-ready format
        messages = []
        for item in response['Items']:
            messages.append({
                "role": item["role"],  # 'user' or 'assistant'
                "content": [{"text": item["message"]}]
            })

        return messages

Then user prompt will be sent for the intent classification to the Nova model.
Here I am using the converse function of the bedrock runtime instead invoke_model to maintain context awareness
The intent classification prompt and calling the converse API

classify_system_prompt = """You are a prompt classifier — not a chatbot. Your only job is to classify the user's **latest message**.

    - Use previous messages **only as context** if the user prompt is vague.
    - Do not react to previous assistant responses.
    - Never generate movie suggestions or details. Do not respond like a chatbot.

    Return exactly **one** JSON object matching one of the following:

    1. If the user asks for movie details (e.g., who acted, synopsis, or info), return:
       {"type": "ask_details", "title": "<movie name>"}

    2. If the user is asking for new movie suggestions, return:
       {"type": "suggestion", "genre": "<genre>", "mood": "<mood>"}

    3. If the user greets (Hi, Hello, etc.), or say thanks return:
       {"type": "greeting"}

    4. If the user says thanks or thanks you or express his gratitude, return:
        {"type":"gratitude"}

    4. If the message is not related to movies, return:
       {"type": "irrelevant"}


    Important:
    - Use only the **last user prompt** to determine classification.
    - Use prior messages **only to resolve vague references** (like “that movie”).
    - Respond ONLY with the JSON. Never generate answers, movie descriptions, or titles unless needed for classification.


    """
    classify_system_prompt_obj = {"text":classify_system_prompt}

 def send_to_model(message,system_prompt,session_id,get_session_messages):


        previous_messages = []

        if get_session_messages:
            previous_messages = get_session_history(session_id)

        previous_messages.append({"role": "user", "content": [{"text": message}]})

        print(f"previous messages {previous_messages}")

        system_prompts = []

        if system_prompt is not None:
            system_prompts.append(system_prompt)

        temperature = 0.7
        top_k = 200

        inference_config = {"temperature": temperature}

        response = bedrock_nova.converse(
            modelId="model id",
            messages=previous_messages,
            system=system_prompts,
            inferenceConfig=inference_config
        )

        model_text = response["output"]["message"]["content"][0]["text"]
        print(f"model_text {model_text}")
        return model_text

Based on the classification received from the model, we will categorize each prompt and redirect to the respective task
5 Types of classification I have done here. They are:

greetings: If the user prompt represents a greeting kind of prompt, we will redirect them to generate a greeting response

ask_details: If the user is asking for movie details, we will redirect them to fetch movie details from Elastic Search and will generate a movie info response using the model

suggestions: If the user is asking for a movie suggestion, we will redirect them suggestion category where suggestions will be fetched from the Elastic Search index and will generate a movie suggestion response using the model

gratitude: If the user is expressing his happiness or says thank you, then this category will prompt and respond to the user using a model response

irrelevant: if the user is asking, which is not at all related to movie suggestions or details, will respond back to the user with a polite message saying it is irrelevant to the movies.

 def route_prompt(prompt,prompt_type,session_id,json_dump):
        if prompt_type == "irrelevant":
            response_text = "Looks like your question is irrelevant to movies. I can help you with movie suggestions. Try asking about genres, actors, or moods!"
            return {
                "statusCode": 200,
                "body": json.dumps({"response": response_text})
            }
        elif prompt_type == "greeting":
            response_text = build_greetings_response(prompt,session_id)

            return {
                "statusCode": 200,
                "body": json.dumps({"response": response_text})
            }
        elif prompt_type == "gratitude":
            response_text = build_gratitude_response(prompt,session_id)
            return {
                "statusCode": 200,
                "body": json.dumps({"response": response_text})
            }
        elif prompt_type == "ask_details":
            print("Entered ask details")
            if get_movie_details(json_dump,prompt,session_id) is None:
                response_text = "I can't find the movie title in our movies list."
                return {
                    "statusCode": 200,
                    "body": json.dumps({"response": "I can't find the movie title in our movies list."})
                }
            else:
                print("Entered ask details")
                response_text = get_movie_details(json_dump, prompt, session_id)
                return {
                    "statusCode": 200,
                    "body": json.dumps({"response": response_text})
                }
        elif prompt_type == "suggestion":
            response_text = get_movie_suggestions(prompt,json_dump,session_id)
            return response_text
        else:
            return {
                "statusCode": 200,
                "body": json.dumps({"response": "I am not sure what you are asking for"})
            }

The prompts for generating respective category responses are here

 greeting_system_prompt = """
    You are a movie chat bot named Chitrangi. Read the user query and prepare a polite response and respond to the user by introducting yourselft as chitrangi.
    These are the things you can do. You can suggest movies. You can get a specific movie details. That's it. Nothing morethan that. Try to keep response in a single line
    """

    gratitude_system_prompt = """
    You are a movie chat bot named Chitrangi.
    If the user expresses gratitude (e.g., says "Thanks", "Thank you", etc.), respond warmly, letting them know you're also happy to help. Try to keep response in a single line
    """

    movie_details_system_prompt = """
    You are a movie chat bot. Based on user_query and movie_object, prepare a polite response and respond to the user.
    Only consider the data from the user_query and movie_object. Don't hallucinate. Try to keep everything in a single line
    """
    movie_details_system_prompt_obj = {"text":movie_details_system_prompt}

    movie_suggestion_system_prompt = f"""
    You are a friendly movie chatbot that suggest movies based movies list provided.  Movies list is final suggestions you need to provide. Nothing from the internet and no hallucinations.  
    Don't express your opinion on each movie in the response. Even though you feel the movies in the movies list are not good match for user query.
    Your responsibility is to take the movies from the movies list and respond to the user based on user query.
    Only suggest the movies from the given movies list. Mention every movie mentioned in the list
    Don’t suggest anything outside the list. Don't hallucinate.
    The movies list will contain the movie suggestions based on user query.
    Don't suggest movies from internet. If the movies list is empty respond to the user that there are no suggestion right now
    Respond like you are suggesting the movies from the list by yourself. But don't hallucinate and don't provide movie info from the internet. Try to keep entire response in a single line
    """
    movie_suggestion_system_prompt_obj = {"text":movie_suggestion_system_prompt}

I hope you got some clarity on how the Lambda function works with Elastic Search, DynamoDB, and Bedrock to respond back to the user

Note: The converse API is a very tricky one. Even though you give specific rules to follow, if it identifies any messages that break the given rules in the entire conversation, it will completely ignore the classification rules. Make sure to use it wisely.

The complete Lambda is here:

import json
    import boto3
    import os
    from elasticsearch import Elasticsearch
    import ast
    from boto3.dynamodb.conditions import Key
    import time


    region = "ap-south-1"
    bedrock = boto3.client("bedrock-runtime", region_name=region)
    dynamodb = boto3.resource("dynamodb")
    session_table = dynamodb.Table("chatbot_sessions")
    bedrock_nova = boto3.client("bedrock-runtime", region_name='us-east-1')

    es = Elasticsearch(
        hosts=["https://endpoint:9200"],
        basic_auth=("elastic", "passowrd"),
        verify_certs=False
    )
    index_name = "imdb_movies"
    session_memory = {}
    session_id = ""

    classify_system_prompt = """You are a prompt classifier — not a chatbot. Your only job is to classify the user's **latest message**.

    - Use previous messages **only as context** if the user prompt is vague.
    - Do not react to previous assistant responses.
    - Never generate movie suggestions or details. Do not respond like a chatbot.

    Return exactly **one** JSON object matching one of the following:

    1. If the user asks for movie details (e.g., who acted, synopsis, or info), return:
       {"type": "ask_details", "title": "<movie name>"}

    2. If the user is asking for new movie suggestions, return:
       {"type": "suggestion", "genre": "<genre>", "mood": "<mood>"}

    3. If the user greets (Hi, Hello, etc.), or say thanks return:
       {"type": "greeting"}

    4. If the user says thanks or thanks you or express his gratitude, return:
        {"type":"gratitude"}

    4. If the message is not related to movies, return:
       {"type": "irrelevant"}


    Important:
    - Use only the **last user prompt** to determine classification.
    - Use prior messages **only to resolve vague references** (like “that movie”).
    - Respond ONLY with the JSON. Never generate answers, movie descriptions, or titles unless needed for classification.


    """
    classify_system_prompt_obj = {"text":classify_system_prompt}



    greeting_system_prompt = """
    You are a movie chat bot named Chitrangi. Read the user query and prepare a polite response and respond to the user by introducting yourselft as chitrangi.
    These are the things you can do. You can suggest movies. You can get a specific movie details. That's it. Nothing morethan that. Try to keep response in a single line
    """

    gratitude_system_prompt = """
    You are a movie chat bot named Chitrangi.
    If the user expresses gratitude (e.g., says "Thanks", "Thank you", etc.), respond warmly, letting them know you're also happy to help. Try to keep response in a single line
    """

    movie_details_system_prompt = """
    You are a movie chat bot. Based on user_query and movie_object, prepare a polite response and respond to the user.
    Only consider the data from the user_query and movie_object. Don't hallucinate. Try to keep everything in a single line
    """
    movie_details_system_prompt_obj = {"text":movie_details_system_prompt}

    movie_suggestion_system_prompt = f"""
    You are a friendly movie chatbot that suggest movies based movies list provided.  Movies list is final suggestions you need to provide. Nothing from the internet and no hallucinations.  
    Don't express your opinion on each movie in the response. Even though you feel the movies in the movies list are not good match for user query.
    Your responsibility is to take the movies from the movies list and respond to the user based on user query.
    Only suggest the movies from the given movies list. Mention every movie mentioned in the list
    Don’t suggest anything outside the list. Don't hallucinate.
    The movies list will contain the movie suggestions based on user query.
    Don't suggest movies from internet. If the movies list is empty respond to the user that there are no suggestion right now
    Respond like you are suggesting the movies from the list by yourself. But don't hallucinate and don't provide movie info from the internet. Try to keep entire response in a single line
    """
    movie_suggestion_system_prompt_obj = {"text":movie_suggestion_system_prompt}



    def store_user_message(session_id, role, message):
        timestamp = int(time.time() * 1000)  # millisecond precision
        session_table.put_item(Item={
            'session_id': session_id,
            'timestamp': timestamp,
            'role': role,
            'message': message
        })

    def get_session_history(session_id):
        response = session_table.query(
            KeyConditionExpression=Key('session_id').eq(session_id),
            ScanIndexForward=True  # Sort by timestamp ascending
        )

        # Convert DynamoDB items into model-ready format
        messages = []
        for item in response['Items']:
            messages.append({
                "role": item["role"],  # 'user' or 'assistant'
                "content": [{"text": item["message"]}]
            })

        return messages



    # Embedding generator
    def invoke_bedrock(model_id, payload):
        response = bedrock.invoke_model(
            modelId=model_id,
            body=json.dumps(payload),
            contentType="application/json"
        )
        return json.loads(response["body"].read())




    def send_to_model(message,system_prompt,session_id,get_session_messages):


        previous_messages = []

        if get_session_messages:
            previous_messages = get_session_history(session_id)

        previous_messages.append({"role": "user", "content": [{"text": message}]})

        print(f"previous messages {previous_messages}")

        system_prompts = []

        if system_prompt is not None:
            system_prompts.append(system_prompt)

        temperature = 0.7
        top_k = 200

        inference_config = {"temperature": temperature}

        response = bedrock_nova.converse(
            modelId="arn:aws:bedrock:us-east-1:556343216872:inference-profile/us.amazon.nova-pro-v1:0",
            messages=previous_messages,
            system=system_prompts,
            inferenceConfig=inference_config
        )

        model_text = response["output"]["message"]["content"][0]["text"]
        print(f"model_text {model_text}")
        return model_text


    #prompt classifer
    def extract_intents_entities(prompt,session_id):
        result = send_to_model(prompt,classify_system_prompt_obj,session_id,True)


        try:
            return result
        except Exception as e:
            print(f"Error parsing JSON: {e}")
            return {}


    #movies search
    def search_movies(query_embedding, exclude_ids=None, top_k=5):
        must_clauses = [{"match_all": {}}]
        must_not_clause = []
        if exclude_ids:
            must_not_clause = [{"terms": {"movie_id": exclude_ids}}]

        search_query = {
            "size": top_k,
            "query": {
                "script_score": {
                    "query": {
                        "bool": {
                            "must_not": must_not_clause
                        }
                    },
                    "script": {
                        "source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
                        "params": {"query_vector": query_embedding}
                    }
                }
            }
        }


        results = es.search(index=index_name, body=search_query)
        return [{
            "movie_id": hit["_source"]["movie_id"],
            "title": hit["_source"]["title"],
            "description": hit["_source"]["description"]
        } for hit in results["hits"]["hits"]]

    #return titan embedding
    def get_titan_embedding(text):
        result = invoke_bedrock("amazon.titan-embed-text-v2:0", {"inputText": text})
        return result["embedding"]

    #nova suggestion response building
    def build_response(prompt, movies,session_id):
        movie_list = []
        for movie in movies:
            if isinstance(movie, dict) and "title" in movie:
                movie_list.append(movie['title'])

        prompt_text = f"""
        User query: {prompt}
        Movies list: {movie_list}"""

        response = send_to_model(prompt_text,movie_suggestion_system_prompt_obj,session_id,True)

        return response

    #nova movie details response building
    def build_movie_details_response(user_query,movie_data,session_id):
        prompt_text = f"""
        user_query= {user_query}
        movie_object= {movie_data}"""
        response = send_to_model(prompt_text, movie_details_system_prompt_obj, session_id,True)
        return response

    def build_greetings_response(user_query,session_id):
        response = send_to_model(greeting_system_prompt,None,session_id,False)
        return response

    def build_gratitude_response(user_query,session_id):
        response = send_to_model(gratitude_system_prompt,None,session_id,False)
        return response

    #movies suggestions fetch
    def get_movie_suggestions(prompt,extracted, session_id):
            session_data = session_memory.get(session_id, {"suggested_movie_ids": []})
            suggested_movie_ids = session_data["suggested_movie_ids"]

            query_embedding = get_titan_embedding(json.dumps(extracted))
            movies = search_movies(query_embedding, exclude_ids=suggested_movie_ids)

            new_movie_ids = [m['movie_id'] for m in movies if 'movie_id' in m]
            session_memory[session_id] = {
                "suggested_movie_ids": suggested_movie_ids + new_movie_ids
            }

            reply = build_response(prompt, movies,session_id)
            # store_user_message(session_id, "assistant", reply)

            return {
                "statusCode": 200,
                "body": json.dumps({"response": reply})
            }

    #movie details fetch
    def get_movie_details(extracted, user_query, session_id):
        title = extracted.get('title', '').strip()

        # fallback to session memory title
        if not title:
            title = session_memory.get(session_id, {}).get("last_title", "")
            print(f"Fallback to session stored title: {title}")

        if not title:
            return "I can't find the movie title. Please provide a specific movie title."

        # update session with this title
        session_data = session_memory.get(session_id, {})
        session_data["last_title"] = title
        session_memory[session_id] = session_data

        response = es.search(index=index_name, body={
            "query": {
                "match": {
                    "title": title
                }
            }
        })

        if response["hits"]["hits"]:
            movie = response["hits"]["hits"][0]["_source"]
            movie_object = {
                "title": movie.get("title", ""),
                "description": movie.get("description", ""),
                "genre": movie.get("genre", ""),
                "crew": movie.get("crew", "")
            }
            return build_movie_details_response(user_query, movie_object,session_id)
        else:
            return "I can't find the movie title in our movies list."





    def route_prompt(prompt,prompt_type,session_id,json_dump):
        if prompt_type == "irrelevant":
            response_text = "Looks like your question is irrelevant to movies. I can help you with movie suggestions. Try asking about genres, actors, or moods!"

            return {
                "statusCode": 200,
                "body": json.dumps({"response": response_text})
            }
        elif prompt_type == "greeting":
            response_text = build_greetings_response(prompt,session_id)

            return {
                "statusCode": 200,
                "body": json.dumps({"response": response_text})
            }
        elif prompt_type == "gratitude":
            response_text = build_gratitude_response(prompt,session_id)

            return {
                "statusCode": 200,
                "body": json.dumps({"response": response_text})
            }
        elif prompt_type == "ask_details":
            print("Entered ask details")
            if get_movie_details(json_dump,prompt,session_id) is None:
                response_text = "I can't find the movie title in our movies list."

                return {
                    "statusCode": 200,
                    "body": json.dumps({"response": "I can't find the movie title in our movies list."})
                }
            else:
                print("Entered ask details")
                response_text = get_movie_details(json_dump, prompt, session_id)

                return {
                    "statusCode": 200,
                    "body": json.dumps({"response": response_text})
                }
        elif prompt_type == "suggestion":
            response_text = get_movie_suggestions(prompt,json_dump,session_id)
            return response_text
        else:
            return {
                "statusCode": 200,
                "body": json.dumps({"response": "I am not sure what you are asking for"})
            }
    # ------------------ HANDLER FUNCTION ------------------

    def lambda_handler(event, context):
        try:
            body = json.loads(event["body"])
            prompt = body.get("message", "")
            session_id = body.get("session_id")
            print(f"prompt and session id {prompt} and {session_id}")
            store_user_message(session_id, "user", prompt)


            extracted = extract_intents_entities(prompt,session_id)

            store_user_message(session_id,"assistant",extracted)

            json_dump = json.loads(extracted)

            prompt_type = json_dump['type']

            print(f"prompt_type {prompt_type}")



            return route_prompt(prompt,prompt_type,session_id,json_dump)


        except Exception as e:
            print(f"Error: {e}")
            return {
                "statusCode": 500,
                "body": json.dumps({"response": "Sorry I can't understand your query"})
            }

I hosted my chatbot at the link below. Give it a try and see how it’s working. If you have any suggestions or struck anywhere, please feel free to comment. I am open to suggestions.

I named my chatbot Chitrangi. Yeah, I know it’s a weird name. Got it from the old Telugu movies😅
Chitrangi

Chitrangi will be live for 3 days — after that, the EC2 instance goes down, and my AWS bill gets to breathe again 💸😂

Thanks for reading.. Have a great day…

Building a Friends-Themed Chatbot: Exploring Amazon Bedrock for Dialogue Refinement

Salam Shaik — Mon, 06 Jan 2025 18:51:29 +0000

Hi Everyone,

While browsing the datasets in Kaggle, I came across this dataset where dialogues are provided character-wise from the Friends Series.

Friends Sitcom Dataset

The dialogues in the dataset brought back the fun time I had watching the Friends series. There comes the thought of building a chatbot using this dataset.

Initial Thought Process: This is how the initial thought process was, Divide the dialogues character-wise and generate embeddings for each dialogue. Store them in open-search, query them based on the user prompt, and return the most suitable dialog from the index.

Challenges: With the initial thought process converted each dialogue into an embedding using Amazon Bedrock models and stored them in the OpenSearch. However, while querying them, there is a big gap between the user prompt and the returned dialogue.

Solution: Even though it finds the most relevant dialogue based on the user prompt from the available dataset, sometimes it looks completely different. So I thought of adding one more Bedrock model to refine the queried dialogue and provide a relevant response.

Final Conclusion: So what I have done finally is, after querying a similar dialogue, I used a bedrock model which is good in Natural Language Processing to refine the dialogue and provide a relevant response without changing the tone of the dialogue. For this model, I prompted the context with some example prompts.

Finally, the bot came in good shape(To my knowledge 😁).

You can access the bot using this link. Give it a try with your input. I am open to suggestions. Feel free to comment

HERE IS THE LINK: https://friendschat.cloudnirvana.in/
Update: Link is removed to reduce the cost for now

Step-by-step implementation:

Refine the dataset and store the dialogues character-wise
Generate embeddings and store them in OpenSearch
Query the OpenSearch index and refine the received dialogues using the Titan Model
Deploy a Front-End application to chat

Refine the dataset and store the dialogues character-wise:

Download the dataset from Kaggle using the link shared above
Extract the zip file. It contains 3 files. We are gonna use friends.csv file
Use the below script to divide the dialogues character-wise and store them in a folder

import pandas as pd
    import os

    df = pd.read_csv('friends.csv')

    refined_df = df[['text','speaker']]

    characters = ['Monica Geller', 'Joey Tribbiani', 'Chandler Bing', 'Phoebe Buffay', 'Ross Geller', 'Rachel Green']

    output_dir = "char_wise_dialogs"

    os.makedirs(output_dir, exist_ok=True)

    for character in characters:
        char_dialogs = refined_df[refined_df['speaker'] == character]

        file_name = f"{character.replace(' ','_')}_dialogues.csv"
        output_file = os.path.join(output_dir, file_name)

        char_dialogs.to_csv(output_file, index=False)
        print(f"Saved {character}'s dialogues to {output_file}")

Generate Embeddings and store them in OpenSearch:

Visit the OpenSearch service and create a domain with t3.medium.search instance type with 10GB of Storage in a single AZ
Make the OpenSearch domain public and create a master user for login
Use the below script to iterate through the dialogues, Generate embeddings, and store them in an index
We will be using the model amazon.titan-embed-text-v2:0

 import boto3
    import pandas as pd
    import os
    import json
    from opensearchpy import OpenSearch, RequestsHttpConnection, helpers

    # AWS OpenSearch domain details
    OPENSEARCH_HOST = "open search endpoint without https"  # Replace with your endpoint
    INDEX_NAME = "friends-dialogues"

    # Initialize OpenSearch client
    client = OpenSearch(
        hosts=[{'host': OPENSEARCH_HOST, 'port': 443}],
        http_auth=('admin', '******'),  # Replace with your OpenSearch credentials
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection
    )

    # Initialize Bedrock client
    bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')  # Replace with your region

    # Folder containing dialogues
    input_folder = "char_wise_dialogs"

    # Batch size for processing
    BATCH_SIZE = 20

    # Function to generate an embedding using Bedrock
    def generate_embedding(text):
        payload = {
            "inputText": text
        }
        response = bedrock_client.invoke_model(
            modelId="amazon.titan-embed-text-v2:0",
            contentType="application/json",
            accept="application/json",
            body=json.dumps(payload)
        )
        response_body = json.loads(response['body'].read())
        return response_body.get('embedding')

    # Function to index documents in bulk in OpenSearch
    def bulk_index_documents(batch):
        actions = [
            {
                "_index": INDEX_NAME,
                "_source": {
                    "character": doc["character"],
                    "dialogue": doc["dialogue"],
                    "embedding": doc["embedding"]
                }
            }
            for doc in batch
        ]
        helpers.bulk(client, actions)

    # Create the index in OpenSearch (if not already created)
    if not client.indices.exists(INDEX_NAME):
        client.indices.create(index=INDEX_NAME, body={
            "settings": {
                "number_of_shards": 1,
                "number_of_replicas": 1,
                "index": {
                    "knn": True  # Enable kNN search for this index
                }
            },
            "mappings": {
                "properties": {
                    "character": {"type": "keyword"},
                    "dialogue": {"type": "text"},
                    "embedding": {
                        "type": "knn_vector",
                        "dimension": 1024  # Replace with the embedding size
                    }
                }
            }
        })
        print(f"Created index with knn_vector: {INDEX_NAME}")

    # Process each character file
    for file_name in os.listdir(input_folder):
        if file_name.endswith('.csv'):
            # Read character dialogues
            character_file = os.path.join(input_folder, file_name)
            df = pd.read_csv(character_file)

            # Process in batches
            batch = []
            for index, row in df.iterrows():
                dialogue = row['text']
                character = row['speaker']

                try:
                    # Generate embedding for each dialogue
                    embedding = generate_embedding(dialogue)
                    batch.append({"dialogue": dialogue, "character": character, "embedding": embedding})

                    # Process the batch if it reaches the batch size
                    if len(batch) == BATCH_SIZE:
                        # Bulk index the batch into OpenSearch
                        bulk_index_documents(batch)
                        print(f"Indexed batch of size {len(batch)}")
                        batch = []  # Reset the batch
                except Exception as e:
                    print(f"Error processing dialogue: {dialogue[:50]} - {e}")

            # Process any remaining documents in the last batch
            if batch:
                bulk_index_documents(batch)
                print(f"Indexed remaining batch of size {len(batch)}")

Query the OpenSearch index and refine the received dialogues using the Titan Model:

Once the index has our data, Let’s create a script to query the index
Create a Lambda function with Python 3.9
Copy and paste the following code in the Lambda function and provide the necessary permissions
This script will query similar dialogues from the index and pass the received dialogue to the next model
We will be using **amazon.titan-text-express-v1 **model to refine the dialogue and add some relevant data to match the user prompt
Once the Lamba is ready, Create an API in API Gateway and add POST method for sending user message

import boto3
    import json
    from opensearchpy import OpenSearch, RequestsHttpConnection

    # OpenSearch configuration
    OPENSEARCH_HOST = "open search endpoint without https"
    INDEX_NAME = "friends-dialogues"

    # Initialize OpenSearch client
    client = OpenSearch(
        hosts=[{'host': OPENSEARCH_HOST, 'port': 443}],
        http_auth=('admin', '******'),
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection
    )

    # Bedrock clients for embedding and refinement
    bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')

    # Function to generate embedding for user input
    def generate_embedding(text):
        payload = {"inputText": text}
        response = bedrock_client.invoke_model(
            modelId="amazon.titan-embed-text-v2:0",
            contentType="application/json",
            accept="application/json",
            body=json.dumps(payload)
        )
        response_body = json.loads(response['body'].read())
        return response_body.get('embedding')

    # Function to query OpenSearch for similar dialogues
    def query_opensearch(user_embedding):
        query = {
            "size": 1,
            "query": {
                "knn": {
                    "embedding": {
                        "vector": user_embedding,
                        "k": 1
                    }
                }
            }
        }
        response = client.search(index=INDEX_NAME, body=query)
        hits = response["hits"]["hits"]
        if hits:
            return hits[0]["_source"]
        return None

    def refine_response(user_prompt, character, retrieved_dialogue):
        # Construct a guided and controlled prompt
        prompt = (
            f"You are an assistant generating responses for a Friends-themed chatbot. Your task is to:\n"
            f"1. Respond in the tone and style of the specified character.\n"
            f"2. Avoid adding irrelevant details or extra sentences.\n"
            f"3. Ensure responses are casual and character-specific.\n"
            f"4. Exclude any metadata or instructional text in the response.\n\n"
            f"Examples:\n"
            f"- User Prompt: \"What's your favorite food?\"\n"
            f"  Character: Joey Tribbiani\n"
            f"  Dialogue: \"Joey doesn't share food!\"\n"
            f"  Response: \"Joey doesn't share food! But I do love a big meatball sub.\"\n\n"
            f"- User Prompt: \"Let's go for a vacation.\"\n"
            f"  Character: Ross Geller\n"
            f"  Dialogue: \"Spring vacation.\"\n"
            f"  Response: \"Spring vacation! I’ll pack my fossils!\"\n\n"
            f"User Prompt: {user_prompt}\n"
            f"Retrieved Dialogue: \"{retrieved_dialogue}\"\n"
            f"Character: {character}\n\n"
            f"Now, generate a response as the specified character, ensuring it aligns with the dialogue and the user's prompt."
        )

        payload = {"inputText": prompt}

        try:
            # Invoke the Titan Text G1 - Express model
            response = bedrock_client.invoke_model(
                modelId="amazon.titan-text-g1-express:0",
                contentType="application/json",
                accept="application/json",
                body=json.dumps(payload)
            )
            response_body = json.loads(response['body'].read())
            generated_response = response_body['results'][0]['outputText']

            # Post-process the response
            # 1. Remove metadata or prompt details
            if "User Prompt" in generated_response:
                generated_response = generated_response.split("User Prompt")[0].strip()

            # 2. Limit response length
            max_length = 150
            if len(generated_response) > max_length:
                generated_response = generated_response[:max_length].rsplit(" ", 1)[0] + "..."

            # 3. Ensure relevance: Fallback to retrieved dialogue if response is invalid
            if not generated_response or "irrelevant" in generated_response.lower():
                return retrieved_dialogue

            return generated_response

        except Exception as e:
            print(f"Error refining response: {e}")
            # Fallback to the retrieved dialogue in case of an error
            return retrieved_dialogue

        # Construct a controlled and guided prompt
        prompt = (
            f"You are an assistant generating responses for a Friends-themed chatbot. Your task is to:\n"
            f"1. Maintain the original tone and personality of the character.\n"
            f"2. Avoid adding irrelevant details or extra sentences.\n"
            f"3. Ensure the response aligns with the retrieved dialogue.\n"
            f"4. Make responses casual and consistent with the character's personality.\n\n"
            f"Examples:\n"
            f"- User Prompt: \"What's your favorite food?\"\n"
            f"  Character: Joey Tribbiani\n"
            f"  Dialogue: \"Joey doesn't share food!\"\n"
            f"  Response: \"Joey doesn't share food! But I do love a big meatball sub.\"\n\n"
            f"- User Prompt: \"I feel sad.\"\n"
            f"  Character: Chandler Bing\n"
            f"  Dialogue: \"I'm sorry you're feeling this way.\"\n"
            f"  Response: \"I'm sorry you're feeling this way. But remember, I can make you laugh. Want a joke?\"\n\n"
            f"User Prompt: {user_prompt}\n"
            f"Retrieved Dialogue: \"{retrieved_dialogue}\"\n"
            f"Character: {character}\n\n"
            f"Now, generate a response that refines the retrieved dialogue to better match the user's prompt while staying true to the character's tone and avoiding verbosity."
        )

        # Payload for the Bedrock API
        payload = {"inputText": prompt}

        try:
            # Invoke the Titan Text G1 - Express model
            response = bedrock_client.invoke_model(
                modelId="amazon.titan-text-express-v1",
                contentType="application/json",
                accept="application/json",
                body=json.dumps(payload)
            )
            response_body = json.loads(response['body'].read())
            generated_response = response_body['results'][0]['outputText']

            # Post-processing: Ensure the refined response adheres to guidelines
            # 1. Limit response length
            max_length = 150
            if len(generated_response) > max_length:
                generated_response = generated_response[:max_length] + "..."

            # 2. Ensure relevance: If response is missing or irrelevant, fallback to retrieved dialogue
            if not generated_response or "irrelevant" in generated_response.lower():  # Replace with advanced checks if needed
                return retrieved_dialogue

            return generated_response

        except Exception as e:
            print(f"Error refining response: {e}")
            # Fallback to the retrieved dialogue in case of an error
            return retrieved_dialogue

        # Construct a controlled prompt
        prompt = (
            f"You are an assistant generating responses for a Friends-themed chatbot. Your task is to:\n"
            f"1. Maintain the original tone and personality of the character.\n"
            f"2. Avoid adding irrelevant details or extra sentences.\n"
            f"3. Ensure the response aligns with the retrieved dialogue.\n\n"
            f"Here is the context:\n"
            f"- User Prompt: {user_prompt}\n"
            f"- Retrieved Dialogue: \"{retrieved_dialogue}\"\n"
            f"- Character: {character}\n\n"
            f"Now, generate a response that refines the retrieved dialogue to better match the user's prompt while staying true to the character's tone and avoiding verbosity."
        )

        # Payload for the Bedrock API
        payload = {"inputText": prompt}

        try:
            # Invoke the Titan Text G1 - Express model
            response = bedrock_client.invoke_model(
                modelId="amazon.titan-text-express-v1",
                contentType="application/json",
                accept="application/json",
                body=json.dumps(payload)
            )
            response_body = json.loads(response['body'].read())
            generated_response = response_body['results'][0]['outputText']

            # Post-processing: Ensure the refined response adheres to guidelines
            # 1. Limit response length
            max_length = 150
            generated_response = generated_response[:max_length]

            # 2. Ensure relevance by comparing with the retrieved dialogue
            # If generated response deviates significantly, fallback to the retrieved dialogue
            if not generated_response or "irrelevant" in generated_response.lower():  # Placeholder for advanced checks
                return retrieved_dialogue

            return generated_response

        except Exception as e:
            print(f"Error refining response: {e}")
            # Fallback to the retrieved dialogue in case of an error
            return retrieved_dialogue


    # Lambda function handler
    def lambda_handler(event, context):
        try:
            # Extract user input
            body = json.loads(event["body"])
            user_input = body["message"]

            # Generate embedding for user input
            user_embedding = generate_embedding(user_input)

            # Query OpenSearch for the most relevant dialogue
            result = query_opensearch(user_embedding)

            if not result:
                return {
                    "statusCode": 200,
                    "headers": {
                        "Content-Type": "application/json",
                        "Access-Control-Allow-Origin": "*"
                    },
                    "body": json.dumps({"character": "Unknown", "response": "I'm not sure how to respond to that!"})
                }

            # Refine the response
            # refined_response = f"{result['dialogue']}"
            refined_response = refine_response(user_input, result["character"], result["dialogue"])
            print(refined_response)

            # Return the refined response
            return {
                "statusCode": 200,
                "headers": {
                    "Content-Type": "application/json",
                    "Access-Control-Allow-Origin": "https://friendschat.cloudnirvana.in"
                },
                "body": json.dumps({
                    "character": result["character"],
                    "response": refined_response
                })
            }

        except Exception as e:
            return {
                "statusCode": 500,
                "headers": {
                    "Content-Type": "application/json",
                    "Access-Control-Allow-Origin": "https://friendschat.cloudnirvana.in"
                },
                "body": json.dumps({"error": str(e)})
            }

Deploy a Front-End application to chat:

Once everything is ready, let’s build a simple front-end application and host it on the S3 Static web hosting.
If you have your own domain, Add it to Route 53 and point it to the S3 bucket.
Use the below code to create an HTML file and host it in an S3 bucket
Replace the API link with your own API

<!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Friends Chatbot</title>
        <style>
            #chat-container {
                width: 90%;
                max-width: 600px;
                margin: 20px auto;
                font-family: Arial, sans-serif;
            }
            #messages {
                height: 400px;
                overflow-y: auto;
                border: 1px solid #ccc;
                padding: 10px;
                border-radius: 5px;
                background-color: #f9f9f9;
            }
            .message {
                margin: 10px 0;
            }
            .user {
                text-align: right;
                color: blue;
            }
            .bot {
                text-align: left;
                color: green;
            }
            #input-container {
                display: flex;
                margin-top: 10px;
            }
            #user-input {
                flex: 1;
                padding: 10px;
                border: 1px solid #ccc;
                border-radius: 5px;
            }
            button {
                margin-left: 5px;
                padding: 10px 20px;
                background-color: #007bff;
                color: white;
                border: none;
                border-radius: 5px;
                cursor: pointer;
            }
            button:hover {
                background-color: #0056b3;
            }
        </style>
    </head>
    <body>
        <div id="chat-container">
            <div id="messages"></div>
            <div id="input-container">
                <input type="text" id="user-input" placeholder="Type your message...">
                <button onclick="sendMessage()">Send</button>
            </div>
        </div>
        <script>
            const apiEndpoint = "replace with you api gateway endpoint";

            function sendMessage() {
                const inputField = document.getElementById("user-input");
                const message = inputField.value.trim();
                if (!message) return;

                const messagesContainer = document.getElementById("messages");

                // Add user message
                const userMessage = document.createElement("div");
                userMessage.className = "message user";
                userMessage.textContent = message;
                messagesContainer.appendChild(userMessage);

                // Clear input
                inputField.value = "";

                // Send API request
                fetch(apiEndpoint, {
                    method: "POST",
                    headers: { "Content-Type": "application/json" },
                    body: JSON.stringify({ message }),
                })
                    .then((response) => response.json())
                    .then((data) => {
                        // Add bot response
                        const botMessage = document.createElement("div");
                        botMessage.className = "message bot";
                        botMessage.textContent = `${data.character}: ${data.response}`;
                        messagesContainer.appendChild(botMessage);

                        // Scroll to bottom
                        messagesContainer.scrollTop = messagesContainer.scrollHeight;
                    })
                    .catch((error) => {
                        console.error("Error:", error);
                        const botMessage = document.createElement("div");
                        botMessage.className = "message bot";
                        botMessage.textContent = "Error connecting to the chatbot.";
                        messagesContainer.appendChild(botMessage);
                    });
            }
        </script>
    </body>
    </html>

That’s it. Visit my hosted solution using the above link shared and share your feedback through the comments.

Thanks 😀

Building an Event-Driven Architecture for Content Embedding Generation with AWS Bedrock, DynamoDb, and AWS Batch

Salam Shaik — Fri, 20 Dec 2024 06:05:50 +0000

Hello everyone,

In this blog, I’ll walk you through building an event-driven pipeline that converts the contents in Dynamo DB into E*mbeddings, making them searchable via **OpenSearch* for vector search. The goal of this pipeline is to automatically handle the entire process whenever new content is added or existing content is modified in Dynamo DB

This event-driven architecture triggers each step in the process seamlessly, converting newly added or updated items into embeddings and storing them in OpenSearch

One of my key design goals is to minimize the level of coding for connecting services and reduce the reliance on Lambda Functions. Instead, I focused on leveraging AWS Services and Event Bridge to connect and automate the workflow.

This is how the workflow will look like

Before diving into the implementation Let’s prepare the scripts for inserting data and converting contents into embeddings and inserting them into OpenSearch

For this POC I am using this dataset from Kaggle https://www.kaggle.com/datasets/fernandogarciah24/top-1000-imdb-dataset

Code for inserting data into Dynamo DB:

import csv
import boto3
import uuid

# DynamoDB table name
DYNAMODB_TABLE_NAME = "content"

# Initialize DynamoDB client
dynamodb = boto3.resource('dynamodb',region_name="us-east-1")
table = dynamodb.Table(DYNAMODB_TABLE_NAME)

def process_csv_and_insert_to_dynamodb(csv_file_path):
    try:
        # Open and read the CSV file
        with open(csv_file_path, mode='r', encoding='utf-8') as file:
            csv_reader = csv.DictReader(file)
            content_id = 0  # Start content_id from 0

            # Iterate over each row in the CSV
            for row in csv_reader:
                # Prepare item for DynamoDB
                item = {
                    'content_id': content_id,
                    'content_title': row['Series_Title'],
                    'genre': row['Genre'],
                    'overview': row['Overview']
                }

                # Insert item into DynamoDB
                table.put_item(Item=item)

                print(f"Inserted: {item}")
                content_id += 1  # Increment content_id
                if content_id == 65: #stopping at first 65 records for testing
                    break
    except Exception as e:
        print(f"Error: {e}")

# Provide the path to your CSV file
csv_file_path = "movies.csv"

# Call the function
process_csv_and_insert_to_dynamodb(csv_file_path)

Code for converting content into Embeddings:

import boto3
import json
import os
import time
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests.auth import HTTPBasicAuth
from requests_aws4auth import AWS4Auth


credentials = boto3.Session().get_credentials()
aws_auth = AWS4Auth(
    credentials.access_key,
    credentials.secret_key,
    'us-east-1',
    'aoss',  # Service name for OpenSearch Serverless
    session_token=credentials.token
)


QUEUE_URL = "queue url"
OPENSEARCH_ENDPOINT = "open search serverless endpoint"
INDEX_NAME = "contents"
AWS_REGION = 'us-east-1'

# AWS Clients
sqs = boto3.client('sqs', region_name=AWS_REGION)
bedrock_runtime = boto3.client('bedrock-runtime', region_name=AWS_REGION)

# OpenSearch Client
def get_opensearch_client():
    return OpenSearch(
        hosts=[{'host': OPENSEARCH_ENDPOINT, 'port': 443}],
        http_auth=aws_auth,
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection
    )

# Function to poll messages from SQS
def poll_sqs_messages():
    response = sqs.receive_message(
        QueueUrl=QUEUE_URL,
        MaxNumberOfMessages=10,  # Fetch up to 10 messages
        WaitTimeSeconds=10
    )
    return response.get('Messages', [])

# Function to call Amazon Titan for embedding generation
def generate_embeddings(text):
    payload = {
        "inputText": text
    }
    response = bedrock_runtime.invoke_model(
        modelId="amazon.titan-embed-text-v1",
        contentType="application/json",
        accept="application/json",
        body=json.dumps(payload)
    )
    response_body = json.loads(response['body'].read())
    return response_body.get('embedding')

# Function to store embeddings in OpenSearch
def store_embeddings_in_opensearch(content_id, embedding, content_title,genre):
    client = get_opensearch_client()
    print("got the client")
    document = {
        "title": content_title,
        "overview": embedding,
        "genre": genre,
        "content_id": content_id
    }
    print("got the document")
    response = client.index(
        index=INDEX_NAME,
        body=document
    )
    print("got the response")
    return response

# Main Processing Function
def main():
    print("Starting Batch Job to process SQS messages...")

    messages = poll_sqs_messages()
    if not messages:
        print("No messages found in the queue. Exiting.")
        return

    for message in messages:
        try:
            body = json.loads(message['Body'])
            db_record = body['dynamodb']['NewImage']
            content_title = db_record['content_title']['S']
            overview = db_record['overview']['S']
            content_id = db_record['content_id']['N']
            genre = db_record['genre']['S']

            # Generate Embedding
            embedding = generate_embeddings(overview)
            print(f"Generated embedding for content: {content_title}")

            # Store in OpenSearch
            store_embeddings_in_opensearch(content_id, embedding, content_title,genre)
            # Delete message from SQS after successful processing
            # sqs.delete_message(
            #     QueueUrl=QUEUE_URL,
            #     ReceiptHandle=message['ReceiptHandle']
            # )
        #     print(f"Deleted message {document_id} from SQS.")

        except Exception as e:
            print(f"Error processing message: {str(e)}")
            continue

    print("Batch Job completed successfully.")

if __name__ == "__main__":
    main()

Create a Dockerfile and Create a Docker image to push it to ECR

FROM python:3.9-slim
    USER root

    # Install dependencies
    RUN pip install boto3 opensearch-py requests requests-aws4auth

    # Copy the script into the container
    COPY process_embeddings_batch.py /app/process_embeddings_batch.py

    # Default command
    CMD ["python", "/app/process_embeddings_batch.py"]

Visit the ECR section and Click on Create a Repo. Provide a name for the Repo and Click on Push Commands button once the Repo is Created
Use the push command to build and push the docker image

Step-by-step implementation:

Create a DynamoDB table and enable streams
Create a SQS Queue for holding the records of DB
Create an Event Bridge Pipe to Connect Dynamo DB and SQS
Create a Cloud Watch Alarm When messages in the Queue exceed more than 50 messages
Create a AWS Batch job definition to run jobs
Create a State Machine to Submit a job to AWS Batch
Create a rule in Even Bridge to listen for Alarm and Trigger the Step Function state machine
Create an OpenSearch Serverless collection and index

Create a DynamoDB table and enable streams:

Visit DynamoDB Service, click on the Create Tables button, Provide a name for the table, make content_id as the partition key, and type as Number

Visit the Exports and Streams tab and enable the Streams

Create a SQS Queue for holding the records of DB:

Visit the SQS service from the AWS console and click on Create a Queue. There is no need for a message order. So Go for the standard queue. Provide a name for the Queue and click on Create Queue

Create an Event Bridge Pipe to Connect Dynamo DB and SQS:

Visit Event Bridge Service and click on Create pipe
Select Source as DyanoDB and Target as SQS

Click on Create Pipe. This will push the dynamo db records to SQS

Create a Cloud Watch Alarm When messages in the Queue exceed more than 50 messages:

Visit the cloud watch service in the AWS console. From the Side panel Click on All Alarms and Click on Create Alarm
Select SQS ApproximateNumberOfMessagesVisible as the metric

Keep the conditions like greater than 50 messages

Skip the actions, provide a name for the alarm, and click on Create Alarm

Create a AWS Batch job definition to run jobs:

Click on Create a job definition, select Fargate as Orchestration type provide storage as 21GB and keep the rest of the fields as it is and visit the next section

Copy the URI of the ECR image and paste it here which we created at the start of the blog and keep the rest of the fields as it is

Keep the user as root and select logging as AWS Logs

Review everything and create the job definition
We also need to create a job queue and computing environment. Provide basic details and create a job queue with a compute environment
Here is how to create a computing environment

Visit the Job Queue and click on create a job queue with the environment we created

Now We have Job Definition and Job Queue with us. Let’s create a state machine to trigger jobs

Create a State Machine to Submit a job to AWS Batch:

Visit the step functions service and click on Create a state machine.
Select submit job state and select job definition and queue we created above
For the sample, I am using only the one required state. Keep adding more states based on your needs

Click on the submit job state and select job queue and definition from the right-side panel

Now save the state machine and Let’s connect the SQS and this state machine based on the cloud watch Alarm We Created above

Create a rule in Even Bridge to listen for Alarm and Trigger the Step Function state machine:

Visit the Event Bridge service and select Rules from the side menu
Provide a name for the rule and select an event with a pattern

In the event pattern paste the following JSON

 {
      "source": ["aws.cloudwatch"],
      "detail-type": ["CloudWatch Alarm State Change"],
      "detail": {
        "state": {
          "value": ["ALARM"]
        },
        "alarmName": ["sqs-threshold-alarm"]
      }
    }

Select the target as the state machine we created and click on Create rule

Create an OpenSearch Serverless collection and index:

Visit the Serverless section from OpenSearch Service, Click on Create a Collection, Provide the details like this, and click on Create collection

Once the collection is created Click on Create an index with the following configuration

That’s it We have everything ready and connected to every service. Let’s run the inserting python script to insert data into Dynamo DB and check everything is Trigged well or not.

Note: Once the Threshold is reached it will take some time to get the state machine to be triggered

These are the jobs that got triggered whenever I inserted data into Dynamo DB

Verify whether records are inserted or not in the index. You can check the records count or dashboard to see the records

Once the data is ready you can build a vector search engine by following my previous article here

Building a vector search Engine

Please let me know if you are struck at any place and need any assistance through the comment section. Thanks. Have a great day

Building a movie suggestion Bot using AWS Bedrock, Amazon Lex, and OpenSearch

Salam Shaik — Tue, 26 Nov 2024 22:53:15 +0000

Hi Everyone,

For the last few weeks, I have been trying to build a chatbot that suggests movies based on user prompts.

I am not an expert in the AI/ML field. With the knowledge I have in this field and the AWS services I know I tried to build this chatbot by using the following AWS Services

Bedrock(Titan and Claude models)
OpenSearch
AWS Lex
Lambda

Whenever I try to spell AWS Lex it reminds me of the villain Lex Luthor from the Superman series

Let me introduce each service I used and the role of that service in building this chatbot

BedRock: This service will provide the models by Amazon and several third parties. From the available models, we will be using the Claude v2 model for processing the User Prompts and the Titan model for generating the embedding for the movie content we have.

If you don't know what is embedding, think of it as an array with numbers which helps the ML Model to understand the relationship between real-world data.

OpenSearch: We will be using this service to store the embeddings generated by the bedrock Titan model and to query those embeddings to find similar movies

AWS Lex: This service will help us to build a chatbot. This will be a bridge that takes the user prompt and triggers the movie-suggesting logic based on the intent detection and will respond back to the user with movie suggestions.

Lambda: This is where the core logic executes, which will take the user prompt from the Lex Bot, query the OpenSearch index, and respond back to the Lex Bot with movie Suggestions

Let’s dive into the implementation

This is the movie dataset I am using for this task https://www.kaggle.com/datasets/kayscrapes/movie-dataset

Implementation Steps:

Create an OpenSearch Cluster and an index for dumping Embeddings
Generate Embeddings and dump them into the OpenSearch Index
Implement a Lambda function to query the OpenSearch Index
Create a chatbot and connect it to the Lambda

Create an OpenSearch Cluster and an index for dumping Embeddings:

Visit the OpenSearch service in the AWS console and click on the Create Domain button
Provide a name for the domain, Choose the Standard Create option and the dev/test template like below

Deploying it in a Single AZ without any standby

Choose General Purpose instances and select t3.medium.search instance for this task and with 1 node. Per node, I gave 10GB of storage

I am giving public access with IPV4 only for this task

Enable fine-grained access and create a master user like below

Change the access control to allow all users to access the endpoint for this task

Keep the rest of the options as it is and click on the create endpoint button. It will take some time for the endpoint to be created
Once the endpoint is created, access the OpenSearch dashboard from the console and provide the master user credentials to access it
Visit the Dev Tools option from the side menu and paste the following code to create the index movies

PUT movies
    {
      "settings": {
        "index.knn": true
      },
      "mappings": {
        "properties": {
          "title_org": {
            "type": "text"
          },
          "title": {
            "type": "knn_vector",
            "dimension": 1536
          },
          "summary": {
            "type": "knn_vector",
            "dimension": 1536
          },
          "short_summary": {
            "type": "knn_vector",
            "dimension": 1536
          },
          "director": {
            "type": "knn_vector",
            "dimension": 1536
          },
          "writers": {
            "type": "knn_vector",
            "dimension": 1536
          },
          "cast": {
            "type": "knn_vector",
            "dimension": 1536
          }
        }
      }
    }

Generate Embeddings and dump them into the OpenSearch Index:

As we have the Open Search cluster and index ready. Let’s generate the embeddings and dump them into the index

Download the movie dataset using this link https://www.kaggle.com/datasets/kayscrapes/movie-dataset
Extract the Zip file and have a look at the CSV file which contains movie data
Create a gen_emb.py with the following code to generate embeddings and dump them into the index. Install Pandas, Boto3, Botocore, and Opensearch-py libraries using PIP

import boto3
    import json
    import botocore
    import pandas as pd

    from opensearchpy import OpenSearch


    ##opensearch configs
    host = 'paste open-search-endpoint here'  
    port = 443 
    auth = ('username', 'password')
    index_name = "movies"


    ##creating opensearch client
    client = OpenSearch(
        hosts=[{'host': host, 'port': port}],
        http_auth=auth,
        use_ssl=True,
        verify_certs=True
    )


    ##filterting columns using pandas
    df = pd.read_csv('hydra_movies.csv')
    refined_df = df[["Title","Summary","Short Summary","Director","Writers","Cast"]]
    refined_df.info()
    refined_df = refined_df.fillna("Unknown")




    ##connecting to bedrock runtime
    session = boto3.Session(region_name='us-east-1')
    bedrock_client = session.client('bedrock-runtime')


    ##generate embedding using titan model
    def generate_embedding(value):
        try:
            body = json.dumps({"inputText": value})
            modelId = "amazon.titan-embed-text-v1"
            accept = "application/json"
            contentType = "application/json"

            response = bedrock_client.invoke_model(
                body=body, modelId=modelId, accept=accept, contentType=contentType
            )
            response_body = json.loads(response.get("body").read())

            return response_body
        except botocore.exceptions.ClientError as error:
            print(error)

    ##creating a document to insert
    def create_document(title,title_emb,summary_emb,short_summary_emb,director_emb,writers_emb,cast_emb):
        document = {
            'title_org':title,
            'title':title_emb['embedding'],
            'summary':summary_emb['embedding'],
            'short_summary':short_summary_emb['embedding'],
            'director':director_emb['embedding'],
            'writers':writers_emb['embedding'],
            'cast':cast_emb['embedding']
        }

        insert_document(document)

    ##inserting document into opensearch
    def insert_document(document):
        client.index(index=index_name, body=document)


    ##iterating thorough each row in data frame created through pandas and requesting embedding
    for index, row in refined_df.iterrows():
        title = row['Title']
        summary = row['Summary']
        short_summary = row['Short Summary']
        director = row['Director']
        writers = row["Writers"]
        cast = row["Cast"]
        title_embedding = generate_embedding(title)
        summary_embedding = generate_embedding(summary)
        short_summary_embedding = generate_embedding(short_summary)
        director_embedding = generate_embedding(director)
        writers_embedding = generate_embedding(writers)
        cast_embedding = generate_embedding(cast)
        create_document(title,title_embedding,summary_embedding,short_summary_embedding,director_embedding,writers_embedding,cast_embedding)
        print(f"inserted:{index}")

Run this file using Python, visit the Query WorkBench from the OpenSearch console side menu, and use the SQL queries to see whether embeddings are inserted or not

Implement a Lambda function to query the OpenSearch Index:

Now that we have the index ready with embeddings, Let’s create a Lambda function to query the index for getting similar movies

Visit the Lambda service from the AWS Console and click on the Create function button
Give a name to the function and choose the runtime as Python 3.9
Paste the following code in the Lambda function code

from opensearchpy import OpenSearch
    import json

    import boto3
    import botocore


    ##open search configs
    host = 'paste-open-search-endpoint-here'  
    port = 443 
    auth = ('username', 'password')
    index_name = "movies"

    ##bedrock connection
    session = boto3.Session(region_name='us-east-1')
    bedrock_client = session.client('bedrock-runtime')



    ##creating opensearch client
    client = OpenSearch(
        hosts=[{'host': host, 'port': port}],
        http_auth=auth,
        use_ssl=True,
        verify_certs=True
    )

    def lambda_handler(event,context):

      print("Event: ", event)

      prompt = event['inputTranscript']
      print(f"received prompt is{prompt}")
      generate_embedding(prompt)
      embedding = generate_embedding(prompt)
      response = run_query(embedding)
      return response

    #generating embedding for user input 
    def generate_embedding(value):
        try:
            body = json.dumps({"inputText": value})
            modelId = "amazon.titan-embed-text-v1"
            accept = "application/json"
            contentType = "application/json"

            response = bedrock_client.invoke_model(
                body=body, modelId=modelId, accept=accept, contentType=contentType
            )
            response_body = json.loads(response.get("body").read())

            return response_body['embedding']
        except botocore.exceptions.ClientError as error:
            print(error)




    def run_query(query_embedding):
        try:
            # Define the query
            query = {
        "query": {
            "bool": {
                "should": [{
                        "knn": {
                            "title": {
                                "vector": query_embedding,
                                "k": 100
                            }
                        }
                    },
                    {
                        "knn": {
                            "summary": {
                                "vector": query_embedding,
                                "k": 100
                            }
                        }
                    },
                    {
                        "knn": {
                            "short_summary": {
                                "vector": query_embedding,
                                "k": 100
                            }
                        }
                    },
                    {
                        "knn": {
                            "cast": {
                                "vector": query_embedding,
                                "k": 100
                            }
                        }
                    },
                    {
                        "knn": {
                            "writers": {
                                "vector": query_embedding,
                                "k": 100
                            }
                        }
                    },
                    {
                        "knn": {
                            "director": {
                                "vector": query_embedding,
                                "k": 100
                            }
                        }
                    }

                ]
            }
        }
    }

            # Run the query
            response = client.search(index=index_name, body=query)

            # Extract and print the results
            hits = response['hits']['hits']
            titles = ", ".join(hit['_source']['title_org'] for hit in hits)



            return respond_with_message(f"Based on your prompt, I suggest the following movies {titles}")

        except Exception as e:
            print(f"Error running query: {e}")
            return None

    def respond_with_message(message):
        """
        Helper function to create a response message for Lex V2.
        """
        return {
            "sessionState": {
                "dialogAction": {
                    "type": "Close"
                },
                "intent": {
                    "name": "SuggestMovie",
                    "state": "Fulfilled"
                }
            },
            "messages": [
                {
                    "contentType": "PlainText",
                    "content": message
                }
            ]
        }

This Lambda function needs the Opensearch-py, Boto3, and Botocore libraries.
Install all the libraries in a folder named python and keep that folder in another folder with the name lambda-layer. Use the following command to install libraries in a specific folder

pip install opensearch-py boto3 botocore -t .
Zip the entire folder Create a Lambda Layer and attach it to the Lambda function so that the function can access the libraries
Make sure to add necessary permissions to the Lambda role to invoke the bedrock models

Create a chatbot and connect it to the Lambda:

As we have the Lambda and cluster ready with the data and queries, Let’s create the bot

Visit the Lex service from the AWS Console
Click on the Create bot button and Give a name for the bot. Choose the Generative AI Creation Method

Choose to create an IAM role and choose NO for COOPA for now

Leave the other fields as it is and click on the Next Button. Keep the Language options as it is and provide the description for the bot to be created.

Click on the Done button and wait for some time for the bot to be created
Once the bot is created visit the Intent section of the bot from the side under All Languages
Click on the Add Intent button and choose Empty Intent from the dropdown
Provide a name for the intent Here I am giving SuggestMovie as the title
Provide the Utterance generation description like below

Add the sample Utterance like below

Scroll down to the end of the page Choose the Lambda function trigger

Here what we did is we have added a sample intent to the bot to identify any prompt that is similar to the prompt we provided and trigger the Lambda function we created.
Click on the save intent button and come back to the previous section

Attaching Lambda to the bot

From the side menu under the Deployment section click on Aliases and choose the aliases that are listed
Click on the Language you created under the Language section

Choose the Lambda we created and Click on Save

Now Click on the Aliases from the Side Menu and click on the Build Button to build the bot with all the changes

Once the build is successful click on the Test button and provide the prompts like this

You can see suggestion prompts like this. Now edit the fallback intent like this so that we will get some good response from the bot when user prompts other than movie suggestions

After editing this fallback intent also build the bot once and give some random prompt to the bot and see what it is providing

This is how the bot will respond if the prompt does not belong to a movie suggestion.

That’s it for now. I will come up with more customizations to the bot with different features in my upcoming blogs.

Meanwhile, let me know if you face any issues while trying out this task, and Feel free to comment if there are any suggestions. I am open to suggestions. Have a good day

Thanks

Kubernetes Part 2: A Guide to Application Deployment, Autoscaling, and Rollout Strategies

Salam Shaik — Tue, 19 Nov 2024 16:17:40 +0000

Hi everyone,

This is the second installment of the Kubernetes series. You can find the previous article using the below link
Introduction to Kubernetes and AWS EKS — Part 1

Tasks we are going to do in this article:

Create a docker image of a web app and upload it to AWS ECR
Pulling docker image from AWS ECR and deploying it in EKS Cluster
Scaling your application basic on traffic
Different deployment strategies

Let’s dive into the article

Creating a docker image and uploading to AWS ECR:

I am using an open-source HTML/CSS website in this article as a deployment application
You can download the application using this link codewithsadee/portfolio: Fully responsive personal portfolio website
This is a web application. We’re gonna deploy this in our Kubernetes cluster
For deploying it, we need to make a docker image of this app
So clone the project using the below command

git clone https://github.com/codewithsadee/portfolio
Create a Dockerfile with below code at the root directory of the application

FROM nginx:alpine COPY . /usr/share/nginx/html EXPOSE 80
Here I am using the nginx: alpine image for the application deployment. We will serve the web application using the Nginx server
The second line will copy every file from the current folder to the path specified. Here it is /usr/share/nginx/html
This is the default path of nginx server, it will serve the files inside that folder in port 80
The third line will expose port 80 from the container

Let’s build the docker image and upload it to the AWS ECR

Visit the ECR service from the AWS search bar
Click on the Create Repository button
Provide a name for the repository, keep the remaining fields as it is, and click on Create
Once the repo is created. In the top right corner, you can see the button view push commands. Click on it and follow the instructions one by one

After running all the commands you can see the list of the images you uploaded like this

Deploying docker image to EKS cluster from ECR:

Create an EKS cluster and a node group
You can find how to create an EKS cluster and a node group in my previous article mentioned at the start of the article
The only change I am doing here is, I am changing the EC2 machine type from t3.medium to t3.small in the node group
Keeping the Desired, Minimum nodes as 1 and Maximum size of 2.

Point your kubectl to use the EKS cluster we created using the following command

aws eks --region <region-code> update-kubeconfig --name <cluster-name>
To check whether the kubectl is correctly configured or not. Use the following commands

kubectl get nodes
The command will return the nodes list created in the cluster.

Installing the metric server for Auto Scaling:

Metric server is a light-weight Kubernetes add-on that provides resource usage metrics
Use the below command to install the metric server in the cluster

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Once the metric server is ready, Let’s deploy the application
Create a Deployment.yaml file with the following code. This code will deploy our application in the cluster

apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: sample
    spec:
      replicas: 1 # Start with one replica
      selector:
        matchLabels:
          app: sample
      template:
        metadata:
          labels:
            app: sample
        spec:
          containers:
            - name: my-app-container
              image: <account-id>.dkr.ecr.us-east-1.amazonaws.com/sample:latest
              ports:
                - containerPort: 80
              resources:
                requests:
                  cpu: "100m" # Minimum CPU required for a pod
                limits:
                  cpu: "500m" # Maximum CPU allocated for a pod

Run the following command to deploy the application

kubectl apply -f deployment.yaml

Let’s create a service load balancer to expose the application outside of the cluster
Create another file with the name service.yaml which will create a load balancer that exposes 80 port to outside and points incoming traffic to the pod 80 port

apiVersion: v1
    kind: Service
    metadata:
      name: sample-service
    spec:
      type: LoadBalancer
      selector:
        app: sample
      ports:
        - protocol: TCP
          port: 80
          targetPort: 80

Once service is also deployed. Use the below command to see the list of the services deployed

kubectl get services
Output will look something like this

Copy the External IP of the service we deployed and try to access the applications using the browser. It should display the web page of the application we deployed
If it’s not loading wait till the load balancer comes online and is in a working state

Scaling the deployment:

Horizontal Pod AutoScaler(HPA):

It is a Kubernetes feature that helps in adjusting the pod replicas based on metrics like CPU usage
This will use the metric server for collecting the metrics, and based on those metrics it will try to adjust the pod replicas
Create a file name scale.yaml file with the following code

apiVersion: autoscaling/v1
    kind: HorizontalPodAutoscaler
    metadata:
      name: sample-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: sample
      minReplicas: 1
      maxReplicas: 10
      targetCPUUtilizationPercentage: 50f

This will deploy the pod auto-scaler with our deployment as target and with minimum replicas as 1 and max replicas as 10 based on CPU utilization reaching more than 50%
Deploy this HPA using the following command

kubectl apply -f scale.yaml
Wait for a few minutes till it is deployed. After that run the following command to check the HPA deployment

kubectl get hpa
The output should look like this

Let’s hit the endpoint with multiple requests and try to see whether auto-scaling works or not
I am using J-meter to create the requests

I am creating 10k requests in 10 seconds to increase the CPU utilization so that auto-scaling will trigger and try to create more replicas
Use the following commands to check the CPU-Utilization percentage while creating the requests

kubectl get hpa

Once it is in peaks you can see Utilization reached more than the threshold of 50%

Let’s see how many pods are running using kubectl get pods command

You can see that 3 pods are running to manage the traffic. Wait for at least five minutes, so that it will auto-scale down the pod replicas
After 5 minutes I ran the kubectl get pods command again to check the pods count

That’s it. Like this, we can manage the scaling up/down of the pod replicas for managing the incoming traffic

Deployment Strategies:

Kubernetes supports different types of deployments for rolling out updates

Some of the methods are

Rolling update(Default):

It will replace the pods gradually for a smooth transition

Recreate:

It will terminate all the current running pods and then create new updated pods

Canary Deployment:

It will deploy the new version to the subset of the user. Like redirecting some of the users to the new deployment and keeping the other users to the old deployment until everything is stable

Blue-green deployment:

Run two complete environments blue for the old and green for the new. Once everything is verified in the green environment traffic will be redirected to the Green environment from the blue environment

These are all the different deployment methods that Kubernetes supports for rolling out the updates

Will try to practice these deployments in the upcoming article. That’s it for this article

Please let me know if there are any mistakes or suggestions to improve. I am open to suggestions

Thanks and Have a good day

Note: Before leaving try to delete the cluster, node group, and the load balancer you created to avoid un-necessary charges

Introduction to Kubernetes and AWS EKS - Part 1

Salam Shaik — Wed, 30 Oct 2024 08:46:12 +0000

Hi everyone,

This article will provide you with an overview of what Kubernetes is and how to start working with Kubernetes

Before learning Kubernetes, You need to know about Docker. Kubernetes is nothing but an Open-Source container Orchestration Technology

Docker:Docker is an open-source platform that helps automate application deployments, scaling, and management.

Before docker developers used to face problems like “It worked on my system but not on production”. Docker helped to resolve this kind of problem by making your application and all its dependencies as a package. You can run this package anywhere in any system without any issues. However, the application worked on your system. It works the same way in other systems too

Docker Image:As I explained above docker will create a package of your application with everything it needs like dependencies, env files, config files, etc. You can upload this docker image to any docker repo like DockerHub, or ECR. From the repo, you can pull the image into your system and run it in a container

Container: A container is an isolated environment where your docker image runs. Each container will be isolated from one another. But they share the same host OS Kernal making them lightweight and efficient when compared to virtual machines

Creating a docker image:

For creating a docker image of your application, create a Dockerfile in your application folder
A Sample Dockerfile looks like this with possible commands

# 1. Specify the base image
    # This defines the OS and pre-installed packages the Docker container will be built on.
    FROM ubuntu:20.04

    # 2. Set environment variables
    # ENV sets environment variables that can be used by applications inside the container.
    ENV APP_HOME=/usr/src/app
    ENV LANG=C.UTF-8

    # 3. Install dependencies
    # RUN executes commands inside the container to install software packages.
    RUN apt-get update && apt-get install -y \
        python3 \
        python3-pip \
        curl \
        && rm -rf /var/lib/apt/lists/*

    # 4. Create a directory in the container file system
    # WORKDIR sets the working directory for any subsequent RUN, CMD, ENTRYPOINT, or COPY commands.
    WORKDIR $APP_HOME

    # 5. Copy files from host to the container
    # COPY copies files from your local machine (host) to the container's file system.
    COPY . .

    # 6. Install Python dependencies
    # RUN can also be used to install specific project dependencies.
    RUN pip3 install --no-cache-dir -r requirements.txt

    # 7. Expose a port
    # EXPOSE documents which ports the container listens on during runtime.
    EXPOSE 5000

    # 8. Define the default command to run the application
    # CMD specifies the default command that gets executed when running a container from the image.
    # It can be overridden when running the container.
    CMD ["python3", "app.py"]

    # 9. Set up an entry point
    # ENTRYPOINT defines a command that will always run when the container starts.
    # Unlike CMD, ENTRYPOINT cannot be easily overridden.
    ENTRYPOINT ["python3", "app.py"]

    # 10. Add a health check
    # HEALTHCHECK defines how Docker should check the health of the container.
    HEALTHCHECK --interval=30s --timeout=5s CMD curl -f http://localhost:5000/health || exit 1

    # 11. Volume to persist data
    # VOLUME allows sharing of a directory between the host and the container, ensuring persistent data.
    VOLUME ["/data"]

    # 12. Labels
    # LABEL adds metadata to your image, such as version, description, or maintainer information.
    LABEL version="1.0" description="Sample Python Flask App" maintainer="you@example.com"

A simple Example Dockerfile

 FROM python:3.9-slim
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install -r requirements.txt
    COPY . .
    EXPOSE 8000
    CMD ["python", "app.py"]

Let’s understand what the file says:

The base image of the docker container is Python 3.9
WORKDIRwill set the working directory as /app in the container
COPYwill copy the requirements.txt file to the dot which means the current directory
RUNwill run the command specified here to install the requirements.xml file
COPY . . this will copy everything from the current directory to the working directory
EXPOSE will expose port 8000
The final CMDwill run the command python app.pyto start script execution

Commands for creating a docker image and running it in a container

docker build -t <image-name>:<tag> <path-to-dockerfile> 
docker run -d -p <host-port>:<container-port> --name <container-name> <image-name>:<tag>

For storing and versioning your Docker images you can use Docker Hub or AWS ECR

Here comes Kubernetesalso known as K8s

Kubernetes:

Kubernetes is an open-source platform that helps automate deployment and scaling of your containerized applications. Kubernetes manages the lifecycles of containers across multiple machines, ensuring they are running as expected

Key Concepts in Kubernetes:

Pods: A pod is the simplest and smallest Kubernetes object. It represents a single instance of a running process in the cluster and contains one or more tightly coupled containers

Node: A Node is a working machine in Kubernetes where pods are deployed

Master Node: A master node will manage all cluster operations, such as pod deployment, scaling, etc.
Worker Node: This is where the pod runs. You can consider this also a machine that is capable of running docker containers

Deployment: Consider this as a YAML file where you will define how the deployment should happen like how many replicas you need, how the container should be, and what image you need to use in the container. You will declare all this in a YAML file. Based on these instructions deployments will happen. A sample YAML deployment file

 apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-app-deployment
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: my-app
      template:
        metadata:
          labels:
            app: my-app
        spec:
          containers:
          - name: my-app-container
            image: my-app-image:v1.0
            ports:
            - containerPort: 8080

Service: Usually services will provide endpoints to access the set pods. It will expose types of endpoints like cluster IP, Node Port, Load Balancer, etc, A sample Service YAML file

 apiVersion: v1
    kind: Service
    metadata:
      name: my-app-service
    spec:
      selector:
        app: my-app
      ports:
      - protocol: TCP
        port: 80
        targetPort: 8080
      type: ClusterIP

NameSpaces: By using namespaces you can divide your cluster into different parts and run different workloads in each of them like one for dev, one for staging, and one for production.

Let’s get inside the nodes.

Master Node: also known as Control Plane. These are the major components in the control plane

API Server:It is the gateway to the Kubernetes. It exposes the APIs by which we can interact with the control plane
etcd: This is a key-value-based storage that holds all the data about the cluster what application are running in the cluster and where they are running etc...
Scheduler: It will decide in which node pods need to be deployed. Based on the resources availability of nodes, it assigns the workloads
Controller Manager: It makes sure everything is under control if you specified I need 3 instances to maintain the application availability it will make sure to maintain all 3 instances. If one goes down it will spawn a new one.

Worker Node: where the actual work happens

Kubelet: It will take instructions from the control plane and make sure pods are running as instructed by the control plane
Container Runtime: This is the software responsible for running the containers. It will pull the images and run them
Kube-Proxy: It maintains the network operation in the cluster, allowing communication between the pods in the cluster and directing requests to correct pods

I hope you have a basic understanding of Kubernetes now and how it works

How to interact with Kubernetes cluster:

Kubectl: It’s a tool you can install in your system to interact with the Kubernetes cluster control plane. It’s available for all operating systems. You can download this tool using this link Install Tools | Kubernetes

AWS EKS(Elastic Kubernetes Service):

AWS offers a managed Kubernetes service called EKS, allowing you to easily create Kubernetes clusters with minimal input and deploy your containerized applications.

When you create a cluster in EKS, you are actually creating a control plane. Once the cluster is ready you can create nodes based on your need or you can use AWS Fargate for the serverless option

You will be charged $0.10 per hour for every cluster you create. On top of that whatever computing resources you create in the cluster, it will be charged based on that like EC2 pricing, storage, etc.

You can deploy multiple applications in a single cluster by taking advantage of NameSpacesin the Kubernetes.

Creating and deploying an application in the EKS cluster:

Search for EKS in the AWS Console search bar and visit the EKS home page
Click on Create a Cluster

Give a name to the cluster and create an IAMrole that allows the cluster to perform the operations it needs.
I created an IAM role named eks-testing with a policy named AmazonEKSClusterPolicy attached to it. This permission almost covers all the needs of your cluster for now. Change or create a new policy based on your needs

I took the extended support, you chose based on your need. For the cluster access, I opted for EKS API
EKS API: It will allow access to only IAM users and roles only
EKS API and ConfigMap: It will allow access to both IAM users and roles and aws-auth config map. This map is a Kubernetes resource to map IAM users and roles to Kubernetes users and roles
ConfigMap: This mode restricts the cluster to authenticate IAM principals only from the aws-auth ConfigMap. In this case, IAM users or roles must be manually mapped to Kubernetes roles in the ConfigMap before they can interact with the cluster
Leave the rest of the fields in the config cluster section as it is and click on the Next button to proceed to the network section

Select the VPC you want the cluster to be deployed and the subnets and select Security group. Please allow the ports you need in the security group

For the cluster endpoint access, I am choosing both public and Private so that the cluster endpoint can be accessed from outside the VPC and worker nodes traffic will be within the VPC. Click on next for the next section
From the observability section, I am not enabling anything for now. If you want logs for the cluster you can enable whatever you need

I am going with the default selection here. We need core DNS, Kube-proxy, and VPC CNI for basic cluster functioning
You can leave add-on settings as it is and click on the next button to review the cluster creation
If everything is looking good, click on Create. Wait for a few minutes for the cluster to be created
Now that we have the cluster ready means we have control plane ready. Let’s create the worker nodes

Click on the Compute tab and click on the Add Node Group button to create the worker nodes

. Give a name for the node group and select the role that has the following policies attached

These policies will allow the node group to pull images from ECR, Enable pod networking with the Amazon VPC CNI plugin, and grant worker nodes permission to interact with the control plane and cloud watch
Leave the rest of the fields as it is and proceed to compute and scaling configuration

I am taking Amazon Linux AMI and an on-demand instance which is t3.medium and disk size of 10GB.

I am keeping the desired, minimum as 1 and a max of 2 nodes and the maximum unavailable as 1
Click on next to select the subnets

Review everything and click on the Create button. It will take a few minutes to nodes come online
Now that we have our control plane and worker nodes ready, Let’s connect to the cluster and deploy an application

Connect and deploy the application to the cluster:

For connecting to the cluster create a user in the IAM console and download the creds to your system and configure your using AWS CONFIGURE command
Click on the Access tab from the cluster and click on the Create access entry button to allow the user you created in the above step

select the IAM user from the first input box and keep the type as standard for now. Don’t forget to click on Add Policyand then the Next Button

From the policy, I am allowing my user to have EKS cluster admin level access and create the access

Connecting to your cluster:

Use this command to configure the EKS

aws eks - region update-kubeconfig - name
Replace region and cluster name with your values
Then run the following command whether the connection is successful or not

kubectl get nodes
It will display the node you created previously like this

Deploying the app:

Create a namespace demo for deploying our application using this command

kubectl create namespace demo
We will create one deployment file with nginx image running in the container with port 80 open
Then we will create a service file to create a load balancer that targets the container port 80 and exposes the port 80 to the public
Create a file name nginx-deployment.yaml file and paste the following code

 apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
      namespace: demo # Specify the namespace if created earlier
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.21.6
            ports:
            - containerPort: 80

Create a nginx-service.yaml file and paste the following code

 apiVersion: v1
    kind: Service
    metadata:
      name: nginx-service
      namespace: demo # Specify the namespace
    spec:
      type: LoadBalancer
      selector:
        app: nginx
      ports:
      - protocol: TCP
        port: 80
        targetPort: 80

Now run the following commands to deploy and nginx and expose the 80 port

kubectl apply -f nginx-deployment.yaml
kubectl apply -f nginx-service.yaml
It will create 3 pods in the cluster under the demo namespace. Run the following commands to see whether pods are running or not

kubectl get pods
You can see 3 pods in a running state like this

Run the following command to get the external IP to access the nginx server home page

kubectl get service nginx-service -n demo
Output should look something like this

Copy the external IP and make a URL like this http://externalIP hit the URL from your browser
You should see the nginx welcome page like this

That’s it. We created an EKS cluster deployed an NGINX container and verified the deployment.

In the upcoming articles, we will deep dive into Kubernetes and EKS. Till then have a good time. Thanks

Note: After done with this task please delete the node group and cluster to avoid unnecessary charges.

If you faced any issues or were blocked at any step. Please feel free to comment here. I am happy to help.

Exploring AWS OpenSearch Serverless Pricing: How It Differs from Traditional Serverless Services

Salam Shaik — Mon, 21 Oct 2024 04:14:02 +0000

Hi everyone,

Before diving into this topic, Let me introduce what OpenSearch is.

OpenSearch: It’s an AWS-managed service alternative to Elastic Search. It offers various search features. It comes with a dashboard called **Kibana **where you can manage and visualize the data, work with the indexes, etc.

Scenario: Recently I started exploring AWS bedrock **service. I went to see what is Knowledge bases in bedrock and what it offers. As a part of creating the **Knowledge base, It created an OpenSearch serverless collection.

-After around 1 hour I deleted the Knowledge Base, thinking that the OpenSearch collection also be destroyed. But after 2 days I received an E-mail from AWS Budgets that the bill amount was hitting the budget limit

In the past, I had some incidents where my AWS bill hit around $800 without my knowledge. By talking to the AWS support team and explaining the scenario of what caused this huge bill they waved off the bill.

After that incident I created different levels of budget alerts, to know about the unexpected rise in the bill amount at the earliest. Like

Alert 1: with a $5 limit

Alert 2: with a $15 limit

Alert 3: with a $50 limit

I immediately opened my AWS console, went to AWS bills, and saw that the OpenSearch service was causing this issue. Then I deleted that collection.

Assumption:

After using Lambda for a long time, I assumed that serverless services would only be charged when the service got a hit or based on the execution time. I only know about my OpenSearch Serverless endpoint and I didn’t use the endpoint at all in the last 2 days. Then why will I be charged?

Reality:

Open Search Serverless pricing works differently when compared to other serverless services
You will be charged for Computing Power and Storage
Computing power is measured in OpenSearch Computing Units(OCU)
1 OCU comes with 6GB of RAM and corresponding vCPUs and GP3 Storage
You need at least 2 OCUs **running one for **indexing and one for searching
All data is stored in the S3 bucket. So storage costs will be there
Here is the pricing from the AWS docs

So even though you are not hitting the server and not running anything in the open search index you will charged based on the OCU units you are using

Let’s calculate how much it costs for you for a month if you want to have a collection with the bare minimum of 2 OCUs **and **10GB of storage

2 OCUs = 2 x 0.24 = $0.48/hour
0.48 x 24(1 day) x 30(days) = $345.6 Per Month
Storage 10 x 0.024 = $0.24 Per month
Total cost = $345.6 + $0.24 = $345.84 Per Month

So Even though you are not hitting the server, or not running anything on the index you will be charged around 345$ per month. That’s a very huge amount

My Thoughts on this:

AWS doesn’t mention anything about how much load an OCU can take. It mostly depends on the complexity of the query we are running and the simultaneous requests that are hitting the server
We can set limits for OCUs we need to prevent un-intentional cost
Even though it is a serverless service we need to have an idea of Open Search Serverless Infra
I feel it looks like a typical EC2 Auto-Scaling service. We can set the limits on the desired instances and it will scale up and down based on the traffic
If we are not sure about the load we are gonna get, we don’t know how many OCUs we need and I personally feel paying around a minimum of 345$ dollars to set up a server up and running
It offers dev/test mode you can use 1 OCU, half for searching and half for indexing. It might cut the cost to half around $170 which is also not a small amount

I recommend if you are new to OpenSearch and want to experiment or try it, it is good to create an endpoint with dev/test mode on instances

AWS offers a free tier on this service with 750 hours of usage per month on t2.small.search or t3.small.search instances and 10GB per month. It is good enough for trying and testing this service

It is always better to keep budget alerts with different amount limits will help you a lot to identify any cost spikes at the earliest.

Hope you find this helpful. Please share your thoughts on this. I am happy to hear your views on this pricing. Thanks.

Harnessing AWS's Invoke Model State for Efficient Embedding Creation in Workflows

Salam Shaik — Fri, 22 Dec 2023 14:20:19 +0000

Hi everyone,

AWS recently released bedrock integration to step functions. They are providing InvokeModel **and **CreateModelCustomizationJob **states in the step functions. If you can search in the search bar of the step function with the **Bedrock name it will provide a list of the actions available.

Now we can create the workflows by invoking the models available in the Amazon Bedrock from the step functions.

If you are not aware of the embeddings and bedrock service, refer to my previous article using this link
Building a vector-based search engine using Amazon Bedrock and Amazon Open Search Service

Let’s dive into the article

In this article, we are going to create a state machine workflow to generate embeddings from a CSV file stored in an S3 bucket and store the generated embeddings in the OpenSearch index for further processing.

AWS services we are going to use in this experiment

S3 bucket for storing the data
Step function to create a workflow for generating embeddings and storing them OpenSearch index
BedRock contains the models for generating the embeddings
Lambda for storing the generating embedding in the OpenSearch
OpenSearch for storing the embeddings for implementing semantic search

Step 1: Collecting and storing the data:

For this experiment, using this link I downloaded a dataset containing a set of movie info https://www.kaggle.com/datasets/kayscrapes/movie-dataset
Create an S3 bucket and upload the CSV file to the bucket.

Step 2: Create an open Search Service cluster

For storing the embeddings we will create an open search cluster
Visit the open search service from the AWS search bar
Click on the Create Domain button
Provide the Name for the domain, choose standard create, and select dev/test template

Deployment will be without standby as we are not doing this for production purposes.

From the general purpose instances select t3.small.search instances, as we are experimenting and nodes will only have 1

Instead of VPC deploy it publicly and provide a master username and password

Configure the access policy to allow all to access OpenSearch dashboards and endpoints. But for production make it according to your security requirements

Click on Create Domain and wait for a few minutes for the cluster to come online
Once the cluster is ready copy the open search endpoints from the dashboard to use in the Lambda function
Visit the OpenSearch dashboard and create an index for storing the data
Visit dev tools ****from the dashboard and use the following code to create an index

PUT contents
    {
      "settings": {
        "index.knn": true
      },
      "mappings": {
        "properties": {
          "Summary":{
            "type": "text"
          },  
          "Title": {
            "type": "text"
          },
          "Embedding": {
            "type": "knn_vector",
            "dimension": 1536
          }
        }
      }
    }

Step 3: Create a Lambda function for storing the embeddings

Visit the Lambda service and create a Lambda function with the Python 3.9 environment
Here is the lambda code

    import boto3
    import requests
    from requests_aws4auth import AWS4Auth

    def lambda_handler(event, context):
        # Extract the relevant data
        summary = event['Summary']#summary column of csv file
        title = event['Title']#Title column of the csv file
        embedding = event['output']['Body']['embedding'] #contains embedding generataed for Summary columns

        # Define the document to be indexed
        document = {
            'Summary': summary,
            'Title': title,
            'Embedding': embedding
        }

        #Username and password of the opensearch endpoint
        auth = ('username',"password")

        # OpenSearch domain endpoint
        opensearch_domain_endpoint = "https://search-contents-oflzhkvsjgukdwvszyd5erztza.us-east-1.es.amazonaws.com"  # e.g., https://search-mydomain.us-west-1.es.amazonaws.com
        index_name = 'contents'
        url = f"{opensearch_domain_endpoint}/{index_name}/_doc"

        headers = { "Content-Type": "application/json" }

        # Make the signed HTTP request
        response = requests.post(url, auth=auth, json=document, headers=headers)

        return {
            'statusCode': 200,
            'body': response.text
        }

This is how we are going to send the data to Lambda from the state machine

 {
      "Summary": "value1",
      "Title": "value2",
      "output": {
        "Body": {
          "embedding": "[0,1,2,3]"
        }
      }
    }

Step 4: Create a state machine in step functions

Visit step functions from the AWS search bar
From the flow, we are using **Map **for iterating through the CSV file records

From the bedrock section Invoke Model action

**Invoke Lambda **action for sending embeddings to Open Search

Create a workflow like the below picture. You can drag and drop items from the left menu

After creating, configure each state like this
Map: configure it to use the S3 bucket CSV file we stored in the first step and make it iterate through the **Summary, Title **Columns of the document
Output of the Map function to another s3 bucket for avoiding limit exceed errors from the state machine

Configure **the Invoke Model **state to use **amazon.titan-embed-text-v1 **model and input from the Map function like this

Configure the **Lambda Invoke **state with the function name

This is the final code of the state machine after configuring all the states

 {
      "Comment": "A description of my state machine",
      "StartAt": "Map",
      "States": {
        "Map": {
          "Type": "Map",
          "ItemProcessor": {
            "ProcessorConfig": {
              "Mode": "DISTRIBUTED",
              "ExecutionType": "STANDARD"
            },
            "StartAt": "Invoke Model",
            "States": {
              "Invoke Model": {
                "Type": "Task",
                "Resource": "arn:aws:states:::bedrock:invokeModel",
                "Parameters": {
                  "ModelId": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v1",
                  "Body": {
                    "inputText.$": "$.Summary"
                  }
                },
                "Next": "Lambda Invoke",
                "ResultPath": "$.output"
              },
              "Lambda Invoke": {
                "Type": "Task",
                "Resource": "arn:aws:states:::lambda:invoke",
                "OutputPath": "$.Payload",
                "Parameters": {
                  "Payload.$": "$",
                  "FunctionName": "arn:aws:lambda:us-east-1:556343216872:function:dump-embeddings:$LATEST"
                },
                "Retry": [
                  {
                    "ErrorEquals": [
                      "Lambda.ServiceException",
                      "Lambda.AWSLambdaException",
                      "Lambda.SdkClientException",
                      "Lambda.TooManyRequestsException"
                    ],
                    "IntervalSeconds": 1,
                    "MaxAttempts": 3,
                    "BackoffRate": 2
                  }
                ],
                "End": true
              }
            }
          },
          "Label": "Map",
          "MaxConcurrency": 10,
          "ItemReader": {
            "Resource": "arn:aws:states:::s3:getObject",
            "ReaderConfig": {
              "InputType": "CSV",
              "CSVHeaderLocation": "GIVEN",
              "MaxItems": 10,
              "CSVHeaders": [
                "Summary",
                "Title"
              ]
            },
            "Parameters": {
              "Bucket": "netflix-titles-csv",
              "Key": "Hydra-Movie-Scrape.csv"
            }
          },
          "End": true,
          "ResultWriter": {
            "Resource": "arn:aws:states:::s3:putObject",
            "Parameters": {
              "Bucket": "embeddings-ouput",
              "Prefix": "output"
            }
          }
        }
      }
    }

Step 5: Execute the state machine workflow and check the output

From the step function console top right corner click on the **Execute **button
Leave the input of the state machine as it is and click on execute
After the successful execution of the workflow, you can see the results like this

Visit the OpenSearch dashboard and click on Query Workbench to execute the query and check whether embeddings are stored or not

As embeddings are a very large dataset they won’t be visible on the dashboard

That’s it. We successfully create a workflow to generate embedding using a bedrock model.

If you want to use the generated embeddings for further implementing search engine, you can visit the link provided at the start of the article. It will contain the steps on how to build the vector search engine in OpenSearch

If you are having any doubts or need more clarification or help, Please feel free to comment on this article. Will surely get back to you.

Thanks :)

Introduction to AWS Step functions by Automated Image-Processing Using Amazon Rekognition

Salam Shaik — Sun, 10 Dec 2023 17:23:07 +0000

Hi everyone,

We are implementing an automated image-processing workflow using state machines in step functions. It will cover the basics of step functions workflow

Before jumping into the topic let me introduce the step functions and terminology we are going to use in this article

**Step functions: **It’s a managed service that helps to combine AWS services to create workflows. It will provide a visual workflow editor. We can just drag and drop the services and combine them to create a workflow

Terminology:

State: Each step in the workflow is defined as a state. A state will have input, outputs, and error-handling

State Machine: A workflow is defined as a state machine. It contains the states. By default, the editor will provide **start **and **end **states

Overview: Here we will create 4 lambda functions and connect them using the step functions to automate image processing using the AWS Rekognition service

API-receiver: This lambda function receives the image through API-Gateway and triggers the state machine

Image-uploader: This function will upload the image to s3 bucket and return the image data

Object-receiver: This function will receive the image data pass that info to the AWS recognition service and return that data

Store-to-dynamo-db: This function will take the image info and store that data in the Dynamo Db

Let’s start

Step 1: Create a lambda function to upload images to S3

Here is the code for the lambda function

import json
import base64
import boto3
from datetime import datetime


def lambda_handler(event, context):

    ##rececing base 64 image and data and decoding it
    image_data = base64.b64decode(event['base64Image'])

    s3 = boto3.client("s3")

    ##generating the name for the image to upload to s3
    now = datetime.now()
    date_time_string = now.strftime("%Y-%m-%d %H:%M:%S")

    bucket_name = "images-rek-test"
    object_key = f"{date_time_string}.png"
    region = "us-west-2"



    try:
        ##uploading image to s3
        s3.put_object(Bucket=bucket_name, Key=object_key, Body=image_data)

        response = {
            "bucket_name":bucket_name,
            "object_key":object_key,
            "region":region
        }

        return {
            'statusCode':200,
            'body': response
        }
    except Exception as e:
        print(e)
        return {
            'statusCode': 500,
            'body': json.dumps("Error in uploading image")
        }

Step 2: Create a lambda function for image processing:

Here is the code

import json
import boto3

def lambda_handler(event, context):

    s3_client = boto3.client('s3')
    rekognition_client = boto3.client('rekognition')

    bucket_name = event["body"]['bucket_name']
    object_key = event["body"]['object_key']


    print(f"keys {bucket_name} and {object_key}")

    ##sending uploaded image info to rekognition service
    response = rekognition_client.detect_labels(
        Image={'S3Object': {'Bucket': bucket_name, 'Name': object_key}},
        MaxLabels=10
        )

    print(response)

    return {
        'statusCode': 200,
        'body': response
    }

Step 3: Create a Lambda function to store data in DynamoDb

Create a table in the dynamo DB with id as the primary key

Here is the code

import json
import boto3
from datetime import datetime
from decimal import Decimal

##converting floats to decimal as dynamo db won't support
def convert_floats_to_decimal(obj):
    if isinstance(obj, float):
        return Decimal(str(obj))
    elif isinstance(obj, dict):
        return {k: convert_floats_to_decimal(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_floats_to_decimal(v) for v in obj]
    else:
        return obj



def lambda_handler(event, context):

    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('image_data')##table name in dynamo db


    rekognition_data = convert_floats_to_decimal(event['body'])

    ##generate an id with date and time to store in table
    now = datetime.now()
    date_time_string = now.strftime("%Y-%m-%d %H:%M:%S")

    item = {
        'id': date_time_string, 
        'RekognitionResponse': rekognition_data
    }

    table.put_item(Item=item)##stroing the data to db

    return {
        'statusCode': 200,
        'body': 'Data stored in DynamoDB successfully'
    }

Now we have the Lambda functions ready, Let’s create the state machine in step function

Step 4: Create a state machine in step functions:

Visit the step functions service from the AWS search bar
Click on create state machine and select a blank template
From the side menu drag and drop the Lambda invoke

From the right side menu update the lambda configuration, input, and output

Give a name for the state, From API Parameters select the function
Leave the input empty
Transform the output like this to pass info to the next state

. This is the state the machine looks like now

Drag another Lambda invoke state and keep it under the first function

Configure the second Lambda function configuration, Input and output like this
Update the state name, select the lambda function, leave the input as it is, and change the output like the previous state

Now drag and drop another Lambda invoke function below the second Lambda function

Update the Lambda function configuration, leave the input as it is, and output like this

Save the state machine from the top right corner
While creating the state machine you can test each state by selecting it.

Step 5: Create a Lambda function to trigger this state machine

Copy the state machine ARN and paste it into the state_machine_arn variable

import json
import boto3

def lambda_handler(event, context):

client = boto3.client('stepfunctions') 

state_machine_arn ="arn:aws:states:{region}:{accountid}:stateMachine:MyStateMachine-jz7b4j339"

input_data = json.dumps(event)

##trigger the state machine
response= client.start_execution(
    stateMachineArn=state_machine_arn,
    input=input_data
    )

return{
    'statusCode':200,
    'body': json.dumps('Successfully started state machine')
}

Step 6: Create an API-Gateway API to trigger the lambda function

Visit the API-Gateway service from the AWS Search bar and visit the API Gateway service
Create a POST call click on the integration request and select the lambda function we created above in the mapping template enter the code like this

From the API Setting enable binary media types and enter image/png as the media type

Deploy the API and copy the API URL and from the postman trigger that URL
It will start the State machine and you can see the results stored in the dynamo DB
You can see the execution flow from the state machine below and see the results in the dynamo db

That’s it. We successfully created a state machine to process the images
Go to the step function and visit all the features available, keep on trying other workflows
There will be another article with more info about workflows

Limitations: Here state machine input accepts only around 250kb of data, so if you want to input bigger images better to do them from S3 Bucket directly.

Introduction to Lambda Function URLs

Salam Shaik — Mon, 27 Nov 2023 19:01:31 +0000

Hi everyone,

I have been using Lambda service for the last 3 years. Whenever I want a public URL to trigger the lambda function immediately I used to go to API Gateway.

There I have to do some steps like creating resources, and methods attaching lambda, deploying API, etc. Every time I used to wonder if there should be some easy way to do it

But never done any research on achieving this. Two days back doing some research for writing my next article I came across these Lambda Function URLs.

After getting to know about Lambda URLs, I literally felt like this

Okay. Enough of my story. Let’s dive into the topic

Lambda Function URLs:

If you want a public URL to directly call a Lambda function without using API Gateway, you can use Lambda function URLs. It will give a public HTTP URL. We can call this URL like normal URL we daily use in our lives

Creating a Lambda Function URL:

Visit the Lambda service from the AWS search
Click on the Create Lambda function. Give a name for the function and select Runtime. I am selecting Python 3.9 as my function runtime

Click on the Create Function button and wait for some time for the function to be created
It will come with the basic code like this

For testing, that code is enough. Now to create a public URL, Visit the configuration tab at the top of the Lambda code editor

From the side menu click on Function URL and click on the Create Function URL button

You can find different Auth types like below

AWS_IAM will allow permission granted users and roles to access the URL
NONE will give access to the public. Anybody can access the Lambda Function using the URL
From the Additional Settings, you can configure the CORS also

You can define the origin you want to allow and can add needed headers.

From the drop-down, you can select the methods you want for the function and click on save.
Once the Function URL is generated, you can find the details like below

You can copy the Function URL, put it in PostMan, and test it like the below

As I kept the Authe type as AWS_IAM, It’s giving forbidden as the response. Now I am changing it to NONE

After that, I tried again. Now I am getting the correct response.

Use cases:

We can use these Lambda Function URLs for webhooks
For a quick API, static websites, or single-page applications, these are helpful
For simple web applications, that don’t want to use API Gateway it is very helpful
For me for simply writing a function and testing it directly, I feel these function URLs are very helpful.

There are so many use cases for now I can only think of these

Limitations:

When you compare these Lambda Function URLs with API Gateway, these lack some security features API keys, Granular access control, etc.
For restricting the HTTP Methods we need to implement our own logic in the Lambda, by default these don't provide any HTTP method restriction
CORS configuration, if not configured correctly can lead to many security issues.

That’s all about Lambda Function URLs I got to know about in a little bit of time. If you know of any use cases I missed or any limitations you have feel free to comment. I am open to suggestions.

Thanks, everyone, Have a good day.