DEV Community: Benito Martin

Automate Vector Database Update with AWS and CircleCI

Benito Martin — Tue, 15 Jul 2025 09:07:55 +0000

Introduction

Managing vector databases efficiently is crucial for applications that rely on similarity search, AI-powered recommendations, and large-scale data retrieval. As data sources grow, keeping vector databases updated with fresh embeddings becomes a challenge. Manually updating embeddings for new documents is inefficient and error-prone, making automation essential.

In this guide, you will explore how to build a fully automated pipeline for processing and updating a vector database using AWS Lambda and CircleCI. The solution involves extracting text from PDFs, generating embeddings with OpenAI, and storing them in Zilliz Cloud, a managed vector database. You will also set up AWS infrastructure (S3, ECR, and Lambda) and implement a CI/CD pipeline with CircleCI to automate deployment and updates.

What You Will Learn:

How to manage vector databases and automate embeddings creation
Building an AWS Lambda function to process and update embeddings
Using Docker to containerize the AWS Lambda function for efficient execution
Setting up CircleCI to automate testing and deployment
Implementing best practices for AWS IAM roles and security

By the end of this tutorial, you will have a fully automated workflow to process and update vector embeddings seamlessly.

This tutorial assumes some familiarity with Python, AWS, and Docker. You can check out the complete source code on GitHub, but this guide will walk you through the process step by step.

Prerequisites

Before you begin, ensure that you have the following requirements in place:

AWS Account: Sign up for an AWS account if you do not already have one. You will use AWS Lambda and Elastic Container Registry (ECR) for deployment.
AWS CLI Installed and Configured: Install the AWS Command Line Interface (CLI) and configure it with your AWS credentials. You can follow the AWS CLI setup guide.
Basic Knowledge of LangChain or Vector Databases:Understanding the fundamentals of LangChain and Vector Databases will help you design the architecture of the pipeline.
Familiarity with AWS Lambda and Docker: You should know the basics of AWS Lambda and Docker, as you will use them to package and deploy the application.
GitHub and CircleCI Accounts: Create accounts on GitHub and CircleCI to manage the version control and automate the CI/CD pipeline.
OpenAI API Key: To access OpenAI’s GPT models, you will need an API key. You can sign up for an API key on the OpenAI website.
Zilliz Cloud Account: Sign up for a Zilliz Cloud account to host your vector database and get a free cluster that provides the URI endpoint and Token to interact with it.

Once you have these prerequisites in place, you will be ready to set up the automated pipeline.

Setting Up the Project Structure

Before diving into implementation, you need to structure your project efficiently. A well-organized project makes development, testing, and deployment smoother, especially when dealing with cloud services and CI/CD automation.

Project Organization and Key Components

Your project will include the following key components:

├── .circleci/
│   └── config.yml
├── data/
│   └── 1706.03762v7.pdf
├── src/
│   ├── create_collection.py
│   ├── drop_collection.py
│   ├── insert_documents.py
│   └── __init__.py
├── aws_lambda/
|   ├── __init__.py
│   └── lambda_function.py
├── scripts/
│   ├── build_deploy.sh
│   ├── create_roles.sh
│   ├── create_image.sh
│   └── create_lambda.sh
├── tests/
│   ├── test_collection_exists.py
│   ├── test_lambda_function.py
│   └── test_collection_mock.py
├── Dockerfile
└── pyproject.toml

Installing Dependencies with UV Package Manager

First, clone the repository containing the project code:

git clone https://github.com/benitomartin/embeddings-aws-circleci
cd embeddings-aws-circleci

Note: The repository you just cloned already contains all the necessary code snippets referenced throughout this tutorial. There's no need to recreate files from scratch. Simply verify that the contents match as you follow along. Feel free to adapt the structure and implementation to suit your own project requirements.

Next, install the dependencies using the UV Package Manager. If you do not have it installed, you can follow the installation guide

uv sync --all-extras
source .venv/bin/activate

These commands will install all the necessary dependencies for the project that are listed in the pyproject.toml file and activate the virtual environment.

Environment Configuration

Create a .env file in the root directory of your project and add the following environment variables:

ZILLIZ_CLOUD_URI=your-zilliz-uri
ZILLIZ_TOKEN=your-zilliz-token
COLLECTION_NAME=your-collection-name
PDF_BUCKET_NAME=your-bucket-name
OPENAI_API_KEY=your-openai-key
AWS_REGION=your-aws-region
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_ACCOUNT_ID=your-account-id
LAMBDA_ECR_REPOSITORY_NAME=your-ecr-repo-name
LAMBDA_IMAGE_NAME=your-image-name
LAMBDA_FUNCTION_NAME=your-lambda-name
ROLE_NAME=your-role-name
ROLE_POLICY_NAME=your-policy-name

Replace the placeholders with your actual values.

Creating the Vector Database Infrastructure

To efficiently store and retrieve embeddings, you need to set up a vector database. This section will guide you through configuring Zilliz Cloud (Milvus), defining a schema, and optimizing the database for fast vector searches.

Setting Up Zilliz Cloud Collection

Zilliz Cloud is a managed version of Milvus, a high-performance vector database. You will create a collection to store extracted text and corresponding vector embeddings.

In order to create a collection, you need to follow these steps:

Sign up and create a free Cluster in Zilliz Cloud.
Get the connection details:

URI: Found in the cluster settings (public endpoint).
Token: Required for authentication.

Set environment variables in your .env file and provide a collection name:

ZILLIZ_CLOUD_URI=your-zilliz-uri
ZILLIZ_TOKEN=your-zilliz-token
COLLECTION_NAME=your-collection-name

Creating the Collection

Once you have the connection details, you can create a collection in Zilliz Cloud. The collection will store the extracted text and corresponding vector embeddings.

In the src folder, you can create a create_collection.py script, with several functions to define the schema and create the collection:

create_schema: Defines the schema (create_schema), which includes:
- id: Auto-generated primary key (INT64).
- pdf_text: Extracted text stored as a VARCHAR.
- my_vector: Vector embeddings stored as FLOAT_VECTOR (default dimension: 1536).
create_collectionCreates the collection in Zilliz Cloud, with the defined schema. It optimizes vector search by setting up an AUTOINDEX with COSINE similarity, ensuring efficient retrieval.

import os
from typing import Optional

from pymilvus import DataType, MilvusClient

def create_schema(dimension: int = 1536) -> MilvusClient.create_schema:
    """Define the schema for the Milvus collection."""
    schema = MilvusClient.create_schema(
        auto_id=True,
        enable_dynamic_field=True,
    )

    schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
    schema.add_field(field_name="pdf_text", datatype=DataType.VARCHAR, max_length=65535)
    schema.add_field(field_name="my_vector", datatype=DataType.FLOAT_VECTOR, dim=dimension)

    return schema


def create_collection(
    collection_name: Optional[str] = None,
    uri: Optional[str] = None,
    token: Optional[str] = None,
    dimension: int = 1536,
) -> None:
    """Create a new Milvus collection with the specified parameters.

    Args:
        collection_name (str, optional): Name of the collection. Defaults to env var COLLECTION_NAME.
        uri (str, optional): Zilliz Cloud URI. Defaults to env var ZILLIZ_CLOUD_URI.
        token (str, optional): Zilliz token. Defaults to env var ZILLIZ_TOKEN.
        dimension (int, optional): Vector dimension. Defaults to 1536.
    """
    # Use environment variables as fallback
    collection_name = collection_name or os.getenv("COLLECTION_NAME")
    uri = uri or os.getenv("ZILLIZ_CLOUD_URI")
    token = token or os.getenv("ZILLIZ_TOKEN")

    if not all([collection_name, uri, token]):
        raise ValueError("Missing required parameters: collection_name, uri, or token")

    # Connect to Zilliz Cloud (Milvus)
    client = MilvusClient(uri=uri, token=token)

    # Create schema
    schema = create_schema(dimension)

    # Prepare index parameters
    index_params = client.prepare_index_params()
    index_params.add_index(field_name="my_vector", index_type="AUTOINDEX", metric_type="COSINE")

    # Create collection
    client.create_collection(collection_name=collection_name, schema=schema, index_params=index_params)

if __name__ == "__main__":
    # Create collection
    print("Creating collection...")
    create_collection()
    print("Collection created successfully.")

Once your Zilliz Cloud cluster is ready and .env is configured, run:

uv run src/create_collection.py

This will create a collection in your Zilliz Cloud cluster. In case you need to delete the collection, you can create a drop_collection.py script in the src folder to drop the collection and recreate it again with the previous script.

import os
from typing import Optional

from pymilvus import MilvusClient


def drop_collection(
    collection_name: Optional[str] = None,
    uri: Optional[str] = None,
    token: Optional[str] = None,
) -> None:
    """Drop a Milvus collection.

    Args:
        collection_name (str, optional): Name of the collection. Defaults to env var COLLECTION_NAME.
        uri (str, optional): Zilliz Cloud URI. Defaults to env var ZILLIZ_CLOUD_URI.
        token (str, optional): Zilliz token. Defaults to env var ZILLIZ_TOKEN.
    """
    # Use environment variables as fallback
    collection_name = collection_name or os.getenv("COLLECTION_NAME")
    uri = uri or os.getenv("ZILLIZ_CLOUD_URI")
    token = token or os.getenv("ZILLIZ_TOKEN")

    if not all([collection_name, uri, token]):
        raise ValueError("Missing required parameters: collection_name, uri, or token")

    # Connect to Zilliz Cloud (Milvus)
    client = MilvusClient(uri=uri, token=token)

    # Drop the collection
    client.drop_collection(collection_name=collection_name)

if __name__ == "__main__":
    # Drop collection
    print("Dropping collection...")
    drop_collection()
    print("Collection dropped successfully.")

To drop the collection, run:

uv run src/drop_collection.py

Implementing the PDF Processing Pipeline

To store and search text efficiently, you need to process PDFs, extract the text, convert it into embeddings, and store them in Zilliz Cloud for fast retrieval.

Make sure to set the OPENAI_API_KEY environment variable in your .env file.

Then, create a insert_documents.py script in the src folder. This script will:

Load the text from PDFs using PyPDFLoader from LangChain to get a Document object
Split the text into manageable chunks to ensure accurate embeddings using CharacterTextSplitter
Generate vector embeddings using OpenAI
Store the text and embeddings in Zilliz Cloud using MilvusClient for efficient similarity searches.

import os
from typing import Optional

from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from pymilvus import MilvusClient


def process_pdf(pdf_path: str, chunk_size: int = 512, chunk_overlap: int = 100) -> list[dict]:
    """Process a PDF file and generate embeddings for its content.

    Args:
        pdf_path (str): Path to the PDF file.
        chunk_size (int, optional): Size of text chunks. Defaults to 512.
        chunk_overlap (int, optional): Overlap between chunks. Defaults to 100.

    Returns:
        List[dict]: List of dictionaries containing text and embeddings.
    """
    if not os.path.exists(pdf_path):
        raise FileNotFoundError(f"PDF file not found at {pdf_path}")

    # Load and process PDF
    loader = PyPDFLoader(pdf_path)
    documents = loader.load()

    # Split text
    text_splitter = CharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    chunks = text_splitter.split_documents(documents)

    # Generate embeddings
    openai_embeddings = OpenAIEmbeddings()

    # Prepare data for insertion
    data = []
    for chunk in chunks:
        text = chunk.page_content
        embedding = openai_embeddings.embed_documents([text])[0]
        data.append({"pdf_text": text, "my_vector": embedding})

    return data


def insert_documents(
    pdf_path: str,
    collection_name: Optional[str] = None,
    uri: Optional[str] = None,
    token: Optional[str] = None,
    chunk_size: int = 512,
    chunk_overlap: int = 100,
) -> None:
    """Insert documents from a PDF file into a Milvus collection.

    Args:
        pdf_path (str): Path to the PDF file.
        collection_name (str, optional): Name of the collection. Defaults to env var COLLECTION_NAME.
        uri (str, optional): Zilliz Cloud URI. Defaults to env var ZILLIZ_CLOUD_URI.
        token (str, optional): Zilliz token. Defaults to env var ZILLIZ_TOKEN.
        chunk_size (int, optional): Size of text chunks. Defaults to 512.
        chunk_overlap (int, optional): Overlap between chunks. Defaults to 100.
    """
    # Use environment variables as fallback
    collection_name = collection_name or os.getenv("COLLECTION_NAME")
    uri = uri or os.getenv("ZILLIZ_CLOUD_URI")
    token = token or os.getenv("ZILLIZ_TOKEN")

    if not all([collection_name, uri, token]):
        raise ValueError("Missing required parameters: collection_name, uri, or token")

    # Connect to Zilliz Cloud (Milvus)
    client = MilvusClient(uri=uri, token=token)

    # Process PDF and get data
    data = process_pdf(pdf_path, chunk_size, chunk_overlap)

    # Insert data
    client.insert(collection_name, data)

    # Verify collection load state
    load_state = client.get_load_state(collection_name=collection_name)
    print(f"Collection load state: {load_state}")


if __name__ == "__main__":
    # Insert documents
    print("Inserting documents...")
    insert_documents("data/1706.03762v7.pdf")
    print("Documents inserted successfully.")

To run the script, use the following command. You can find a sample PDF file in the data folder but feel free to use your own.

uv run src/insert_documents.py

This script will process the PDF, generate embeddings, and store them in your Zilliz Cloud cluster collection.

Creating IAM Roles and Policies

Now that you have a working pipeline, you need to set up AWS Lambda to trigger the pipeline when a new PDF is uploaded to an S3 bucket.

To deploy AWS Lambda functions, you need first to create specific IAM roles and permissions. You can create the following create_roles.sh script under the scripts folder. This script automates the process of creating an IAM role with the necessary policy AWSLambdaExecute for AWS Lambda to execute the function and access S3.

Before running the script,make sure to set the ROLE_NAME and AWS_REGION environment variables in your .env file.

AWS Lambda will assume this role when executing the function, which allows it to access the S3 bucket, as defined in the AWSLambdaExecute policy. It will also have access to CloudWatch Logs for logging purposes, which will help you monitor and debug the function.

#!/bin/bash

# Exit immediately if a command exits with a non-zero status
set -e

# Load environment variables from .env file
set -o allexport
source .env
set +o allexport

echo "Environment variables loaded."


# Create a new IAM role with Lambda and S3 full access
echo "Checking IAM role..."

# Check if the role exists
if ! aws iam get-role --role-name ${ROLE_NAME} --region ${AWS_REGION} 2>/dev/null; then
    echo "Creating new IAM role for Lambda with S3 access..."

    # Fix: Remove space after = and use proper JSON formatting
    ASSUME_ROLE_POLICY='{
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }]
    }'

    # Create the IAM role
    aws iam create-role \
        --role-name ${ROLE_NAME} \
        --assume-role-policy-document "${ASSUME_ROLE_POLICY}" \
        --region ${AWS_REGION}


    # Add Lambda execution policy. Provides Put, Get access to S3 and full access to CloudWatch Logs.
    aws iam attach-role-policy \
        --role-name ${ROLE_NAME} \
        --policy-arn arn:aws:iam::aws:policy/AWSLambdaExecute  \
        --region ${AWS_REGION}

    echo "IAM role created and policy attached."

    # Wait for role to propagate
    echo "Waiting for role to propagate..."
    sleep 20

else
    echo "IAM role ${ROLE_NAME} already exists. Skipping role creation."
fi

To execute the script, use the following command:

uv run scripts/create_roles.sh

Building the AWS Lambda Function

The AWS Lambda function is the core component that automates the entire process of handling PDF uploads, generating embeddings, and storing them in Zilliz Cloud. The function is triggered by an S3 event, processes the uploaded PDF, and stores the resulting data in your Milvus collection.

Lambda Handler Implementation

Now you can create the lambda_function.py file below and save it in the aws_lambda folder. This file contains the implementation of the AWS Lambda function. In this case, the AWS Lambda function is triggered by an S3 event whenever a new PDF file is uploaded in an S3 bucket. It processes the event, extracts the file, generates embeddings, and inserts the data into the Zilliz Cloud collection.

import json
import os

import boto3
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from pymilvus import MilvusClient

# Global variables for reuse across invocations
client = None
openai_embeddings = None
text_splitter = None


def init_clients():
    """Initialize global clients if not already initialized"""
    global client, openai_embeddings, text_splitter

    if client is None:
        print("Initializing Milvus client...")
        client = MilvusClient(uri=os.getenv("ZILLIZ_CLOUD_URI"), token=os.getenv("ZILLIZ_TOKEN"))

    if openai_embeddings is None:
        print("Initializing OpenAI embeddings...")
        openai_embeddings = OpenAIEmbeddings(openai_api_key=os.getenv("OPENAI_API_KEY"))

    if text_splitter is None:
        print("Initializing text splitter...")
        text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=100)


def lambda_handler(event, context):
    try:
        print(f"Received event: {json.dumps(event)}")

        # Initialize clients
        init_clients()

        # Validate event structure
        if "Records" not in event or not event["Records"]:
            print("No records found in event")
            return {"statusCode": 400, "body": json.dumps("No records found in event")}

        # Get bucket and file info from S3 event
        record = event["Records"][0]
        bucket = record["s3"]["bucket"]["name"]
        key = record["s3"]["object"]["key"]

        print(f"Processing file {key} from bucket {bucket}")

        # Verify bucket
        expected_bucket = os.getenv("PDF_BUCKET_NAME")
        if bucket != expected_bucket:
            print(f"Invalid bucket. Expected {expected_bucket}, got {bucket}")
            return {
                "statusCode": 400,
                "body": json.dumps(f"Invalid bucket. Expected {expected_bucket}, got {bucket}"),
            }

        # Download PDF
        local_path = f"/tmp/{os.path.basename(key)}"
        print(f"Downloading file to {local_path}")
        s3 = boto3.client("s3")
        s3.download_file(bucket, key, local_path)

        # Process PDF
        print("Loading and splitting PDF...")
        documents = PyPDFLoader(local_path).load()
        chunks = text_splitter.split_documents(documents)
        print(f"Split PDF into {len(chunks)} chunks")

        # Prepare and insert data
        print("Generating embeddings and preparing data...")
        data = [
            {
                "pdf_text": chunk.page_content,
                "my_vector": openai_embeddings.embed_documents([chunk.page_content])[0],
            }
            for chunk in chunks
        ]

        print(f"Inserting {len(data)} records into collection {os.getenv('COLLECTION_NAME')}")
        client.insert(os.getenv("COLLECTION_NAME"), data)

        # Cleanup
        os.remove(local_path)
        print("Processing completed successfully")

        return {"statusCode": 200, "body": json.dumps(f"Successfully processed {key}")}

    except Exception as e:
        print(f"Error processing document: {str(e)}")
        import traceback

        print(f"Traceback: {traceback.format_exc()}")
        return {"statusCode": 500, "body": json.dumps(str(e))}

Main Features of the Lambda Function:

S3 Event Processing: The AWS Lambda function is triggered by an S3 event when a new PDF is uploaded to the designated bucket.
Client Initialization: The function initializes the Milvus client for storing embeddings, the OpenAI embeddings client, and the text splitter for chunking the PDF text.
Text Processing: The PDF text is extracted using PyPDFLoader, then split into smaller chunks to ensure proper embedding generation.
Generating and Storing Embeddings: The OpenAI embeddings are generated for each chunk of text, and the resulting data is inserted into the specified Milvus collection in Zilliz Cloud.
Error Handling: The function includes error handling to catch and log any exceptions that occur during the processing of the PDF.

AWS Lambda Containerization with Docker

Once the AWS Lambda function is ready, it needs to be containerized using Docker. As AWS Lambda works better with requirements.txt instead of pyproject.toml, you need to create a requirements.txt file from your pyproject.toml file in the root directory of your project with the following dependencies:

langchain-community
langchain_milvus
boto3
langchain-openai
pypdf

The Dockerfile below sets up the environment for the AWS Lambda function, including the necessary dependencies and the function code. You can save this file in the root directory of your project.

FROM public.ecr.aws/lambda/python:3.12.2025.04.01.18

# Set the working directory to /var/task
WORKDIR ${LAMBDA_TASK_ROOT}

# Copy requirements first to leverage Docker cache
COPY requirements.txt ./

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy source code
COPY aws_lambda/lambda_function.py ./lambda_function.py

# Command to run the Lambda handler function
CMD [ "lambda_function.lambda_handler" ]

Similarly to the creation of the IAM Role, the creation of the ECR repository and the Docker image can be automated using a shell script. Make sure the coresponding environment variables are set in the .env file. Save the script below in the scripts folder.

#!/bin/bash

# Exit immediately if a command exits with a non-zero status
set -e

# Load environment variables from .env file
set -o allexport
source .env
set +o allexport

echo "Environment variables loaded."

# Check if the ECR repository exists, create it if it does not
if ! aws ecr describe-repositories --repository-names ${LAMBDA_ECR_REPOSITORY_NAME} --region ${AWS_REGION} 2>/dev/null; then
    echo "Repository ${LAMBDA_ECR_REPOSITORY_NAME} does not exist. Creating..."
    aws ecr create-repository --repository-name ${LAMBDA_ECR_REPOSITORY_NAME} --region ${AWS_REGION}
    echo "Repository ${LAMBDA_ECR_REPOSITORY_NAME} created."
else
    echo "Repository ${LAMBDA_ECR_REPOSITORY_NAME} already exists."
fi

# Build Docker image
# To make your image compatible with Lambda, you must use the --provenance=false option.
echo "Building Docker image ${LAMBDA_IMAGE_NAME}..."
docker buildx build --platform linux/amd64 --provenance=false -t ${LAMBDA_IMAGE_NAME}:latest .

# Authenticate Docker to your Amazon ECR registry
echo "Authenticating Docker to ECR..."
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com

# Tag the Docker image
echo "Tagging Docker image..."
docker tag ${LAMBDA_IMAGE_NAME}:latest ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${LAMBDA_ECR_REPOSITORY_NAME}:latest

# Push the Docker image to Amazon ECR
echo "Pushing Docker image to ECR..."
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${LAMBDA_ECR_REPOSITORY_NAME}:latest

echo "Docker image pushed to ECR."
echo "Image created successfully."

You can run the script with the following command:

uv run scripts/create_image.sh

Pushing the AWS Lambda Function

Once the Docker image is built and pushed to ECR, you can create the AWS Lambda function. As this function is triggered by an S3 event, you need to create an S3 bucket first to store your PDFs. This can be done through the AWS Management Console or the AWS CLI with the following command:

aws s3api create-bucket \
    --bucket embeddings-$(uuidgen | tr -d - | tr '[:upper:]' '[:lower:]' ) \
    --region eu-central-1 \
    --create-bucket-configuration LocationConstraint=eu-central-1

This will create a new S3 bucket with a unique name as required by AWS. Make sure to update the PDF_BUCKET_NAME environment variable in the .env file with the name of the bucket you just created.
If your default region is us-east-1, do not include the --create-bucket-configuration flag. Instead, run:

aws s3api create-bucket \
  --bucket embeddings-$(uuidgen | tr -d - | tr '[:upper:]' '[:lower:]') \
  --region us-east-1

Now that the S3 bucket is created, you can create the AWS Lambda function using the following script:

#!/bin/bash

# Exit immediately if a command exits with a non-zero status
set -e

# Load environment variables from .env file
set -o allexport
source .env
set +o allexport

echo "Environment variables loaded."

# Check if the Lambda function exists
if ! aws lambda get-function --function-name ${LAMBDA_FUNCTION_NAME} --region ${AWS_REGION} 2>/dev/null; then
    echo "Lambda function ${LAMBDA_FUNCTION_NAME} does not exist. Creating..."
    aws lambda create-function \
        --function-name ${LAMBDA_FUNCTION_NAME} \
        --package-type Image \
        --code ImageUri=${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${LAMBDA_ECR_REPOSITORY_NAME}:latest \
        --role arn:aws:iam::${AWS_ACCOUNT_ID}:role/${ROLE_NAME} \
        --region ${AWS_REGION} \
        --timeout 900 \
        --memory-size 3072 \
        --environment "Variables={
            PDF_BUCKET_NAME=${PDF_BUCKET_NAME},
            OPENAI_API_KEY=${OPENAI_API_KEY},
            ZILLIZ_CLOUD_URI=${ZILLIZ_CLOUD_URI},
            ZILLIZ_TOKEN=${ZILLIZ_TOKEN},
            COLLECTION_NAME=${COLLECTION_NAME}
        }" \

else
    echo "Lambda function ${LAMBDA_FUNCTION_NAME} already exists. Updating..."
    aws lambda update-function-code \
        --function-name ${LAMBDA_FUNCTION_NAME} \
        --image-uri ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${LAMBDA_ECR_REPOSITORY_NAME}:latest

    # Wait for role to propagate
    echo "Waiting lambda function to update code..."
    sleep 20

    aws lambda update-function-configuration \
        --function-name ${LAMBDA_FUNCTION_NAME} \
        --timeout 900 \
        --memory-size 3072 \
        --environment "Variables={
            PDF_BUCKET_NAME=${PDF_BUCKET_NAME},
            OPENAI_API_KEY=${OPENAI_API_KEY},
            ZILLIZ_CLOUD_URI=${ZILLIZ_CLOUD_URI},
            ZILLIZ_TOKEN=${ZILLIZ_TOKEN},
            COLLECTION_NAME=${COLLECTION_NAME}
        }"


fi

# Check and add S3 trigger to Lambda if it doesn't exist
if ! aws lambda get-policy --function-name ${LAMBDA_FUNCTION_NAME} 2>/dev/null | grep -q "S3InvokeFunction"; then
    echo "Adding S3 trigger permission to Lambda..."
    aws lambda add-permission \
        --function-name ${LAMBDA_FUNCTION_NAME} \
        --statement-id S3InvokeFunction \
        --action lambda:InvokeFunction \
        --principal s3.amazonaws.com \
        --source-arn arn:aws:s3:::${PDF_BUCKET_NAME} \
        --region ${AWS_REGION}
    echo "Waiting for permission to propagate..."
    sleep 20
else
    echo "S3 trigger permission already exists for Lambda. Skipping..."
fi

# Check and configure S3 bucket notification if it doesn't exist
CURRENT_NOTIFICATIONS=$(aws s3api get-bucket-notification-configuration --bucket ${PDF_BUCKET_NAME} 2>/dev/null)
if ! echo "${CURRENT_NOTIFICATIONS}" | grep -q "${LAMBDA_FUNCTION_NAME}"; then
    echo "Configuring S3 bucket notification..."
    aws s3api put-bucket-notification-configuration \
        --bucket ${PDF_BUCKET_NAME} \
        --notification-configuration '{
            "LambdaFunctionConfigurations": [{
                "LambdaFunctionArn": "arn:aws:lambda:'${AWS_REGION}':'${AWS_ACCOUNT_ID}':function:'${LAMBDA_FUNCTION_NAME}'",
                "Events": ["s3:ObjectCreated:*"]
            }]
        }'
else
    echo "S3 bucket notification already configured. Skipping..."
fi

The script checks if the AWS Lambda function already exists and creates it if it does not.

If it does not exist, it creates it with the necessary configuration, including the previously created IAM role, environment variables, and the Docker image. Additionally, it adds a permission to the AWS Lambda function to be invoked by the S3 bucket and configures the S3 bucket to trigger the AWS Lambda function when a new object is created.
If it does exist, it updates the function code and configuration.

The environment variables are loaded from the .env file and stored in the AWS Lambda function's environment variables. This allows the AWS Lambda function to access the necessary resources and configurations.

Also to be sure the AWS Lambda function is invoked, the S3 bucket is configured to trigger the AWS Lambda function when a new object is created, like uploading a new PDF.

You can run the script with the following command:

uv run scripts/create_lambda.sh

Testing and Quality Assurance

Testing and ensuring good code quality are essential steps in any software development pipeline, particularly when deploying to cloud services such as AWS Lambda. It is important to ensure that your code works as expected and is clean, efficient, and type-safe.

Unit Testing with Pytest

Unit tests ensure that each part of the code behaves as expected. Under the tests directory you can create test following tests files:

test_collection_exists.py: Verifies that the collection exists in Zilliz Cloud before attempting to insert embeddings.

  import os

  import pytest
  from pymilvus import MilvusClient


  @pytest.fixture
  def milvus_client():
      # Initialize Milvus client with environment variables for URI and token
      client = MilvusClient(uri=os.getenv("ZILLIZ_CLOUD_URI"), token=os.getenv("ZILLIZ_TOKEN"))
      yield client
      client.close()  # Close the connection after the test


  def test_check_collection_existence(milvus_client):
      collection_name = os.getenv("COLLECTION_NAME")

      # Step 1: Get list of all collections in the Milvus instance
      collections = milvus_client.list_collections()

      # Step 2: Assert that the collection name exists in the list of collections
      assert collection_name in collections, f"Collection '{collection_name}' does not exist in Milvus."

test_collection_mock.py: Uses mocks to test the collection existence and dropping functionality.

  import os
  from unittest.mock import MagicMock, patch

  import pytest


  @pytest.fixture
  def mock_milvus_client():
      with patch("pymilvus.MilvusClient") as mock_client:
          client_instance = MagicMock()
          mock_client.return_value = client_instance
          yield client_instance


  @pytest.fixture
  def mock_env_vars():
      env_vars = {
          "ZILLIZ_CLOUD_URI": "fake-uri",
          "COLLECTION_NAME": "test_collection",
          "ZILLIZ_TOKEN": "fake-token",
      }
      with patch.dict(os.environ, env_vars):
          yield env_vars


  def test_drop_collection(mock_milvus_client, mock_env_vars):
      from src.drop_collection import drop_collection

      # Call drop collection
      drop_collection()

      # Verify the drop_collection method was called with correct parameters
      mock_milvus_client.drop_collection.assert_called_once_with(
          collection_name=mock_env_vars["COLLECTION_NAME"]
      )


  @pytest.mark.parametrize("collection_exists", [True, False])
  def test_collection_existence(mock_milvus_client, mock_env_vars, collection_exists):
      mock_milvus_client.list_collections.return_value = (
          [mock_env_vars["COLLECTION_NAME"]] if collection_exists else []
      )

      # Check if collection exists
      result = mock_milvus_client.list_collections()
      print(f" result: {result}")

      if collection_exists:
          assert mock_env_vars["COLLECTION_NAME"] in result
      else:
          assert mock_env_vars["COLLECTION_NAME"] not in result

test_lambda_function.py: Tests the AWS Lambda function locally with an actual PDF file stored in the S3 bucket.

You can upload a pdf file to your S3 bucket with the following command:

aws s3 cp your-file.pdf s3://your-bucket-name/

  import os
  from aws_lambda.lambda_function import lambda_handler

  # Set up test event
  TEST_BUCKET = os.getenv("PDF_BUCKET_NAME")
  TEST_FILE = "1706.03762v7.pdf"

  test_event = {
      "Records": [
          {
              "s3": {
                  "bucket": {"name": TEST_BUCKET},
                  "object": {"key": TEST_FILE},
              }
          }
      ]
  }

  def test_lambda_handler():
      """Test the lambda_handler function with an actual S3 file."""
      response = lambda_handler(test_event, None)

      assert response["statusCode"] == 200, f"Unexpected response: {response}"
      assert "Successfully processed" in response["body"]

To run the tests, you can use the following command:

uv run pytest

If everything is set up correctly, you should see the tests passing.

You will see an output similar to the following:

================================================================ test session starts 

tests/test_collection_exists.py::test_check_collection_existence PASSED
tests/test_collection_mock.py::test_drop_collection PASSED
tests/test_collection_mock.py::test_collection_existence[True]  result: ['test_collection']
PASSED
tests/test_collection_mock.py::test_collection_existence[False]  result: []
PASSED
tests/test_lambda_function.py::test_lambda_handler Received event: {"Records": [{"s3": {"bucket": {"name": "embeddings-8213c13740654398b076090eac96473e"}, "object": {"key": "1706.03762v7.pdf"}}}]}
Initializing Milvus client...
Initializing OpenAI embeddings...
Initializing text splitter...
Processing file 1706.03762v7.pdf from bucket embeddings-8213c13740654398b076090eac96473e
Downloading file to /tmp/1706.03762v7.pdf
Loading and splitting PDF...
Split PDF into 15 chunks
Generating embeddings and preparing data...
Inserting 15 records into collection pdf_embeddings
Processing completed successfully
PASSED

================================================================ 5 passed in 16.73s =================================================================

Quality Assurance with Ruff and MyPy

Ruff and MyPy are static analysis tools that help ensure your code is clean, efficient, and type-safe.

Ruff is a linter that checks for code style and syntax errors. MyPy is a static type checker that ensures your code is type-safe.

With the following commands, you can run Ruff and MyPy to check your code:

uv run ruff check . --fix --exit-non-zero-on-fix
uv run mypy

If everything is set up correctly, you should see no errors or warnings.

Implementing CI/CD with CircleCI

Continuous Integration and Continuous Deployment (CI/CD) are essential practices for automating the testing, building, and deployment of your applications. CircleCI provides a platform to automate your development workflows, including code testing, Docker image building, and deployment to AWS Lambda.

To configure your pipeline, you’ll need a .circleci/config.yml file in a .circleci directory at the root of your project. This configuration file defines your jobs, workflows, and execution steps for building, testing, and deploying your Lambda function.

version: 2.1

orbs:
  aws-cli: circleci/aws-cli@5.2.0
  docker: circleci/docker@2.8.2

jobs:
  build-deploy:
    docker:
      - image: cimg/python:3.12
    steps:
      - checkout

      - run:
          name: Install UV
          command: |
            curl -LsSf https://astral.sh/uv/install.sh | sh

      - run:
          name: Create venv and install dependencies
          command: |
            uv sync --all-extras

      - run:
          name: Run ruff
          command: |
            uv run ruff check . --fix --exit-non-zero-on-fix

      - run:
          name: Run MyPy
          command: |
            uv run mypy

      - run:
          name: Run tests
          command: |
            uv run pytest

      - run:
          name: Create .env file
          command: |
            echo "ZILLIZ_CLOUD_URI=${ZILLIZ_CLOUD_URI}" > .env
            echo "ZILLIZ_TOKEN=${ZILLIZ_TOKEN}" >> .env
            echo "COLLECTION_NAME=${COLLECTION_NAME}" >> .env
            echo "PDF_BUCKET_NAME=${PDF_BUCKET_NAME}" >> .env
            echo "OPENAI_API_KEY=${OPENAI_API_KEY}" >> .env
            echo "AWS_REGION=${AWS_REGION}" >> .env
            echo "AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}" >> .env
            echo "AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}" >> .env
            echo "AWS_ACCOUNT_ID=${AWS_ACCOUNT_ID}" >> .env
            echo "REPOSITORY_NAME=${REPOSITORY_NAME}" >> .env
            echo "IMAGE_NAME=${IMAGE_NAME}" >> .env
            echo "LAMBDA_FUNCTION_NAME=${LAMBDA_FUNCTION_NAME}" >> .env
            echo "ROLE_NAME=${ROLE_NAME}" >> .env
            echo "ROLE_POLICY_NAME=${ROLE_POLICY_NAME}" >> .env

      - aws-cli/setup:
          profile_name: default

      - setup_remote_docker

      - run:
          name: Deploy to AWS
          command: |
            chmod +x scripts/build_deploy.sh
            ./scripts/build_deploy.sh

workflows:
  version: 2
  deploy:
    jobs:
      - build-deploy

The file can be broken down into the following components:

Orbs:
- aws-cli: The AWS CLI orb simplifies the setup of AWS CLI to interact with AWS services.
- docker: The CircleCI Docker orb handles setting up the Docker environment.

Jobs:

build-deploy: This job is responsible for building and deploying the AWS Lambda function. It includes steps for checking out the code, installing dependencies, running tests, and deploying the function to AWS Lambda. As you need to execute multiple scripts in sequence, you can use a single bash script build_deploy.sh to do so and save it in the scripts directory.

#!/bin/bash

# Exit immediately if a command fails
set -e

# Define script paths
SCRIPT1="scripts/create_roles.sh"
SCRIPT2="scripts/create_image.sh"
SCRIPT3="scripts/create_lambda.sh"

# Ensure scripts are executable
chmod +x $SCRIPT1 $SCRIPT2 $SCRIPT3

# Run the scripts sequentially
echo "Running Script 1..."
$SCRIPT1

echo "Running Script 2..."
$SCRIPT2

echo "Running Script 3..."
$SCRIPT3

echo "All scripts executed successfully!"

Workflows:
- The deploy workflow triggers the build-deploy job when a push is made to the main branch.

Once you have committed the configuration file, push it to your GitHub repository, and visit the CircleCI dashboard to set up your project.

Select your repository and click Set Up Project:

Next, choose the appropriate branch to trigger the first pipeline. You can select the branch you want to use for your CI/CD pipeline. In this case, you can choose the main branch and click Set Up Project.

If this is your first time triggering a build on CircleCI for this project, note that the initial pipeline will fail.

This is expected behavior as environment variables are required for the pipeline to run successfully. CircleCI does not allow you to configure them until the project has been initialized by that first triggered build.

After the initial failure, open the Project Settings, go to the Environment Variables section, and add all the required environment variables.

Once the variables are saved, re-run the pipeline. It should now complete successfully and deploy your AWS Lambda function. From this point onward, CircleCI will retain your environment variables, and you won't need to configure then again unless you introduce new ones.

To confirm that the deployment works as expected, upload a PDF to the configured S3 bucket. The Lambda function should be automatically tiggered by the S3 event.

To upload a PDF to the S3 bucket, you can use the following command:

aws s3 cp your-file.pdf s3://your-bucket-name/

To monitor the logs of the AWS Lambda function, you can use the following command:

aws logs tail /aws/lambda/your-lambda-function --follow

If everything is set up correctly, you should see the logs of the AWS Lambda function and you can check of the PDF was processed correctly by checking your Zilliz Cloud collection.

Cleaning up

If you do not need the respurces anymore, make sure you delete them to avoid unnecessary charges.

Conclusion

In this blog, you have walked through building and automating a serverless PDF processing pipeline using AWS Lambda, Docker, and CircleCI. The automated process involves triggering AWS Lambda functions via S3 events, generating embeddings using OpenAI, and storing them in Milvus on Zilliz Cloud. The CI/CD pipeline powered by CircleCI ensures that the code is automatically tested, built into a Docker image, and deployed to AWS Lambda, streamlining the development and deployment process.

Using Docker with AWS Lambda provides a consistent environment for your AWS Lambda function, ensuring that dependencies and configurations are maintained across different stages. The CircleCI pipeline automates testing, building, and deployment, reducing manual intervention and enabling fast and reliable updates to the AWS Lambda function. These tools work together to ensure efficiency, scalability, and security.

Looking ahead, potential improvements could include enhancing the error handling and logging in the AWS Lambda function, adding more comprehensive testing coverage, saving the environment variables in AWS Secrets Manager, and introducing monitoring and alerting to track the performance of the AWS Lambda function. As the pipeline evolves, it can scale to handle more complex workflows and integrations, ensuring continued success and reliability in production.

End-to-end testing and deployment of a multi-agent AI system with Docker, LangGraph, and CircleCI

Benito Martin — Sat, 31 May 2025 09:38:13 +0000

Multi-agent AI systems are transforming how intelligent applications are built. By orchestrating multiple specialized agents that collaborate to solve complex tasks, these systems enable more dynamic and efficient workflows. However, deploying such a system reliably and at scale requires a structured approach to testing, packaging, and automation.

In this tutorial, you will walk through the process of building, testing, and deploying a multi-agent AI system using LangGraph, Docker, AWS Lambda, and CircleCI. You will develop a research-driven AI workflow where different agents,such as fact-checking, summarization, and search agents, work together seamlessly. You will package this application into a Docker container, deploy it to AWS Lambda, and automate the entire pipeline using CircleCI.

By the end of this guide, you will:

Understand how LangGraph enables stateful multi-agent interactions.
Learn how to containerize the application for scalable cloud deployment.
Set up end-to-end testing for agent reliability and AWS Lambda functionality.
Use CircleCI to automate testing and deployment with every Git push.

This tutorial assumes some familiarity with Python, AWS, and Docker. You can check out the complete source code on GitHub, but this guide will walk you through the process step by step.

Prerequisites

Before you begin, ensure that you have the following requirements in place:

AWS Account: Sign up for an AWS account if you do not already have one. You will use AWS Lambda and Elastic Container Registry (ECR) for deployment.
AWS CLI Installed and Configured: Install the AWS Command Line Interface (CLI) and configure it with your AWS credentials. You can follow the AWS CLI setup guide.
AWS Bedrock: You will be using AWS Bedrock Anthropic models. Therefore, you will need to request access to these models to be able to use them in your application. Specifically you will be using Claude 3 Haiku. Under AWS Bedrock, go to model access and request access to the model. Once you see "Access granted" in green, you can invoke the model.

Basic Knowledge of LangChain or LangGraph:Understanding the fundamentals of LangChain and LangGraph will help you design the multi-agent workflow efficiently.
Familiarity with AWS Lambda and Docker: You should know the basics of AWS Lambda and Docker, as you will use them to package and deploy the application.
GitHub and CircleCI Accounts: Create accounts on GitHub and CircleCI to manage the version control and automate the CI/CD pipeline.
OpenAI API Key: To access OpenAI’s GPT models, you will need an API key. You can sign up for an API key on the OpenAI website.
Serper API Key: To perform Google Search queries programmatically, obtain a free API key from Serper.
uv: Install uv to manage dependencies and virtual environments. Set up instructions can be found in the "Installing Dependencies" section below.

Once you have these prerequisites in place, you will be ready to set up the multi-agent project.

Setting Up a Multiagent Project

Before you start building the multi-agent system, you need to set up the project environment, install dependencies, and understand the role of LangGraph in managing multi-agent workflows.

Overview of LangGraph

LangGraph is a framework built on LangChain that enables structured multi-agent interactions. It allows you to define agent workflows, manage stateful decision-making, and create directed graphs where agents collaborate based on predefined logic.

Key features of LangGraph include:

Stateful Agent Orchestration: Agents can remember past interactions and modify behavior accordingly.
Graph-Based Workflow Execution: Define agent interactions using nodes and edges, making execution flow transparent.
Decision-Making and Conditional Branching: Implement custom logic that determines how agents respond to inputs dynamically.

In this project, you will use LangGraph to design an AI research workflow where agents handle tasks such as searching for information, summarizing content, fact-checking, and generating reports.

Setting Up the Environment

First, clone the repository containing the project code:

git clone https://github.com/benitomartin/multiagent-langgraph-circleci.git
cd multiagent-langgraph-circleci

Then, run the following commands to install dependencies and set up the virtual environment:

uv sync --all-extras
source .venv/bin/activate

These command will:

Install the dependencies defined in pyproject.toml.
Automatically create a virtual environment (.venv).
Activates the virtual environment.

Finally, creeate .env file in the root directory of your repository and add the required environment variables:

SERPER_API_KEY=your_serper_key_here                
OPENAI_API_KEY=your_openai_key_here
REPOSITORY_NAME=langgraph-ecr-docker-repo
LAMBDA_FUNCTION_NAME=langgraph-lambda-function     
ROLE_NAME=lambda-bedrock-role                      
ROLE_POLICY_NAME=LambdaBedrockPolicy  
IMAGE_NAME=langgraph-lambda-image                  
AWS_REGION=your_aws_region                          
AWS_ACCESS_KEY_ID=your_aws_access_key              
AWS_SECRET_ACCESS_KEY=your_aws_secret_key          
AWS_ACCOUNT_ID=your_aws_account_id

These variables will be used for API authentication, AWS service configurations, and deployment settings. During deployment, the AWS ECR repository, the AWS Lambda function, the IAM Role and Policy will be created automatically using a bash script.

Setting Up the Project Structure

The project has the following directory structure. If you cloned the repository, you should already have this structure in place. If you are setting up the project manually, create these directories and files as needed.

multiagent-langgraph-circleci/
├── .env                          
├── .gitignore                     
├── README.md                      
├── pyproject.toml                
├── uv.lock                     
├── build_deploy.sh                
├── Dockerfile                    
├── requirements.txt               
├── lambda_function/
│   ├── lambda_handler.py          
│   └── requirements.txt           
└── src/
    ├── agents/                    
    │   ├── __init__.py
    │   ├── fact_checking_agent.py
    │   ├── report_generation_agent.py
    │   ├── search_agent.py
    │   ├── stop_workflow_agent.py
    │   └── summarization_agent.py
    ├── graph/
    │   └── research_graph.py      
    ├── models/
    │   ├── __init__.py           
    │   └── schemas.py             
    └── utils/                      
        ├── chain_builder.py
        ├── error_handler.py
        └── prompt_templates.py

This structure ensures modularity and maintainability, allowing for seamless multi-agent interactions within the LangGraph framework.

Defining and Creating Agents

In this section, you will define the agents responsible for handling different tasks in the multi-agent workflow, and understand the role of Pydantic models in structuring data for inter-agent communication. The agents will be designed to interact with each other, share data, and perform specific operations, such as search, summarization, fact-checking, and report generation.

The logic of this multi-agentworkflow is shown in the following figure:

Schemas for Agent Communication

Before defining the agents themselves, you will need to set up Pydantic models to define the data structure for agent communication. These models will ensure that data exchanged between agents is validated and formatted correctly.

The schemas.py file contains the definitions of several models that will be used by the agents to manage the state and data during the process.

from typing import Any, Dict, List, TypedDict

from pydantic import BaseModel, Field


class ResearchState(TypedDict):
    query: str
    search_results: List[Dict[str, Any]]
    summarized_content: str
    fact_checked_results: Dict[str, Any]
    final_report: str
    errors: List[str]
    fact_check_attempts: int
    summarization_attempts: int
    max_results: int
    search_retries: int

class SearchResult(BaseModel):
    title: str = Field(description="The title of the search result")
    url: str = Field(description="The URL of the search result")
    snippet: str = Field(description="A brief excerpt or summary of the search result")

class Summary(BaseModel):
    main_points: str = Field(description="List of key points from the search results")
    benefits: str = Field(description="List of specific benefits of the search results")
    conclusion: str = Field(description="A concise conclusion about the search results")

class FactCheckResult(BaseModel):
    is_accurate: bool = Field(description="Whether the summary is factually accurate based on the search results")
    issues: List[str] = Field(description="List of inaccuracies or inconsistencies found in the summary")
    corrected_facts: List[str] = Field(description="List of corrections for any identified issues")
    confidence_score: float = Field(description="Confidence score from 0.0 to 1.0 indicating reliability of the fact check")

class FinalReport(BaseModel):
    report: str = Field(description="The final research report generated from the summary and fact-check results")

The ResearchState model tracks the overall state of the research workflow. It includes information such as the search query, results, summary, fact-check status, and errors. The other models (SearchResult, Summary, FactCheckResult, and FinalReport) define the data structures for the results that each agent will produce or consume during the workflow.

Prompt Templates for Agents

To guide the agents' actions, you will use prompt templates. These templates define the instructions that will be passed to the large language model (LLM) to guide its response.

In the prompt_templates.py file, you can define the prompts used by the summarization, fact-checking, and report generation agents:

class PromptTemplates:
    """Centralized class for all prompt templates used in the research workflow."""

    @staticmethod
    def summarization_prompt():
        return (
            "You are a summarization agent. Summarize the following search results:\n\n"
            "{results}\n\n"
            "Provide a structured summary of the key information about the benefits "
            "and main points.\n"
        )

    @staticmethod
    def fact_checking_prompt():
        return (
            "You are a fact-checking agent. Review the following summary and verify it "
            "against the original search results:\n\n"
            "Summary: {summary}\n\n"
            "Original results: {original_results}\n\n"
            "Identify any inaccuracies or inconsistencies. Provide a confidence score "
            "indicating how reliable your fact check is."
        )

    @staticmethod
    def report_generation_prompt():
        return (
            "You are a report generation agent. Create a comprehensive research report "
            "based on the following information:\n\n"
            "Original query: {query}\n\n"
            "Content summary: {summary}\n\n"
            "Format the report with markdown, including appropriate headings, bullet "
            "points, and sections. The report should be informative, well-structured, "
            "and directly address the original query.\n"
        )

Each prompt template corresponds to a specific agent task in the workflow. The summarization_prompt will guide the summarization agent to generate summaries, while the fact_checking_prompt will check the confidence of the summary and the report_generation_prompt will guide the report generation agent to create reports.

Building Chains with the Agents

The chain_builder.py file is responsible for linking the prompt templates with the LLMs that execute the tasks. The ChainBuilder class will combine the respective input variables, a prompt template and an LLM to execute the required steps.

from typing import List

from langchain.prompts import PromptTemplate
from pydantic import BaseModel


class ChainBuilder:
    def __init__(self, llm):
        self.llm = llm

    def build(self, prompt_template: str, input_vars: List[str], model: BaseModel):
        structured_llm = self.llm.with_structured_output(model)
        prompt = PromptTemplate(
            template=prompt_template,
            input_variables=input_vars
        )
        return prompt | structured_llm

The ChainBuilder helps in structuring the agents interaction with the LLM, ensuring that the output is structured according to the Pydantic model.

Error Handling

To ensure smooth execution of the agents, you can use an error-handling utility in the error_handler.py file to log and add errors as well:

from typing import Any, Dict

class ErrorHandler:
    @staticmethod
    def add_error(state: Dict[str, Any], message: str) -> Dict[str, Any]:
        errors = state.get("errors", [])
        errors.append(message)
        return {**state, "errors": errors}

This utility is especially useful for tracking issues that may arise during agent execution and can be used to update the state with error messages.

Creating the Search Agent

The SearchAgent is responsible for querying Google and retrieving search results. It uses the GoogleSerperAPIWrapper to interface with the Google Serper API. The following code defines the agent behavior:

from langchain_community.utilities import GoogleSerperAPIWrapper
from src.models.schemas import SearchResult  

class SearchAgent:
    def __init__(self, serper_api_key: str):
        self.search = GoogleSerperAPIWrapper(serper_api_key=serper_api_key, k=3)

    def execute(self, state: dict, k: int = 3) -> dict:
        query = state.get("query")
        max_results = state.get("max_results", k)

        if not query:
            return {**state, "errors": ["Search agent error: No query provided"]}

        try:
            self.search.k = max_results
            raw_results = self.search.results(query=query)

            # Convert raw search results to instances of the SearchResult model
            results = [
                SearchResult(
                    title=r.get("title", ""),
                    url=r.get("link", ""),
                    snippet=r.get("snippet", "")
                )
                for r in raw_results.get("organic", [])
            ]

            print(f"Search agent found {len(results)} results with max results equal to {max_results}")
            print(f"Search Results: {[result.model_dump() for result in results]}")  

            # Return the list of SearchResult objects
            return {**state, "search_results": [result.model_dump() for result in results]}  
        except Exception as e:
            return {**state, "errors": [f"Search agent error: {str(e)}"]}

This agent is tasked with performing searches based on the given query, collecting the results, and then formatting them as SearchResult instances.

Creating the Summarize Agent

The SummarizationAgent is responsible for summarizing the search results retrieved by the SearchAgent. It uses the language model to summarize key points, benefits, and conclusions based on the search results provided.

from src.models.schemas import ResearchState, Summary
from src.utils.chain_builder import ChainBuilder
from src.utils.prompt_templates import PromptTemplates


class SummarizationAgent:
    def __init__(self, llm):
        self.chain_builder = ChainBuilder(llm)

    def execute(self, state: ResearchState) -> ResearchState:
        search_results = state.get("search_results", [])

        print(f"Summarization agent executing with {len(search_results)} search results")

        # Increment the summarization attempt counter
        summarization_attempts = state.get("summarization_attempts", 0)
        state["summarization_attempts"] = summarization_attempts + 1

        if not search_results:
            errors = state.get("errors", [])
            errors.append("Summarization agent error: No search results to summarize")
            return {**state, "errors": errors}

        summary_chain = self.chain_builder.build(
            prompt_template=PromptTemplates.summarization_prompt(),
            input_vars=["results"],
            model=Summary
        )
        try:
            # Format the search results as a string for the prompt
            formatted_results = "\n\n".join([
                f"Title: {result['title']}\nURL: {result['url']}\nSnippet: {result['snippet']}"
                for result in search_results
            ])

            # Invoke the chain with the formatted results
            summary_obj = summary_chain.invoke({"results": formatted_results})

            # Format the summary object into a string
            summary_str = "# Summary\n\n"
            summary_str += f"\n\n## Key Points\n{summary_obj.main_points}\n"
            summary_str += f"\n\n## Benefits\n{summary_obj.benefits}\n"
            summary_str += f"\n\n## Conclusion\n{summary_obj.conclusion}\n"

            return {**state, "summarized_content": summary_str}

        except Exception as e:
            errors = state.get("errors", [])
            errors.append(f"Summarization agent error: {str(e)}")
            return {**state, "errors": errors}

The SummarizationAgent uses the ChainBuilder to create a chain from the summarization_prompt template and the LLM. It then formats the search results of the SearchAgentinto a string and passes them to the language model for summarization. The result is structured into key points, benefits, and conclusions.

Note that the summarization attempts counter is incremented each time the agent is executed. This happen if in the next agent, the FactCheckingAgent, the confidence score from the fact-checking process is below a certain threshold.

Creating the Fact-Checking Agent

The FactCheckingAgent is designed to validate the accuracy of a generated summary by comparing it with the original search results. It uses thefact_checking_prompt template to cross-reference the summary with the search results. If the fact-checking process determines that the confidence score is low, it triggers additional search retries with an increased number of results.

import json

from src.models.schemas import FactCheckResult, ResearchState
from src.utils.chain_builder import ChainBuilder
from src.utils.error_handler import ErrorHandler
from src.utils.prompt_templates import PromptTemplates


class FactCheckingAgent:
    def __init__(self, llm, confidence_threshold: float, max_retries: int, add_max_results: int):
        self.chain_builder = ChainBuilder(llm)
        self.confidence_threshold = confidence_threshold  # Store the passed confidence threshold
        self.max_retries = max_retries                    # Store the max_retries value
        self.add_max_results = add_max_results            # Store the add_max_results value


    def execute(self, state: ResearchState) -> ResearchState:
        summary = state.get("summarized_content")
        search_results = state.get("search_results", [])

        # Increment fact-checking attempt counter
        fact_check_attempts = state.get("fact_check_attempts", 0)
        state["fact_check_attempts"] = fact_check_attempts + 1

        if not summary or not search_results:
            return {**state, "errors": ["Fact-checking agent error: Missing data"]}

        fact_check_chain = self.chain_builder.build(
            prompt_template=PromptTemplates.fact_checking_prompt(),
            input_vars=["summary", "original_results"],
            model=FactCheckResult
        )
        try:
            results_text = json.dumps(search_results, indent=2)
            fact_check_results = fact_check_chain.invoke({
                "summary": summary,
                "original_results": results_text,
            })

            print(f"Fact-checking agent completed review. Accurate: {fact_check_results.is_accurate}")
            print(f"Confidence score: {fact_check_results.confidence_score}")

            fact_check_results = fact_check_results.model_dump()

            confidence_score = fact_check_results.get("confidence_score", 1.0)
            retry_count = state.get("search_retries", 0)
            max_results = state.get("max_results", 3)
            print("Retry Count:", retry_count)
            print("Max Results:", max_results)

            if confidence_score < self.confidence_threshold:
                if retry_count < self.max_retries:
                    state["search_retries"] = retry_count + 1

                    # Only increase max_results if we're NOT about to hit the retry cap
                    if state["search_retries"] < self.max_retries:
                        print(f"Retrying search number {state['search_retries']}")

                        state["max_results"] = max_results + self.add_max_results
                        print(f"Increasing max_results to: {state['max_results']}")

            state["fact_checked_results"] = fact_check_results
            return state

        except Exception as e:
            return ErrorHandler.add_error(state, f"Fact-checking agent error: {str(e)}")

The FactCheckingAgentuses the ChainBuilder to construct the chain with the FactCheckResult prompt. It checks the accuracy of the summary from the SummarizationAgent by comparing it with the search results from the SearchAgent.

Creating the Generate Report Agent

The ReportGenerationAgent is responsible for generating the final research report based on the query and summary, only after it passes the fact checking process.

from src.models.schemas import FinalReport, ResearchState
from src.utils.chain_builder import ChainBuilder
from src.utils.error_handler import ErrorHandler
from src.utils.prompt_templates import PromptTemplates


class ReportGenerationAgent:
    def __init__(self, llm):
        self.chain_builder = ChainBuilder(llm)

    def execute(self, state: ResearchState) -> ResearchState:
        summary = state.get("summarized_content")
        query = state.get("query")

        if not summary or not query:
            return {**state, "errors": ["Report generation agent error: Missing required content"]}

        chain = self.chain_builder.build(
            prompt_template=PromptTemplates.report_generation_prompt(),
            input_vars=["query", "summary"],
            model=FinalReport
        )

        try:
            final_report = chain.invoke({"query": query, "summary": summary})
            final_report = final_report.model_dump()

            # Print the final report
            print("\n======= FINAL REPORT =======\n")
            print(final_report["report"])
            print("\n============================\n")

            return {**state, "final_report": final_report["report"]}
        except Exception as e:
            return ErrorHandler.add_error(state, f"Report generation agent error: {str(e)}")

The ReportGenerationAgent generates the final research report using the ChainBuilder and report_generation_prompt, combining the search query and the summary.

Creating the Stop Workflow Agent

The StopWorkflowAgent is responsible for stopping the workflow if there is any errors and displaying the final confidence score in case it has been calculated.

from src.models.schemas import ResearchState


class StopWorkflowAgent:
    def execute(self, state: ResearchState) -> ResearchState:
        """Stops the workflow and displays the final confidence score"""
        confidence_score =state.get("fact_checked_results", {}).get("confidence_score", "N/A")

        # Add the message to the errors list
        errors = state.get("errors", [])

        return {**state, "errors": errors, "confidence_score": confidence_score}

Building the Agentic Graph

In the previous sections, you have defined individual agents and their respective roles. Now, you need to organize them into an Agentic Graph that manages the flow of the entire multi-agent system. This graph allows the agents to interact in a sequence based on their results, making the workflow adaptive and capable of handling real-world scenarios such as retries and failure conditions.

Here is a step-by-step explanation of each part of the Agentic Graph construction, under the file research_graph.py and how it integrates with the agents.

Configuration Settings

The configuration settings, under settings.py load essential environment variables, such as API keys for different services, such as the Serper API, OpenAI and AWS variables. It also sets the LLM models for the Summarization and Fact-Checking agents, as well as the confidence threshold, number of search results and maximum retries for the Fact-Checking agent. You can select the models that best suit your needs but for this example you will use anthropic.claude-3-haiku-20240307-v1:0 from AWS Bedrock for the summarization agent and gpt-4o-mini for the fact-checking agent. This last one will act as LLM-as-a-judge, to be sure that the summary is accurate.

import os

from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# API Keys
SERPER_API_KEY = os.getenv("SERPER_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_REGION = os.getenv("AWS_REGION")

# Model Settings
FACT_CHECK_MODEL = "gpt-4o-mini"
SUMMARIZATION_MODEL = "anthropic.claude-3-haiku-20240307-v1:0"

# Workflow Settings
CONFIDENCE_THRESHOLD = 0.95
MAX_RETRIES = 1
ADD_MAX_RESULTS = 2

# Validate required environment variables
if not SERPER_API_KEY:
    raise ValueError("SERPER_API_KEY environment variable is not set")
if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY environment variable is not set")

Importing Required Libraries and Modules

To construct and run the agentic graph, you need to import several modules together with the defined agents:

ChatBedrock and ChatOpenAI: For working with AWS and OpenAI-based models, respectively.
StateGraph: To build and manage the workflow.
ResearchState: This schema keeps track of the research flow and stores state information.

import argparse
from langchain_aws import ChatBedrock
from langchain_openai import ChatOpenAI
from langgraph.graph import END, StateGraph
from config.settings import (
    AWS_ACCESS_KEY_ID,
    AWS_REGION,
    AWS_SECRET_ACCESS_KEY,
    FACT_CHECK_MODEL,
    SUMMARIZATION_MODEL,
    CONFIDENCE_THRESHOLD,
    MAX_RETRIES,
    ADD_MAX_RESULTS,
    OPENAI_API_KEY,
    SERPER_API_KEY,
)
from src.agents.fact_checking_agent import FactCheckingAgent
from src.agents.report_generation_agent import ReportGenerationAgent
from src.agents.search_agent import SearchAgent
from src.agents.stop_workflow_agent import StopWorkflowAgent
from src.agents.summarization_agent import SummarizationAgent
from src.models.schemas import ResearchState

Initializing Agents

Each agent is initialized based on the configuration and APIs available. This step ensures that each agent has the required models and settings to operate.

def build_research_graph(serper_api_key: str = SERPER_API_KEY, 
                         openai_api_key: str = OPENAI_API_KEY,
                         confidence_threshold: float = CONFIDENCE_THRESHOLD,
                         max_retries: int = MAX_RETRIES,
                         add_max_results: int = ADD_MAX_RESULTS):

    # Initialize different models for Summarization and Fact-Checking agents
    fact_check_llm = ChatOpenAI(model=FACT_CHECK_MODEL, api_key=openai_api_key)
    summarization_llm = ChatBedrock(
        model_id=SUMMARIZATION_MODEL,
        model_kwargs=dict(temperature=0),
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
        region_name=AWS_REGION
    )

    # Initialize agents
    search_agent = SearchAgent(serper_api_key)
    summarization_agent = SummarizationAgent(summarization_llm)
    fact_checking_agent = FactCheckingAgent(fact_check_llm, confidence_threshold, max_retries, add_max_results)
    report_generation_agent = ReportGenerationAgent(summarization_llm)
    stop_workflow_agent = StopWorkflowAgent()

Here, each agent is initialized with the corresponding models and configurations:

SearchAgent: Handles the search task using the Serper API.
SummarizationAgent: Summarizes results using AWS Bedrock.
FactCheckingAgent: Fact-checks results using OpenAI.
ReportGenerationAgent: Generates the final research report.
StopWorkflowAgent: Ends the workflow if there is any error.

Defining the Graph

After the agents are initialized, you define the StateGraph that represents the flow of the research process. You set the SearchAgent as the entry point of the graph.

StateGraph: This is the heart of the agentic graph, defining how the agents interact and progress.
ResearchState: This schema holds all the data and state required by the system.

    # Define graph state
    builder = StateGraph(ResearchState)

    # Set entry point
    builder.set_entry_point("Search")

Adding Nodes

Next, you add each agent as a node in the state graph:

    # Add nodes
    builder.add_node("Search", search_agent.execute)
    builder.add_node("Summarize", summarization_agent.execute)
    builder.add_node("Fact Check", fact_checking_agent.execute)
    builder.add_node("Report", report_generation_agent.execute)
    builder.add_node("Stop Workflow", stop_workflow_agent.execute)

Each agent is associated with its corresponding execute method, which will be invoked during the workflow. These nodes represent the tasks or steps in the research process.

Adding Conditional Edges

Conditional edges dictate the flow from one agent to the next. Each function checks the current state and returns the next agent based on conditions. For instance:

If there are errors at any point, the workflow stops.
If the confidence score from the fact-checking is low, it retries the search with more search results,up to a maximum number of retries.

After these conditions are defined, you add the conditional edges between agents and compile the graph:


    def on_search_complete(state: ResearchState) -> str:
        return "Stop Workflow" if state.get("errors") else "Summarize"

    def on_summarization_complete(state: ResearchState) -> str:
        return "Stop Workflow" if state.get("errors") else "Fact Check"

    def on_fact_check_complete(state: ResearchState) -> str:
        fact_check_result = state.get("fact_checked_results", {})
        confidence_score = fact_check_result.get("confidence_score", 1.0)
        count = state.get("search_retries", 0)

        if state.get("errors"):
            return "Stop Workflow"

        if confidence_score < confidence_threshold:
            if count >= max_retries:
                print(f"Maximum retry attempts ({max_retries}) reached. Stopping workflow.")
                return "Stop Workflow"
            return "Search"  # Go back to search if retries are available

        return "Report"  # Proceed to report generation

    def on_report_complete(state: ResearchState) -> str:
        return "Stop Workflow" if state.get("errors") else END

    def on_stop_workflow(_state: ResearchState) -> str:
        return END

    builder.add_conditional_edges("Search", on_search_complete, {
        "Stop Workflow": "Stop Workflow",
        "Summarize": "Summarize",
    })

    builder.add_conditional_edges("Summarize", on_summarization_complete, {
        "Stop Workflow": "Stop Workflow",
        "Fact Check": "Fact Check",
    })

    builder.add_conditional_edges("Fact Check", on_fact_check_complete, {
        "Stop Workflow": "Stop Workflow",
        "Report": "Report",
        "Search": "Search",
    })

    builder.add_conditional_edges("Report", on_report_complete, {
        "Stop Workflow": "Stop Workflow",
        END: END,
    })

    builder.add_conditional_edges("Stop Workflow", on_stop_workflow, {
        END: END
    })

    return builder.compile()

Running the Research Graph

Finally, the graph is executed by invoking it with the research query. You also need to parse command-line arguments to define the confidence threshold, maximum retries, and additional results per retry.

if __name__ == "__main__":

    parser = argparse.ArgumentParser(description='Run research graph with custom parameters')
    parser.add_argument('--query', type=str, default="What are the benefits of using AWS Cloud Services?",
                      help='Research query')
    parser.add_argument('--confidence-threshold', type=float, default=CONFIDENCE_THRESHOLD,
                      help='Confidence score threshold (0-1)')
    parser.add_argument('--max-retries', type=int, default=MAX_RETRIES,
                      help='Maximum number of retries')
    parser.add_argument('--add-max-results', type=int, default=ADD_MAX_RESULTS,
                      help='Number of additional results per retry')

    args = parser.parse_args()

    graph = build_research_graph(
        SERPER_API_KEY,
        OPENAI_API_KEY,
        confidence_threshold=args.confidence_threshold,
        max_retries=args.max_retries,
        add_max_results=args.add_max_results
    )

    result = graph.invoke({"query": args.query})

After you set up the graph you can invoke it with the following command:

uv run src/graph/research_graph.py \
   --query "What are the benefits of using CircleCI?" \
   --confidence-threshold 0.85 \
   --max-retries 3 \
   --add-max-results 2

This will output the final report based on the structure defined in the schemas.py file. If the confidence score is below the threshold, it will retry the search up to the maximum number of retries adding more results to the next search.

If the confidence score is below the threshold, it will retry the search up to the maximum number of retries adding more results to the next search and only display the final report if the confidence score is above the threshold.

Dockerizing the Application for AWS Lambda

To deploy your multi-agent research application on AWS Lambda, you need to package it in a way that ensures compatibility, portability, and ease of deployment. Using Docker simplifies this process by allowing you to create a containerized environment that includes all dependencies, ensuring your application runs seamlessly on AWS Lambda.

Creating the Lambda Function Handler

The Lambda handler function acts as the entry point for your application when invoked by AWS Lambda. This function processes incoming requests, executes the agentic graph, and returns results.

import json
import logging
import os

from config.settings import (
    ADD_MAX_RESULTS,
    CONFIDENCE_SCORE,
    MAX_RETRIES,
    OPENAI_API_KEY,
    SERPER_API_KEY,
)
from src.graph.research_graph import build_research_graph

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def lambda_handler(event, context):
    # Log environment variables (masked for security)
    logger.info("SERPER_API_KEY present: %s", bool(os.getenv("SERPER_API_KEY")))
    logger.info("OPENAI_API_KEY present: %s", bool(os.getenv("OPENAI_API_KEY")))
    logger.info("AWS_ACCESS_KEY_ID present: %s", bool(os.getenv("AWS_ACCESS_KEY_ID")))
    logger.info("AWS_SECRET_ACCESS_KEY present: %s", bool(os.getenv("AWS_SECRET_ACCESS_KEY")))
    logger.info("AWS_DEFAULT_REGION present: %s", bool(os.getenv("AWS_DEFAULT_REGION")))

    # Extract parameters from event with defaults
    query = event.get("query", "What are the benefits of using AWS Cloud Services?")
    confidence_score = event.get("confidence_score", CONFIDENCE_SCORE)
    max_retries = event.get("max_retries", MAX_RETRIES)
    add_max_results = event.get("add_max_results", ADD_MAX_RESULTS)

    # Validate parameters
    if not 0 <= confidence_score <= 1:
        return {
            "statusCode": 400,
            "body": json.dumps({"error": "Confidence score must be between 0 and 1"})
        }
    if max_retries < 0:
        return {
            "statusCode": 400,
            "body": json.dumps({"error": "Max retries must be non-negative"})
        }
    if add_max_results < 1:
        return {
            "statusCode": 400,
            "body": json.dumps({"error": "Additional max results must be positive"})
        }

    # Build the research graph with custom parameters
    graph = build_research_graph(
        SERPER_API_KEY,
        OPENAI_API_KEY,
        confidence_score=confidence_score,
        max_retries=max_retries,
        add_max_results=add_max_results
    )

    # Run the graph
    result = graph.invoke({
        "query": query,
        "search_results": [],
        "summarized_content": "",
        "fact_checked_results": {},
        "final_report": "",
        "errors": [],
        "fact_check_attempts": 0,
        "summarization_attempts": 0,
        "max_results": 3,
        "search_retries": 0
    })

    return {
        "statusCode": 200,
        "body": {
            "final_report": result.get("final_report", ""),
            "errors": result.get("errors", [])
        }
    }

Writing the Dockerfile

Your Dockerfile packages the application with all required dependencies, making it ready for deployment on AWS Lambda. Since AWS Lambda works best with a requirements.txt file, you need to transfer dependencies from pyproject.toml to requirements.txt. You can do this using the following command:

uv pip freeze --exclude-editable > requirements.txt

Now, create the Dockerfile:

FROM public.ecr.aws/lambda/python:3.12

# Set the working directory to /var/task
WORKDIR ${LAMBDA_TASK_ROOT}

# Copy requirements first to leverage Docker cache
COPY requirements.txt ./

# Install dependencies
RUN pip install -r requirements.txt

# Copy source code and config
COPY lambda_function/lambda_handler.py ./lambda_handler.py
COPY src ./src
COPY config ./config
COPY .env ./.env

# Command to run the Lambda handler function
CMD [ "lambda_handler.lambda_handler" ]

Testing the Docker Image Locally

Before deploying your containerized Lambda function to AWS, you should test it locally to ensure it behaves as expected.

Run the following command to build the image:

docker build -t langgraph-lambda-function .

Use Docker to run the container and expose it on port 9000. Set up AWS credentials as environment variables to simulate execution in AWS Lambda:

docker run -p 9000:8080 \
    -e AWS_ACCESS_KEY_ID=<YOUR_AWS_ACCESS_KEY_ID> \
    -e AWS_SECRET_ACCESS_KEY=<YOUR_AWS_SECRET_ACCESS_KEY> \
    -e AWS_DEFAULT_REGION=<YOUR_AWS_DEFAULT_REGION> \
    langgraph-lambda-function

Replace YOUR_AWS_ACCESS_KEY_ID, YOUR_AWS_SECRET_ACCESS_KEY, and YOUR_AWS_DEFAULT_REGION with your AWS credentials before running the above command.

Now, use curl to send a test event to the running container:

curl -X POST "http://localhost:9000/2015-03-31/functions/function/invocations" \
-d '{"query": "What are the benefits of using AWS Cloud Services?", "confidence_threshold": 0.9, "max_retries": 2, "add_max_results": 3}'

If everything works as expected, you should see the final report as in the previous section where you run the research graph locally. Now you are ready to deploy your application to AWS Lambda.

Writing Tests for the Application

Testing is essential for validating the correctness of your multi-agent workflow and ensuring your AWS Lambda function behaves as expected. To be sure the application works correctly, you can write unit tests for the Lambda handler function under test_lambda_handler.py.

from lambda_function.lambda_handler import lambda_handler


def test_lambda_handler():
    # Create a mock event to simulate an AWS Lambda invocation
    event = {
        "query": "What is the capital of France?"
    }

    # Call the lambda_handler function
    response = lambda_handler(event, None)

    # Print the response for testing
    print("\nResponse:\n\n", response["body"]["final_report"])


    # Assertions to validate the response
    assert response["statusCode"] == 200
    assert "final_report" in response["body"]
    assert "errors" in response["body"]
    assert isinstance(response["body"]["final_report"], str)
    assert isinstance(response["body"]["errors"], list)

Additionally you can also write some agents like the SearchAgent:

from src.agents.search_agent import SearchAgent
from config.settings import SERPER_API_KEY

def test_search_agent():
    """
    Tests the SearchAgent with a mock query.
    """

    # Initialize the agent
    agent = SearchAgent(SERPER_API_KEY)

    # Test search execution
    results = agent.execute({"query": "What is CircleCI?"})

    # Assertions
    assert isinstance(results, dict)
    assert "search_results" in results
    assert isinstance(results["search_results"], list)

By running the tests with pytest, you can ensure that your application functions correctly and that the multi-agent workflow is executed as expected.

uv run pytest

Deploying to AWS Lambda with CircleCI

Now it is time to automate the process of deploying your Dockerized application to AWS Lambda using CircleCI. CircleCI is a powerful CI/CD tool that can automate testing, building, and deploying your Lambda function. To simplify the process, you can use a shell script (build_deploy.sh) to handle the deployment steps.

Setting Up the Build and Deploy Script

The build_deploy.sh script automates the process of building, testing, and deploying your Dockerized Lambda function. It includes all the necessary steps to:

Build the Docker image.
Push it to Amazon Elastic Container Registry (ECR).
Deploy it to AWS Lambda with the required IAM permissions.

Note that you need to create the assume-role.json file in advance to execute the script. This file is required to define the IAM role assumption policy for Lambda to assume the role with permissions to access the required AWS resources.

{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "lambda.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }

To execute the script, you need to make it executable and then run it. You can find the script in the repository here:

chmod +x build_deploy.sh
./build_deploy.sh

Here is the breakdown of the script:

Exit on Errors (set -e): The script is configured to exit immediately if any command fails, ensuring that errors are caught early.
Loading Environment Variables: Loads the environment variables from a .env file. These variables contain sensitive information such as AWS keys, region, repository name, etc. The script uses these to configure AWS interactions.
ECR Repository Check & Creation: Check if the Amazon Elastic Container Registry (ECR) repository exist. If not, it creates a new repository where the Docker image will be pushed.
Generating requirements.txt: It generates the requirements file containing the dependencies of the Lambda function. This file is essential for ensuring the Lambda function is packaged correctly.
Docker Image Build: The Docker image for the Lambda function is built using the docker buildx build command. This creates a platform-specific image compatible with AWS Lambda.
ECR Authentication: The script logs into AWS ECR using aws ecr get-login-password. This is required to push the Docker image to the ECR repository.
Tagging Docker Image: The image is tagged with the repository URI to ensure it is associated with the correct repository in ECR.
Pushing Docker Image to ECR: The tagged Docker image is pushed to the ECR repository.
Creating IAM Role (Lambda Execution Role): The script checks if an IAM role for Lambda exists. If it does not, it creates a new role with the necessary permissions for Lambda to execute and access AWS resources like Bedrock.
Lambda Function Creation/Update: The script checks if the Lambda function already exists. If not, it creates a new Lambda function using the Docker image stored in ECR. If the function already exists, it updates the Lambda function with the new Docker image.
Completion: Once the deployment is finished, the script prints Deployment complete, indicating that the process was successful.

Testing the Deployment

To test the deployed Lambda function, you can invoke it with the following command and adding your AWS Region:

aws lambda invoke \
    --function-name langgraph-lambda-function \
    --payload '{"query": "What are the benefits of using AWS Cloud Services?"}' \
    --region <your_region> \
    --cli-binary-format raw-in-base64-out \
    response.json && \
    cat response.json | jq

The response will contain the Lambda's output, including the generated report in the following structured format.

CircleCI Configuration

To automate the deployment pipeline using CircleCI, you need to define a CircleCI config file (config.yml). This file will automate all the tasks like building the Docker image, running tests, and deploying the image to AWS.

Before CircleCI can deploy your serverless application to AWS, you need to configure your environment variables in the CircleCI project settings. In your CircleCI account, set up a project and link it to your GitHub repository. Then, under project settings, add the environment variables that you have defined in your .env file.

Here is the breakdown of the config.yml file:

Orbs:
- aws-cli: The AWS CLI orb simplifies the setup of AWS CLI to interact with AWS services.
- docker: The CircleCI Docker orb handles setting up the Docker environment.
Jobs:
- build-deploy: This job defines all the steps necessary to deploy your application, from installing all required dependencies, including uv, AWS CLI, setting up the environment variables, running tests, running docker, till building and deploying the lambda package with all the necessary permissions.
Workflows:
- The deploy workflow ensures that the build-deploy job is executed in sequence whenever changes are pushed to the repository.

version: 2.1

orbs:
  aws-cli: circleci/aws-cli@5.2.0
  docker: circleci/docker@2.8.2

jobs:
  build-deploy:
    docker:
      - image: cimg/python:3.12
    steps:
      - checkout  

      - run:
          name: Install UV
          command: |
            curl -LsSf https://astral.sh/uv/install.sh | sh

      - run:
          name: Create venv and install dependencies
          command: |
            uv sync --all-extras

      - run:
          name: Run ruff
          command: |
            uv run ruff check --fix --unsafe-fixes .

      - run:
          name: Run tests
          command: |
            uv run pytest

      - run:
          name: Create .env file
          command: |
            echo "SERPER_API_KEY=${SERPER_API_KEY}" > .env
            echo "OPENAI_API_KEY=${OPENAI_API_KEY}" >> .env
            echo "AWS_REGION=${AWS_REGION}" >> .env
            echo "AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}" >> .env
            echo "AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}" >> .env
            echo "AWS_ACCOUNT_ID=${AWS_ACCOUNT_ID}" >> .env
            echo "REPOSITORY_NAME=${REPOSITORY_NAME}" >> .env
            echo "IMAGE_NAME=${IMAGE_NAME}" >> .env
            echo "LAMBDA_FUNCTION_NAME=${LAMBDA_FUNCTION_NAME}" >> .env
            echo "ROLE_NAME=${ROLE_NAME}" >> .env
            echo "ROLE_POLICY_NAME=${ROLE_POLICY_NAME}" >> .env

      - setup_remote_docker


      - aws-cli/setup:
          profile_name: default

      - run:
          name: Deploy to AWS
          command: |
            chmod +x build_deploy.sh
            ./build_deploy.sh

workflows:
  version: 2
  deploy:
    jobs:
      - build-deploy

Once the config file and the environment variables are set up, you can commit and push your changes to GitHub. CircleCI will automatically trigger the deployment pipeline.

Now you can invoke the deployed Lamba function again, adding for example the additional parameters, like confidence-threshold, max-retries, and add-max-results:

aws lambda invoke \
    --function-name langgraph-lambda-function \
    --payload '{"query": "What are the benefits of using CircleCI?", "confidence-threshold": 0.85, "max-retries": 3, "add-max-results": 2}' \
    --region <your_aws_region> \
    --cli-binary-format raw-in-base64-out \
    response.json && \
    cat response.json | jq

Cleaning up

If you do not need the app anymore, make sure you delete the resources to avoid unnecessary charges.

Conclusion

Automating AWS Lambda deployments using CircleCI streamlines the entire development pipeline, ensuring efficient, reliable, and repeatable deployments. By leveraging CircleCI’s Continuous Integration and Continuous Deployment (CI/CD) capabilities, you can integrate automated testing, dependency management, and security best practices into your workflows. This minimizes human errors and allows you to focus on writing and improving code rather than manually managing deployments.

Furthermore, incorporating AWS services like Elastic Container Registry (ECR) and Identity and Access Management (IAM) ensures secure and scalable deployments. The use of environment variables and automated role assignments enhances security while maintaining flexibility. By structuring the deployment pipeline in a modular and reusable way, this approach not only simplifies the deployment of Lambda functions but also ensures adaptability for future enhancements.

Furthermore, integrating LangGraph improves workflow orchestration by enabling structured, multi-agent interactions in AI-driven applications. This allows for better coordination between AI agents, improving decision-making and automation within the system, ultimately leading to more dynamic and intelligent application behavior.

With this automated pipeline in place, you can deploy updates confidently, knowing that every change undergoes thorough testing before reaching production. This methodology aligns with modern DevOps practices, fostering team collaboration, reducing downtime, and accelerating software delivery.

How to Safely Delete All AWS S3 Buckets Using a Bash Script

Benito Martin — Mon, 31 Mar 2025 12:22:23 +0000

Have you ever faced the situation where you realized you have too many S3 buckets in your account and you want to delete all? But AWS will not allow you to delete all at once in the console, especially if there is still content in them.

The following Bash script automates and speeds up this process while ensuring that both object versions and delete markers are properly removed before bucket deletion. In this blog post, I will break down how this script works and the precautions it takes.

The Purpose of the Script

This script performs the following steps:

1. Lists all S3 buckets in your AWS account.
2. Asks for user confirmation before proceeding with deletion.
3. Iterates over each bucket and deletes all object versions and delete markers.
Deletes the empty bucket from S3.

Let's go through the script step by step.

Script Breakdown

The script starts with the shebang (#!/bin/bash), which ensures it runs in a Bash shell.

#!/bin/bash

Step 1: Listing All S3 Buckets

This command lists all S3 buckets using aws s3 ls and extracts only the bucket names using awk '{print $3}'.

buckets=$(aws s3 ls | awk '{print $3}')

if [ -z "$buckets" ]; then
    echo "No S3 buckets found."
    exit 0
fi

If no buckets exist, the script exits gracefully.

Step 2: User Confirmation

The script prompts the user for confirmation to prevent accidental deletions. If the user does not type yes, the operation is aborted.

echo "WARNING: This will permanently delete all S3 buckets and their contents."
echo "Do you want to proceed? (yes/no)"
read confirmation

if [ "$confirmation" != "yes" ]; then
    echo "Operation canceled."
    exit 1
fi

Step 3: Deleting Object Versions

Each bucket is processed in a loop:

for bucket in $buckets; do
    echo "Deleting all object versions from bucket: $bucket"

Object versions are retrieved using:

versions=$(aws s3api list-object-versions --bucket "$bucket" --query 'Versions[*].[Key, VersionId]' --output text 2>/dev/null)

If object versions exist, they are deleted in a loop:

echo "$versions" | while read -r key versionId; do
    if [ -n "$key" ] && [ -n "$versionId" ]; then
        echo "Deleting: $key (version: $versionId)"
        aws s3api delete-object --bucket "$bucket" --key "$key" --version-id "$versionId"
    fi
  done

Step 4: Deleting Delete Markers

Similarly, delete markers (for versioned buckets) are retrieved and removed:

markers=$(aws s3api list-object-versions --bucket "$bucket" --query 'DeleteMarkers[*].[Key, VersionId]' --output text 2>/dev/null)

echo "$markers" | while read -r key versionId; do
    if [ -n "$key" ] && [ -n "$versionId" ]; then
        echo "Deleting marker: $key (version: $versionId)"
        aws s3api delete-object --bucket "$bucket" --key "$key" --version-id "$versionId"
    fi
  done

Step 5: Deleting the Bucket

After clearing all contents, the bucket itself is removed:

echo "Deleting bucket: $bucket"
aws s3 rb s3://$bucket --force

This command forcefully removes the bucket from S3.

Final Step: Script Completion

Once all buckets are deleted, a success message is displayed.

done
echo "All S3 buckets deleted successfully."

Conclusion

This script provides a safe and efficient way to delete all S3 buckets while ensuring that all object versions and delete markers are removed. The user confirmation step helps prevent accidental deletions, making it a useful tool for cloud administrators. Additionally, the script can be further customized if you just want to delete specific buckets.

If you plan to use this script, ensure you have the appropriate AWS permissions and double-check before running it to avoid unintended data loss!

The script repository can be found here.

Build and Deploy a Serverless API with AWS SAM, Lambda, Gateway and Docker

Benito Martin — Wed, 19 Mar 2025 21:17:00 +0000

In this tutorial, you will create a serverless API that generates a random number between two values using:

✅ AWS Lambda
✅ AWS SAM (Serverless Application Model)
✅ API Gateway
✅ Docker (for local testing)

This tutorial is beginner friendly and only requires basic knowledge of AWS Services.

You can find the whole code repository here

🧱 Project Overview

The app will expose a single POST endpoint:

/generate-random

You send a JSON like:

{ "min": 1, "max": 100 }

And the API will return a random number between min and max:

{ "random_number": 42 }

📦 Prerequisites

To get started, make sure you have:

Python installed
AWS CLI (configured with credentials)
AWS SAM CLI
Docker running locally

📁 Project Structure

Here is the directory structure of the project:

├── lambda/
│   ├── lambda_function.py
├── .gitignore
├── .python-version
├── LICENSE
├── README.md
├── pyproject.toml
├── template.yaml
└── uv.lock

🧑‍💻 Setup and Installation

Create the Repository

First, create a new repository (sam-lambda-aws-api) in your GitHub account and clone it locally.

git clone https://github.com/yourusername/sam-lambda-aws-api.git
cd sam-lambda-aws-api

Write the Lambda Function

Create a new folder named lambda in the root of your project. Inside this folder, create the lambda_function.py file. This file will contain the Python code for the Lambda function that handles generating the random number:

import json
import random

def lambda_handler(event, context):
    try:
        body = json.loads(event.get("body", "{}"))
        min_val = int(body.get("min", 1))
        max_val = int(body.get("max", 100))
        random_number = random.randint(min_val, max_val)

        return {
            "statusCode": 200,
            "body": json.dumps({"random_number": random_number})
        }

    except Exception as e:
        return {
            "statusCode": 400,
            "body": json.dumps({"error": str(e)})
        }

Since you are using Python’s built-in json and random modules, there are no additional dependencies for this simple example. For more complex project, you can add external dependencies to a requirements.txt file and save this file in the lambda folder to package the AWS Lambda function with the required dependencies.

Define Your SAM Template

Create the template.yaml file. This is the SAM template that defines the Lambda function, API Gateway, and other AWS resources needed for the app:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  Random Number Generator API

Resources:
  RandomNumberGeneratorFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: lambda_function.lambda_handler
      Runtime: python3.11
      CodeUri: lambda/
      Events:
        Api:
          Type: Api
          Properties:
            Path: /generate-random
            Method: post
            RestApiId: !Ref ApiGateway

  ApiGateway:
    Type: AWS::Serverless::Api
    Properties:
      StageName: dev
      EndpointConfiguration: REGIONAL
      Cors:
        AllowMethods: "'*'" # Used methods: "'POST,OPTIONS'"
        AllowHeaders: "'Content-Type,Authorization'"
        AllowOrigin: "'*'"

Outputs:
  ApiEndpoint:
    Value: !Sub https://${ApiGateway}.execute-api.${AWS::Region}.amazonaws.com/dev/generate-random
    Description: API Gateway endpoint URL for the Random Number Generator

Here is what the components do:

RandomNumberGeneratorFunction: This resource represents your Lambda function. It uses the lambda_function.lambda_handler as the entry point.
ApiGateway: The API Gateway is defined to expose the Lambda function as a REST API at the /generate-random path using the POST method.
Cors: The CORS configuration allows requests from any origin ('*'), with any method, and with the specified headers. This is useful for development, but in production, you should restrict these to only necessary values.
Outputs: This section outputs the URL of the deployed API, which you will use to make requests.

Build and Validate the SAM Application

Now, you need to build the application and validate the template.

Run the following commands:

sam build
sam validate

This will create a .aws-sam directory with the built artifacts, ensuring your project is correctly packaged.

🧪 Local Testing with Docker

You can test the Lambda function locally using Docker (make sure it is running). SAM CLI can emulate AWS Lambda functions on your local machine.

Invoke Lambda Locally

The following command will create a local AWS Lambda Docker image, invoke the function and return the response.

sam local invoke RandomNumberGeneratorFunction

Response:

{
    "random_number": 34
}

Start the API Locally

Alternatively, you can start the API locally and send requests to it. Here also Docker must be running and again a local AWS Lambda image will be created.

sam local start-api

This will spin up a local API Gateway on your machine. The API will be available at http://127.0.0.1:3000.

Now, you can test the API by sending a POST request using curl:

curl -X POST http://127.0.0.1:3000/generate-random \
-H "Content-Type: application/json" \
-d '{"min": 1, "max": 100}'

Response:

{
    "random_number": 5
}

Testing Error Handling

Let's also test how our API handles invalid inputs:

Test with min > max

curl -X POST http://127.0.0.1:3000/generate-random \
-H "Content-Type: application/json" \
-d '{"min": 100, "max": 1}'

Response:

{
    "error": "empty range for randrange() (100, 2, -98)"
}

Test with non-integer values

curl -X POST http://127.0.0.1:3000/generate-random \
-H "Content-Type: application/json" \
-d '{"min": "abc", "max": 100}'

Response:

{
    "error": "invalid literal for int() with base 10: 'abc'"
}

🚀 Deploying to AWS

Once you have tested the app locally, it is time to deploy it to AWS.

Deploy Using SAM

To deploy the app, run the following command:

sam deploy --guided

SAM CLI will prompt you to configure the deployment:

SAM will package, deploy, and set up the necessary resources. After the deployment is complete, you will receive an API Gateway URL, which also contains the StageName dev:

Test the Deployed API

Once deployed, you can test the API by sending a request to the deployed endpoint:

curl -X POST https://hi8mgf8ssl.execute-api.eu-central-1.amazonaws.com/dev/generate-random \
  -H "Content-Type: application/json" \
  -d '{"min": 1, "max": 100}'

Response:

{"random_number": 84}

🧹 Clean Up

To avoid incurring unnecessary costs, remember to delete the resources after you are done:

sam delete

This will remove the stack and all associated resources from AWS.

🎯 Conclusion

In this tutorial, you have successfully built a serverless random number generator API using AWS Lambda, AWS SAM, and API Gateway. This serverless architecture offers several key advantages:

Cost-effectiveness: You only pay for the compute time you consume, with no charges when your code is not running.
Scalability: The application automatically scales based on demand.
Reduced operational overhead: No servers to manage means less time spent on infrastructure maintenance.
Fast deployment: The AWS SAM CLI makes it easy to develop, test, and deploy serverless applications.

While this example is simple, it demonstrates the fundamental concepts of serverless architecture that can be applied to more complex applications. The same patterns you used here, defining resources in SAM templates, implementing Lambda functions, and exposing them through API Gateway, can be expanded to build sophisticated serverless applications.

Happy building your serverless applications! 🚀

Deep Dive into uv Dockerfiles by Astral: Image Size, Performance & Best Practices

Benito Martin — Tue, 18 Mar 2025 12:25:22 +0000

Dockerizing Python applications is still a complex task, despite all the tools we have at our disposal. Dependency management, environment consistency, image size, and build speed can all become pain points, especially as projects grow.

Enter uv by Astral: a modern tool that aims to simplify and speed up Python dependency resolution. It introduces a fresh approach to Docker workflows, but it comes with configuration choices that may be unclear if you’re not familiar with how they affect image structure and behavior.

The uv GitHub repository provides three example Dockerfiles:

Single-Stage: Single-stage build that keeps uv in the final image
Standalone: Multi-stage build using uv-managed Python, removes uv in the final stage
Multi-Stage: Multi-stage build using system Python, removes uv in the final stage

Each approach has trade-offs, some are faster, some leaner, and some easier to understand. In this post, we’ll compare all three, break down their Dockerfile internals, analyze image size and history, and help you decide which is best for production-grade images.

What Are These Dockerfiles?

Before we dive into pros, cons, and differences, let’s quickly describe each Dockerfile.

Single-Stage Dockerfile

An example of using a single stage with uv pre-installed.

# Use a Python image with uv pre-installed
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim

# Install the project into `/app`
WORKDIR /app

# Enable bytecode compilation
ENV UV_COMPILE_BYTECODE=1

# Copy from the cache instead of linking since it's a mounted volume
ENV UV_LINK_MODE=copy

# Install the project's dependencies using the lockfile and settings
RUN --mount=type=cache,target=/root/.cache/uv \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
    uv sync --frozen --no-install-project --no-dev

# Then, add the rest of the project source code and install it
ADD . /app
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen --no-dev

# Place executables in the environment at the front of the path
ENV PATH="/app/.venv/bin:$PATH"

# Reset the entrypoint, don't invoke `uv`
ENTRYPOINT []

# Run the FastAPI application by default
CMD ["fastapi", "dev", "--host", "0.0.0.0", "src/uv_docker_example"]

Code Explained

FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim: This line specifies the base image. It pulls a pre-configured image that includes Python 3.12 and the uv tool. The bookworm-slim tag indicates it is a minimal Debian-based image (Bookworm) optimized for size (slim variant).
WORKDIR /app: Sets the working directory inside the container to /app. All subsequent commands will be executed relative to this directory.
ENV UV_COMPILE_BYTECODE=1: Sets an environment variable to enable Python bytecode compilation (creates .pyc files). This can speed up application startup times by avoiding re-compiling Python files each time the container starts.
ENV UV_LINK_MODE=copy: Configures the uv tool to copy files into the container instead of creating symbolic links (symlinks). This is useful when using mounted volumes, preventing potential issues in production due to symlinks.
RUN — mount=type=cache…: This command installs the dependencies for the project using the uv sync command. The --mount=type=cache optimizes caching by storing cached files in a location outside the container, and --mount=type=bind allows the mounting of the uv.lock and pyproject.toml files from the host, ensuring they are synchronized.
ADD . /app: This copies the project’s source code into the container at /app.
RUN uv sync — frozen — no-dev: This second RUN command installs the local project.
ENV PATH=”/app/.venv/bin:$PATH”: This adds the .venv directory, which contains the virtual environment’s executables, to the PATH environment variable. This ensures the container will use the right Python interpreter and dependencies.
ENTRYPOINT []: Resets the entrypoint to an empty array. This means no default executable is set, giving flexibility to specify the command in the CMD.
CMD starts the FastAPI development server.

Standalone Dockerfile

An example of using standalone Python builds with multi-stage images.

# First, build the application in the `/app` directory
FROM ghcr.io/astral-sh/uv:bookworm-slim AS builder
ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy

# Configure the Python directory so it is consistent
ENV UV_PYTHON_INSTALL_DIR /python

# Only use the managed Python version
ENV UV_PYTHON_PREFERENCE=only-managed

# Install Python before the project for caching
RUN uv python install 3.12

WORKDIR /app
RUN --mount=type=cache,target=/root/.cache/uv \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
    uv sync --frozen --no-install-project --no-dev

ADD . /app
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen --no-dev

# Then, use a final image without uv
FROM debian:bookworm-slim

# Copy the Python version
COPY --from=builder --chown=python:python /python /python

# Copy the application from the builder
COPY --from=builder --chown=app:app /app /app

# Place executables in the environment at the front of the path
ENV PATH="/app/.venv/bin:$PATH"

# Run the FastAPI application by default
CMD ["fastapi", "dev", "--host", "0.0.0.0", "/app/src/uv_docker_example"]

Code Explained

FROM ghcr.io/astral-sh/uv:bookworm-slim AS builder: This line specifies the first (builder) stage of the Dockerfile. It uses the same uv image as the base, but marks this stage as builder. This approach helps modularize the build process by separating build dependencies from runtime dependencies.
ENV UV_COMPILE_BYTECODE=1: These environment variables enable bytecode compilation and configure the uv tool to copy files, just as in the previous Dockerfile.
ENV UV_LINK_MODE=copy: Configures the uv tool to copy files into the container instead of creating symbolic links (symlinks). This is useful when using mounted volumes, preventing potential issues in production due to symlinks.
ENV UV_PYTHON_INSTALL_DIR /python: Specifies where Python should be installed within the container. This ensures Python is placed in a consistent directory during the build process.
ENV UV_PYTHON_PREFERENCE=only-managed: Instructs uv to use only the managed Python version, ensuring consistency between builds and avoiding issues with mismatched Python versions.
RUN uv python install 3.12: Installs Python 3.12 inside the builder image. This step is necessary because we need Python available for the application in the next stages.
WORKDIR /app: Similar to the previous Dockerfile, sets the working directory to /app for the subsequent commands.
RUN — mount=type=cache…: Installs dependencies using uv sync in a cached manner, like the previous Dockerfile.
ADD . /app: Adds source code to the container.
RUN uv sync — frozen — no-dev: This second RUN command installs the project, like in the previous image.
FROM debian:bookworm-slim: Switches to a much lighter image (debian:bookworm-slim), which is used as the base for the final runtime image. This reduces the size of the final container since it excludes build tools like uv, reducing the final image size.
COPY — from=builder /python /python: Copies the Python installation from the builder stage into the final image, ensuring that the Python installation is preserved but without the build tools
COPY — from=builder /app /app: Copies the application code from the builder stage to the runtime stage.
ENV PATH=”/app/.venv/bin:$PATH”: Adds the .venv directory to the PATH environment variable, ensuring the container uses the correct Python and dependencies.
CMD: Starts the FastAPI development server.

Multi-Stage Dockerfile

An example using multi-stage image builds to create a final image without uv.

# First, build the application in the `/app` directory.
# See `Dockerfile` for details.
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder
ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy

# Disable Python downloads, because we want to use the system interpreter
# across both images. If using a managed Python version, it needs to be
# copied from the build image into the final image; see `standalone.Dockerfile`
# for an example.
ENV UV_PYTHON_DOWNLOADS=0

WORKDIR /app
RUN --mount=type=cache,target=/root/.cache/uv \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
    uv sync --frozen --no-install-project --no-dev
ADD . /app
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen --no-dev


# Then, use a final image without uv
FROM python:3.12-slim-bookworm
# It is important to use the image that matches the builder, as the path to the
# Python executable must be the same, e.g., using `python:3.11-slim-bookworm`
# will fail.

# Copy the application from the builder
COPY --from=builder --chown=app:app /app /app

# Place executables in the environment at the front of the path
ENV PATH="/app/.venv/bin:$PATH"

# Run the FastAPI application by default
CMD ["fastapi", "dev", "--host", "0.0.0.0", "/app/src/uv_docker_example"]

Code Explained

FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder: Begins the first stage (builder) using the uv image with Python 3.12 pre-installed.
ENV UV_COMPILE_BYTECODE=1: These environment variables enable bytecode compilation and configure the uv tool to copy files, just as in the previous Dockerfile.
ENV UV_PYTHON_DOWNLOADS=0: Disables downloading Python from external sources, forcing the use of the pre-installed system Python.
WORKDIR /app: Sets the working directory inside the container to /app.
RUN — mount=type=cache…: Installs dependencies using uv sync in a cached manner, like the previous Dockerfile.
ADD . /app: Adds source code to the container.
RUN uv sync — frozen — no-dev: This second RUN command installs the project, like in the previous image.
FROM python:3.12-slim-bookworm: Switches to a slim Python 3.12 image for the final runtime image.
COPY — from=builder /app /app: Copies the application from the builder stage.
ENV PATH=”/app/.venv/bin:$PATH”: Adds the .venv directory to the PATH environment variable, ensuring the container uses the correct Python and dependencies.
CMD: Starts the FastAPI development server.

What are the Main Differences Between the Images?

Single-Stage Dockerfile:

Structure: All operations (dependency installation, code addition, environment setup) happen within one stage.
Image Size: Typically, the largest, since both build and runtime dependencies (e.g., compilers, Python environment, etc.) remain in the final image.
Build Time: It might take longer since all steps, including installation and code addition, happen within a single stage, requiring a complete rebuild when changes are made to any part of the build process.
Modularity: Less modular. Every change to the application or de****pendencies requires a rebuild of the entire image.

Standalone Dockerfile:

Structure: This uses two stages: the builder stage (which uses the uv image to install dependencies) and the final stage (which copies over the built application and Python environment from the builder).
Image Size: The final image contains only the runtime environment and the application code, excluding unnecessary build tools. However, it still needs Python to be installed from the builder image, which may result in a somewhat larger final image than the multistage approach.
Build Time: Slightly faster than the single-stage image because the standalone uses a dedicated builder stage for dependency installation, which allows better layer caching and separation of concerns.
Modularity: More modular. Each stage is responsible for a specific task, allowing you to optimize each step and cache parts of the build.

Multi-Stage Dockerfile:

Structure: This uses two stages: one for building and one for the final application image. The builder stage installs dependencies and compiles the app, while the final stage copies only the necessary parts into a slim runtime image.
Image Size: The smallest image size because it only includes the final necessary components: application code and runtime environment (no build tools, no redundant Python installation).
Build Time: Build times are comparable to the Standalone approach since both use multi-stage builds. However, rebuilds can be more efficient when only application code changes, as the Python environment in the final stage doesn’t need to be copied from the builder.
Modularity: The final runtime image is as slim as possible, making it an excellent choice for production.

What are the Main Differences Between the Images?

To validate the differences, a benchmark was run on image size and build time across three Docker strategies: Single-Stage, Standalone, and Multi-Stage.

In all three builds, the bulk of the image size (~2.89 GB) comes from the application itself and its installed dependencies, typically residing in .venv/. This layer is identical across builds, which is why the overall image sizes are relatively close despite the structural differences.

However, as the application grows (more dependencies, larger codebase, etc.), the differences between the image sizes (and also build time) will become more pronounced. The multi-stage approach will continue to offer the most significant reduction in size, while the single-stage build will likely see a larger overall size increase.

Single-Stage

What’s happening?
Everything, runtime, build tools, dependencies, gets packed into a single image.
Image Size: 🟥 Largest
Build Time: 🟥 Slowest
Why?
All dependencies (compilers, wheels, Python runtime, and your app) remain in the final image. There’s no separation between build environment and runtime environment.

Standalone

What’s happening?
Uses two stages: builds with uv in the first stage, then copies both Python runtime and application to a clean final image.
Image Size: 🟨 Medium
Build Time: 🟨 Slightly faster
Why?
More lightweight than the single-stage version since it discards build tools. The final image contains just the application, and a full Python installation copied from the builder.

Multi-Stage

What’s happening?
Uses two stages: builds with uv in the first stage, then copies only the application to a slim image with system Python already installed.
Image Size: 🟩 Smallest
Build Time: 🟩 Most efficient for production
Why?
The final image has only what’s necessary to run the app, no build tools, no dev dependencies, just your app and the minimal system Python environment.

Which One Should You Use in Production?

Single-Stage: Best for quick prototypes or local development of small applications. Not ideal for production due to larger image size and inclusion of unnecessary build tools.

Standalone: Excellent choice when you need precise control over Python versions across environments. Provides a good balance of clean runtime environment and consistent Python installation. Recommended for production when specific Python versions or configurations are critical.

Multi-Stage: The optimal choice for production deployments where minimal image size and security are priorities. Creates the leanest possible container by using system Python in the final image and including only what’s necessary to run your application.

Conclusion

Dockerizing Python applications has always required trade-offs , but with tools like uv, we’re entering a new era of faster builds, cleaner environments, and more efficient dependency management. What stands out across the three approaches isn’t just their technical differences, but the flexibility they offer depending on your team’s priorities.

The real takeaway? There’s no one-size-fits-all Dockerfile. Instead, uv gives you the tools to fine-tune your container strategy, from quick dev spins to production-grade minimalism. Whether you’re chasing performance, reproducibility, or portability, uv-based Dockerfiles let you build with confidence and clarity.

The whole code repository to reproduce the benchmark can be found here.

Building a serverless GenAI API with FastAPI, AWS, and CircleCI

Benito Martin — Wed, 12 Mar 2025 19:46:10 +0000

The advancement of AI has empowered businesses to incorporate intelligent automation into their applications. A serverless Generative AI (GenAI) API enables developers to harness cutting-edge AI models without the burden of infrastructure management. This guide walks you through building a scalable and cost-effective GenAI API using FastAPI, a high-performance Python framework with built-in async support and seamless AWS integration. By deploying FastAPI on AWS Lambda with AWS API Gateway, you can create a fully managed, pay-per-use architecture that eliminates server maintenance.

To simplify development and deployment, you will set up a Continuous Integration and Continuous Deployment (CI/CD) pipeline with CircleCI, automating testing, building, and deployment. With CircleCI’s GitHub integration, you’ll achieve continuous delivery, reducing errors and accelerating development cycles. This combination of FastAPI, AWS Lambda, and CircleCI ensures a robust, scalable, and efficient GenAI API ready for real-world applications.

You can check out the complete source code on GitHub, but this tutorial will guide you to build it from scratch.

Prerequisites

Before diving into the process of building a serverless GenAI API, there are several prerequisites you need to have in place. Here is a breakdown of what you will need:

AWS Account: You will need an active AWS account to deploy the serverless application using AWS services like Lambda and API Gateway.
AWS CLI: To install and configure the AWS CLI, follow the instructions in the AWS CLI documentation. Once installed, configure it with aws configure and provide your AWS access key, secret key, region, and output format.
Basic Understanding of RESTful APIs, FastAPI, and GenAI Models: This project assumes a basic understanding of RESTful APIs, FastAPI, and GenAI models. REST APIs enable communication between clients (like web or mobile apps) and servers, while FastAPI is a fast, modern Python framework for building APIs with automatic documentation generation. GenAI models, such as OpenAI’s GPT, generate human-like text and other outputs, and in this project, you will integrate OpenAI into the API to provide responses to user queries.
GitHub and CircleCI Accounts: You will need a GitHub account to host your project’s repository and a CircleCI account to automate testing and deployment through CI/CD.
OpenAI API Key: To access OpenAI’s GPT models, you will need an API key. You can sign up for an API key on the OpenAI website.

Setting Up the FastAPI GenAI Server

FastAPI is a modern, high-performance web framework for building APIs with Python. It is particularly well-suited for LLM-based APIs due to its speed, simplicity, and support for asynchronous operations, which enable handling multiple requests efficiently. For this project, you will integrate OpenAI’s GPT-4o-mini model via its API to generate AI-driven responses with minimal setup.
Installing Dependencies and GenAI Libraries

First, clone the repository containing the project code.

git clone https://github.com/CIRCLECI-GWP/genai-aws-circleci.git
cd genai-aws-circleci

Then, install the package manager uv from astral, a fast Python package manager, instead of pip. Written in Rust, uv is chosen for its speed, efficient dependency resolution, and built-in support for managing virtual environments. You can install it using the following command.

curl -LsSf https://astral.sh/uv/install.sh | sh

Once you have installed uv, run the following commands to install dependencies and activate the virtual environment.

uv sync
source .venv/bin/activate

The uv sync command will:

Install the dependencies defined in pyproject.toml.
Automatically create a virtual environment (.venv).

Finally, creeate .env file in the root directory of your respository and add your OPENAI_API_KEY to the file.

OPENAI_API_KEY=your-openai-key

Define Endpoints

With dependencies installed, you can now define the FastAPI endpoints to interact with the GPT-4o-mini model. This implementation, found in main.py, does not yet include AWS integration, which you will cover in the next section.

Code Breakdown:

PromptRequest (Pydantic Model): Defines the expected structure of incoming requests. It ensures that each request contains a prompt string.
get_openai_api_key(): Retrieves the OpenAI API key from the environment variables file. If the key is missing, it raises an HTTPException to prevent unauthorized API calls.
get_openai_client(): Uses get_openai_api_key() to fetch the API key and initialize the OpenAI client. If initialization fails, an exception is raised.
Root Endpoint (/): A simple health check that confirms the API is running.
Generate Endpoint (/generate): Accepts a POST request containing a prompt, passes it to OpenAI’s GPT-4o-mini, and returns the generated response. It depends on get_openai_client() to ensure a valid API connection.
OpenAI API Call: Uses chat.completions.create() to send the user’s prompt to OpenAI and returns the generated response.

from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv
import os
# Load environment variables
load_dotenv()

# Initialize FastAPI
app = FastAPI()

# Pydantic model to define expected structure of request
class PromptRequest(BaseModel):
    """Model for request validation."""
    prompt: str

def get_openai_api_key():
        api_key = os.environ.get("OPENAI_API_KEY")

        if not api_key:
           raise HTTPException(status_code=500, detail="OPENAI_API_KEY not found in environment variables")
        return api_key

def get_openai_client():

    try:
        api_key = get_openai_api_key()
        return OpenAI(api_key=api_key)
    except HTTPException as e:

        raise HTTPException(status_code=500, detail="Failed to initialize OpenAI client: " + str(e.detail))


@app.get("/")
async def root():
    """Root endpoint to confirm API is running."""
    return {"message": "Welcome to the GenAI API"}

@app.post("/generate")
async def generate_text(request: PromptRequest, client: OpenAI = Depends(get_openai_client)):

    if not client:
        raise HTTPException(status_code=500, detail="OpenAI API client not initialized.")

    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",  
            messages=[{"role": "user", "content": request.prompt}],
            max_tokens=200
        )

        if not response.choices:
            raise ValueError("No response received from OpenAI API.")

        return {"response": response.choices[0].message.content}

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Run the app with uvicorn
if __name__ == "__main__":
    import uvicorn
    uvicorn.run("main:app", host="127.0.0.1", port=8000, reload=True)

Running the FastAPI application

To run the FastAPI application, execute the command below:

uv run main.py

The command will start the FastAPI server locally at http://127.0.0.1:8000, and the --reload option allows for hot-reloading during development. You can use a cURL command below to make a POST request to the /generate endpoint with a prompt.

curl -X 'POST' 'http://127.0.0.1:8000/generate' \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "Tell me a fun fact about AI"}'

You should receive a response like this:

{ "response": "AI was first introduced as a field in 1956 at a conference at Dartmouth College. It was the birth of modern artificial intelligence!" }

That marks the FastAPI-based GenAI API's readiness for local testing. Next, you will integrate AWS Lambda and API Gateway for serverless deployment.

Deploying FastAPI to AWS Lambda

To deploy the FastAPI GenAI server to AWS Lambda, you will need to set up a few key components namely:

Mangum for making FastAPI compatible with AWS Lambda
The Lambda function handler
AWS API Gateway to expose the FastAPI endpoints
The OPENAI_API_KEY added into AWS Secrets Manager

Mangum is a Python library that allows ASGI] applications (like FastAPI) to run on AWS Lambda. It acts as an adapter, making FastAPI compatible with AWS Lambda’s event-driven architecture and API Gateway.

Creating an AWS Lambda Function Handler with Magnum

Once your FastAPI application is set up locally, you will need to wrap it in a handler that AWS Lambda can invoke when requests come in via API Gateway. This is where Mangum comes in. Modify your main.py by importing Mangum and wrapping the FastAPI app. Add the handler right after defining your endpoints.

import mangum  

# Create the handler for AWS Lambda
handler = mangum.Mangum(app)

When your app is running in AWS, you need to ensure that the OPENAI_API_KEY is accessed securely. You can add it into AWS Secrets Manager and update the main.py so that depending on where you run the app, the corresponding OPENAI_API_KEY is used.

The command below securely stores the OPENAI_API_KEY in AWS Secrets Manager, ensuring that sensitive credentials are not hardcoded in the application.

create-secret: Creates a new secret in AWS Secrets Manager.
--name: Specifies the unique name of the secret.
--description: Provides a brief description of the secret.
--secret-string: Stores the actual secret as a JSON object, where YOUR_OPENAI_API_KEY should be replaced with the actual API key.

aws secretsmanager create-secret \
    --name openai/api_key \
    --description "OpenAI API Key for GenAI API" \
    --secret-string '{"OPENAI_API_KEY":"YOUR_OPENAI_API_KEY"}'

Once stored, the application can retrieve this secret dynamically.

Then, update the get_openai_api_key function in the main.py file to allow retrieval of the key from the .env file when running locally and from AWS Secrets Manager when running on Lambda.

Code Breakdown:

If running on AWS Lambda (detected via AWS_LAMBDA_FUNCTION_NAME):
- It fetches the API key securely from AWS Secrets Manager.
- A Secrets Manager client is created, and the stored secret (openai/api_key) is retrieved and parsed.
If running locally:
- It loads the API key from the .env file via environment variables.

import boto3  
import json

def get_openai_api_key():

    # Check if running locally or in Lambda
    if os.environ.get("AWS_LAMBDA_FUNCTION_NAME"):
        # Running in Lambda, get key from AWS Secrets Manager
        secret_name = "openai/api_key"

        try:
            # Create a Secrets Manager client
            session = boto3.session.Session()
            client = session.client(service_name='secretsmanager', region_name="eu-central-1")

            # Get the secret API Key
            get_secret_value_response = client.get_secret_value(SecretId=secret_name)
            secret = get_secret_value_response['SecretString']
            secret_dict = json.loads(secret)

            api_key = secret_dict.get("OPENAI_API_KEY")
            if not api_key:
                raise KeyError("OPENAI_API_KEY not found in Secrets Manager.")

            return api_key
        except Exception as e:
            raise HTTPException(status_code=500, detail="Failed to retrieve API key from Secrets Manager")
    else:
        # Running locally, get key from .env file
        api_key = os.environ.get("OPENAI_API_KEY")

        if not api_key:
           raise HTTPException(status_code=500, detail="OPENAI_API_KEY not found in environment variables")

        logger.info("Successfully retrieved OpenAI API key from .env file.")
        return api_key

Testing and Validating the API

Testing and validating the API is crucial to ensure it is functioning correctly before deploying it. Below are several tests using pytest and unittest packages. The unit tests check if the app runs locally and in AWS Lambda, ensuring that requests work in both setups.

These tests validate the core functionality of the FastAPI-based GenAI server by covering different scenarios:

Basic API Functionality: Tests the root (/) endpoint and the /generate endpoint with a valid prompt.
Input Validation: Ensures that invalid input (e.g., missing prompt) returns appropriate error responses.
Error Handling: Mocks scenarios such as missing API keys and verifies that the API correctly returns error messages.
Mocking External Dependencies: Uses unittest.mock.patch to simulate OpenAI API calls and AWS Secrets Manager, ensuring API integration works as expected without relying on actual external services.

from fastapi.testclient import TestClient
from fastapi import HTTPException
from unittest.mock import patch, MagicMock
from main import app
import pytest
import os


@pytest.fixture
def client():
    """Fixture for FastAPI test client"""
    return TestClient(app)

def test_root_endpoint(client):
    """Test the root endpoint"""
    response = client.get("/")
    assert response.status_code == 200
    assert response.json() == {"message": "Welcome to the GenAI API"}

def test_generate_endpoint(client):
    """Test the generate endpoint"""

    response = client.post("/generate", json={"prompt": "Tell me a joke"})

    # Assert error status code
    response_data = response.json()
    assert response.status_code == 200
    assert "response" in response_data
    assert isinstance(response_data["response"], str)
    assert len(response_data["response"]) > 0

def test_generate_invalid_input(client):
    """Test the generate endpoint with invalid input"""
    # Test with missing prompt field
    response = client.post("/generate", json={})

    # Assert validation error
    assert response.status_code == 422 # Unprocessable Entity
    assert "prompt" in response.json()["detail"][0]["loc"]

@patch("main.get_openai_api_key") # Patch the get_openai_api_key function in main.py
def test_generate_text_missing_api_key(mock_get_api_key, client):
    """Test the generate endpoint when the API key is missing"""

    # Setup mock to raise an HTTPException
    mock_get_api_key.side_effect = HTTPException(status_code=500, detail="API key not found")

    # Test with a sample prompt
    response = client.post("/generate", json={"prompt": "Tell me a joke"})

    # Assert error status code
    assert response.status_code == 500 # Internal Server Error
    assert "API key not found" in response.json()["detail"]

# # Test function to mock OpenAI client behavior
@patch("main.get_openai_client")  # Patch the get_openai_client function in main.py
def test_mock_client(mock_get_client):
    """Test the OpenAI client behavior with a simplified mock client"""

    # Set up the mock OpenAI client and the mock response in one go
    mock_response = MagicMock()
    mock_response.choices = [
        MagicMock(
            message=MagicMock(content="Mock response")  # Directly mock the message and its content
        )
    ]

    # When `chat.completions.create()` is called, return the mock response
    mock_get_client.return_value.chat.completions.create.return_value = mock_response


    # Simulate calling the OpenAI client's `chat.completions.create()`
    result = mock_get_client.return_value.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Tell me a joke"}],
        max_tokens=200
    )

    # Assert the mock response
    assert result == mock_response
    assert result.choices[0].message.content == "Mock response"


@patch("boto3.session.Session")
def test_get_openai_api_key_aws_environment(mock_session, client):
    """Test retrieving API key from AWS Secrets Manager"""

    # Set up environment to simulate AWS Lambda
    with patch.dict(os.environ, {"AWS_LAMBDA_FUNCTION_NAME": "test-function"}, clear=True):

        # Create mock for the entire boto3 session and client chain
        mock_client = MagicMock()
        mock_session.return_value.client.return_value = mock_client

        # Mock the get_secret_value response
        mock_response = {
            'SecretString': '{"OPENAI_API_KEY": "test-api-key"}'
        }
        mock_client.get_secret_value.return_value = mock_response

        # Call the function under test
        from main import get_openai_api_key
        api_key = get_openai_api_key()

        # Assertions
        mock_session.assert_called_once()
        mock_session.return_value.client.assert_called_with(
            service_name='secretsmanager',
            region_name="eu-central-1"
        )
        mock_client.get_secret_value.assert_called_with(SecretId="openai/api_key")
        assert api_key == "test-api-key"

Mocking is an essential technique for testing app behavior before deploying it to production environments. It helps simulate API interactions, allowing you to check how the application would respond under various conditions without making real calls to external services.

API Deployment to AWS with AWS SAM

To expose your FastAPI endpoints using AWS API Gateway, you will use AWS Serverless Application Model (AWS SAM). AWS SAM simplifies the process of building and deploying serverless applications on AWS by providing a simplified syntax for defining AWS resources such as Lambda functions, API Gateway, IAM roles, and other related services, all within a template.yaml file.

Key components of the template.yaml file:

Lambda Function: The serverless function that will execute the FastAPI application logic.
API Gateway: The API Gateway exposes the FastAPI application as HTTP endpoints.
Secrets Manager: Stores OpenAI API Key securely, which will be retrieved by Lambda.
Policies: Defines necessary IAM roles and policies that allow Lambda to interact with other AWS services (e.g., Secrets Manager).

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: GenAI API with FastAPI and Lambda

# Global variables
Globals:
  Function: # Lambda function resources in the template
    Timeout: 30
    MemorySize: 256
    Runtime: python3.11
    Architectures:
      - x86_64
    Environment:
      Variables:
        OPENAI_API_KEY_SECRET_ARN: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:openai/api_key-*'
  Api:
    EndpointConfiguration: REGIONAL
    Cors:
      AllowMethods: "'*'"
      AllowHeaders: "'Content-Type,Authorization'"
      AllowOrigin: "'*'"

# AWS resources that will be created
Resources:
  # API Gateway
  GenAIApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: dev
      EndpointConfiguration: REGIONAL
      Cors:
        AllowMethods: "'*'"
        AllowHeaders: "'Content-Type,Authorization'"
        AllowOrigin: "'*'"

  # Lambda function
  GenAIFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./app/
      Handler: main.handler
      Description: FastAPI GenAI service using OpenAI API
      Policies:
        - AWSLambdaBasicExecutionRole
        - Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Action:
                - secretsmanager:GetSecretValue
              Resource: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:openai/api_key-*'

      Environment:
        Variables:
          OPENAI_API_KEY_SECRET_ARN: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:openai/api_key-*'    
      Events:
        RootPath:
          Type: Api
          Properties:
            RestApiId: !Ref GenAIApi
            Path: /       
            Method: ANY   
        GeneratePath:
          Type: Api
          Properties:
            RestApiId: !Ref GenAIApi
            Path: /generate   
            Method: ANY      

Outputs:
  GenAIApiEndpoint:
    Description: API Gateway endpoint URL for the GenAI service
    Value: !Sub 'https://${GenAIApi}.execute-api.${AWS::Region}.amazonaws.com/dev/'

Deploying the FastAPI Application

Once the template.yaml file is ready, the next step is to deploy your application using AWS SAM. Before deploying, you will need to create a Lambda deployment package that includes both the application code, main.py and necessary dependencies.

To make this easier, you will use a bash script (build-sam.sh) to automate the process. This script will create a folder named app where main.py will be copied and the dependencies from pyproject.toml will be transferred into a requirements.txt file, which works seamlessly with AWS Lambda.

#!/bin/bash
set -e
echo "🚀 Building Lambda deployment package..."

# Create a temporary build directory
LAMBDA_PACKAGE="$(pwd)/app"
mkdir -p "$LAMBDA_PACKAGE"

# Save dependencies to requirements.txt
echo "📦 Saving dependencies to requirements.txt..."
uv pip freeze > requirements.txt  

# Copy application code and project configuration
echo "📋 Copying application code and project configuration..."
cp main.py requirements.txt "$LAMBDA_PACKAGE/"

echo "✅ Build complete! Your Lambda package is ready for deployment."

This script will ensure that your Lambda deployment package includes both the application code and the necessary dependencies for running your application in AWS Lambda.

Once your Lambda deployment package is ready, you can use the following AWS SAM commands (from the root directory, where the template.yaml is located) to build and deploy the application.

sam build
sam deploy --stack-name sam-app --resolve-s3 --capabilities CAPABILITY_IAM --region eu-central-1

The sam build command will build the Lambda package and create a .aws-sam directory. The sam deploy command will deploy the app to AWS and create the necessary AWS resources (Lambda, API Gateway, and IAM roles).

Below is what the flags to the command do:

stack-name sam-app pecifies the name of the app: sam-app
resolve-s3 automatically creates and manages an Amazon S3 bucket for storing deployment artifacts.
capabilities CAPABILITY_IAM acknowledges that the deployment may create IAM roles and policies.
region eu-central-1 specifies the AWS region where the stack should be deployed. You can change this to your region.

Once the deployment is complete, you will get an API Gateway URL in the output.

You can test the /generate endpoint by making a POST request using curl.

curl -X POST https://znhxj2t415.execute-api.eu-central-1.amazonaws.com/dev/generate \
     -H "Content-Type: application/json" \
     -d '{"prompt": "Tell me a joke!"}'

You should receive a response like this:

{"response":"Why did the scarecrow win an award? \n\nBecause he was outstanding in his field!"}

You can monitor the logs for both AWS Lambda and API Gateway in AWS CloudWatch to ensure everything is running smoothly.

Automating Deployment with CircleCI

CI/CD is a critical aspect of modern software development that helps to automate the process of testing and deploying code. For serverless applications, such as those deployed using AWS Lambda, APIs, and other cloud services, CI/CD pipelines ensure the following benefits:

Automation: It reduces human errors by automating testing and deployment.
Faster Delivery: Features and updates are delivered faster with automated pipelines.
Consistency: Ensures the same deployment process across environments (dev, staging, production).
Scalability: Serverless architectures naturally scale, and a CI/CD pipeline ensures deployments can handle this scaling seamlessly.

Next you will walk you through setting up CircleCI to automate the testing and deployment of your FastAPI serverless application to AWS.

Setting Up AWS Credentials and OpenAI API Key in CircleCI

Before CircleCI can deploy your serverless application to AWS, you need to configure your AWS credentials and OPENAI_API_KEY. In your CircleCI account, set up a project and link it to your GitHub repository. Then, under project settings, add the environment variables.

These variables will be used by CircleCI during the CI/CD process to authenticate with AWS and OpenAI.

Creating the `config.yml` File for CircleCI

Now that you have your credentials and environment variables set up, you need to create a config.yml file for CircleCI that will automate the testing and deployment of your serverless application. This configuration file should be placed under the .circleci directory in the root of your repository. Create the folder and add the YAML file to it.

version: 2.1

orbs:
  python: circleci/python@3.0.0
  aws-cli: circleci/aws-cli@5.2.0

jobs:
  build-deploy:
    docker:
      - image: cimg/python:3.11
    steps:
      - checkout  # Checkout the code from the repository

      - run:
          name: Install uv
          command: curl -LsSf https://astral.sh/uv/install.sh | sh  # Install uv package

      - run:
          name: Install the project
          command: uv sync   # Sync the project using uv

      - run:
          name: Run tests
          command: uv run pytest  

      # Using AWS CLI orb for setup instead of manual installation
      - aws-cli/setup:
          profile_name: default

      - run:
          name: Install AWS SAM CLI
          command: |
            # Download and install AWS SAM CLI for Linux
            curl -Lo sam-cli-linux-x86_64.zip https://github.com/aws/aws-sam-cli/releases/latest/download/aws-sam-cli-linux-x86_64.zip
            unzip sam-cli-linux-x86_64.zip -d sam-installation  # Unzip the SAM CLI
            sudo ./sam-installation/install  # Run the SAM CLI installation script

      - run:
          name: Build Lambda Deployment Package
          command: |
            chmod +x build-sam.sh
            ./build-sam.sh  # Execute the build script

      - run:
          name: Build with SAM
          command: sam build 

      - run:
          name: Verify AWS Credentials
          command: |
            aws sts get-caller-identity  # Verify AWS credentials are set up correctly

      - run:
          name: Deploy with SAM
          command: |
            sam deploy --stack-name sam-app --resolve-s3 --capabilities CAPABILITY_IAM --region eu-central-1  # Deploy with SAM

      - run:
          name: Output API Gateway URL
          command: |
            API_URL=$(aws cloudformation describe-stacks --stack-name sam-app --region eu-central-1)
            echo "GenAI API URL: $API_URL"  # Output the API Gateway URL

workflows:
  version: 2
  deploy:
    jobs:
      - build-deploy:
          context: aws  # Add this to use AWS credentials from CircleCI context

Breakdown of the config.yml file:

Orbs: The configuration uses two orbs:
- python: The CircleCI Python orb handles setting up the Python environment.
- aws-cli: The AWS CLI orb simplifies the setup of AWS CLI to interact with AWS services.
Jobs:
- build-deploy: This job defines all the steps necessary to deploy your application, from installing all required dependencies, including AWS CLI and AWS SAM, verifying credentials, running tests, till building and deploying the lambda package using AWS SAM.
Workflows:
- The deploy workflow ensures that the build-deploy job is executed in sequence whenever changes are pushed to the repository.
- The context: aws ensures that the AWS credentials stored in CircleCI are available during the deployment process.

After creating the project, the pipeline will be triggered. If everything is set up correctly , you should see a green build. On clicking the pipeline, you will see at the end of the deployment, the API Gateway URL.

Similarly to the previous deployments, you can test the /generate endpoint by making a POST request using curl.

curl -X POST https://jz82zg9i21.execute-api.eu-central-1.amazonaws.com/dev/generate \
     -H "Content-Type: application/json" \
     -d '{"prompt": "Tell me a joke!"}'

 You should receive a response like this:
{"response":"Why did the scarecrow win an award? \n\nBecause he was outstanding in his field!"}

Clean Up

If you do not need the app anymore, you can delete it with the following command.

sam delete

You will be prompted to add the app name (sam-app) and to confirm the deletion.

Conclusion

Automating the deployment of serverless applications, such as those using FastAPI and AWS Lambda, significantly enhances the efficiency and reliability of the development process. By leveraging CircleCI for Continuous Integration and Continuous Deployment (CI/CD), you ensure that every code change is automatically tested, built, and deployed to AWS. This eliminates manual intervention, reduces the risk of human error, and accelerates the delivery of new features and updates.

The combination of AWS SAM, CircleCI, and serverless architecture not only streamlines the deployment but also scales seamlessly with demand, allowing you to focus on building and improving your application without worrying about the underlying infrastructure. Additionally, integrating AWS services such as API Gateway and Secrets Manager ensures secure and efficient interactions with external APIs like OpenAI's, while maintaining the best practices for cloud-based applications.

By following the outlined steps, you can establish a robust and automated pipeline for deploying serverless applications. This approach will empower your team to deliver high-quality code faster, with confidence, and without downtime.

If you enjoyed reading this content, you can support me by:

Reacting to the post 💖 💖 💖
Following me on DEV! 👏 👏 👏
Sharing my content on LinkedIn! 💯💯💯
Following my Github 🎶 🎷 🎶
Hiring me! 👨‍💻👨‍💻👨‍💻

Do you have questions or feedback? Feel free to ask below 🚀

Happy coding!