Sergio Esteban

Posted on Nov 8

Launching Your AI Agent on AWS: Bedrock, Lambda & API Gateway

#aws #bedrock #lambda #apigateway

Introduction

In this post, I’m going to explore the implementation of a service that allows us to send a questions and receive an answer back. Sending a question would be handle as a prompt to a specific model using Bedrock.

To minimize the costs of this implementation, I’ll be utilizing the services of API Gateway, AWS Lambda, and Amazon Bedrock.

Context

When we want to implement a Generative AI service, there are several possible paths, and we must take the time to understand the minimal requirements before making a decision.

In this post, I will demonstrate the process of deploying a simple GenAI solution that we can either connect to our current solution or expose to a third party via API Gateway.

TLDR

All the content for this implementation can be found in this repository.

Scope

For this implementation, I’m going to define a set of quite simple requirements.

Below is a diagram that represents the implementation we will be building.

Functional Requirements

We should be able to send a prompt and receive a responsein the form of model completion from Nova Micro.
We should be able to expose an HTTP endpoint to trigger the process of generating a response.
Let's estimate, for the sake of cost calculations, that we will serve a monthly volume of 100 requests.

Non-Functional Requirements

Since we're using AWS serverless services, we can lean on their built in features for our non-functional requirements.

For automation, the deployment should be handled completely by GitHub Actions. When it comes to availability, the API needs to maintain a monthly uptime of 99.9% or higher.

On the security front, we'll need IAM-scoped access for Bedrock, use OpenID Connect for authentication, and ensure all traffic is HTTPS only.

And finally, for observability, we must have structured logs that capture metadata, token usage, and any errors, all visualized with CloudWatch dashboards.

Simple.

What is Out of Scope?

For this implementation, I will be excluding authentication, sanitization, authorization and any security processes applied to the requests. This approach allows me to focus on the general GenAI implementation process.

Cost Breakdown per service

Given the predefined volume, we can estimate an average usage per request of 22 input tokens and 232 output tokens.

Amazon Bedrock (Nova Micro)

The Nova Micro model is processing 2,200 input tokens and 23,200 output tokens monthly. This runs about $0.003/month. Output tokens cost more than input about x 4 times.

AWS Lambda

Considering only 100 invocations and maybe 3 to 5 seconds per call** with 512MB memory, we are nowhere near the Free Tier limits. Lambda gives us 1 million requests and 400,000 GB-seconds free every month.

API Gateway

For the first year, we get 1 million requests free. After that, we are looking at maybe $0.0004/month.

Amazon ECR

That 300MB Docker imagesitting in ECR costs about a cent a month. After the 500MB Free Tier, we are paying for around 136MB of storage.

What Happens When It Scales?

If it reaches 1,000 requests monthly, costs are around $0.04. At 10K requests, it's about $0.39. Even at 100K requests, we are only paying about **$3.76/month.

Alrighty, let's move ahead.

Implementation

To begin the implementation, we can start by creating the agent.

Creating the Agent

To create the agent, I first need to set up the complete TypeScript project.

We can move quickly by following these instructions and installing the necessary dependencies so the project can operate correctly.

Feel free to use the configuration that suits you best.

mkdir -p handler terraform 
cd handler
pnpm init -y
pnpm --package=typescript dlx tsc --init
mkdir -p src __tests__
touch src/{app,env,index}.ts 

pnpm add -D @types/node tsx typescript
pnpm add ai @ai-sdk/amazon-bedrock
pnpm add zod dotenv

In this project, I will define three fundamental parts:

Logic for compatibility with AWS Lambda.
Logic to manage requests as prompts, solicit the text generation based on these, and handle their potential future evolution.
Core logic for the actual text generation using the @ai-sdk and Bedrock invocations.

Below is a summary of the project’s main files.

import { main } from "./app";

export const handler = async (event: any, context: any) => {
    try {
        const body = event.body ? JSON.parse(event.body) : {};
        const prompt = body.prompt ?? "Welcome from Warike technologies - GenAI solutions architecture";
        const response = await main(prompt);
        return {
            statusCode: 200,
            body: JSON.stringify({
                success: true,
                data: response,
            }),
        };
    } catch (error) {
        console.error('Error in Lambda handler:', error);
        return {
            statusCode: 500,
            body: JSON.stringify({
                success: false,
                error: error instanceof Error ? error.message : 'An unexpected error occurred'
            }),
        };
    }
};

// app.ts
import { generateResponse } from "./utils/bedrock";

export async function main(prompt: string) {
    try {
        console.log('🚀 Starting Bedrock:');
        return await generateResponse(prompt)
    } catch (error) {
        console.error('An unexpected error occurred running workflow:', error);
        throw error;
    }

}

// utils/bedrock.ts
import { config } from "./config";
import { createAmazonBedrock } from '@ai-sdk/amazon-bedrock';
import { generateText } from 'ai';

export async function generateResponse(prompt: string){
  const { regionId, modelId } = config({ });
  try {
    const bedrock = createAmazonBedrock({ 
        region: regionId
    });

    const { text, usage } = await generateText({
        model: bedrock(modelId),
        system: "You are a helpful assistant.",
        prompt: [
          { role: "user", content: prompt },
        ],
        });  
    console.log(`model: ${modelId}, \n response: ${text}, usage: ${JSON.stringify(usage)}`);
    return text;

  } catch (error) {
    console.log(`ERROR: Can't invoke '${modelId}'. Reason: ${error}`);
  }
}

For local testing, I've defined the following environment variables. We can use an AWS Bedrock API key for testing purposes.

AWS_REGION=us-west-2
AWS_BEDROCK_MODEL='amazon.nova-micro-v1:0'
AWS_BEARER_TOKEN_BEDROCK='aws_bearer_token_bedrock'

It's highly recommended to only use Short-term API keys to avoid compromising the system's security.

Defining the Infrastructure

Now that the system's logic is functional, we can create its Dockerfile, which will facilitate its deployment to AWS Lambda.

# ---- Build Stage ----
FROM node:22-alpine AS builder

WORKDIR /usr/src/app
RUN corepack enable

COPY package.json pnpm-lock.yaml* ./

RUN pnpm install --frozen-lockfile

COPY . .

RUN pnpm run build

# ---- Runtime Stage ----
FROM public.ecr.aws/lambda/nodejs:22

WORKDIR ${LAMBDA_TASK_ROOT}

COPY --from=builder /usr/src/app/dist/src ./ 
COPY --from=builder /usr/src/app/node_modules ./node_modules

CMD [ "index.handler" ]

With all components ready in our handler, we can proceed to define the resources in Terraform.

Exposing the Service with API Gateway

We'll start by defining our API Gateway. We'll use the HTTP protocol and focus solely on creating the API, its stage, and its subsequent integration with Lambda.

locals {
  api_gateway_name = "dev-http-${local.project_name}"
}

module "warike_development_api_gw" {
  source  = "terraform-aws-modules/apigateway-v2/aws"
  version = "5.4.1"

  name          = local.api_gateway_name
  description   = "API Gateway for ${local.project_name}"
  protocol_type = "HTTP"

  create_domain_name    = false
  create_certificate    = false
  create_domain_records = false

  cors_configuration = {
    allow_headers = ["*"]
    allow_methods = ["*"]
    allow_origins = ["*"]
  }

  # Access logs
  stage_name        = "dev"
  stage_description = "Development API Gateway"


  stage_access_log_settings = {

    create_log_group = false
    destination_arn = aws_cloudwatch_log_group.warike_development_api_gw_logs.arn

    format = jsonencode({
      context = {
        requestId               = "$context.requestId"
        requestTime             = "$context.requestTime"
        protocol                = "$context.protocol"
        httpMethod              = "$context.httpMethod"
        resourcePath            = "$context.resourcePath"
        routeKey                = "$context.routeKey"
        status                  = "$context.status"
        responseLength          = "$context.responseLength"
        integrationErrorMessage = "$context.integrationErrorMessage"

        error = {
          message      = "$context.error.message"
          responseType = "$context.error.responseType"
        }
        identity = {
          sourceIP = "$context.identity.sourceIp"
        }
        integration = {
          error             = "$context.integration.error"
          integrationStatus = "$context.integration.integrationStatus"
        }
      }
    })
  }

  # Routes & Integration
  routes = {
    "POST /" = {
      integration = {
        uri                    = module.warike_development_lambda.lambda_function_arn
        payload_format_version = "2.0"
      }
    }
  }

  stage_tags = merge(local.tags, {
    Name = "${local.api_gateway_name}-dev"
  })

  tags = merge(local.tags, {
    Name = local.api_gateway_name
  })

  depends_on = [
    aws_cloudwatch_log_group.warike_development_api_gw_logs,
  ]
}

resource "aws_cloudwatch_log_group" "warike_development_api_gw_logs" {
  name              = "/aws/api-gw/${local.api_gateway_name}"
  retention_in_days = 7
}

Connecting to Amazon Bedrock

Next, to utilize Amazon Bedrock, this implementation will use the Amazon Nova Micro inference profile in the US region.

locals {
  model_id = "amazon.nova-micro-v1:0"
}

data "aws_bedrock_inference_profile" "warike_development_lambda_bedrock_model" {
  inference_profile_id = "us.${local.model_id}"
}

## Bedrock Policy ##
data "aws_iam_policy_document" "warike_development_lambda_bedrock_policy_doc" {
  statement {
    effect = "Allow"
    actions = [
      "bedrock:InvokeModel",
    ]
    resources = ["*"]
  }
}

Lambda Service

Finally, we'll create the Lambda function associated with all the components we've created, along with others that can be found in the repository.

locals {
  lambda_function_name = local.project_name
  lambda_env_vars = {
    AWS_BEDROCK_MODEL = data.aws_bedrock_inference_profile.warike_development_lambda_bedrock_model.inference_profile_arn
  }
}

module "warike_development_lambda" {
  source  = "terraform-aws-modules/lambda/aws"
  version = "8.1.2"

  function_name = local.lambda_function_name
  description   = "Lambda function for ${local.project_name}"

  image_uri               = "${aws_ecr_repository.warike_development_ecr.repository_url}:latest"
  package_type            = "Image"
  create_package          = false
  ignore_source_code_hash = true
  memory_size             = 128
  timeout                 = 900

  environment_variables = merge(
    local.lambda_env_vars,
    {}
  )

  create_role = false
  lambda_role = aws_iam_role.warike_development_lambda_role.arn

  ## Cloudwatch logging
  use_existing_cloudwatch_log_group = true
  logging_log_group                 = aws_cloudwatch_log_group.warike_development_lambda_logs.name
  logging_log_format                = "JSON"
  logging_application_log_level     = "INFO"
  logging_system_log_level          = "WARN"

  ## function URL
  create_lambda_function_url = false

  depends_on = [
    aws_cloudwatch_log_group.warike_development_lambda_logs,
    null_resource.warike_development_seed_ecr_image
  ]

  tags = merge(local.tags, {
    Name = local.lambda_function_name
  })

}

resource "null_resource" "warike_development_seed_ecr_image" {
  provisioner "local-exec" {
    command = <<EOT
      aws ecr get-login-password --region ${local.aws_region} --profile ${local.aws_profile} \
        | docker login --username AWS --password-stdin ${data.aws_caller_identity.current.account_id}.dkr.ecr.${local.aws_region}.amazonaws.com

      docker pull public.ecr.aws/lambda/nodejs:22
      docker tag public.ecr.aws/lambda/nodejs:22 ${aws_ecr_repository.warike_development_ecr.repository_url}:latest
      docker push ${aws_ecr_repository.warike_development_ecr.repository_url}:latest
    EOT
  }

  depends_on = [aws_ecr_repository.warike_development_ecr]
}

Important notice, warike_development_seed_ecr_image will require you to have your local docker running.

Let’s move ahead.

Creating the Resources

Once the resources are created, you should see a message similar to this:

terraform apply
...
Apply complete! Resources: 26 added, 0 changed, 0 destroyed.

Configuring CI/CD with GitHub + ECR

Additionally, we need a Docker image containing our project. The following GitHub pipeline will allow us to build and push an image to ECR, followed by its deployment to Lambda.

---
name: Lambda CI/CD Common
'on':
  workflow_call:
    inputs:
      app-name:
        type: string
      lambda-function-secret-name:
        required: true
        type: string
      pnpm-version:
        required: false
        type: number
        default: 10
      node-version:
        required: false
        type: number
        default: 22
    secrets:
      PROJECT_NAME:
        required: true
      AWS_OIDC_ROLE_ARN:
        required: true
      AWS_REGION:
        required: true
      ECR_REPOSITORY:
        required: true
      AWS_LAMBDA_FUNCTION_NAME:
        required: true
      AWS_LAMBDA_FUNCTION_ROLE_ARN:
        required: true
jobs:
  build:
    name: Build and Test
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ./handler
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Install pnpm
        uses: pnpm/action-setup@v4
        with:
          version: ${{ inputs.pnpm-version }}

      - name: Use Node.js ${{ inputs.node-version }}
        uses: actions/setup-node@v4
        with:
          node-version: ${{ inputs.node-version }}

      - name: Install Dependencies
        run: pnpm install --frozen-lockfile

      - name: Scan for critical vulnerabilities
        run: pnpm audit --audit-level=critical

      - name: Run Tests
        env:
          DOTENV_QUIET: true
        run: pnpm test:ci

      - name: Build
        run: pnpm run build

  build-docker:
    name: Build and Push Docker Image
    runs-on: ubuntu-latest
    needs: build
    permissions:
      id-token: write
      contents: read
    outputs:
      sha: ${{ steps.vars.outputs.sha }}
    defaults:
      run:
        working-directory: ./handler
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_OIDC_ROLE_ARN }}
          aws-region: ${{ secrets.AWS_REGION }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2
        with:
          mask-password: 'true'

      - name: Set commit-sha
        id: vars
        run: |
          calculatedSha=$(git rev-parse --short ${{ github.sha }})
          echo "sha=${calculatedSha}" >> $GITHUB_OUTPUT

      - name: Build and Push Docker Image
        env:
          DOCKER_IMAGE: ${{ secrets.ECR_REPOSITORY }}:${{ inputs.app-name }}-${{ steps.vars.outputs.sha }}
        run: |
          echo "Building Docker image $DOCKER_IMAGE"
          docker build -t $DOCKER_IMAGE .
          docker tag $DOCKER_IMAGE "${{ secrets.ECR_REPOSITORY }}:${{ inputs.app-name }}-${{ steps.vars.outputs.sha }}"
          docker tag $DOCKER_IMAGE "${{ secrets.ECR_REPOSITORY }}:latest"
          docker push $DOCKER_IMAGE
          docker push "${{ secrets.ECR_REPOSITORY }}:${{ inputs.app-name }}-${{ steps.vars.outputs.sha }}"
          docker push "${{ secrets.ECR_REPOSITORY }}:latest"

  deploy-prod:
    name: Deploy Production Lambda
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    needs: build-docker
    permissions:
      id-token: write
      contents: read
    defaults:
      run:
        working-directory: ./handler
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_OIDC_ROLE_ARN }}
          aws-region: ${{ secrets.AWS_REGION }}

      - name: Deploy Production Lambda
        uses: aws-actions/aws-lambda-deploy@v1.1.0
        with:
          function-name: ${{ secrets.AWS_LAMBDA_FUNCTION_NAME }}
          package-type: Image
          image-uri: ${{ secrets.ECR_REPOSITORY }}:${{ inputs.app-name }}-${{ needs.build-docker.outputs.sha }}

And if everything is correct, we will be in a position to test Amazon Bedrock.

Testing

Let's perform a test. From our terminal, we can make the following query and observe the result:

curl -sS "https://123456.execute-api.us-west-2.amazonaws.com/dev/" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Heeey hoe gaat het?"}' | jq

The expected output is something like this:

{
  "success": true,
  "data": "Hoi! Het gaat prima, bedankt voor het vragen. Hoe gaat het met jou? Is er iets waar ik je kan helpen of iets waar je graag over wilt praten?"
}

I call that a success.

Observability

It is important to mention that we can monitor any errors from CloudWatch, so we don't have to navigate in the dark.

Cleaning

Finally, we clean up the resources.

terraform destroy
...
Destroy complete! Resources: 26 destroyed.

Conclusions

Overall, implementing this GenAI service using AWS serverless components was quite straightforward.

The combination of API Gateway, AWS Lambda, and Amazon Bedrock, specifically with the Nova Micro model, proves to be not only functional but also incredibly cost-effective. The analysis showed that this solution remains exceptionally inexpensive, even when (manually) scaling traffic significantly.

By leveraging Terraform for infrastructure management and GitHub Actions for the CI/CD pipeline, we achieved a robust and fully automated deployment process.

Finally, even by excluding aspects like authentication, it provides a solid and scalable foundation for building more complex Generative AI applications.

DEV Community