Introduction
In this post, I’m going to explore the implementation of a service that allows us to send a questions and receive an answer back. Sending a question would be handle as a prompt to a specific model using Bedrock.
To minimize the costs of this implementation, I’ll be utilizing the services of API Gateway, AWS Lambda, and Amazon Bedrock.
Context
When we want to implement a Generative AI service, there are several possible paths, and we must take the time to understand the minimal requirements before making a decision.
In this post, I will demonstrate the process of deploying a simple GenAI solution that we can either connect to our current solution or expose to a third party via API Gateway.
TLDR
All the content for this implementation can be found in this repository.
Scope
For this implementation, I’m going to define a set of quite simple requirements.
Below is a diagram that represents the implementation we will be building.
Functional Requirements
- We should be able to send a prompt and receive a responsein the form of model completion from Nova Micro.
- We should be able to expose an HTTP endpoint to trigger the process of generating a response.
- Let's estimate, for the sake of cost calculations, that we will serve a monthly volume of 100 requests.
Non-Functional Requirements
Since we're using AWS serverless services, we can lean on their built in features for our non-functional requirements.
For automation, the deployment should be handled completely by GitHub Actions. When it comes to availability, the API needs to maintain a monthly uptime of 99.9% or higher.
On the security front, we'll need IAM-scoped access for Bedrock, use OpenID Connect for authentication, and ensure all traffic is HTTPS only.
And finally, for observability, we must have structured logs that capture metadata, token usage, and any errors, all visualized with CloudWatch dashboards.
Simple.
What is Out of Scope?
For this implementation, I will be excluding authentication, sanitization, authorization and any security processes applied to the requests. This approach allows me to focus on the general GenAI implementation process.
Cost Breakdown per service
Given the predefined volume, we can estimate an average usage per request of 22 input tokens and 232 output tokens.
Amazon Bedrock (Nova Micro)
The Nova Micro model is processing 2,200 input tokens and 23,200 output tokens monthly. This runs about $0.003/month. Output tokens cost more than input about x 4 times.
AWS Lambda
Considering only 100 invocations and maybe 3 to 5 seconds per call** with 512MB memory, we are nowhere near the Free Tier limits. Lambda gives us 1 million requests and 400,000 GB-seconds free every month.
API Gateway
For the first year, we get 1 million requests free. After that, we are looking at maybe $0.0004/month.
Amazon ECR
That 300MB Docker imagesitting in ECR costs about a cent a month. After the 500MB Free Tier, we are paying for around 136MB of storage.
What Happens When It Scales?
If it reaches 1,000 requests monthly, costs are around $0.04. At 10K requests, it's about $0.39. Even at 100K requests, we are only paying about **$3.76/month.
Alrighty, let's move ahead.
Implementation
To begin the implementation, we can start by creating the agent.
Creating the Agent
To create the agent, I first need to set up the complete TypeScript project.
We can move quickly by following these instructions and installing the necessary dependencies so the project can operate correctly.
Feel free to use the configuration that suits you best.
mkdir -p handler terraform
cd handler
pnpm init -y
pnpm --package=typescript dlx tsc --init
mkdir -p src __tests__
touch src/{app,env,index}.ts
pnpm add -D @types/node tsx typescript
pnpm add ai @ai-sdk/amazon-bedrock
pnpm add zod dotenv
In this project, I will define three fundamental parts:
- Logic for compatibility with AWS Lambda.
- Logic to manage requests as prompts, solicit the text generation based on these, and handle their potential future evolution.
-
Core logic for the actual text generation using the
@ai-sdkand Bedrock invocations.
Below is a summary of the project’s main files.
import { main } from "./app";
export const handler = async (event: any, context: any) => {
try {
const body = event.body ? JSON.parse(event.body) : {};
const prompt = body.prompt ?? "Welcome from Warike technologies - GenAI solutions architecture";
const response = await main(prompt);
return {
statusCode: 200,
body: JSON.stringify({
success: true,
data: response,
}),
};
} catch (error) {
console.error('Error in Lambda handler:', error);
return {
statusCode: 500,
body: JSON.stringify({
success: false,
error: error instanceof Error ? error.message : 'An unexpected error occurred'
}),
};
}
};
// app.ts
import { generateResponse } from "./utils/bedrock";
export async function main(prompt: string) {
try {
console.log('🚀 Starting Bedrock:');
return await generateResponse(prompt)
} catch (error) {
console.error('An unexpected error occurred running workflow:', error);
throw error;
}
}
// utils/bedrock.ts
import { config } from "./config";
import { createAmazonBedrock } from '@ai-sdk/amazon-bedrock';
import { generateText } from 'ai';
export async function generateResponse(prompt: string){
const { regionId, modelId } = config({ });
try {
const bedrock = createAmazonBedrock({
region: regionId
});
const { text, usage } = await generateText({
model: bedrock(modelId),
system: "You are a helpful assistant.",
prompt: [
{ role: "user", content: prompt },
],
});
console.log(`model: ${modelId}, \n response: ${text}, usage: ${JSON.stringify(usage)}`);
return text;
} catch (error) {
console.log(`ERROR: Can't invoke '${modelId}'. Reason: ${error}`);
}
}
For local testing, I've defined the following environment variables. We can use an AWS Bedrock API key for testing purposes.
AWS_REGION=us-west-2
AWS_BEDROCK_MODEL='amazon.nova-micro-v1:0'
AWS_BEARER_TOKEN_BEDROCK='aws_bearer_token_bedrock'
It's highly recommended to only use Short-term API keys to avoid compromising the system's security.
Defining the Infrastructure
Now that the system's logic is functional, we can create its Dockerfile, which will facilitate its deployment to AWS Lambda.
# ---- Build Stage ----
FROM node:22-alpine AS builder
WORKDIR /usr/src/app
RUN corepack enable
COPY package.json pnpm-lock.yaml* ./
RUN pnpm install --frozen-lockfile
COPY . .
RUN pnpm run build
# ---- Runtime Stage ----
FROM public.ecr.aws/lambda/nodejs:22
WORKDIR ${LAMBDA_TASK_ROOT}
COPY --from=builder /usr/src/app/dist/src ./
COPY --from=builder /usr/src/app/node_modules ./node_modules
CMD [ "index.handler" ]
With all components ready in our handler, we can proceed to define the resources in Terraform.
Exposing the Service with API Gateway
We'll start by defining our API Gateway. We'll use the HTTP protocol and focus solely on creating the API, its stage, and its subsequent integration with Lambda.
locals {
api_gateway_name = "dev-http-${local.project_name}"
}
module "warike_development_api_gw" {
source = "terraform-aws-modules/apigateway-v2/aws"
version = "5.4.1"
name = local.api_gateway_name
description = "API Gateway for ${local.project_name}"
protocol_type = "HTTP"
create_domain_name = false
create_certificate = false
create_domain_records = false
cors_configuration = {
allow_headers = ["*"]
allow_methods = ["*"]
allow_origins = ["*"]
}
# Access logs
stage_name = "dev"
stage_description = "Development API Gateway"
stage_access_log_settings = {
create_log_group = false
destination_arn = aws_cloudwatch_log_group.warike_development_api_gw_logs.arn
format = jsonencode({
context = {
requestId = "$context.requestId"
requestTime = "$context.requestTime"
protocol = "$context.protocol"
httpMethod = "$context.httpMethod"
resourcePath = "$context.resourcePath"
routeKey = "$context.routeKey"
status = "$context.status"
responseLength = "$context.responseLength"
integrationErrorMessage = "$context.integrationErrorMessage"
error = {
message = "$context.error.message"
responseType = "$context.error.responseType"
}
identity = {
sourceIP = "$context.identity.sourceIp"
}
integration = {
error = "$context.integration.error"
integrationStatus = "$context.integration.integrationStatus"
}
}
})
}
# Routes & Integration
routes = {
"POST /" = {
integration = {
uri = module.warike_development_lambda.lambda_function_arn
payload_format_version = "2.0"
}
}
}
stage_tags = merge(local.tags, {
Name = "${local.api_gateway_name}-dev"
})
tags = merge(local.tags, {
Name = local.api_gateway_name
})
depends_on = [
aws_cloudwatch_log_group.warike_development_api_gw_logs,
]
}
resource "aws_cloudwatch_log_group" "warike_development_api_gw_logs" {
name = "/aws/api-gw/${local.api_gateway_name}"
retention_in_days = 7
}
Connecting to Amazon Bedrock
Next, to utilize Amazon Bedrock, this implementation will use the Amazon Nova Micro inference profile in the US region.
locals {
model_id = "amazon.nova-micro-v1:0"
}
data "aws_bedrock_inference_profile" "warike_development_lambda_bedrock_model" {
inference_profile_id = "us.${local.model_id}"
}
## Bedrock Policy ##
data "aws_iam_policy_document" "warike_development_lambda_bedrock_policy_doc" {
statement {
effect = "Allow"
actions = [
"bedrock:InvokeModel",
]
resources = ["*"]
}
}
Lambda Service
Finally, we'll create the Lambda function associated with all the components we've created, along with others that can be found in the repository.
locals {
lambda_function_name = local.project_name
lambda_env_vars = {
AWS_BEDROCK_MODEL = data.aws_bedrock_inference_profile.warike_development_lambda_bedrock_model.inference_profile_arn
}
}
module "warike_development_lambda" {
source = "terraform-aws-modules/lambda/aws"
version = "8.1.2"
function_name = local.lambda_function_name
description = "Lambda function for ${local.project_name}"
image_uri = "${aws_ecr_repository.warike_development_ecr.repository_url}:latest"
package_type = "Image"
create_package = false
ignore_source_code_hash = true
memory_size = 128
timeout = 900
environment_variables = merge(
local.lambda_env_vars,
{}
)
create_role = false
lambda_role = aws_iam_role.warike_development_lambda_role.arn
## Cloudwatch logging
use_existing_cloudwatch_log_group = true
logging_log_group = aws_cloudwatch_log_group.warike_development_lambda_logs.name
logging_log_format = "JSON"
logging_application_log_level = "INFO"
logging_system_log_level = "WARN"
## function URL
create_lambda_function_url = false
depends_on = [
aws_cloudwatch_log_group.warike_development_lambda_logs,
null_resource.warike_development_seed_ecr_image
]
tags = merge(local.tags, {
Name = local.lambda_function_name
})
}
resource "null_resource" "warike_development_seed_ecr_image" {
provisioner "local-exec" {
command = <<EOT
aws ecr get-login-password --region ${local.aws_region} --profile ${local.aws_profile} \
| docker login --username AWS --password-stdin ${data.aws_caller_identity.current.account_id}.dkr.ecr.${local.aws_region}.amazonaws.com
docker pull public.ecr.aws/lambda/nodejs:22
docker tag public.ecr.aws/lambda/nodejs:22 ${aws_ecr_repository.warike_development_ecr.repository_url}:latest
docker push ${aws_ecr_repository.warike_development_ecr.repository_url}:latest
EOT
}
depends_on = [aws_ecr_repository.warike_development_ecr]
}
Important notice, warike_development_seed_ecr_image will require you to have your local docker running.
Let’s move ahead.
Creating the Resources
Once the resources are created, you should see a message similar to this:
terraform apply
...
Apply complete! Resources: 26 added, 0 changed, 0 destroyed.
Configuring CI/CD with GitHub + ECR
Additionally, we need a Docker image containing our project. The following GitHub pipeline will allow us to build and push an image to ECR, followed by its deployment to Lambda.
---
name: Lambda CI/CD Common
'on':
workflow_call:
inputs:
app-name:
type: string
lambda-function-secret-name:
required: true
type: string
pnpm-version:
required: false
type: number
default: 10
node-version:
required: false
type: number
default: 22
secrets:
PROJECT_NAME:
required: true
AWS_OIDC_ROLE_ARN:
required: true
AWS_REGION:
required: true
ECR_REPOSITORY:
required: true
AWS_LAMBDA_FUNCTION_NAME:
required: true
AWS_LAMBDA_FUNCTION_ROLE_ARN:
required: true
jobs:
build:
name: Build and Test
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./handler
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install pnpm
uses: pnpm/action-setup@v4
with:
version: ${{ inputs.pnpm-version }}
- name: Use Node.js ${{ inputs.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ inputs.node-version }}
- name: Install Dependencies
run: pnpm install --frozen-lockfile
- name: Scan for critical vulnerabilities
run: pnpm audit --audit-level=critical
- name: Run Tests
env:
DOTENV_QUIET: true
run: pnpm test:ci
- name: Build
run: pnpm run build
build-docker:
name: Build and Push Docker Image
runs-on: ubuntu-latest
needs: build
permissions:
id-token: write
contents: read
outputs:
sha: ${{ steps.vars.outputs.sha }}
defaults:
run:
working-directory: ./handler
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_OIDC_ROLE_ARN }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
with:
mask-password: 'true'
- name: Set commit-sha
id: vars
run: |
calculatedSha=$(git rev-parse --short ${{ github.sha }})
echo "sha=${calculatedSha}" >> $GITHUB_OUTPUT
- name: Build and Push Docker Image
env:
DOCKER_IMAGE: ${{ secrets.ECR_REPOSITORY }}:${{ inputs.app-name }}-${{ steps.vars.outputs.sha }}
run: |
echo "Building Docker image $DOCKER_IMAGE"
docker build -t $DOCKER_IMAGE .
docker tag $DOCKER_IMAGE "${{ secrets.ECR_REPOSITORY }}:${{ inputs.app-name }}-${{ steps.vars.outputs.sha }}"
docker tag $DOCKER_IMAGE "${{ secrets.ECR_REPOSITORY }}:latest"
docker push $DOCKER_IMAGE
docker push "${{ secrets.ECR_REPOSITORY }}:${{ inputs.app-name }}-${{ steps.vars.outputs.sha }}"
docker push "${{ secrets.ECR_REPOSITORY }}:latest"
deploy-prod:
name: Deploy Production Lambda
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
needs: build-docker
permissions:
id-token: write
contents: read
defaults:
run:
working-directory: ./handler
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_OIDC_ROLE_ARN }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Deploy Production Lambda
uses: aws-actions/aws-lambda-deploy@v1.1.0
with:
function-name: ${{ secrets.AWS_LAMBDA_FUNCTION_NAME }}
package-type: Image
image-uri: ${{ secrets.ECR_REPOSITORY }}:${{ inputs.app-name }}-${{ needs.build-docker.outputs.sha }}
And if everything is correct, we will be in a position to test Amazon Bedrock.
Testing
Let's perform a test. From our terminal, we can make the following query and observe the result:
curl -sS "https://123456.execute-api.us-west-2.amazonaws.com/dev/" \
-H "Content-Type: application/json" \
-d '{"prompt":"Heeey hoe gaat het?"}' | jq
The expected output is something like this:
{
"success": true,
"data": "Hoi! Het gaat prima, bedankt voor het vragen. Hoe gaat het met jou? Is er iets waar ik je kan helpen of iets waar je graag over wilt praten?"
}
I call that a success.
Observability
It is important to mention that we can monitor any errors from CloudWatch, so we don't have to navigate in the dark.
Cleaning
Finally, we clean up the resources.
terraform destroy
...
Destroy complete! Resources: 26 destroyed.
Conclusions
Overall, implementing this GenAI service using AWS serverless components was quite straightforward.
The combination of API Gateway, AWS Lambda, and Amazon Bedrock, specifically with the Nova Micro model, proves to be not only functional but also incredibly cost-effective. The analysis showed that this solution remains exceptionally inexpensive, even when (manually) scaling traffic significantly.
By leveraging Terraform for infrastructure management and GitHub Actions for the CI/CD pipeline, we achieved a robust and fully automated deployment process.
Finally, even by excluding aspects like authentication, it provides a solid and scalable foundation for building more complex Generative AI applications.



Top comments (0)