Suhas Mallesh

Posted on Feb 16 • Edited on Feb 17

Your AWS Account is AI-Ready: Deploy your first AI endpoint with Terraform in 10 minutes⚡

#aws #terraform #devops #ai

Amazon Bedrock gives you instant access to Claude, Llama, and Titan models with zero ML experience. Here's how to deploy your first AI endpoint with Terraform in 10 minutes.

Every AWS account has access to Claude, Llama, Titan, and other foundation models right now. No GPU provisioning. No Docker containers. No ML degree required.

Amazon Bedrock is a fully managed service that lets you call AI models via API — just like calling DynamoDB or S3. You send a prompt, you get a response. That's it.

The problem? Most teams still set this up manually through the console. Let's fix that with Terraform and build something production-ready from day one. 🏗️

🤔 Bedrock vs SageMaker — Which One Do You Need?

Before writing any code, let's clear this up:

	Bedrock	SageMaker
What it is	API access to pre-trained models	Platform to build/train/deploy custom models
Infrastructure	Serverless (zero to manage)	You manage instances, endpoints, clusters
Use when	You want to USE AI models	You want to BUILD AI models
Models	Claude, Llama, Titan, Mistral, etc.	Your own custom models
ML expertise needed	None	Significant
Time to first API call	10 minutes	Days to weeks

Rule of thumb: Start with Bedrock. Move to SageMaker only when you need custom model training. 90% of enterprise AI use cases (chatbots, summarization, RAG, code generation) work great with Bedrock. 🎯

🏗️ Step 1: Enable Model Access with Terraform

Before calling any model, AWS requires you to explicitly request access. This is a one-time setup per account/region.

# bedrock/main.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 5.60.0"  # Bedrock support
    }
  }
}

provider "aws" {
  region = var.region
}

variable "region" {
  type    = string
  default = "us-east-1"  # Best Bedrock model availability
}

⚠️ Note: Model access activation currently requires a one-time console click or API call. Terraform manages the infrastructure around Bedrock, but the initial model enablement is done via the AWS console under Amazon Bedrock → Model access → Manage model access. This is intentional — AWS wants humans to explicitly approve which models an account can use.

🔐 Step 2: IAM Role for Bedrock Access

This is where Terraform shines. Lock down exactly which models your application can call:

# bedrock/iam.tf

# Role for your application to invoke Bedrock
resource "aws_iam_role" "bedrock_invoker" {
  name = "${var.environment}-bedrock-invoker"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"  # Or ecs-tasks, ec2, etc.
      }
    }]
  })

  tags = {
    Environment = var.environment
    Purpose     = "bedrock-ai"
    ManagedBy   = "terraform"
  }
}

# Granular Bedrock permissions
resource "aws_iam_role_policy" "bedrock_invoke" {
  name = "bedrock-invoke-policy"
  role = aws_iam_role.bedrock_invoker.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "AllowModelInvocation"
        Effect = "Allow"
        Action = [
          "bedrock:InvokeModel",
          "bedrock:InvokeModelWithResponseStream"
        ]
        # Lock down to specific models only 👇
        Resource = [
          "arn:aws:bedrock:${var.region}::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
          "arn:aws:bedrock:${var.region}::foundation-model/anthropic.claude-3-haiku-20240307-v1:0",
          "arn:aws:bedrock:${var.region}::foundation-model/amazon.titan-embed-text-v2:0"
        ]
      },
      {
        Sid    = "AllowModelListing"
        Effect = "Allow"
        Action = [
          "bedrock:ListFoundationModels",
          "bedrock:GetFoundationModel"
        ]
        Resource = "*"
      }
    ]
  })
}

# CloudWatch Logs for Lambda
resource "aws_iam_role_policy_attachment" "lambda_logs" {
  role       = aws_iam_role.bedrock_invoker.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

variable "environment" {
  type    = string
  default = "dev"
}

Why this matters: Most tutorials use bedrock:* on Resource: "*". That means any developer can call any model, including expensive ones. Locking down to specific model ARNs is how enterprises do it. 🔒

⚡ Step 3: Your First AI-Powered Lambda

Let's build a Lambda function that calls Claude and returns a response:

# bedrock/lambda.tf

resource "aws_lambda_function" "ai_endpoint" {
  filename         = data.archive_file.ai_lambda.output_path
  function_name    = "${var.environment}-bedrock-ai"
  role             = aws_iam_role.bedrock_invoker.arn
  handler          = "index.handler"
  runtime          = "python3.12"
  timeout          = 30
  memory_size      = 256
  source_code_hash = data.archive_file.ai_lambda.output_base64sha256

  environment {
    variables = {
      MODEL_ID    = "anthropic.claude-3-5-sonnet-20241022-v2:0"
      AWS_REGION_ = var.region  # SDK uses this
    }
  }

  tags = {
    Environment = var.environment
    Purpose     = "bedrock-ai"
    ManagedBy   = "terraform"
  }
}

data "archive_file" "ai_lambda" {
  type        = "zip"
  output_path = "${path.module}/ai_lambda.zip"

  source {
    content  = <<-PYTHON
import boto3
import json
import os

bedrock = boto3.client('bedrock-runtime')

def handler(event, context):
    """
    Simple Bedrock invocation endpoint.

    Event format:
    {
      "prompt": "Explain what Kubernetes is in 2 sentences.",
      "max_tokens": 500,
      "temperature": 0.7
    }
    """
    prompt = event.get('prompt', 'Say hello!')
    max_tokens = event.get('max_tokens', 500)
    temperature = event.get('temperature', 0.7)
    model_id = os.environ.get('MODEL_ID')

    # Bedrock Messages API format
    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "temperature": temperature,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    })

    try:
        response = bedrock.invoke_model(
            modelId=model_id,
            contentType='application/json',
            accept='application/json',
            body=body
        )

        result = json.loads(response['body'].read())

        return {
            'statusCode': 200,
            'body': {
                'response': result['content'][0]['text'],
                'model': model_id,
                'input_tokens': result['usage']['input_tokens'],
                'output_tokens': result['usage']['output_tokens']
            }
        }

    except Exception as e:
        return {
            'statusCode': 500,
            'body': {'error': str(e)}
        }
    PYTHON
    filename = "index.py"
  }
}

# Function URL for quick testing (no API Gateway needed)
resource "aws_lambda_function_url" "ai_endpoint" {
  function_name      = aws_lambda_function.ai_endpoint.function_name
  authorization_type = var.environment == "prod" ? "AWS_IAM" : "NONE"
}

output "ai_endpoint_url" {
  value       = aws_lambda_function_url.ai_endpoint.function_url
  description = "URL to invoke your AI endpoint"
}

output "lambda_function_name" {
  value = aws_lambda_function.ai_endpoint.function_name
}

🧪 Step 4: Test It

After terraform apply, test your AI endpoint:

# Via Lambda Function URL
curl -X POST $(terraform output -raw ai_endpoint_url) \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain what Kubernetes is in 2 sentences.",
    "max_tokens": 200
  }'

# Or via AWS CLI
aws lambda invoke \
  --function-name dev-bedrock-ai \
  --payload '{"prompt": "What is Terraform?", "max_tokens": 200}' \
  --cli-binary-format raw-in-base64-out \
  response.json

cat response.json

Response:

{
  "statusCode": 200,
  "body": {
    "response": "Kubernetes is an open-source container orchestration platform...",
    "model": "anthropic.claude-3-5-sonnet-20241022-v2:0",
    "input_tokens": 14,
    "output_tokens": 52
  }
}

You just called Claude from your own AWS infrastructure. No API keys to manage. No third-party billing. Just IAM roles and Bedrock. ✅

🏢 Step 5: Environment-Aware Module

For production, wrap everything in a reusable module:

# modules/bedrock-endpoint/variables.tf

variable "environment" {
  type = string

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Must be: dev, staging, or prod."
  }
}

variable "allowed_models" {
  type        = list(string)
  description = "List of Bedrock model IDs this environment can access"
  default     = ["anthropic.claude-3-haiku-20240307-v1:0"]  # Cheapest by default
}

variable "default_model" {
  type        = string
  description = "Default model for the Lambda endpoint"
  default     = "anthropic.claude-3-haiku-20240307-v1:0"
}

variable "region" {
  type    = string
  default = "us-east-1"
}

Usage per environment:

# Dev: Cheapest model only
module "bedrock_dev" {
  source      = "./modules/bedrock-endpoint"
  environment = "dev"
  allowed_models = [
    "anthropic.claude-3-haiku-20240307-v1:0"
  ]
  default_model = "anthropic.claude-3-haiku-20240307-v1:0"
}

# Production: Multiple models, stricter access
module "bedrock_prod" {
  source      = "./modules/bedrock-endpoint"
  environment = "prod"
  allowed_models = [
    "anthropic.claude-3-5-sonnet-20241022-v2:0",
    "anthropic.claude-3-haiku-20240307-v1:0",
    "amazon.titan-embed-text-v2:0"
  ]
  default_model = "anthropic.claude-3-5-sonnet-20241022-v2:0"
}

Why this pattern matters: Dev gets the cheapest model (Haiku), prod gets Sonnet. Nobody accidentally racks up a bill testing with the most expensive model. IAM enforces it — not developer discipline. 🛡️

💡 Bedrock Model Quick Reference

Model	Best For	Input Cost/1K tokens	Speed
Claude 3.5 Sonnet	Complex reasoning, coding	$0.003	Fast
Claude 3 Haiku	Simple tasks, high volume	$0.00025	Fastest
Llama 3.1 70B	Open-source alternative	$0.00099	Fast
Titan Text Express	Basic text generation	$0.0002	Fast
Titan Embeddings v2	Vector embeddings for RAG	$0.00002	Fastest

💰 Pro tip: Use Haiku or Titan for dev/testing, Sonnet for production. The cost difference is 10x+.

🎯 What You Just Built

┌─────────────────────────────────────────┐
│  Your Application / curl / API client   │
└──────────────┬──────────────────────────┘
               │
               ▼
┌──────────────────────────┐
│  Lambda Function URL     │
│  (or API Gateway later)  │
└──────────────┬───────────┘
               │
               ▼
┌──────────────────────────┐
│  Lambda Function         │
│  (Python 3.12, 256MB)    │
│  Role: bedrock-invoker   │
└──────────────┬───────────┘
               │ IAM-authenticated
               ▼
┌──────────────────────────┐
│  Amazon Bedrock          │
│  Claude / Llama / Titan  │
│  (Fully managed, no GPU) │
└──────────────────────────┘

All deployed with Terraform. All version-controlled. All reproducible across environments. 🚀
⏭️ What's Next
This is Post 1 of the AWS AI Infrastructure with Terraform series. Coming up:

Post 2: Bedrock Guardrails — Stop your AI from leaking PII or going off-topic
Post 3: Invocation Logging — Track every AI call for compliance and debugging
Post 4: RAG Knowledge Base — Connect your company docs to AI with Bedrock + OpenSearch

You just deployed an AI endpoint in your AWS account with Terraform. No GPUs, no containers, no ML degree. That's the power of Bedrock. 🧠

Found this helpful? Follow for the full AWS AI Infrastructure with Terraform series! 💬

DEV Community