Budiono Santoso

Posted on May 18 • Edited on May 20

Amazon SageMaker AI : SageMaker Studio, vLLM, Gemma 4 and Terraform for Receipt Extraction

#aws #sagemaker #gemma #terraform

Hello everyone. I want to create receipt extraction application using Amazon SageMaker AI. Amazon SageMaker AI is a fully managed service most comprehensive set of AI tools and capabilities to enable high-performance and low-cost AI model development for any use case.

In this blog tutorial, I using Amazon SageMaker AI products such as SageMaker Studio as my online IDE, SageMaker Model, SageMaker Endpoint Configuration and SageMaker Endpoint.

REQUIREMENTS :

AWS account, you can sign up/sign in here
vLLM image for inference and serving, you can see this link
Gemma 4 model for open-source LLM, you can see this link
Terraform AWS provider for Infrastructure as Code, you can see this link
FastAPI for create API, you can see this link

STEP-BY-STEP :

Open Amazon SageMaker AI like this screenshot then click "Set up for a single user" for create SageMaker Studio.
Wait until SageMaker Studio is ready. After SageMaker Studio is ready, click "Open Studio".
This is what SageMaker Studio looks like. Click JupyterLab logo top left corner then click "Create JupyterLab space".
After space is created, click "Run space" then wait until show "Open JupyterLab" is available and status is Running.
Open SageMaker AI (not SageMaker Studio) console then click your Quick setup domain. Click "User profiles" -> click "default-..." then copy execution role for create SageMaker Endpoint step. For app configuration, click "Enable Docker on this domain" to can pull vLLM image and push to Amazon ECR.
Pull vLLM image and push to Amazon ECR with one shell script. In the terminal, run this shell script.

chmod +x vllm-to-ecr.sh
./vllm-to-ecr.sh

Search "0.19.1-gpu-py312-cu129-ubuntu22.04-sagemaker-v1.0" in image tags of vLLM because vLLM 0.19.1 version ready support Gemma 4 and use SageMaker version.

Open and check Amazon ECR private repository "vllm-gemma-4".

Open AWS IAM console then click Roles, search your IAM execution role that already copy before and add some policies because SageMaker Studio as a IDE needs to connect to AWS services. But what happens if you don't add some policies? Yes, access to AWS services is denied.

Then open AWS Service Quotas for request SageMaker Endpoint quota then search "SageMaker" then click "View quotas".

Write "g6.2x large for endpoint" then click "Request increase at account level".

Fill number in Increase quota value then click "Request".

In my case, my request was automatically Approved in just a few seconds. Then why must request an Endpoint quota? If do not request an Endpoint quota, will receive an error during the Endpoint creation process such as your endpoint instance are 0 and you need to request a Service Quota.

Then create SageMaker model, endpoint configuration and endpoint. (Code for this available on GitHub).

After SageMaker Endpoint is created, wait until SageMaker Endpoint is ready for inference and serving. Can see in Deployments -> Endpoints.

You also can see SageMaker model, endpoint configuration and endpoint in Amazon SageMaker AI console like this screenshot. Can see in Deployments & inference -> Deployable models, endpoints and endpoint configurations.

When SageMaker Endpoint is ready for inference and serving, now can test some sample photos (code available on GitHub) and display JSON-based structured output like this.

{"storeName": "OAK STREET MARKET", "purchaseDate": "26-10-2023", "total": 42.58}

NOTE : DELETE your SageMaker model, endpoint configuration and endpoint after testing because this endpoint use real-time endpoint that always running.

So why and how to create SageMaker model, endpoint configuration and endpoint using Terraform? Because Terraform is an Infrastructure as Code (IaC) tool, you can automatically create SageMaker resources faster and delete SageMaker resources with just one line of code.

Install Terraform with run this shell script.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv bash)"

brew install gcc

brew tap hashicorp/tap

brew install hashicorp/tap/terraform

terraform --version

Then create Terraform files such as iam.tf, main.tf and sagemaker.tf in ONE folder (available on GitHub).

iam.tf for create IAM role such as SageMaker execution role and SageMaker ECR policy.
main.tf for Terraform AWS version and AWS region.
sagemaker.tf for create SageMaker model, endpoint configuration and endpoint.

iam.tf

# SageMaker Execution Role
resource "aws_iam_role" "sagemaker" {
  name               = "sagemakerExecutionRole"
  assume_role_policy = jsonencode({
    Version   = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "sagemaker.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

# SageMaker ECR Policy
resource "aws_iam_role_policy_attachment" "ecr" {
  role       = aws_iam_role.sagemaker.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess"
}

main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "6.45.0"
    }
  }
}

provider "aws" {
  region = "us-west-2"
}

sagemaker.tf

data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

# Get AWS Account ID and AWS Region from above data
locals {
  account_id = data.aws_caller_identity.current.account_id
  region     = data.aws_region.current.region
}

# Create SageMaker Model with Gemma 4 on vLLM
resource "aws_sagemaker_model" "model" {
  name                = "gemma-4-receipt-extraction-vllm"
  execution_role_arn  = aws_iam_role.sagemaker.arn
  primary_container {
    image             = "${local.account_id}.dkr.ecr.${local.region}.amazonaws.com/vllm-gemma-4:0.19.1-sagemaker"
    environment       = {
      "SM_VLLM_MODEL" = "google/gemma-4-E4B-it"
    }
  }
}

# Create SageMaker Endpoint Configuration
resource "aws_sagemaker_endpoint_configuration" "endpointConfig" {
  name  = aws_sagemaker_model.model.name
  production_variants {
    variant_name           = "AllTraffic"
    model_name             = aws_sagemaker_model.model.name
    initial_instance_count = 1
    instance_type          = "ml.g6.2xlarge"
  }
}

# Create SageMaker Endpoint
resource "aws_sagemaker_endpoint" "endpoint" {
  name                 = aws_sagemaker_model.model.name
  endpoint_config_name = aws_sagemaker_endpoint_configuration.endpointConfig.name
}

After 3 Terraform files is available, write and run this Terraform script.

terraform init

terraform init is initialize Terraform latest version in main.tf

terraform plan

terraform plan is check make sure your Terraform file is true and right format. If false and wrong format, you must check your Terraform file again.

terraform apply --auto-approve

terraform apply is apply/create all AWS service resources based all Terraform file. If not write --auto-approve, you must write/enter "yes" every want to apply Terraform file.

terraform destroy --auto-approve

terraform destroy is destroy/delete all AWS service resources based all Terraform file. If not write --auto-approve, you must write/enter "yes" every want to apply Terraform file.

Wait until SageMaker endpoint is inService. The result of this Terraform process are same as SageMaker model, endpoint configuration and endpoint screenshots above.

Create API using FastAPI then create main.py file, Dockerfile and requirements.txt file in ONE folder (available on GitHub).

Then create Amazon ECR private repository "receipt-extraction-gemma-4" with run this shell script.

aws ecr create-repository \
    --repository-name "receipt-extraction-gemma-4" \
    --image-scanning-configuration scanOnPush=false \
    --image-tag-mutability MUTABLE \
    --region "us-west-2" 2>/dev/null || echo "Repository already exists, skipping creation."

However, when want to build API app and push to Amazon ECR private repository, display error like this screenshot.

To fix this error, use SageMaker Docker Build. SageMaker Docker Build is CLI tool for building Docker images in SageMaker Studio using AWS CodeBuild.

Install SageMaker Docker Build with run this shell script.

pip install sagemaker-studio-image-build

After SageMaker Docker Build is installed, run this shell script to build and push to ECR and wait until finished.

sm-docker build . --repository receipt-extraction-gemma-4:latest

Result of this process will be like this screenshot.

This ECR receipt extraction image be used for next step using Amazon Elastic Container Service (Express Mode and custom mode) in next blog tutorial.

CONCLUSION :

SageMaker Studio can be used as a online IDE. If you have a laptop with limited specifications, I recommended this online IDE.
vLLM can be used for inference and serving on ECR.
Gemma 4 can running in SageMaker Endpoint.
Terraform help create automatically SageMaker resources faster and also destroy/delete all SageMaker resources faster.
FastAPI help create API for receipt extraction.

DOCUMENTATION :