DEV Community

Budiono Santoso
Budiono Santoso

Posted on

Amazon SageMaker AI : SageMaker Studio, vLLM, Gemma 4 and Terraform for Receipt Extraction

Hello everyone. I want to create receipt extraction application using Amazon SageMaker AI. Amazon SageMaker AI is a fully managed service most comprehensive set of AI tools and capabilities to enable high-performance and low-cost AI model development for any use case.

In this blog tutorial, I using Amazon SageMaker AI products such as SageMaker Studio as my online IDE, SageMaker Model, SageMaker Endpoint Configuration and SageMaker Endpoint.

REQUIREMENTS :

  1. AWS account, you can sign up/sign in here
  2. vLLM image for inference and serving, you can see this link
  3. Gemma 4 model for open-source LLM, you can see this link
  4. Terraform AWS provider for Infrastructure as Code, you can see this link
  5. FastAPI for create API, you can see this link

STEP-BY-STEP :

  1. Open Amazon SageMaker AI like this screenshot then click "Set up for a single user" for create SageMaker Studio.
    SageMaker AI

  2. Wait until SageMaker Studio is ready. After SageMaker Studio is ready, click "Open Studio".
    Loading
    Click open studio

  3. This is what SageMaker Studio looks like. Click JupyterLab logo top left corner then click "Create JupyterLab space".
    SageMaker Studio
    JupyterLab space
    Create space

  4. After space is created, click "Run space" then wait until show "Open JupyterLab" is available and status is Running.
    Run space
    Open JupyterLab

  5. Open SageMaker AI (not SageMaker Studio) console then click your Quick setup domain. Click "User profiles" -> click "default-..." then copy execution role for create SageMaker Endpoint step. For app configuration, click "Enable Docker on this domain" to can pull vLLM image and push to Amazon ECR.
    User and app details

  6. Pull vLLM image and push to Amazon ECR with one shell script. In the terminal, run this shell script.

chmod +x vllm-to-ecr.sh
./vllm-to-ecr.sh
Enter fullscreen mode Exit fullscreen mode

Search "0.19.1-gpu-py312-cu129-ubuntu22.04-sagemaker-v1.0" in image tags of vLLM because vLLM 0.19.1 version ready support Gemma 4 and use SageMaker version.
Shell script

Open and check Amazon ECR private repository "vllm-gemma-4".
ECR private repo

Open AWS IAM console then click Roles, search your IAM execution role that already copy before and add some policies because SageMaker Studio as a IDE needs to connect to AWS services. But what happens if you don't add some policies? Yes, access to AWS services is denied.
IAM role
Trust relationship

Then open AWS Service Quotas for request SageMaker Endpoint quota then search "SageMaker" then click "View quotas".
Service Quota
View quota

Write "g6.2x large for endpoint" then click "Request increase at account level".
Endpoint quota

Fill number in Increase quota value then click "Request".
Increase number
Approved

In my case, my request was automatically Approved in just a few seconds. Then why must request an Endpoint quota? If do not request an Endpoint quota, will receive an error during the Endpoint creation process such as your endpoint instance are 0 and you need to request a Service Quota.

Then create SageMaker model, endpoint configuration and endpoint. (Code for this available in Github).

After SageMaker Endpoint is created, wait until SageMaker Endpoint is ready for inference and serving. Can see in Deployments -> Endpoints.
Endpoint in Studio
Endpoint explain

You also can see SageMaker model, endpoint configuration and endpoint in Amazon SageMaker AI console like this screenshot. Can see in Deployments & inference -> Deployable models, endpoints and endpoint configurations.
Model
Endpoint configuration
Endpoint

When SageMaker Endpoint is ready for inference and serving, now can test some sample photos (code available in Github) and display JSON-based structured output like this.

{"storeName": "OAK STREET MARKET", "purchaseDate": "26-10-2023", "total": 42.58}
Enter fullscreen mode Exit fullscreen mode

NOTE : DELETE your SageMaker model, endpoint configuration and endpoint after testing because this endpoint use real-time endpoint that always running.

So why and how to create SageMaker model, endpoint configuration and endpoint using Terraform? Because Terraform is an Infrastructure as Code (IaC) tool, you can automatically create SageMaker resources faster and delete SageMaker resources with just one line of code.

Install Terraform with run this shell script.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv bash)"

brew install gcc

brew tap hashicorp/tap

brew install hashicorp/tap/terraform

terraform --version
Enter fullscreen mode Exit fullscreen mode

Then create Terraform file such as iam.tf, main.tf and sagemaker.tf in ONE folder (available in Github).

  • iam.tf for create IAM role such as SageMaker execution role and SageMaker ECR policy.
  • main.tf for Terraform AWS version and AWS region.
  • sagemaker.tf for create SageMaker model, endpoint configuration and endpoint.
iam.tf

# SageMaker Execution Role
resource "aws_iam_role" "sagemaker" {
  name               = "sagemakerExecutionRole"
  assume_role_policy = jsonencode({
    Version   = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "sagemaker.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

# SageMaker ECR Policy
resource "aws_iam_role_policy_attachment" "ecr" {
  role       = aws_iam_role.sagemaker.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess"
}
Enter fullscreen mode Exit fullscreen mode
main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "6.45.0"
    }
  }
}

provider "aws" {
  region = "us-west-2"
}
Enter fullscreen mode Exit fullscreen mode
sagemaker.tf

data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

# Get AWS Account ID and AWS Region from above data
locals {
  account_id = data.aws_caller_identity.current.account_id
  region     = data.aws_region.current.region
}

# Create SageMaker Model with Gemma 4 on vLLM
resource "aws_sagemaker_model" "model" {
  name                = "gemma-4-receipt-extraction-vllm"
  execution_role_arn  = aws_iam_role.sagemaker.arn
  primary_container {
    image             = "${local.account_id}.dkr.ecr.${local.region}.amazonaws.com/vllm-gemma-4:0.19.1-sagemaker"
    environment       = {
      "SM_VLLM_MODEL" = "google/gemma-4-E4B-it"
    }
  }
}

# Create SageMaker Endpoint Configuration
resource "aws_sagemaker_endpoint_configuration" "endpointConfig" {
  name  = aws_sagemaker_model.model.name
  production_variants {
    variant_name           = "AllTraffic"
    model_name             = aws_sagemaker_model.model.name
    initial_instance_count = 1
    instance_type          = "ml.g6.2xlarge"
  }
}

# Create SageMaker Endpoint
resource "aws_sagemaker_endpoint" "endpoint" {
  name                 = aws_sagemaker_model.model.name
  endpoint_config_name = aws_sagemaker_endpoint_configuration.endpointConfig.name
}
Enter fullscreen mode Exit fullscreen mode

After 3 Terraform file is available, write and run this Terraform script.

terraform init
Enter fullscreen mode Exit fullscreen mode

terraform init is initialize Terraform latest version in main.tf

terraform plan
Enter fullscreen mode Exit fullscreen mode

terraform plan is check make sure your Terraform file is true and right format. If false and wrong format, you must check your Terraform file again.

terraform apply --auto-approve
Enter fullscreen mode Exit fullscreen mode

terraform apply is apply/create all AWS service resources based all Terraform file. If not write --auto-approve, you must write/enter "yes" every want to apply Terraform file.

terraform destroy --auto-approve
Enter fullscreen mode Exit fullscreen mode

terraform destroy is destroy/delete all AWS service resources based all Terraform file. If not write --auto-approve, you must write/enter "yes" every want to apply Terraform file.

Wait until SageMaker endpoint is inService. The result of this Terraform process are same as SageMaker model, endpoint configuration and endpoint screenshots above.

Create API using FastAPI then create main.py file, Dockerfile and requirements.txt file in ONE folder (available in Github).

Then create Amazon ECR private repository "receipt-extraction-gemma-4" with run this shell script.

aws ecr create-repository \
    --repository-name "receipt-extraction-gemma-4" \
    --image-scanning-configuration scanOnPush=false \
    --image-tag-mutability MUTABLE \
    --region "us-west-2" 2>/dev/null || echo "Repository already exists, skipping creation."
Enter fullscreen mode Exit fullscreen mode

However, when want to build API app and push to Amazon ECR private repository, display error like this screenshot.
Docker build error

To fix this error, use SageMaker Docker Build. SageMaker Docker Build is CLI tool for building Docker images in SageMaker Studio using AWS CodeBuild.

Install SageMaker Docker Build with run this shell script.

pip install sagemaker-studio-image-build
Enter fullscreen mode Exit fullscreen mode

After SageMaker Docker Build is installed, run this shell script to build and push to ECR and wait until finished.

sm-docker build . --repository receipt-extraction-gemma-4:latest
Enter fullscreen mode Exit fullscreen mode

Result of this process will be like this screenshot.
ECR receipt extraction image

This ECR receipt extraction image be used for next step using Amazon Elastic Container Service (Express Mode and custom mode) in next blog tutorial.

CONCLUSION :

  • SageMaker Studio can be used as a online IDE. If you have a laptop with limited specifications, I recommended this online IDE.
  • vLLM can be used for inference and serving on ECR.
  • Gemma 4 can running in SageMaker Endpoint.
  • Terraform help create automatically SageMaker resources faster and also destroy/delete all SageMaker resources faster.
  • FastAPI help create API for receipt extraction.

DOCUMENTATION :

Thank you,
Budi

Top comments (0)