Hello everyone. I want to create receipt extraction application using Amazon SageMaker AI. Amazon SageMaker AI is a fully managed service most comprehensive set of AI tools and capabilities to enable high-performance and low-cost AI model development for any use case.
In this blog tutorial, I using Amazon SageMaker AI products such as SageMaker Studio as my online IDE, SageMaker Model, SageMaker Endpoint Configuration and SageMaker Endpoint.
REQUIREMENTS :
- AWS account, you can sign up/sign in here
- vLLM image for inference and serving, you can see this link
- Gemma 4 model for open-source LLM, you can see this link
- Terraform AWS provider for Infrastructure as Code, you can see this link
- FastAPI for create API, you can see this link
STEP-BY-STEP :
Open Amazon SageMaker AI like this screenshot then click "Set up for a single user" for create SageMaker Studio.
Wait until SageMaker Studio is ready. After SageMaker Studio is ready, click "Open Studio".
This is what SageMaker Studio looks like. Click JupyterLab logo top left corner then click "Create JupyterLab space".
After space is created, click "Run space" then wait until show "Open JupyterLab" is available and status is Running.
Open SageMaker AI (not SageMaker Studio) console then click your Quick setup domain. Click "User profiles" -> click "default-..." then copy execution role for create SageMaker Endpoint step. For app configuration, click "Enable Docker on this domain" to can pull vLLM image and push to Amazon ECR.
Pull vLLM image and push to Amazon ECR with one shell script. In the terminal, run this shell script.
chmod +x vllm-to-ecr.sh
./vllm-to-ecr.sh
Search "0.19.1-gpu-py312-cu129-ubuntu22.04-sagemaker-v1.0" in image tags of vLLM because vLLM 0.19.1 version ready support Gemma 4 and use SageMaker version.
Open and check Amazon ECR private repository "vllm-gemma-4".
Open AWS IAM console then click Roles, search your IAM execution role that already copy before and add some policies because SageMaker Studio as a IDE needs to connect to AWS services. But what happens if you don't add some policies? Yes, access to AWS services is denied.
Then open AWS Service Quotas for request SageMaker Endpoint quota then search "SageMaker" then click "View quotas".
Write "g6.2x large for endpoint" then click "Request increase at account level".
Fill number in Increase quota value then click "Request".
In my case, my request was automatically Approved in just a few seconds. Then why must request an Endpoint quota? If do not request an Endpoint quota, will receive an error during the Endpoint creation process such as your endpoint instance are 0 and you need to request a Service Quota.
Then create SageMaker model, endpoint configuration and endpoint. (Code for this available in Github).
After SageMaker Endpoint is created, wait until SageMaker Endpoint is ready for inference and serving. Can see in Deployments -> Endpoints.
You also can see SageMaker model, endpoint configuration and endpoint in Amazon SageMaker AI console like this screenshot. Can see in Deployments & inference -> Deployable models, endpoints and endpoint configurations.
When SageMaker Endpoint is ready for inference and serving, now can test some sample photos (code available in Github) and display JSON-based structured output like this.
{"storeName": "OAK STREET MARKET", "purchaseDate": "26-10-2023", "total": 42.58}
NOTE : DELETE your SageMaker model, endpoint configuration and endpoint after testing because this endpoint use real-time endpoint that always running.
So why and how to create SageMaker model, endpoint configuration and endpoint using Terraform? Because Terraform is an Infrastructure as Code (IaC) tool, you can automatically create SageMaker resources faster and delete SageMaker resources with just one line of code.
Install Terraform with run this shell script.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv bash)"
brew install gcc
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
terraform --version
Then create Terraform file such as iam.tf, main.tf and sagemaker.tf in ONE folder (available in Github).
- iam.tf for create IAM role such as SageMaker execution role and SageMaker ECR policy.
- main.tf for Terraform AWS version and AWS region.
- sagemaker.tf for create SageMaker model, endpoint configuration and endpoint.
iam.tf
# SageMaker Execution Role
resource "aws_iam_role" "sagemaker" {
name = "sagemakerExecutionRole"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Service = "sagemaker.amazonaws.com"
}
Action = "sts:AssumeRole"
}
]
})
}
# SageMaker ECR Policy
resource "aws_iam_role_policy_attachment" "ecr" {
role = aws_iam_role.sagemaker.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess"
}
main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "6.45.0"
}
}
}
provider "aws" {
region = "us-west-2"
}
sagemaker.tf
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
# Get AWS Account ID and AWS Region from above data
locals {
account_id = data.aws_caller_identity.current.account_id
region = data.aws_region.current.region
}
# Create SageMaker Model with Gemma 4 on vLLM
resource "aws_sagemaker_model" "model" {
name = "gemma-4-receipt-extraction-vllm"
execution_role_arn = aws_iam_role.sagemaker.arn
primary_container {
image = "${local.account_id}.dkr.ecr.${local.region}.amazonaws.com/vllm-gemma-4:0.19.1-sagemaker"
environment = {
"SM_VLLM_MODEL" = "google/gemma-4-E4B-it"
}
}
}
# Create SageMaker Endpoint Configuration
resource "aws_sagemaker_endpoint_configuration" "endpointConfig" {
name = aws_sagemaker_model.model.name
production_variants {
variant_name = "AllTraffic"
model_name = aws_sagemaker_model.model.name
initial_instance_count = 1
instance_type = "ml.g6.2xlarge"
}
}
# Create SageMaker Endpoint
resource "aws_sagemaker_endpoint" "endpoint" {
name = aws_sagemaker_model.model.name
endpoint_config_name = aws_sagemaker_endpoint_configuration.endpointConfig.name
}
After 3 Terraform file is available, write and run this Terraform script.
terraform init
terraform init is initialize Terraform latest version in main.tf
terraform plan
terraform plan is check make sure your Terraform file is true and right format. If false and wrong format, you must check your Terraform file again.
terraform apply --auto-approve
terraform apply is apply/create all AWS service resources based all Terraform file. If not write --auto-approve, you must write/enter "yes" every want to apply Terraform file.
terraform destroy --auto-approve
terraform destroy is destroy/delete all AWS service resources based all Terraform file. If not write --auto-approve, you must write/enter "yes" every want to apply Terraform file.
Wait until SageMaker endpoint is inService. The result of this Terraform process are same as SageMaker model, endpoint configuration and endpoint screenshots above.
Create API using FastAPI then create main.py file, Dockerfile and requirements.txt file in ONE folder (available in Github).
Then create Amazon ECR private repository "receipt-extraction-gemma-4" with run this shell script.
aws ecr create-repository \
--repository-name "receipt-extraction-gemma-4" \
--image-scanning-configuration scanOnPush=false \
--image-tag-mutability MUTABLE \
--region "us-west-2" 2>/dev/null || echo "Repository already exists, skipping creation."
However, when want to build API app and push to Amazon ECR private repository, display error like this screenshot.
To fix this error, use SageMaker Docker Build. SageMaker Docker Build is CLI tool for building Docker images in SageMaker Studio using AWS CodeBuild.
Install SageMaker Docker Build with run this shell script.
pip install sagemaker-studio-image-build
After SageMaker Docker Build is installed, run this shell script to build and push to ECR and wait until finished.
sm-docker build . --repository receipt-extraction-gemma-4:latest
Result of this process will be like this screenshot.
This ECR receipt extraction image be used for next step using Amazon Elastic Container Service (Express Mode and custom mode) in next blog tutorial.
CONCLUSION :
- SageMaker Studio can be used as a online IDE. If you have a laptop with limited specifications, I recommended this online IDE.
- vLLM can be used for inference and serving on ECR.
- Gemma 4 can running in SageMaker Endpoint.
- Terraform help create automatically SageMaker resources faster and also destroy/delete all SageMaker resources faster.
- FastAPI help create API for receipt extraction.
DOCUMENTATION :
- Amazon SageMaker AI documentation
- vLLM documentation
- Gemma 4 model documentation
- Terraform AWS (Amazon SageMaker) documentation
- FastAPI documentation is same as above requirements.
Thank you,
Budi
Top comments (0)