Vasil Shaikh

Posted on Dec 25, 2025 • Originally published at Medium

Getting Started with AI on AWS: A Practical Guide

#aws #generativeai #machinelearning #cloud

Hello World 👋

I'm Vasil, a DevOps Engineer with a passion for building reliable, scalable, and well-architected cloud platforms. With hands-on experience across cloud infrastructure, CI/CD, observability, and platform engineering, I enjoy turning complex operational challenges into clean, automated solutions.
I've been working with AWS Cloud for over 5 years, and I believe it's high time I start exploring AI on AWS more deeply. Through these posts, I plan to share practical learnings, real-world experiences, and honest perspectives from my journey in DevOps, Cloud, and now AI.

Without further delay - let's dive in 🚀

Introduction

AWS offers a wide range of AI services today, from ready-to-use APIs like Rekognition and Textract to generative AI platforms such as Amazon Bedrock and SageMaker.
The problem is not lack of choice - it's knowing where to start without overcomplicating things.
This article is written for anyone who wants to learn AI on AWS the right way:

without training models on day one
without managing GPUs
without turning a simple idea into a complex architecture

We'll look at how AWS AI services are structured, how they are typically used in real systems, and how to take your first practical step.

Prerequisites

You don't need a data science background to follow this.
You should have:

An AWS account
Basic familiarity with IAM and the AWS Console
AWS CLI configured locally (I'll be using Cloudshell)

If you can deploy a Lambda function or create an S3 bucket, you're ready.

Architecture Overview: How AI Fits Into AWS Applications

Most AI workloads on AWS follow a simple pattern.

Application → API → AI Service → Response

The AI service is not the application itself. It is just another managed AWS service, similar to DynamoDB or S3.
In practice, this usually means:

API Gateway receives a request
Lambda handles validation and logic
An AI service (Bedrock, Textract, Comprehend, etc.) is called
The result is returned or stored

This keeps the system:

loosely coupled
easy to scale
aligned with the AWS Well-Architected Framework

Understanding AWS AI Services (In Simple terms)

Amazon Bedrock

Amazon Bedrock provides access to foundation models through simple API calls.
You send a prompt → You get a response.
No model training. No infrastructure to manage.
This makes Bedrock a good starting point for:

text generation
summarization
chat-style applications

From an architecture perspective, Bedrock behaves like a serverless inference API.

Amazon SageMaker

SageMaker is for cases where you need more control:

training your own models
fine-tuning existing ones
running long-lived inference endpoints

If Bedrock is "plug and play", SageMaker is "build and operate".
For most beginners, SageMaker is not where you start.

Pre-Trained AI Services

AWS also provides purpose-built AI APIs:

Rekognition → images and video
Textract → documents
Transcribe → speech
Translate → language translation
Comprehend → NLP analysis

These services solve very specific problems and integrate easily with S3 and Lambda.

First Practical Step: Calling Amazon Bedrock

Rather than jumping into a full application, the safest way to start is to call Bedrock directly and understand how it behaves.

Step 1: Check Model Availability

Bedrock is region-specific.

aws bedrock list-foundation-models --region us-east-1

If this returns models, your account is ready.

Step 2: Create an IAM Role (Security First)

Following the Security pillar, never use root or overly permissive roles.

Minimal IAM policy example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": "*"
    }
  ]
}

Attach this policy to:

A Lambda execution role or An IAM user used for experimentation
Note: In production environments, this policy should be scoped down to specific foundation models and regions.

Step 3: Invoke a Model

Example using Meta Llama:

aws bedrock-runtime invoke-model \
  --region us-east-1 \
  --model-id meta.llama3-2-3b-instruct-v1:0 \
  --content-type application/json \
  --accept application/json \
  --cli-binary-format raw-in-base64-out \
  --body '{
    "messages": [
      {
        "role": "user",
        "content": [
          { "text": "Explain AWS AI services in simple terms" }
        ]
      }
    ],
    "max_gen_len": 300,
    "temperature": 0.5
  }' \
  response.json

But wait — what is this max_gen_len & temperature ?

max_gen_len defines the maximum number of tokens the model is allowed to generate in the response. Lower values reduce cost and response time, while higher values allow more detailed outputs.

temperature controls how deterministic the response is. Lower values make outputs more predictable and focused, while higher values introduce more creativity and variation.

After running this command, we have successfully hit our first error!

Let’s try to solve this…

The error message is -

An error occurred (ValidationException) when calling the InvokeModel operation: Invocation of model ID meta.llama3–2–3b-instruct-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of a n inference profile that contains this model.

This error means the meta.llama3-2-3b-instruct-v1:0 model on AWS Bedrock needs an Inference Profile (like a specific configuration for its deployment) instead of direct "on-demand" calling; you must use the profile's ARN or ID in your code, often because it's a larger model needing dedicated resources or cross-region setup, not just pay-per-request.

To resolve this issue, you need to use an inference profile that contains the meta.llama3-2-3b-instruct-v1:0 model

Refer below AWS:rePost article for more info — https://repost.aws/questions/QUEU82wbYVQk2oU4eNwyiong/bedrock-api-invocation-error-on-demand-throughput-isn-s-supported

Let’s now find the inference profile for our model using —

aws bedrock list-inference-profiles --region us-east-1 | grep meta.llama3-2-3b-instruct-v1:0

Copy the inferenceProfileId (This will be our --model-id now)

So the new command is -

aws bedrock-runtime invoke-model \
  --region us-east-1 \
  --model-id us.meta.llama3-2-3b-instruct-v1:0 \
  --content-type application/json \
  --accept application/json \
  --cli-binary-format raw-in-base64-out \
  --body '{
    "messages": [
      {
        "role": "user",
        "content": [
          { "text": "Explain AWS AI services in simple terms" }
        ]
      }
    ],
    "max_gen_len": 300,
    "temperature": 0.5
  }' \
  response.json

BUT, we have a new error!

But this time the error is for the message format in the body which means we are on the right track, let’s fix this.

aws bedrock-runtime invoke-model \
  --region us-east-1 \
  --model-id us.meta.llama3-2-3b-instruct-v1:0 \
  --content-type application/json \
  --accept application/json \
  --cli-binary-format raw-in-base64-out \
  --body '{
    "prompt": "Explain AWS AI services in simple terms",
    "max_gen_len": 300,
    "temperature": 0.5
  }' \
  response.json

And then use cat command to view the response stored in the response.json file

cat response.json | jq

Well done! You’ve just successfully invoked generative AI on AWS — without writing an application or managing any infrastructure.

How This Fits Real Applications

In real systems:

The prompt usually comes from an API request
Lambda formats the request
Bedrock generates a response
The result may be stored in S3 or DynamoDB The key idea is simple: *AI is a service call, not a separate system. *

Cost and Quota Considerations

AI services are not free, and costs scale quickly if you are not careful.

Things to watch:

Token count (both input and output) directly affects Bedrock cost
Repeated calls inside loops can get expensive
SageMaker endpoints are billed while running

Always start with:

small inputs
clear limits
usage monitoring

Cost optimization is not optional — it’s part of good architecture.

Final Thoughts

Learning AI on AWS does not require a complex setup.

If you understand how to:

call managed services
secure them properly
control costs

you are already on the right path.

DEV Community