DEV Community

Vasil Shaikh
Vasil Shaikh

Posted on • Originally published at Medium

Getting Started with AI on AWS: A Practical Guide

Some AI services available on AWS

Hello World 👋

I'm Vasil, a DevOps Engineer with a passion for building reliable, scalable, and well-architected cloud platforms. With hands-on experience across cloud infrastructure, CI/CD, observability, and platform engineering, I enjoy turning complex operational challenges into clean, automated solutions.
I've been working with AWS Cloud for over 5 years, and I believe it's high time I start exploring AI on AWS more deeply. Through these posts, I plan to share practical learnings, real-world experiences, and honest perspectives from my journey in DevOps, Cloud, and now AI.

Without further delay - let's dive in 🚀

Introduction

AWS offers a wide range of AI services today, from ready-to-use APIs like Rekognition and Textract to generative AI platforms such as Amazon Bedrock and SageMaker.
The problem is not lack of choice - it's knowing where to start without overcomplicating things.
This article is written for anyone who wants to learn AI on AWS the right way:

  • without training models on day one
  • without managing GPUs
  • without turning a simple idea into a complex architecture

We'll look at how AWS AI services are structured, how they are typically used in real systems, and how to take your first practical step.

Prerequisites

You don't need a data science background to follow this.
You should have:

  • An AWS account
  • Basic familiarity with IAM and the AWS Console
  • AWS CLI configured locally (I'll be using Cloudshell)

If you can deploy a Lambda function or create an S3 bucket, you're ready.


Architecture Overview: How AI Fits Into AWS Applications

Most AI workloads on AWS follow a simple pattern.

Application → API → AI Service → Response
Enter fullscreen mode Exit fullscreen mode

The AI service is not the application itself. It is just another managed AWS service, similar to DynamoDB or S3.
In practice, this usually means:

  • API Gateway receives a request
  • Lambda handles validation and logic
  • An AI service (Bedrock, Textract, Comprehend, etc.) is called
  • The result is returned or stored

This keeps the system:

  • loosely coupled
  • easy to scale
  • aligned with the AWS Well-Architected Framework

Understanding AWS AI Services (In Simple terms)

Amazon Bedrock

Amazon Bedrock provides access to foundation models through simple API calls.
You send a prompt → You get a response.
No model training. No infrastructure to manage.
This makes Bedrock a good starting point for:

  • text generation
  • summarization
  • chat-style applications

From an architecture perspective, Bedrock behaves like a serverless inference API.


Amazon SageMaker

SageMaker is for cases where you need more control:

  • training your own models
  • fine-tuning existing ones
  • running long-lived inference endpoints

If Bedrock is "plug and play", SageMaker is "build and operate".
For most beginners, SageMaker is not where you start.


Pre-Trained AI Services

AWS also provides purpose-built AI APIs:

  • Rekognition → images and video
  • Textract → documents
  • Transcribe → speech
  • Translate → language translation
  • Comprehend → NLP analysis

These services solve very specific problems and integrate easily with S3 and Lambda.

First Practical Step: Calling Amazon Bedrock

Rather than jumping into a full application, the safest way to start is to call Bedrock directly and understand how it behaves.

Step 1: Check Model Availability

Bedrock is region-specific.

aws bedrock list-foundation-models --region us-east-1

Enter fullscreen mode Exit fullscreen mode

If this returns models, your account is ready.

Step 2: Create an IAM Role (Security First)

Following the Security pillar, never use root or overly permissive roles.

Minimal IAM policy example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": "*"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Attach this policy to:

A Lambda execution role or An IAM user used for experimentation
Note: In production environments, this policy should be scoped down to specific foundation models and regions.

Step 3: Invoke a Model

Example using Meta Llama:

aws bedrock-runtime invoke-model \
  --region us-east-1 \
  --model-id meta.llama3-2-3b-instruct-v1:0 \
  --content-type application/json \
  --accept application/json \
  --cli-binary-format raw-in-base64-out \
  --body '{
    "messages": [
      {
        "role": "user",
        "content": [
          { "text": "Explain AWS AI services in simple terms" }
        ]
      }
    ],
    "max_gen_len": 300,
    "temperature": 0.5
  }' \
  response.json
Enter fullscreen mode Exit fullscreen mode

But wait — what is this max_gen_len & temperature ?

max_gen_len defines the maximum number of tokens the model is allowed to generate in the response. Lower values reduce cost and response time, while higher values allow more detailed outputs.

temperature controls how deterministic the response is. Lower values make outputs more predictable and focused, while higher values introduce more creativity and variation.

After running this command, we have successfully hit our first error!

Let’s try to solve this…

The error message is -

An error occurred (ValidationException) when calling the InvokeModel operation: Invocation of model ID meta.llama3–2–3b-instruct-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of a n inference profile that contains this model.

This error means the meta.llama3-2-3b-instruct-v1:0 model on AWS Bedrock needs an Inference Profile (like a specific configuration for its deployment) instead of direct "on-demand" calling; you must use the profile's ARN or ID in your code, often because it's a larger model needing dedicated resources or cross-region setup, not just pay-per-request.

To resolve this issue, you need to use an inference profile that contains the meta.llama3-2-3b-instruct-v1:0 model

Refer below AWS:rePost article for more info — https://repost.aws/questions/QUEU82wbYVQk2oU4eNwyiong/bedrock-api-invocation-error-on-demand-throughput-isn-s-supported

Let’s now find the inference profile for our model using —

aws bedrock list-inference-profiles --region us-east-1 | grep meta.llama3-2-3b-instruct-v1:0
Enter fullscreen mode Exit fullscreen mode

Copy the inferenceProfileId (This will be our --model-id now)

So the new command is -

aws bedrock-runtime invoke-model \
  --region us-east-1 \
  --model-id us.meta.llama3-2-3b-instruct-v1:0 \
  --content-type application/json \
  --accept application/json \
  --cli-binary-format raw-in-base64-out \
  --body '{
    "messages": [
      {
        "role": "user",
        "content": [
          { "text": "Explain AWS AI services in simple terms" }
        ]
      }
    ],
    "max_gen_len": 300,
    "temperature": 0.5
  }' \
  response.json
Enter fullscreen mode Exit fullscreen mode

BUT, we have a new error!

But this time the error is for the message format in the body which means we are on the right track, let’s fix this.

aws bedrock-runtime invoke-model \
  --region us-east-1 \
  --model-id us.meta.llama3-2-3b-instruct-v1:0 \
  --content-type application/json \
  --accept application/json \
  --cli-binary-format raw-in-base64-out \
  --body '{
    "prompt": "Explain AWS AI services in simple terms",
    "max_gen_len": 300,
    "temperature": 0.5
  }' \
  response.json
Enter fullscreen mode Exit fullscreen mode

And then use cat command to view the response stored in the response.json file

cat response.json | jq
Enter fullscreen mode Exit fullscreen mode

Well done! You’ve just successfully invoked generative AI on AWS — without writing an application or managing any infrastructure.

How This Fits Real Applications

In real systems:

  • The prompt usually comes from an API request
  • Lambda formats the request
  • Bedrock generates a response
  • The result may be stored in S3 or DynamoDB The key idea is simple: *AI is a service call, not a separate system. *

Cost and Quota Considerations

AI services are not free, and costs scale quickly if you are not careful.

Things to watch:

  • Token count (both input and output) directly affects Bedrock cost
  • Repeated calls inside loops can get expensive
  • SageMaker endpoints are billed while running

Always start with:

  • small inputs
  • clear limits
  • usage monitoring

Cost optimization is not optional — it’s part of good architecture.

Final Thoughts

Learning AI on AWS does not require a complex setup.

If you understand how to:

  • call managed services
  • secure them properly
  • control costs

you are already on the right path.

Top comments (0)