Hello World 👋
I'm Vasil, a DevOps Engineer with a passion for building reliable, scalable, and well-architected cloud platforms. With hands-on experience across cloud infrastructure, CI/CD, observability, and platform engineering, I enjoy turning complex operational challenges into clean, automated solutions.
I've been working with AWS Cloud for over 5 years, and I believe it's high time I start exploring AI on AWS more deeply. Through these posts, I plan to share practical learnings, real-world experiences, and honest perspectives from my journey in DevOps, Cloud, and now AI.
Without further delay - let's dive in 🚀
Introduction
AWS offers a wide range of AI services today, from ready-to-use APIs like Rekognition and Textract to generative AI platforms such as Amazon Bedrock and SageMaker.
The problem is not lack of choice - it's knowing where to start without overcomplicating things.
This article is written for anyone who wants to learn AI on AWS the right way:
- without training models on day one
- without managing GPUs
- without turning a simple idea into a complex architecture
We'll look at how AWS AI services are structured, how they are typically used in real systems, and how to take your first practical step.
Prerequisites
You don't need a data science background to follow this.
You should have:
- An AWS account
- Basic familiarity with IAM and the AWS Console
- AWS CLI configured locally (I'll be using Cloudshell)
If you can deploy a Lambda function or create an S3 bucket, you're ready.
Architecture Overview: How AI Fits Into AWS Applications
Most AI workloads on AWS follow a simple pattern.
Application → API → AI Service → Response
The AI service is not the application itself. It is just another managed AWS service, similar to DynamoDB or S3.
In practice, this usually means:
- API Gateway receives a request
- Lambda handles validation and logic
- An AI service (Bedrock, Textract, Comprehend, etc.) is called
- The result is returned or stored
This keeps the system:
- loosely coupled
- easy to scale
- aligned with the AWS Well-Architected Framework
Understanding AWS AI Services (In Simple terms)
Amazon Bedrock
Amazon Bedrock provides access to foundation models through simple API calls.
You send a prompt → You get a response.
No model training. No infrastructure to manage.
This makes Bedrock a good starting point for:
- text generation
- summarization
- chat-style applications
From an architecture perspective, Bedrock behaves like a serverless inference API.
Amazon SageMaker
SageMaker is for cases where you need more control:
- training your own models
- fine-tuning existing ones
- running long-lived inference endpoints
If Bedrock is "plug and play", SageMaker is "build and operate".
For most beginners, SageMaker is not where you start.
Pre-Trained AIÂ Services
AWS also provides purpose-built AI APIs:
- Rekognition → images and video
- Textract → documents
- Transcribe → speech
- Translate → language translation
- Comprehend → NLP analysis
These services solve very specific problems and integrate easily with S3 and Lambda.
First Practical Step: Calling Amazon Bedrock
Rather than jumping into a full application, the safest way to start is to call Bedrock directly and understand how it behaves.
Step 1: Check Model Availability
Bedrock is region-specific.
aws bedrock list-foundation-models --region us-east-1
If this returns models, your account is ready.
Step 2: Create an IAM Role (Security First)
Following the Security pillar, never use root or overly permissive roles.
Minimal IAM policy example:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": "*"
}
]
}
Attach this policy to:
A Lambda execution role or An IAM user used for experimentation
Note: In production environments, this policy should be scoped down to specific foundation models and regions.
Step 3: Invoke a Model
Example using Meta Llama:
aws bedrock-runtime invoke-model \
--region us-east-1 \
--model-id meta.llama3-2-3b-instruct-v1:0 \
--content-type application/json \
--accept application/json \
--cli-binary-format raw-in-base64-out \
--body '{
"messages": [
{
"role": "user",
"content": [
{ "text": "Explain AWS AI services in simple terms" }
]
}
],
"max_gen_len": 300,
"temperature": 0.5
}' \
response.json
But wait — what is this max_gen_len & temperature ?
max_gen_len defines the maximum number of tokens the model is allowed to generate in the response. Lower values reduce cost and response time, while higher values allow more detailed outputs.
temperature controls how deterministic the response is. Lower values make outputs more predictable and focused, while higher values introduce more creativity and variation.
After running this command, we have successfully hit our first error!
Let’s try to solve this…
The error message is -
An error occurred (ValidationException) when calling the InvokeModel operation: Invocation of model ID meta.llama3–2–3b-instruct-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of a n inference profile that contains this model.
This error means the meta.llama3-2-3b-instruct-v1:0 model on AWS Bedrock needs an Inference Profile (like a specific configuration for its deployment) instead of direct "on-demand" calling; you must use the profile's ARN or ID in your code, often because it's a larger model needing dedicated resources or cross-region setup, not just pay-per-request.
To resolve this issue, you need to use an inference profile that contains the meta.llama3-2-3b-instruct-v1:0 model
Refer below AWS:rePost article for more info — https://repost.aws/questions/QUEU82wbYVQk2oU4eNwyiong/bedrock-api-invocation-error-on-demand-throughput-isn-s-supported
Let’s now find the inference profile for our model using —
aws bedrock list-inference-profiles --region us-east-1 | grep meta.llama3-2-3b-instruct-v1:0
Copy the inferenceProfileId (This will be our --model-id now)
So the new command is -
aws bedrock-runtime invoke-model \
--region us-east-1 \
--model-id us.meta.llama3-2-3b-instruct-v1:0 \
--content-type application/json \
--accept application/json \
--cli-binary-format raw-in-base64-out \
--body '{
"messages": [
{
"role": "user",
"content": [
{ "text": "Explain AWS AI services in simple terms" }
]
}
],
"max_gen_len": 300,
"temperature": 0.5
}' \
response.json
BUT, we have a new error!
But this time the error is for the message format in the body which means we are on the right track, let’s fix this.
aws bedrock-runtime invoke-model \
--region us-east-1 \
--model-id us.meta.llama3-2-3b-instruct-v1:0 \
--content-type application/json \
--accept application/json \
--cli-binary-format raw-in-base64-out \
--body '{
"prompt": "Explain AWS AI services in simple terms",
"max_gen_len": 300,
"temperature": 0.5
}' \
response.json
And then use cat command to view the response stored in the response.json file
cat response.json | jq
Well done! You’ve just successfully invoked generative AI on AWS — without writing an application or managing any infrastructure.
How This Fits Real Applications
In real systems:
- The prompt usually comes from an API request
- Lambda formats the request
- Bedrock generates a response
- The result may be stored in S3 or DynamoDB The key idea is simple: *AI is a service call, not a separate system. *
Cost and Quota Considerations
AI services are not free, and costs scale quickly if you are not careful.
Things to watch:
- Token count (both input and output) directly affects Bedrock cost
- Repeated calls inside loops can get expensive
- SageMaker endpoints are billed while running
Always start with:
- small inputs
- clear limits
- usage monitoring
Cost optimization is not optional — it’s part of good architecture.
Final Thoughts
Learning AI on AWS does not require a complex setup.
If you understand how to:
- call managed services
- secure them properly
- control costs
you are already on the right path.









Top comments (0)