DEV Community: Saksham Paliwal

GuardDuty: Your AWS Watchdog

Saksham Paliwal — Wed, 21 Jan 2026 17:45:55 +0000

You deployed your first real production app to AWS last month.

It's running. Users are happy.

And then someone on the security team slacks you: "Hey, we're seeing some weird API calls from your account. You spinning up instances in regions you don't use?"

You weren't.

That's the moment most of us first hear about GuardDuty.

Why does this even exist?

Let's go back to around 2013-2014.

AWS was growing fast. More companies were moving real workloads to the cloud. And attackers noticed.

Here's what was happening: someone would steal AWS credentials, maybe from a leaked GitHub repo or a phishing attack. They'd quietly spin up hundreds of EC2 instances to mine cryptocurrency. Or exfiltrate data from S3 buckets. Or scan for vulnerabilities across entire VPCs.

Companies wouldn't notice for days. Sometimes weeks.

Because here's the thing, AWS gives you logs. CloudTrail logs every API call. VPC Flow Logs show network traffic. DNS logs capture queries.

But nobody was actually watching them in real time!!!

Security teams were drowning in log files. Trying to spot malicious patterns manually was like finding a needle in a haystack the size of a data center.

AWS needed a service that would just... watch. Constantly. And yell when something looked wrong.

That's why GuardDuty launched in 2017.

So what actually is it?

GuardDuty is a threat detection service.

It continuously monitors your AWS account for malicious or unauthorized behavior. Think of it as a security camera that never sleeps and actually knows what suspicious looks like.

It analyzes three main data sources automatically:

VPC Flow Logs (your network traffic), CloudTrail events (API calls and management actions), and DNS logs (what your resources are communicating with).

You don't send GuardDuty these logs. It accesses them directly. You just turn it on.

What does it actually catch?

Here's where it gets practical.

GuardDuty looks for patterns that indicate real attacks. Not just theoretical vulnerabilities, but actual "someone is doing something bad right now" situations.

Common things it detects:

Compromised EC2 instances. Like when your instance starts communicating with known malware command-and-control servers. Or when it's suddenly being used for cryptocurrency mining.

Stolen credentials. If someone's using your IAM credentials from an unusual location or making API calls they've never made before, GuardDuty notices.

Reconnaissance activity. When attackers are probing your infrastructure, port scanning, or trying to map your network.

Data exfiltration attempts. Unusual data transfers or access patterns that look like someone's trying to steal information.

The findings show up in your AWS console with a severity level: Low, Medium, or High.

Each finding explains what happened, which resources are involved, and suggests what to do about it.

When do people actually use this?

Honestly? Most teams turn it on and forget about it.

That's kind of the point.

If you're running anything in production, you should probably have GuardDuty enabled. It's not something you "use" actively like you'd use CloudFormation or Lambda.

You enable it. Set up alerts (usually SNS to Slack or PagerDuty). And then it just runs in the background.

The real question is what you do when it alerts you.

The cost thing nobody talks about upfront

GuardDuty isn't free.

It charges based on the volume of events it analyzes. CloudTrail events, VPC Flow Logs volume, DNS queries.

For a small account, you might pay $5-20 a month. For larger production environments with lots of traffic, it can be hundreds.

There's a 30-day free trial though. Most people start there to see what the actual cost looks like for their usage.

Setting it up is weirdly simple

You literally just enable it in the console.

No agents to install. No log shipping to configure. No complex rules to write.

Go to the GuardDuty section in AWS Console, click "Get Started," click "Enable GuardDuty."

That's it.

Within minutes it starts analyzing your account activity. Findings appear in the console as they're detected.

If you want to get fancy, you can set up automated responses using EventBridge. Like automatically isolating a compromised instance or revoking suspicious credentials.

But honestly? Start simple. Enable it, hook up an SNS topic for alerts, and learn what normal findings look like for your environment.

The multi-account reality

Here's something that confused me early on.

If you're using AWS Organizations (and most companies are), GuardDuty works across all your accounts. You designate one account as the GuardDuty administrator, and it can monitor findings from all member accounts.

This is huge for companies with dozens or hundreds of AWS accounts.

You don't want your security team checking GuardDuty in 50 different accounts. Centralized monitoring makes way more sense.

What it doesn't do

GuardDuty won't prevent attacks.

It detects them. Big difference.

It's not a firewall. It's not blocking malicious traffic. It's not automatically remediating issues.

It's telling you "hey, this thing that just happened looks really suspicious."

What you do about it is up to you.

That's why most teams pair GuardDuty with other services. Security Hub for centralized security management. Systems Manager for automated remediation. WAF for actual blocking at the application layer.

GuardDuty is one piece of your security setup, not the entire thing.

The findings you'll actually see

When you first enable GuardDuty, you might see findings immediately. Or you might see nothing for weeks.

Common early findings that freak people out but are usually fine:

"UnauthorizedAccess:EC2/SSHBruteForce" - Someone's trying to brute force SSH on your instances. If they're internet-facing, this happens constantly. Make sure you're using key-based auth and maybe restrict IPs.

"Recon:EC2/PortProbeUnprotectedPort" - Someone's scanning your ports. Again, super common if you have public IPs. Review your security groups.

The scary findings are the High severity ones about compromised credentials or instances communicating with known malicious IPs. Those need immediate attention.

Learning what's normal for your environment

Here's something they don't tell you in the docs.

Every AWS environment has its own "normal."

You'll get findings that are false positives for your use case. Maybe you have a legitimate reason to access AWS from multiple countries. Maybe your application does unusual API call patterns.

You can suppress findings that aren't relevant. Or adjust your alerting so you're not getting paged for Low severity findings at 3am.

This tuning process takes a few weeks usually. Don't expect perfect signal-to-noise on day one.

Why you should probably just turn it on

Look, I get it. Another AWS service. Another thing to monitor. Another bill.

But here's the reality: if someone compromises your AWS account and you don't catch it quickly, the cost of that incident will make GuardDuty's monthly fee look like pocket change.

Plus, if you're working at a company with any kind of compliance requirements (SOC 2, HIPAA, PCI), having GuardDuty enabled is basically table stakes. Auditors love seeing it.

Even if you're a solo developer running a side project, the free tier gives you a month to see what it catches. You might be surprised.

Enable it in one account. See what findings you get. Learn what they mean.

You don't need to become a security expert overnight. Just having visibility into what's happening in your AWS account is already a huge step forward from where most teams were a few years ago.

And who knows, maybe you'll catch something weird before it becomes a real problem. That's kinda the whole point, right?

AWS Kinesis: What It Is and Why It Exists

Saksham Paliwal — Tue, 20 Jan 2026 17:32:26 +0000

You're building something.

Maybe it's a web app. Maybe it's an analytics dashboard. Maybe it's just a service that needs to log some events.

And then someone on your team says, "we should use Kinesis for this."

And you're like... what? Why? We already have databases. We have queues. We have S3. Why do we need another AWS service?

Yeah, I've been there.

Let me walk you through this.

Why does Kinesis even exist?

Here's the thing.

Around 2013, companies like Netflix and Amazon were dealing with a very specific problem. They had millions of users generating data constantly. Clicks, views, searches, purchases, errors, all happening at the same time.

They needed to process this data as it arrived. Not in batches. Not overnight. Right now.

Traditional databases? Too slow. They're built for storing and querying, not for handling thousands of writes per second from different sources.

Message queues like SQS? Better, but they're designed for job processing, not for streaming massive amounts of continuous data to multiple consumers at once.

So AWS built Kinesis.

It was inspired by Apache Kafka (which came out earlier), but made simpler and fully managed for AWS users.

The core idea was simple: give developers a way to ingest, buffer, and process real-time streaming data without managing servers or worrying about scale.

What actually is Kinesis?

Think of Kinesis as a super fast conveyor belt for data.

You put data onto the belt (producers send records). The belt moves continuously. Multiple teams can watch the belt and grab what they need (consumers read records). The belt keeps moving.

That's it.

More technically, Kinesis is a managed service that lets you collect, process, and analyze streaming data in real time.

It's not a database. It's not a queue in the traditional sense. It's a stream.

Wait, what's the difference between a stream and a queue?

Good question!!!

A queue like SQS is meant for one-to-one delivery. You send a message, one consumer picks it up, it's gone.

A stream like Kinesis is meant for one-to-many delivery. You send a record, it stays in the stream for a while (24 hours by default, up to 365 days if you configure it). Multiple consumers can read the same record independently.

Also, streams preserve order within a partition. Queues don't guarantee order unless you use FIFO queues with extra config.

Streams are for high-throughput, real-time data pipelines. Queues are for task distribution and decoupling services.

When do people actually use Kinesis?

Real-time analytics is a big one.

Let's say you're building a gaming app. You want to track every player action, analyze patterns, detect cheating, update leaderboards, all in real time. Kinesis can ingest millions of events per second and let different services consume that data simultaneously.

Log and event data collection is another common use case.

Instead of writing logs directly to S3 or CloudWatch (which can get expensive or slow), you stream logs to Kinesis. Then you can fan out to multiple destinations: one consumer writes to S3 for long-term storage, another sends to Elasticsearch for searching, another triggers Lambda functions for alerts.

IoT data ingestion also fits perfectly.

Thousands of devices sending sensor data every second? Kinesis handles it. You can process the data in real time, store it, run machine learning models on it, whatever you need.

Clickstream analysis for websites is super common too.

Every click, scroll, hover gets sent to Kinesis. Your analytics team reads the stream to build dashboards. Your recommendation engine reads the same stream to personalize content. Your data science team reads it to train models.

One stream, multiple consumers, all happening live.

The different flavors of Kinesis

AWS actually has a few different Kinesis services, which is confusing at first.

Kinesis Data Streams is the core service. This is what most people mean when they say "Kinesis." You manage capacity (shards), you control retention, you write producers and consumers.

Kinesis Data Firehose is the simpler version. You just point it at a destination (S3, Redshift, Elasticsearch, etc.), and it automatically delivers your streaming data there. No consumers to write. Great for simple ETL pipelines.

Kinesis Data Analytics lets you run SQL queries on streaming data. Useful if you want to do transformations or aggregations in real time without writing code.

Kinesis Video Streams is for video, which is a whole different thing. Not relevant for most backend use cases.

For now, just know Data Streams exists. That's the foundation.

A super basic example

Let's say you're tracking user sign-ups.

You could write sign-up events directly to a database. But what if you also want to send a welcome email, update analytics, sync to a CRM, and trigger a Slack notification?

You'd have to call all those services from your sign-up endpoint. If one fails, you have to handle retries. If you add a new integration, you have to modify the sign-up code.

With Kinesis, you just write the sign-up event to the stream. Done.

Then you have separate consumers: one Lambda function sends the email, another updates analytics, another syncs to your CRM, another posts to Slack.

Each consumer is independent. If one breaks, the others keep working. The stream keeps the data for 24 hours (or longer), so even if a consumer is down, it can catch up later.

Decoupled, scalable, resilient.

How does it actually work under the hood?

Kinesis Data Streams is made up of shards.

A shard is basically a unit of capacity. Each shard can handle 1 MB/sec of writes and 2 MB/sec of reads.

If you need more throughput, you add more shards. AWS handles the infrastructure.

When you write a record to Kinesis, you specify a partition key. Kinesis hashes that key to decide which shard gets the record.

Records with the same partition key always go to the same shard, which means they're ordered relative to each other.

Consumers read from shards and process records in order within each shard.

You don't have to think about this too much at first, but it's good to know.

The catch (because there's always a catch)

Kinesis isn't free.

You pay per shard-hour, plus data ingestion and retrieval costs. If you're processing a lot of data, it adds up.

You also have to manage shard scaling. If your traffic spikes, you might need to manually increase shards (or set up auto-scaling).

There's also a learning curve.

Writing a producer is pretty straightforward. But building a reliable consumer that handles retries, checkpointing, and shard rebalancing? That takes some work. AWS provides libraries (KCL, Kinesis Client Library) to help, but it's still more complex than, say, using SQS.

And if you don't need real-time processing, Kinesis might be overkill.

If batch processing once an hour is fine, just write to S3 and process later. Simpler, cheaper.

So when should I actually use it?

Use Kinesis when:

You need to process data in real time or near real time
You have multiple consumers that need the same data
You need to preserve order within a partition
You're dealing with high-throughput streaming data

Don't use Kinesis when:

Batch processing is good enough
You only have one consumer (use SQS instead)
You need long-term storage as the primary goal (use S3)
You're just getting started and want the simplest solution (start simple, add Kinesis later if you need it)

What about Kafka?

Yeah, Kafka is the open-source equivalent.

Kinesis is easier to set up (fully managed), but Kafka gives you more control and can be cheaper at scale if you run it yourself (or use a managed service like Confluent or MSK).

If you're already deep in the AWS ecosystem, Kinesis is probably easier.

If you need multi-cloud or have very specific requirements, Kafka might be better.

Honestly, the concepts are similar enough that learning one helps you understand the other.

It's one of those tools that makes way more sense once you hit a specific problem. You'll know when you need it because you'll be sitting there trying to process a flood of real-time events and thinking, "there has to be a better way."

That's when you reach for Kinesis.

Until then? Just know it exists. Know roughly what it does. And when the time comes, you'll know where to look.

You're doing great. Keep building, keep learning, and don't stress about knowing every AWS service by heart. Nobody does!!!

What is AWS Bedrock??

Saksham Paliwal — Mon, 19 Jan 2026 15:50:23 +0000

You're sitting in a sprint planning meeting and someone says, "hey, what if we add AI to our customer support?"

And your first thought is probably... "oh no."

Because you've heard the stories. Training models. Managing GPUs. Hiring ML engineers. Spending months just to get something basic working.

That's exactly the problem AWS Bedrock was built to solve.

Why Does Bedrock Even Exist?

Let's rewind a bit.

Around 2022-2023, companies were going absolutely wild over generative AI. ChatGPT had just blown up. Every startup wanted a chatbot. Every enterprise wanted to "leverage AI."

But there was a massive gap.

On one side, you had OpenAI's API, which was great but meant sending all your data to OpenAI's servers. Not ideal if you're in healthcare or finance.

On the other side, you had options like AWS SageMaker, where you could train and host your own models. But that meant becoming an ML engineer basically overnight. You needed to understand model architectures, training pipelines, GPU instances, all of it.

Most dev teams just wanted to add some AI features to their app. They didn't want a PhD in machine learning.

That's the gap Bedrock fills.

So What Actually Is Bedrock?

Think of it as a menu of AI models that you can just... use.

AWS Bedrock is a fully managed service that gives you API access to foundation models from companies like Anthropic (Claude), Meta (Llama), Stability AI, and Amazon's own Titan models.

You pick a model. You make an API call. That's it.

No infrastructure to manage. No GPUs to provision. No model training (unless you want to customize, which we'll get to).

It's serverless, so you only pay for what you use. And all your data stays in your AWS account, which is huge for compliance and security.

When Would You Actually Use This?

Here's the thing, Bedrock isn't for every AI use case. But it's perfect for a bunch of common ones.

Building a chatbot or customer support agent

You need something that can answer questions about your product. With Bedrock, you can use Claude or another model, feed it your documentation through RAG (Retrieval Augmented Generation), and you're basically done.

Content generation

Marketing needs blog posts, product descriptions, social media content. Hook up Bedrock to your CMS and generate drafts at scale.

Document processing and summarization

Got tons of PDFs, meeting notes, or research papers? Bedrock models can summarize them, extract key info, or answer questions about them.

Code generation and assistance

Some models in Bedrock are really good at writing code. You can build internal tools that help your team with boilerplate or documentation.

The pattern here is: if you need AI capabilities but don't want to become an AI company, Bedrock is probably your answer.

How It Actually Works in Practice

Let's say you want to build a simple Q&A bot for your docs.

First, you enable model access in the AWS console. By default, you don't have access to any models. You just click through and enable the ones you want. Takes like two minutes.

Then you can test stuff in the playground. It's literally a chat interface where you can try different models with different prompts.

When you're ready to integrate, you use the AWS SDK (boto3 for Python, for example) to make API calls. Here's what that looks like:

import boto3
import json

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

prompt = "What is serverless computing?"

response = bedrock.invoke_model(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": [
            {"role": "user", "content": prompt}
        ]
    })
)

result = json.loads(response['body'].read())
print(result['content'][0]['text'])

That's it. You're using Claude through Bedrock.

If you need the model to know about your specific data, you set up a Knowledge Base (which uses RAG under the hood) or fine-tune a model with your own dataset.

The Real Advantages (And When They Matter)

You get to compare models super easily

Different models are good at different things. In the playground, you can literally ask the same question to Claude, Llama, and Titan and see which one gives better results for your use case.

Security and compliance are handled

Your data doesn't leave AWS. It's encrypted in transit and at rest. You can use IAM policies, VPC, all the usual AWS security stuff. And Bedrock is HIPAA eligible, SOC compliant, all that.

If you're in finance or healthcare, this is massive.

The pricing is actually pretty reasonable

You pay per token (think of tokens as chunks of text). For testing and small apps, you'll spend like dollars per month. For production stuff, you can use provisioned throughput or batch processing to cut costs by 50% or more.

Guardrails prevent disasters

Bedrock has a feature called Guardrails that filters harmful content, blocks certain topics, and can even catch hallucinations. So your chatbot won't accidentally say something wildly inappropriate.

Things That Might Trip You Up

Real talk, there are a few gotchas.

Model availability varies by region

Not all models are available in all AWS regions yet. So check the docs before you commit to a specific region.

You still need to understand prompting

Just because you have access to AI doesn't mean it'll magically work well. You need to learn prompt engineering. How you phrase your request massively affects the output quality.

Token limits are real

Each model has a context window (how much text it can process at once). If you're trying to analyze a 100-page document in one go, you might hit limits.

Costs can scale surprisingly fast

Those per-token costs add up quick if you're processing lots of data. Always test with small batches first and monitor your usage.

When NOT to Use Bedrock

If you need a highly specialized model for like medical imaging or something super niche, Bedrock probably won't have what you need. You'd want SageMaker or a custom solution.

If you're building the next ChatGPT competitor, you're not using Bedrock. You're training your own models from scratch.

And if you literally just need basic text analysis or simple ML tasks, you might be overcomplicating things. Sometimes a traditional ML model or even regex is enough!!!

Getting Started Is Easy

AWS has a playground right in the console. Just log in, search for Bedrock, enable a model (Claude is a safe bet to start), and start typing prompts.

Play with it for an hour. See what it can do. Then think about where it fits in your stack.

You'll know pretty quick if it's the right tool for what you're building.

Start small, test stuff out, and see where it takes you.

AWS Nova: AI That Scales Cheap

Saksham Paliwal — Sun, 18 Jan 2026 13:15:38 +0000

You know that moment when you're estimating cloud costs for an AI feature and you just... close the tab?

Yeah.

Because GPT-4 pricing looked scary. Claude was amazing but expensive for high-volume stuff. And you're sitting there thinking "I just need to classify some customer emails, why does this cost more than my EC2 bill??"

That's exactly the gap AWS Nova is trying to fill.

Why Nova Even Exists

Let me take you back to 2023-2024.

AWS had Bedrock, which was great. You could access models from Anthropic, Meta, Cohere, all through one API. Super convenient.

But here's what kept happening: customers would prototype something cool with Claude or GPT-4 through Bedrock, love it, then hit production scale and go "wait, WHAT is this going to cost per month?!"

The high-performance models were incredible but pricing made them impractical for a lot of real-world use cases. And the cheaper models? Often not quite good enough.

AWS saw this gap everywhere. Startups burning through runway on inference costs. Enterprises shelving AI projects because the math didn't work.

So in December 2024, they released Nova. Their own family of foundation models, built from scratch, with one clear goal: give you actually good performance at prices that don't make your CFO cry.

What Exactly Is AWS Nova?

Nova is Amazon's own family of foundation models.

Not someone else's models hosted on AWS. These are built by Amazon, for AWS, optimized specifically to run efficiently on their infrastructure.

Think of it like this: you can rent a bunch of different cars (Bedrock's third-party models), or you can use the car the rental company designed specifically for their business model (Nova).

The family has a few different models, each sized for different use cases and budgets.

The Nova Family

Nova Micro is the tiny, super fast one. Great for simple tasks like classification, extraction, basic Q&A. Think "is this email spam?" or "extract the order number from this text."

Cheapest in the family. Ridiculously fast.

Nova Lite steps it up. Better reasoning, longer context, still very affordable. This is your workhorse for most everyday AI tasks.

Chat, summarization, content generation that doesn't need PhD-level reasoning.

Nova Pro is where it gets interesting. This one actually competes with the big names on quality while staying way cheaper. Multimodal too, it can handle text, images, and video.

You'd reach for Pro when Lite isn't cutting it but you still need to watch costs.

Nova Premier is the flagship. Most capable, best reasoning, designed to compete directly with GPT-4 and Claude Sonnet. Still cheaper than those, but not by as much.

This is for when you really need top-tier performance and cost is secondary to quality.

When Would You Actually Use This?

Here's the thing: Nova shines in production workloads where volume matters.

If you're processing thousands or millions of requests, the pricing difference adds up FAST. A feature that would cost $5,000/month on GPT-4 might cost $800 on Nova Pro.

Real scenarios where people are reaching for Nova:

Content moderation at scale. Customer support automation. Document processing pipelines. Chatbots with high traffic. E-commerce product descriptions. Anything where you need "good enough" quality but can't afford premium pricing at volume.

It's also great for experimentation. Want to try adding AI to a feature but not sure if it'll stick? Start with Nova Lite, validate the idea, then optimize from there.

The Multimodal Thing Is Actually Cool

Nova Pro and Premier can handle images and video, not just text.

This matters more than it sounds.

You can send it a screenshot and ask "what's wrong with this UI?" or feed it a product photo and generate descriptions. Or analyze video content without pre-processing it into frames.

All through the same API, billed the same way.

For a lot of real-world apps, this eliminates entire preprocessing pipelines you'd otherwise need.

How It Actually Works (The Basics)

Nova models are available through Bedrock, AWS's managed AI service.

Same API you'd use for Claude or Llama. Same SDKs. Same infrastructure.

Here's what a basic call looks like:

import boto3

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

response = bedrock.invoke_model(
    modelId='amazon.nova-pro-v1:0',
    body=json.dumps({
        "messages": [{"role": "user", "content": "Explain databases simply"}],
        "max_tokens": 500,
        "temperature": 0.7
    })
)

If you've used Bedrock before, this looks identical. That's intentional.

The switching cost between models is basically zero. Try Nova Lite, doesn't work well enough, bump to Pro, done.

The Pricing Reality Check

This is where Nova gets interesting.

Nova Micro: roughly $0.035 per million input tokens. Insanely cheap.

Nova Lite: around $0.06 per million input tokens. Still very affordable.

Nova Pro: about $0.80 per million input tokens. This is where you're balancing cost and quality.

For context, GPT-4 is around $10 per million input tokens. Claude Sonnet is similar.

So if you're processing a million tokens with Nova Pro vs GPT-4, you're looking at roughly $0.80 vs $10. That's a 12x difference.

At scale, that's the difference between "this feature is profitable" and "this feature is bleeding money."

What People Are Actually Building With It

Early adopters are using Nova for some pretty practical stuff.

Summarizing customer support tickets before routing them. Generating product descriptions from specs. Analyzing user feedback at scale. Creating draft responses in internal tools.

One pattern I'm seeing: use Nova Lite/Pro for the bulk work, then use Claude or GPT-4 only for the cases that really need it.

Like a two-tier system. 80% of requests go to Nova, 20% escalate to premium models. Your cost drops massively but quality stays high where it matters.

Things That Might Trip You Up

Nova models are region-specific right now. Not available everywhere Bedrock is.

Check the AWS docs for current region availability before you commit to an architecture.

Also, these are foundation models, not fine-tuned for your specific use case. They're good generalists but if you need domain-specific expertise, you might still need RAG or fine-tuning.

And obviously, these are AWS-only. If you're multi-cloud or cloud-agnostic, vendor lock-in is real. Think through that trade-off.

Should You Care About This?

If you're building anything AI-powered on AWS and cost is a factor, yesss definitely look at Nova.

If you're prototyping and not sure what model you need, start with Nova Lite. It's cheap enough that you can experiment without stress.

If you're already using expensive models through Bedrock and your bill is painful, run some tests with Nova Pro. The performance gap might be smaller than you think.

I'm not saying Nova is better than GPT-4 or Claude at everything. It's not.

But it's good enough for a LOT of real-world use cases, and the pricing makes features financially viable that weren't before.

That's kind of the whole point.

You don't always need the absolute best model. Sometimes you just need one that works well enough and doesn't destroy your budget.

What Is AWS SageMaker, Actually??

Saksham Paliwal — Sat, 17 Jan 2026 17:14:11 +0000

You've been building APIs, deploying containers, managing CI/CD pipelines... and now someone mentions "training a model" and suddenly everyone's talking about GPUs, Jupyter notebooks, and something called SageMaker.

And you're like, wait. I thought we just write code and deploy it?

Yeah, ML is different. Let's talk about it.

Why does SageMaker even exist?

Here's the real story.

Around 2015-2017, companies started actually trying to do machine learning in production. Not just research papers. Real products.

And they hit a wall.

Data scientists would build models on their laptops. Works great! Then they'd try to put it in production and... chaos. The infrastructure team doesn't know what a "training job" is. The model needs specific GPU instances. Where do we store the trained model? How do we version it? How do we serve predictions at scale?

Every company was rebuilding the same infrastructure from scratch.

AWS saw this pain and launched SageMaker in 2017. The pitch was simple: we'll handle all the infrastructure stuff so you can focus on the actual ML part.

So what actually is SageMaker?

Think of it as a managed platform for the entire machine learning workflow.

Not just one thing. A collection of tools that work together.

You get managed Jupyter notebooks for experimentation. You get scalable training infrastructure that spins up when you need it. You get model hosting for serving predictions. You get monitoring, versioning, pipelines, the whole deal.

It's like how you don't manage Kubernetes clusters yourself anymore, you use EKS. Same vibe, but for ML workflows.

When do people actually use this?

you use SageMaker when you're doing ML at a scale where the infrastructure becomes the problem.

If your data scientist is training models on their laptop once a month, you probably don't need it yet.

But when you're:

Training models on datasets that don't fit in memory
Need GPUs but don't want to manage GPU instances yourself
Want to retrain models automatically when new data arrives
Need to serve predictions to thousands of users
Have multiple people working on ML and sharing resources

That's when SageMaker starts making sense.

A lot of teams start with it because their data scientists already know it, or because they're already deep in AWS and want everything in one place.

The main pieces you'll actually touch

Training jobs are probably what you'll see first. Your data scientist writes training code, and SageMaker spins up instances, runs the training, saves the model, and shuts everything down. You only pay for compute time.

Endpoints are how you serve predictions in production. Deploy your trained model, get an HTTPS endpoint, and your apps can call it. Auto-scaling included.

Notebooks are managed Jupyter environments. Your data scientists can experiment without you provisioning instances for them.

Pipelines let you automate the whole workflow. New data arrives, trigger training, evaluate the model, deploy if it's good enough. Standard DevOps stuff but for ML.

What it looks like in practice

Let's say your team trained a model that predicts customer churn.

Training happens through a SageMaker training job. You point it at your data in S3, specify instance type and count, and it handles the rest. The trained model artifact gets saved back to S3.

from sagemaker.sklearn import SKLearn

estimator = SKLearn(
    entry_point='train.py',
    role=role,
    instance_type='ml.m5.xlarge',
    framework_version='1.0-1'
)

estimator.fit({'training': 's3://bucket/data'})

Once trained, you deploy it to an endpoint:

predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium'
)

Now your API can call this endpoint to get predictions. SageMaker handles scaling, health checks, all that infrastructure stuff.

The parts that might confuse you

You're not running Docker containers the normal way. SageMaker has its own conventions for how training code should be structured. There's a learning curve if you're used to standard containerized apps.

Pricing is different. You pay for notebook instances while they're running. You pay for training by the second. Endpoints have hourly charges. It's not like Lambda where you only pay per request.

IAM roles get complicated. SageMaker needs permissions to access S3, write logs, use ECR. Setting this up the first time is... annoying.

Not everything needs SageMaker. If you're just calling OpenAI's API or using a pre-trained model, you don't need any of this. SageMaker is for when you're training and deploying your own models.

What about all the other features?

SageMaker has gotten huge. There's SageMaker Studio (an IDE), Feature Store (for ML features), Model Monitor (for drift detection), Clarify (for bias detection), and like 20 other services.

You don't need to know all of them.

Most teams start with notebooks, training jobs, and endpoints. That's the core loop.

The other stuff you add when you hit specific problems. Model predictions getting worse over time? Then look at Model Monitor. Need to share feature engineering across teams? Feature Store might help.

Don't try to learn everything at once.

When you might NOT want SageMaker

If your team is already deep in GCP, Vertex AI is basically the same thing.

If you want more control and your team is comfortable managing infrastructure, you could run everything on EKS with Kubeflow.

If you're doing very simple ML, sometimes a Flask app serving predictions from a pre-trained model is totally fine.

SageMaker shines when you're scaling ML workloads and want AWS to handle the infrastructure complexity. If that's not your situation yet, it might be overkill.

The real value proposition

Here's what it comes down to.

Machine learning infrastructure is genuinely hard. Managing GPU instances, orchestrating distributed training, serving models at scale, monitoring for drift, versioning everything properly.

You could build all of this yourself. Many companies did.

But it's a ton of undifferentiated heavy lifting. SageMaker lets you skip that part and focus on the actual ML problems you're trying to solve.

For DevOps folks, think of it as the "managed service" approach applied to ML workflows. Same tradeoffs as always: less control, less flexibility, but way faster to get started and someone else handles the ops.

Start small. Spin up a notebook, run through a tutorial, see how training jobs work. The concepts will click way faster when you're actually trying to solve a real problem.

You're already asking the right questions. That's the important part.

RAG on AWS Just Got Simpler with S3 Vector

Saksham Paliwal — Fri, 16 Jan 2026 17:26:03 +0000

You're running a RAG pipeline. Everything's working fine.

Your vectors are sitting in Pinecone or Weaviate, your documents are in S3, and you're paying two separate bills every month.

Then someone on your team asks, "Wait... why are we storing embeddings in a completely different service when our actual data is already in S3?"

Good question, right?

But also... wait, what are embeddings? And what's a RAG pipeline anyway?

Let's back up for a second.

The AI context you need first

Okay so here's what's happening in the AI world right now.

Companies are building chatbots and AI assistants that can answer questions about their own documents. Like, you upload your company's documentation, and users can ask questions in plain English and get answers back.

This is called RAG, which stands for Retrieval-Augmented Generation.

Fancy name, simple idea: the AI retrieves relevant information from your documents, then generates an answer based on what it found.

But here's the problem. Computers don't naturally understand that "How do I reset my password?" and "What's the process for password recovery?" mean the same thing.

That's where embeddings come in.

What are embeddings, really?

An embedding is just a list of numbers that represents meaning.

When you convert text into an embedding, similar meanings get similar numbers. It's like giving every piece of text a mathematical fingerprint based on what it means, not just what words it uses.

So "reset password" and "password recovery" would have very similar embeddings, even though the words are different.

These embeddings are also called vectors. Same thing, different name.

When you have millions of these vectors and you need to find the ones most similar to a user's question? That's called vector search.

And that's what specialized databases like Pinecone and Weaviate are built for.

They're really good at storing millions of these number lists and finding similar ones super fast.

Why this even exists

Here's the thing.

For years, if you wanted to do vector search, you had no choice but to use a specialized vector database. Pinecone, Weaviate, Milvus, whatever. They're great tools, but they're also another service to manage, another bill to pay, another thing that can go down.

Your documents? In S3.

Your embeddings? Somewhere else entirely.

AWS noticed this gap. A lot of teams were already storing massive amounts of data in S3, and many of those teams were also doing AI/ML work that needed vector search. But there was no native way to do vector search directly on S3 data.

So in late 2024, AWS released S3 Metadata and announced plans for S3 Tables with built-in vector search capabilities. The goal was simple: let you store and search vectors right where your data already lives.

No separate database. No data duplication. Just S3.

What is S3 Vector, actually?

S3 Vector isn't a separate product.

It's a capability being built into S3 itself through S3 Tables, which lets you store structured data (including vector embeddings) and query it directly.

Think of it like this: instead of putting your embeddings in Pinecone and your PDFs in S3, you can store both in S3 and search the vectors natively.

The promise is pretty straightforward. You get vector search without leaving the S3 ecosystem. No extra infrastructure, no syncing data between systems, no separate vector DB bill.

The whole flow, step by step

Let me paint the full picture so this actually makes sense.

Let's say you're building that documentation chatbot I mentioned.

The old way:

User uploads a PDF to S3
You break it into chunks (paragraphs or sections)
You send each chunk to an AI model to get embeddings (those number lists)
You store those embeddings in Pinecone or another vector database
You also keep a reference to which S3 file each embedding came from
When a user asks a question, you convert their question into an embedding
You search Pinecone for similar embeddings
You grab the original text from S3
You send that text + the question to an AI to generate an answer

Two separate systems. S3 for files, Pinecone for vectors.

The S3 Vector way:

Steps 1-3 are the same.

But then instead of uploading to Pinecone, you store the embeddings right in S3 alongside your documents.

When a user asks a question, you search directly in S3 for similar vectors.

Everything's in one place.

When would you actually use this?

Okay so here's where it gets practical.

S3 Vector makes sense if you're already deep in the AWS ecosystem and you want to simplify your architecture.

You're building a RAG application. You've got millions of documents in S3. You're generating embeddings for semantic search (that's just a fancy way of saying "search by meaning, not just keywords").

Normally, you'd have to keep S3 and your vector database in sync. If you update a document, you need to regenerate embeddings and update both places.

With S3 Vector, you skip that complexity. Everything lives in S3.

It's not always the right move though!!!

If you need super low-latency vector search at massive scale, dedicated vector databases are still probably better. They're optimized specifically for that workload.

But if you're optimizing for simplicity, cost, or you're already committed to AWS? S3 Vector starts looking pretty good.

The actual setup (very briefly)

I'm not gonna walk through a full tutorial here because honestly, the feature is still pretty new and evolving fast.

But the basic flow looks like this:

You create an S3 Table (this is the new table format AWS introduced). You define your schema, including a column for vector embeddings. You load your data, including the vectors. Then you run queries using SQL-like syntax that includes vector search operations.

Something like:

SELECT * FROM my_table
ORDER BY vector_distance(embedding_column, query_vector)
LIMIT 10

This finds the 10 vectors closest to your query vector. "Closest" meaning most similar in meaning.

It's meant to feel familiar if you've used any vector database before.

The specifics depend on whether you're using S3 Tables directly, integrating with services like Bedrock (AWS's AI service), or going through other AWS AI tools. The ecosystem is still taking shape.

What to watch out for

This is early days.

S3 Vector through S3 Tables is newer than most vector databases you've probably heard of. The feature set is growing, but it's not as mature as Pinecone or Weaviate yet.

Performance characteristics are still being figured out by the community. How does it handle billions of vectors? What's the latency like? How does it scale compared to dedicated solutions?

These are real questions that don't have tons of public benchmarks yet.

Also, you're committing harder to AWS. That might be fine! But it's worth knowing.

So should you care?

If you're just learning about embeddings and vector search, you don't need to stress about this yet.

Get comfortable with the basics first. Understand what embeddings are, play around with a vector database, build a simple RAG pipeline.

Once you've done that? Then S3 Vector becomes interesting.

If you're building something new and you're already in AWS, yeah, definitely keep an eye on this.

If you're trying to reduce operational complexity and your vector search needs are moderate, it could be a really clean solution.

The real power here is architectural simplicity. One less service to manage, one less thing to keep in sync, one less bill to explain to your manager.

That's not nothing.

If you’re already running RAG on AWS it’s worth experimenting with S3 Vector in a side project

Keep building, stay curious, and don't stress about knowing every new feature the day it drops. You're doing great.

AWS Athena: Query Your S3 Data Without Setting Up a Database

Saksham Paliwal — Mon, 12 Jan 2026 15:56:41 +0000

You're staring at terabytes of logs sitting in S3.

Your manager wants a quick report. Something simple. Just count how many 500 errors happened last week.

You know the data's there. It's all in S3. But to query it, you'd need to spin up a database, load all that data in, set up schemas, manage infrastructure...

And you're thinking, "there HAS to be a simpler way to just... ask questions about files."

There is. It's called Athena.

Why Does Athena Even Exist?

Let me take you back to the early 2010s.

S3 was already massive. Companies were dumping logs, analytics data, application events, everything into S3 buckets. It was cheap storage, it was durable, it was perfect.

But here's the problem: S3 is just object storage. You can put files in, you can pull files out. That's it.

If you wanted to actually query that data, you had two options. Download everything locally and grep through it (good luck with that at scale). Or load it all into a proper database like Redshift or RDS first.

Both options were painful for quick analysis.

AWS saw this gap. People needed SQL queries on S3 data without the ceremony of setting up databases.

So in 2016, they launched Athena. Built on top of Presto (an open-source distributed SQL engine), it let you write SQL queries directly against data in S3.

No servers to manage. No data to load. Just point at your S3 bucket and start querying.

So What Actually Is Athena?

Think of Athena as a serverless SQL interface for S3.

You define a table schema that maps to your S3 data structure. Then you write regular SQL queries. Athena reads the files from S3, processes them on-demand, and returns results.

It's not a database. It doesn't store your data separately. It just reads whatever's already in S3 and lets you query it like it's a database.

The whole thing is serverless. You don't provision anything. You just pay per query based on how much data it scans.

When Do People Actually Use This?

Here's where Athena really shines.

Log analysis is probably the biggest use case. Your application logs are streaming into S3via CloudWatch or Kinesis Firehose. You want to check error rates, search for specific events, debug production issues. Athena lets you do that with SQL instead of downloading gigabytes of log files.

Ad-hoc data exploration is another huge one. You've got some CSV files or JSON data dumps sitting in S3. Before building a whole ETL pipeline, you just want to poke around and see what's in there. Athena's perfect for that.

Cost-effective analytics for infrequent queries. If you're not running queries constantly, spinning up a Redshift cluster or RDS instance feels like overkill. Athena charges only when you query, so it's way cheaper for occasional analysis.

Data lake queries are common too. Companies build data lakes in S3 with years of historical data. Athena becomes the query layer on top of that lake.

Here's a super simple example of what an Athena query looks like:

SELECT status_code, COUNT(*) as count
FROM application_logs
WHERE date = '2026-01-11'
  AND status_code >= 500
GROUP BY status_code
ORDER BY count DESC;

That's it. Regular SQL. Nothing weird.

How Does the Schema Thing Work?

This trips people up at first.

Athena needs to know the structure of your data. If you have JSON logs in S3, Athena needs to know which fields exist and what types they are.

You create that mapping using a CREATE TABLE statement. You're not actually creating a table or moving data. You're just telling Athena, "hey, this S3 path has files in this format with these columns."

AWS Glue Crawler can automate this for you. It scans your S3 data and automatically creates the table definitions. Pretty handy when you're getting started.

What About Performance?

Here's the thing: Athena scans data from S3 every single time you query.

If your data is in huge CSV files or uncompressed JSON, queries can be slow and expensive. Athena charges based on data scanned, remember?

This is where file formats matter a lot.

Columnar formats like Parquet or ORC are game-changers. They let Athena read only the columns you actually query, not the whole file. Queries run faster and scan way less data.

Partitioning your data helps too. If you organize files by date like s3://bucket/logs/year=2026/month=01/day=11/, Athena can skip entire partitions when you filter by date.

These optimizations can reduce costs by 10x or more. Not exaggerating.

What Are the Limitations?

Athena isn't a replacement for a real database.

It's designed for analysis, not transactions. You can't UPDATE or DELETE rows. You can only INSERT new data by adding files to S3.

Query performance depends heavily on data format and size. Poorly organized data means slow, expensive queries.

There's also a query timeout of 30 minutes. If your query takes longer than that, it fails. Usually means your data needs better partitioning or format conversion.

And remember, every query scans from S3. There's no caching between queries by default. If you run the same query twice, you pay twice.

Where Does This Fit in Your Stack?

Think of Athena as your "quick question" tool for S3 data.

It's not your primary production database. It's not your real-time analytics engine.

But when you need to investigate something, run a one-off report, or explore data before building a proper pipeline? Athena's incredibly useful.

A lot of teams use it alongside other tools. Logs go to S3, Athena queries them for debugging. Raw data lands in S3, Athena explores it, then a proper ETL moves important stuff to Redshift or RDS for production queries.

It fills a specific gap really well.

Give It a Try

Next time you're staring at data in S3 wishing you could just query it, remember Athena exists.

It's not perfect for everything. But for what it does, it does it really well.

And honestly? The first time you write a SQL query against a bunch of S3 files without setting up any infrastructure, it feels kinda magical.

Start small. Point it at some logs. Run a simple query. See what happens.

You might be surprised how often you reach for it after that!!!