DEV Community

Cover image for RAG on AWS Just Got Simpler with S3 Vector
Saksham Paliwal
Saksham Paliwal

Posted on

RAG on AWS Just Got Simpler with S3 Vector

You're running a RAG pipeline. Everything's working fine.

Your vectors are sitting in Pinecone or Weaviate, your documents are in S3, and you're paying two separate bills every month.

Then someone on your team asks, "Wait... why are we storing embeddings in a completely different service when our actual data is already in S3?"

Good question, right?

But also... wait, what are embeddings? And what's a RAG pipeline anyway?

Let's back up for a second.

The AI context you need first

Okay so here's what's happening in the AI world right now.

Companies are building chatbots and AI assistants that can answer questions about their own documents. Like, you upload your company's documentation, and users can ask questions in plain English and get answers back.

This is called RAG, which stands for Retrieval-Augmented Generation.

Fancy name, simple idea: the AI retrieves relevant information from your documents, then generates an answer based on what it found.

But here's the problem. Computers don't naturally understand that "How do I reset my password?" and "What's the process for password recovery?" mean the same thing.

That's where embeddings come in.

What are embeddings, really?

An embedding is just a list of numbers that represents meaning.

When you convert text into an embedding, similar meanings get similar numbers. It's like giving every piece of text a mathematical fingerprint based on what it means, not just what words it uses.

So "reset password" and "password recovery" would have very similar embeddings, even though the words are different.

These embeddings are also called vectors. Same thing, different name.

When you have millions of these vectors and you need to find the ones most similar to a user's question? That's called vector search.

And that's what specialized databases like Pinecone and Weaviate are built for.

They're really good at storing millions of these number lists and finding similar ones super fast.

Why this even exists

Here's the thing.

For years, if you wanted to do vector search, you had no choice but to use a specialized vector database. Pinecone, Weaviate, Milvus, whatever. They're great tools, but they're also another service to manage, another bill to pay, another thing that can go down.

Your documents? In S3.

Your embeddings? Somewhere else entirely.

AWS noticed this gap. A lot of teams were already storing massive amounts of data in S3, and many of those teams were also doing AI/ML work that needed vector search. But there was no native way to do vector search directly on S3 data.

So in late 2024, AWS released S3 Metadata and announced plans for S3 Tables with built-in vector search capabilities. The goal was simple: let you store and search vectors right where your data already lives.

No separate database. No data duplication. Just S3.

What is S3 Vector, actually?

S3 Vector isn't a separate product.

It's a capability being built into S3 itself through S3 Tables, which lets you store structured data (including vector embeddings) and query it directly.

Think of it like this: instead of putting your embeddings in Pinecone and your PDFs in S3, you can store both in S3 and search the vectors natively.

The promise is pretty straightforward. You get vector search without leaving the S3 ecosystem. No extra infrastructure, no syncing data between systems, no separate vector DB bill.

The whole flow, step by step

Let me paint the full picture so this actually makes sense.

Let's say you're building that documentation chatbot I mentioned.

The old way:

  1. User uploads a PDF to S3
  2. You break it into chunks (paragraphs or sections)
  3. You send each chunk to an AI model to get embeddings (those number lists)
  4. You store those embeddings in Pinecone or another vector database
  5. You also keep a reference to which S3 file each embedding came from
  6. When a user asks a question, you convert their question into an embedding
  7. You search Pinecone for similar embeddings
  8. You grab the original text from S3
  9. You send that text + the question to an AI to generate an answer

Two separate systems. S3 for files, Pinecone for vectors.

The S3 Vector way:

Steps 1-3 are the same.

But then instead of uploading to Pinecone, you store the embeddings right in S3 alongside your documents.

When a user asks a question, you search directly in S3 for similar vectors.

Everything's in one place.

When would you actually use this?

Okay so here's where it gets practical.

S3 Vector makes sense if you're already deep in the AWS ecosystem and you want to simplify your architecture.

You're building a RAG application. You've got millions of documents in S3. You're generating embeddings for semantic search (that's just a fancy way of saying "search by meaning, not just keywords").

Normally, you'd have to keep S3 and your vector database in sync. If you update a document, you need to regenerate embeddings and update both places.

With S3 Vector, you skip that complexity. Everything lives in S3.

It's not always the right move though!!!

If you need super low-latency vector search at massive scale, dedicated vector databases are still probably better. They're optimized specifically for that workload.

But if you're optimizing for simplicity, cost, or you're already committed to AWS? S3 Vector starts looking pretty good.

The actual setup (very briefly)

I'm not gonna walk through a full tutorial here because honestly, the feature is still pretty new and evolving fast.

But the basic flow looks like this:

You create an S3 Table (this is the new table format AWS introduced). You define your schema, including a column for vector embeddings. You load your data, including the vectors. Then you run queries using SQL-like syntax that includes vector search operations.

Something like:

SELECT * FROM my_table
ORDER BY vector_distance(embedding_column, query_vector)
LIMIT 10
Enter fullscreen mode Exit fullscreen mode

This finds the 10 vectors closest to your query vector. "Closest" meaning most similar in meaning.

It's meant to feel familiar if you've used any vector database before.

The specifics depend on whether you're using S3 Tables directly, integrating with services like Bedrock (AWS's AI service), or going through other AWS AI tools. The ecosystem is still taking shape.

What to watch out for

This is early days.

S3 Vector through S3 Tables is newer than most vector databases you've probably heard of. The feature set is growing, but it's not as mature as Pinecone or Weaviate yet.

Performance characteristics are still being figured out by the community. How does it handle billions of vectors? What's the latency like? How does it scale compared to dedicated solutions?

These are real questions that don't have tons of public benchmarks yet.

Also, you're committing harder to AWS. That might be fine! But it's worth knowing.

So should you care?

If you're just learning about embeddings and vector search, you don't need to stress about this yet.

Get comfortable with the basics first. Understand what embeddings are, play around with a vector database, build a simple RAG pipeline.

Once you've done that? Then S3 Vector becomes interesting.

If you're building something new and you're already in AWS, yeah, definitely keep an eye on this.

If you're trying to reduce operational complexity and your vector search needs are moderate, it could be a really clean solution.

The real power here is architectural simplicity. One less service to manage, one less thing to keep in sync, one less bill to explain to your manager.

That's not nothing.


If you’re already running RAG on AWS it’s worth experimenting with S3 Vector in a side project

Keep building, stay curious, and don't stress about knowing every new feature the day it drops. You're doing great.

Top comments (0)