Pini Shvartsman

Posted on Oct 31 • Originally published at pinishv.com

Amazon S3 Vectors: When Storage Learns to Think

#aws #vectordatabase #ai #rag

AWS did something interesting. They turned S3, the storage service that holds something like half the internet's data, into a vector database. Not a separate service that integrates with S3. Not a new database that happens to live near S3. They built vector search directly into object storage itself.

Amazon S3 Vectors is currently in preview, and it's exactly what it sounds like: you can now store billions of vector embeddings in S3 and query them with sub-second performance. No servers to provision, no clusters to manage, and according to AWS, up to 90% cheaper than running a traditional vector database.

That last number matters. Because the biggest problem with vector databases right now isn't performance. It's cost at scale.

The Problem S3 Vectors Actually Solves

Let's talk about the economics of AI applications for a moment. If you're building a RAG system for a large enterprise, you might need to embed millions of documents. Each document gets chunked, each chunk becomes a vector, and suddenly you're storing hundreds of millions or billions of embeddings.

Traditional vector databases are fast, but they're expensive at scale. They keep everything in memory or on fast SSDs, they run on dedicated compute, and they charge accordingly. For a billion 512-dimension vectors with moderate query loads, you might be paying close to $10,000 per month on a service like Pinecone. That's not a criticism of Pinecone, by the way. High-performance infrastructure costs money.

But here's the thing: most use cases don't need single-digit millisecond latency. A support chatbot can wait 200 milliseconds to retrieve relevant documents. An internal knowledge search can tolerate half a second. A recommendation system that updates nightly doesn't need real-time indexing at all.

S3 Vectors bets that for many real-world applications, "fast enough" is actually fast enough. And if "fast enough" costs 90% less, that changes what becomes economically feasible.

How It Actually Works

The architecture is straightforward. You create a vector bucket in S3 (a special bucket type for vectors), then create one or more vector indexes inside it. Each index has a fixed dimension (up to 4,096) and a distance metric (currently Cosine or Euclidean).

You insert vectors using the PutVectors API, up to 500 at a time. Each vector gets an ID and optional metadata (up to 10 key-value pairs). Under the hood, S3 automatically builds and maintains an index structure for similarity search. The exact algorithms aren't exposed, but they're using some form of approximate nearest neighbor search, likely HNSW or similar.

When you query with QueryVectors, you provide a query vector and optionally filter by metadata. S3 returns the top K most similar vectors (up to 30 results) with their IDs, distances, and metadata. The whole operation typically takes 100 to 300 milliseconds for indexes with millions of vectors.

The key insight is that storage and search are now the same thing. Your embeddings live in S3's durable, elastic object storage, but you can query them semantically without pulling everything into memory first.

Where This Actually Makes Sense

If you're building a RAG system with Amazon Bedrock, S3 Vectors is now the obvious choice for the vector store. Bedrock Knowledge Bases can use it natively. You point Bedrock at your documents, choose an embedding model, and it handles chunking, embedding, and storage in S3 Vectors automatically. When users ask questions, Bedrock queries the index, retrieves relevant chunks, and feeds them to the LLM. The whole pipeline is managed.

For semantic search over large document collections, S3 Vectors shines. Legal firms indexing millions of case documents, media companies searching video archives by content, enterprises making their entire knowledge base searchable by meaning rather than keywords. These are all scenarios where you need massive scale but can tolerate sub-second latency.

For recommendation systems and personalization, it depends. If your embeddings update in batch (nightly retraining, periodic refreshes), S3 Vectors works well. If you need real-time updates per user interaction, it's less suitable. The write throughput limit is currently 5 requests per second per index (though you can batch 500 vectors per request).

For fraud detection and anomaly detection, S3 Vectors provides a cost-effective way to store historical patterns. You might keep recent data in a faster system like OpenSearch for real-time checks, while using S3 Vectors for deep historical comparisons or retrospective analysis.

The pattern is consistent: S3 Vectors is ideal when you have large, relatively stable datasets with moderate query loads. It's not for high-frequency trading systems or real-time ad serving. It's for the long tail of AI applications where scale matters more than the last millisecond of latency.

The Integration Story

AWS built this knowing that different use cases need different performance tiers. That's why S3 Vectors integrates directly with Amazon OpenSearch Service in two ways.

First, you can configure OpenSearch to use S3 Vectors as its storage layer. OpenSearch still handles queries and aggregations, but the actual vector data lives in S3. This dramatically reduces storage costs while keeping OpenSearch's rich query capabilities and hybrid search features.

Second, you can export an S3 Vector index directly into OpenSearch Serverless when you need faster performance. This lets you start with S3 (cheap, massive scale) and promote hot data to OpenSearch (expensive, 10-50ms latency) when usage patterns justify it.

This tiered approach is honestly more interesting than S3 Vectors alone. It acknowledges that not all data has the same access patterns, and different queries have different latency requirements. You can optimize for cost most of the time and performance when it matters.

What About the Competition

The vector database market is crowded. Pinecone, Weaviate, Milvus, Qdrant, ChromaDB, even PostgreSQL with pgvector. They all do vector search, and many do it faster than S3 Vectors.

But S3 Vectors isn't trying to be the fastest. It's trying to be the most practical for AWS customers who already have their data in S3 and want to add semantic search without managing new infrastructure.

The real competition isn't Pinecone or Milvus. It's the decision to not build vector search at all because it seems too expensive or complex. If S3 Vectors makes vector search a standard feature of your data lake rather than a separate project with separate infrastructure, that changes the adoption calculation.

For specialized vector databases, this probably means focusing on what they do better: multi-cloud portability, advanced query capabilities, extreme performance optimization, or specific verticals. The "store vectors and do similarity search" use case just became commoditized on AWS.

The Limitations You Should Know

S3 Vectors has real constraints. Each index caps at 50 million vectors in preview. If you need more in a single semantic space, you'll need to partition across multiple indexes or request a limit increase. This is probably the biggest operational consideration.

Query results are limited to top 30. You can't ask for the top 100 candidates for reranking. You get 30, period. For most applications that's fine, but if your workflow depends on large candidate sets, you'll need to adapt.

Distance metrics are limited to Cosine and Euclidean. No dot product (though you can normalize vectors to make cosine equivalent), no Manhattan distance, no custom metrics. This covers most use cases but not all.

Metadata is limited to 10 fields and 2KB of filterable data per vector. If you need complex metadata structures or heavy filtering logic, you might need to combine S3 Vectors with another system that handles the metadata side.

Write throughput is throttled to 5 requests per second per index. For bulk ingestion, you batch 500 vectors per request, giving you roughly 2,500 vectors per second per index. That's fine for batch loads but not for high-frequency streaming ingestion.

And crucially, it's in preview. No SLA, API might change, and it's only available in five regions right now (us-east-1, us-east-2, us-west-2, eu-central-1, and ap-sydney-1).

What This Means Strategically

AWS is doing what AWS does: taking a capability that startups innovated on and building it into the platform as a basic feature. They did this with databases (RDS, Aurora), with search (OpenSearch), with machine learning (SageMaker), and now with vector search.

This creates pressure on standalone vector database companies to differentiate on something other than basic storage and similarity search. Speed, features, multi-cloud, ease of use. The floor just got raised.

For AWS, this strengthens data gravity. If your embeddings live in S3 alongside your source data, and Bedrock can use them directly, and SageMaker can access them easily, you're less likely to move workloads to another cloud. It's not lock-in exactly, but it's definitely friction.

The broader impact might be democratization. Vector search stops being a specialized project requiring evaluation of multiple vendors and becomes something you just turn on in your existing data lake. That probably expands the market more than it cannibalizes existing solutions.

The Cost Reality

Let's ground this in actual numbers. For 1 billion 512-dimension vectors with about 50 million queries per month:

Storage costs roughly $46 per month (2TB at $0.023/GB). Indexing those billion vectors costs around $205 as a one-time charge ($0.10 per million write operations). Monthly queries cost about $20 ($0.40 per million query operations).

Total: around $271 per month after the initial indexing cost.

Compare that to running dedicated vector database infrastructure, where you're provisioning compute regardless of actual usage, and the numbers make sense for many use cases.

The catch: this assumes relatively stable data with moderate query loads. If you're constantly rewriting vectors or running millions of queries per day, the math changes. But for knowledge bases, document search, and periodic recommendations, the economics are compelling.

When Not to Use S3 Vectors

Be honest about your requirements. If you need sub-50ms query latency consistently, S3 Vectors isn't the answer. Use OpenSearch with in-memory indexes, or a dedicated vector database like Pinecone.

If your application requires extremely high query throughput (thousands of queries per second sustained), S3 Vectors will likely hit limits. It's not designed for that load profile.

If you need advanced query features like vector arithmetic, multi-vector queries, or complex boolean logic beyond basic metadata filtering, you'll need a more sophisticated system.

If you can't use AWS for compliance reasons, obviously S3 Vectors isn't an option. And if you need multi-cloud portability, tying yourself to an AWS-specific service might not align with your architecture.

And if your embeddings change frequently, the write throttling becomes a real constraint. This isn't for real-time streaming scenarios where vectors update per user interaction.

The Bottom Line

S3 Vectors doesn't replace purpose-built vector databases. It provides a different trade-off: lower cost and zero infrastructure management in exchange for moderate latency and less control.

For many AI applications, especially those built on AWS already, that trade-off makes complete sense. The difference between 50ms and 250ms query time often doesn't matter to end users. The difference between $10,000 and $500 per month absolutely does matter to businesses.

The most interesting aspect isn't the technology itself. It's what becomes possible when the cost barrier drops by 90%. Suddenly it's economically feasible to embed everything, to make your entire data lake semantically searchable, to keep years of vector history for analysis.

We're probably entering a phase where vector search becomes table stakes infrastructure, like key-value stores or message queues. Not every application needs it, but it's available when you do, at a price point that doesn't require a business case review.

Whether S3 Vectors specifically becomes the standard or whether it just forces the whole market to compete on cost and simplicity, the outcome is probably the same: vector search stops being a specialized tool and becomes basic infrastructure.

And that's when things get interesting.

DEV Community