Siddhant Khare

Posted on Nov 6

AWS S3 Vectors at scale: Real performance numbers at 10 million Vectors

#database #machinelearning #performance #aws

Introduction

AWS S3 Vectors promises "billions of vectors with sub-second queries" and up to 90% cost savings over traditional vector databases. These claims sound good on paper, but implementation details matter. How does performance actually scale? What's the accuracy trade-off? Are there operational gotchas?

This post presents empirical benchmarks testing S3 Vectors from 10,000 to 10 million vectors, comparing performance and accuracy against FAISS and NMSLib. All code used boto3 on us-east-1, measuring real-world query latency including network overhead.

What is S3 Vectors?

S3 Vectors is AWS's managed vector search service that stores and queries vector embeddings directly in S3. Key characteristics:

Native S3 integration with standard durability/availability guarantees
Maximum 50 million vectors per index
Maximum 4096 dimensions per vector
Supports cosine similarity and euclidean distance
Accessed via boto3 query_vectors API

The value proposition is operational simplicity and cost reduction. You don't manage infrastructure, handle index building, or worry about scaling - you just store vectors in S3 and query them.

Experimental Setup

Dataset

Primary dataset: UKBench

10,200 images containing 2,550 distinct objects (4 images per object)
Used for both queries and database (search should return same object images)
Metric: Recall@4 - percentage of correct object images in top 4 results
Since query images exist in database, top result is always the query itself

Distractor dataset: Microsoft COCO 2017

Random crops used to scale database to 10M vectors
Provides realistic noise for large-scale testing

Vector Embeddings

DINOv3 (self-supervised vision transformer) for image embeddings:

Model	Vector Dimensions
ViT-S/16 distilled	384
ViT-B/16 distilled	768
ViT-L/16 distilled	1024

Chose DINOv3 for strong performance on image retrieval tasks without fine-tuning.

Infrastructure

S3 Vectors: us-east-1 bucket, queries from CloudShell (same region)
Local baseline: Intel Core i7-13700KF (16c/24t), 32GB RAM
Measurement: Query time from sending vector via query_vectors to receiving results
- Does NOT include embedding generation time
- DOES include network latency and API overhead
- Measured per individual query (not batched)

Comparison Methods

FAISS (Facebook AI Similarity Search)

IndexHNSWFlat with m=32, efConstruction=512
Graph-based approximate nearest neighbor search
Run locally (no network overhead)

NMSLib (Non-Metric Space Library)

HNSW method with default parameters
Another HNSW implementation for comparison
Run locally (no network overhead)

Brute-force search

NumPy inner product (@operator) computed per query
True nearest neighbors (100% recall baseline)
Run locally (no network overhead)

Important caveat: Local execution eliminates network latency, giving FAISS/NMSLib inherent speed advantages unrelated to algorithm quality.

Results: Scaling Vector Count

Testing from 10K to 10M vectors with 384-dimensional embeddings, topK=5:

Vectors	Query Time (ms)	Recall@4
10,200	112	0.968
100,000	137	0.973
500,000	170	0.969
1,000,000	207	0.969
10,000,000	382	0.908

Absolute Processing Time

S3 Vectors query time grows from 112ms at 10K vectors to 382ms at 10M vectors - a 3.4x increase for a 1000x data increase.

Key Observations

Query latency scales sublinearly: Moving from 10K to 10M vectors (1000x increase) results in only 3.4x latency increase. This suggests efficient indexing that doesn't degrade linearly with dataset size.

Sub-second queries achieved: At 10M vectors, queries complete in 382ms. AWS's "sub-second" claim holds at this scale.

Accuracy remains strong: Recall@4 stays above 90% even at 10M scale. The drop from 0.97 to 0.91 indicates some accuracy trade-off with scale, but still delivers relevant results.

Fixed overhead dominates at small scale: The 112ms baseline at 10K vectors includes network/API overhead. This makes S3 Vectors less competitive for small datasets where local search would be faster.

Comparison: S3 Vectors vs Alternatives

Absolute Query Times

Vectors	FAISS (local)	NMSLib (local)	S3 Vectors	Brute-force (local)
10,200	0.03 ms	0.02 ms	112 ms	0.05 ms
100,000	0.06 ms	0.03 ms	137 ms	2.78 ms
1,000,000	0.10 ms	0.05 ms	207 ms	25.6 ms
10,000,000	0.27 ms	0.09 ms	382 ms	381 ms

Local execution is orders of magnitude faster due to no network overhead. However, this ignores infrastructure costs.

Processing Time Ratio (Normalized to 10K baseline)

To understand scaling behavior independent of fixed costs, normalize each method's 10K time to 1.0:

Vectors	FAISS	NMSLib	S3 Vectors	Brute-force
10,200	1.0x	1.0x	1.0x	1.0x
1,000,000	2.7x	2.4x	1.8x	512x
10,000,000	8.1x	5.1x	3.4x	7620x

S3 Vectors scales better than FAISS/NMSLib when normalized. This is surprising and suggests AWS's indexing approach handles growth efficiently.

Note: This comparison has limitations. Different HNSW parameters would change FAISS/NMSLib results. The key takeaway is that S3 Vectors' scaling characteristics are competitive with established ANN libraries.

Accuracy Comparison

Vectors	FAISS	NMSLib	S3 Vectors	Brute-force
10,200	0.970	0.950	0.968	0.970
1,000,000	0.970	0.930	0.969	0.970
10,000,000	0.910	0.800	0.908	0.970

At 10M scale, S3 Vectors matches FAISS accuracy and significantly outperforms NMSLib (though this is likely due to parameter tuning differences rather than fundamental algorithm quality).

Accuracy degrades with scale for all ANN methods. This is expected - approximate search trades some accuracy for speed. The degradation rate for S3 Vectors is comparable to tuned FAISS.

Impact of Vector Dimensionality

Testing dimension scaling with 100K vectors:

Dimensions	Query Time (ms)	Recall@4
384	137	0.973
768	151	0.983
1024	158	0.988
4096	215	0.988

Dimension scaling is gentle: Going from 384 to 4096 dimensions (10.7x increase) adds only 57% latency. Higher dimensional vectors capture more information, improving accuracy with modest performance cost.

Dimensionality reduction likely unnecessary: The small performance gain from reducing dimensions probably isn't worth the accuracy loss for most use cases.

Operational Findings

1. topK Returns K-1 Results Intermittently

Issue: The topK parameter specifies how many results to return, but approximately 20% of queries return K-1 results instead of K.

Details:

No reproducible pattern
Occurs across different K values
Same query returns different result counts on repeated execution
No documented explanation in AWS docs

Impact: Applications must handle variable result counts. Cannot assume topK results will always return.

Workaround: Request topK+1 if exactly K results are required, though this doesn't guarantee K results either.

2. Vector Deletion is Extremely Slow

Measurement: delete_vectors processes 3-4 vectors per second via boto3.

Comparison: put_data inserts ~500 vectors per second (100x faster).

Impact: Deleting large numbers of vectors is impractical. For 10M vectors, deletion would take ~30 days.

Recommendation: For bulk deletion, recreate the vector index rather than delete individual vectors.

3. Vector Ingestion at Scale

Rate: put_data accepts maximum 500 vectors per call, completing in ~1 second for low dimensions.

At 10M scale: Full ingestion takes approximately 5-6 hours with 384-dim vectors.

Dimension impact: At 4096 dimensions, 500-vector batches sometimes fail, suggesting payload size limits. Reduce batch size for high-dimensional vectors.

4. Indexing Appears Incremental

Observation: Queries return results immediately after inserting vectors, even during ongoing bulk inserts.

Implication: S3 Vectors likely builds/updates indexes during insertion rather than requiring a separate indexing phase. This differs from traditional vector databases that build indexes after bulk load.

Advantage: No downtime waiting for index construction. New vectors become searchable quickly.

When to Use S3 Vectors

Good Fit

Cost-sensitive applications: 90% cost savings over dedicated vector DBs adds up at scale.

Moderate latency requirements: 100-500ms query latency is acceptable for many applications (semantic search, recommendation systems, content discovery).

Operational simplicity priority: No infrastructure to manage, automatic scaling, S3's durability guarantees.

Growing datasets: Sublinear scaling means performance stays reasonable as data grows.

Integration with AWS services: Native S3 storage works well with Lambda, Bedrock, SageMaker.

Poor Fit

Ultra-low latency requirements: If you need <10ms queries, local FAISS/NMSLib will outperform.

Small datasets: Network overhead dominates at small scale. Local search is faster for <100K vectors.

Frequent bulk deletions: Deletion performance makes this operationally painful.

Exact nearest neighbors required: ANN trade-offs mean 90-97% recall, not 100%.

Extremely large scale (>50M per index): Requires multiple indexes and custom orchestration.

Practical Recommendations

Start with S3 Vectors for new projects: Unless you have proven low-latency requirements, the operational benefits outweigh performance differences.
Monitor the topK bug: Build result count validation into your application logic.
Design for immutable vectors: Given slow deletion, treat vectors as append-only when possible.
Batch queries if possible: While this benchmark tested single queries, batching multiple queries per API call would amortize network overhead.
Test with your data: Accuracy and performance depend on vector characteristics. Run your own benchmarks with representative data.
Plan for multi-index if scaling beyond 50M: Design shard-aware query distribution early.

Conclusion

S3 Vectors delivers on its core promise: you can query 10 million vectors in under 400ms with ~91% recall, and costs are significantly lower than dedicated vector databases.

The sublinear scaling characteristics are impressive - performance degrades gracefully as datasets grow. Accuracy remains competitive with tuned FAISS at scale.

However, operational quirks exist: the topK bug needs workarounds, deletion is impractically slow, and small datasets don't benefit from the service.

For most ML applications where 100-500ms latency is acceptable and you value operational simplicity over raw speed, S3 Vectors is a strong default choice. The "cheap managed alternative" has become a legitimate first-class option.

Methodology Notes

All measurements represent single-query latency (no batching)
Query times include network and API overhead for S3 Vectors
Local methods (FAISS/NMSLib/brute-force) exclude network overhead
Each data point represents average across all 10,200 UKBench queries
HNSW parameters chosen for reasonable defaults, not exhaustive tuning
Code available at https://github.com/Siddhant-K-code/s3-vectors-benchmark

DEV Community