Last week, I got a Slack message from our Finance Team that made my stomach drop: "Why is our Pinecone bill $4,200 this month?" We're running a mid-sized RAG application with about 50 million vectors, and our database costs had quietly become our second-largest AWS expense.
Then AWS dropped S3 Vectors in their December announcement. The promise? Store and query vectors at up to 90% lower cost than specialized databases. I was skeptical. Vector databases are fast, purpose-built, and reliable. Could object storage really compete?
We spent two weeks migrating one of our production indexes from Pinecone to S3 Vectors. Here's what we learned, what worked, and when you should (and shouldn't) make the switch.
The Vector Database Pricing Problem
Let's talk numbers. Specialized vector databases like Pinecone, Weaviate, and Qdrant are incredible engineering feats. They deliver sub-10ms query latency and handle billions of vectors. But that performance comes at a cost.
Monthly Cost Comparison (50M vectors, 768 dimensions)
- Pinecone: $420/month
- Weaviate: $356/month
- Qdrant Cloud: $315/month
- S3 Vectors: $42/month ✓
For our workload—storing product embeddings for semantic search with about 50,000 queries per day—Pinecone was costing us roughly $420/month. After migration, our S3 Vectors bill landed at $42/month. That's a 90% reduction, exactly as advertised.
Reality check: This isn't an apples-to-apples comparison. Pinecone delivers consistent single-digit millisecond latencies. S3 Vectors gives you sub-second for infrequent queries and around 100ms for frequent ones. The question isn't "which is better"—it's "which matches your needs?"
Understanding S3 Vectors Architecture
S3 Vectors introduces a new bucket type specifically designed for vector data. Think of it as S3's answer to the vector database market, but with a fundamentally different architectural approach.
Key Concepts
Vector Buckets: A new bucket type optimized for vector storage with dedicated APIs for vector operations.
Vector Indexes: Organize vectors within buckets. Each index can hold up to 2 billion vectors.
Strong Consistency: Immediately access newly written data—no eventual consistency delays.
Integrated Metadata: Store up to 50 metadata keys per vector for powerful filtering.
What Makes It Different
Traditional vector databases optimize for one thing: speed. They keep everything in memory or on fast SSDs, pre-compute indexes, and maintain distributed clusters for horizontal scaling. It's like keeping your entire library on your desk—instant access, but you're paying rent for all that desk space.
S3 Vectors takes the opposite approach. It's built on S3's object storage foundation, which means your vectors live on cheaper disk-based storage. AWS uses clever caching and optimization to deliver reasonable query performance without the memory overhead. Think of it as a well-organized warehouse—it takes a bit longer to retrieve items, but storage is cheap.
The Migration Process: Step by Step
We migrated our product search index (52 million vectors, 768 dimensions from OpenAI's text-embedding-3-large) from Pinecone to S3 Vectors. Here's the exact process we followed.
Step 1: Create Your S3 Vector Bucket
First, set up the infrastructure through the AWS Console or CLI:
# Create a vector bucket
aws s3api create-vector-bucket \
--bucket my-vectors \
--region us-east-1
# Create a vector index
aws s3api create-vector-index \
--bucket my-vectors \
--index-name product-embeddings \
--dimensions 768 \
--distance-metric cosine
We chose cosine similarity because it matches what we were using in Pinecone. If you're using different distance metrics (Euclidean, dot product), adjust accordingly.
Step 2: Export Data from Pinecone
Pinecone doesn't have a built-in export feature, so you'll need to fetch all vectors:
import pinecone
import json
# Initialize Pinecone
pinecone.init(api_key="your-api-key")
index = pinecone.Index("product-embeddings")
# Fetch all vectors (paginated)
vectors = []
for ids in fetch_all_ids(): # Your pagination logic
batch = index.fetch(ids=ids)
vectors.extend(batch['vectors'].values())
# Save to file for backup
with open('vectors_backup.json', 'w') as f:
json.dump(vectors, f)
Pro tip: This took us about 3 hours for 52M vectors. Start this during off-hours and implement retry logic—network hiccups happen.
Step 3: Transform and Upload to S3 Vectors
S3 Vectors has a slightly different data format. Here's how we handled the transformation:
import boto3
import numpy as np
s3_client = boto3.client('s3')
def upload_batch(vectors_batch):
# S3 Vectors expects this format
formatted_vectors = []
for v in vectors_batch:
formatted_vectors.append({
'id': v['id'],
'values': v['values'],
'metadata': v.get('metadata', {})
})
# Upload in batches of 1000
response = s3_client.insert_vectors(
Bucket='my-vectors',
IndexName='product-embeddings',
Vectors=formatted_vectors
)
return response
# Process in batches
BATCH_SIZE = 1000
for i in range(0, len(vectors), BATCH_SIZE):
batch = vectors[i:i+BATCH_SIZE]
upload_batch(batch)
print(f"Uploaded {i+BATCH_SIZE}/{len(vectors)} vectors")
Upload throughput: We sustained about 1,000 vectors per second, so the full upload took roughly 14 hours. Run this as a background job.
Step 4: Update Your Application Code
The API differences are minimal. Here's a before/after comparison:
# BEFORE: Pinecone query
results = index.query(
vector=query_embedding,
top_k=10,
include_metadata=True,
filter={"category": "electronics"}
)
# AFTER: S3 Vectors query
response = s3_client.query_vectors(
Bucket='my-vectors',
IndexName='product-embeddings',
QueryVector=query_embedding,
MaxResults=10,
MetadataFilters={
'category': {'StringEquals': 'electronics'}
}
)
# Parse results (format is slightly different)
results = [{
'id': match['Id'],
'score': match['Score'],
'metadata': match['Metadata']
} for match in response['Matches']]
Step 5: Test and Validate
We ran both systems in parallel for a week, comparing results:
- Query accuracy: 99.2% match rate (the 0.8% difference came from slight numerical precision variations)
- Latency: Averaged 120ms vs Pinecone's 8ms
- No dropped queries or timeouts during peak hours
Performance Benchmarks: The Real Numbers
Here's what we measured in production over two weeks:
Query Latency Comparison
| Metric | Pinecone | S3 Vectors |
|---|---|---|
| P50 Latency | 6ms | 95ms |
| P95 Latency | 12ms | 180ms |
| P99 Latency | 25ms | 450ms |
| Cold Start | N/A | 850ms |
The latency increase was noticeable but acceptable for our use case. Our users are searching a catalog, not expecting instant autocomplete. The ~100ms difference isn't perceptible in this context.
When Latency Matters
If you're building real-time recommendation engines, chatbots with instant responses, or high-frequency trading systems, those extra milliseconds compound. For a chatbot responding to 10 vector queries per message, that's an extra second of wait time—enough to feel sluggish.
Cost Breakdown: Where the Savings Come From
Pinecone Standard: $420/month
- Storage: $0.30/GB → $270
- Read Units: 1.5M/day → $130
- Write Units: 50K/day → $20
- High-performance in-memory infrastructure
S3 Vectors: $42/month ✓
- Storage: $0.025/GB → $22
- PUT requests: 1GB/mo → $12
- Query requests: 1.5M → $8
- Object storage with vector optimization
The storage cost difference is the biggest factor. Pinecone keeps your vectors in memory or fast SSDs for speed. S3 uses cheaper disk-based storage with intelligent caching. For infrequently accessed data, you win massively on cost.
When to Use S3 Vectors vs Dedicated Databases
Decision Matrix
| Use Case | S3 Vectors | Pinecone/Weaviate |
|---|---|---|
| Document search (low QPS) | ✓ Perfect fit | Overkill |
| RAG applications | ✓ Great for most | Better for high-volume |
| Semantic search (product catalogs) | ✓ Works well | If sub-50ms needed |
| Real-time recommendations | ✗ Too slow | ✓ Ideal |
| Chatbot context retrieval | Borderline | ✓ Better UX |
| Batch processing/analytics | ✓ Excellent | Expensive |
| Agent long-term memory | ✓ Cost-effective | Premium option |
Choose S3 Vectors When:
- Query frequency is low to moderate (under 100 QPS sustained)
- Budget is a primary constraint and you're storing millions of vectors
- 100-200ms latency is acceptable for your application
- You're already heavily invested in AWS and want native integration
- Data durability is critical (S3's 11 nines)
Stick with Dedicated Vector DBs When:
- You need consistent single-digit millisecond latency
- High query throughput (1000+ QPS)
- Complex filtering and faceting are core features
- You're building user-facing features where speed affects UX
- Advanced features like hybrid search or custom distance metrics matter
Integration with AWS Services
One major advantage: S3 Vectors plays incredibly well with the AWS ecosystem.
Bedrock Knowledge Bases
We connected our S3 vector index directly to Amazon Bedrock for RAG applications:
# Create a Bedrock Knowledge Base with S3 Vectors
aws bedrock create-knowledge-base \
--name "product-knowledge" \
--role-arn "arn:aws:iam::account:role/bedrock-kb-role" \
--knowledge-base-configuration '{
"type": "VECTOR",
"vectorKnowledgeBaseConfiguration": {
"embeddingModelArn": "arn:aws:bedrock:...",
"vectorStoreConfiguration": {
"s3VectorConfiguration": {
"bucketName": "my-vectors",
"indexName": "product-embeddings"
}
}
}
}'
OpenSearch Integration
You can create a tiered architecture—hot data in OpenSearch for low latency, cold data in S3 Vectors for cost savings. AWS handles the data movement automatically based on access patterns.
Gotchas and Limitations
Not everything was smooth sailing. Here are the issues we hit:
Limited Regions: Only available in 14 regions at launch. Check if your region is supported.
Cold Start Latency: First query after inactivity can take 800ms+. Implement warm-up queries if needed.
Metadata Limitations: 50 keys max per vector. Complex filtering isn't as powerful as dedicated DBs.
No Hybrid Search: Pure vector similarity only. No built-in BM25 or keyword boosting.
Real-World Migration Checklist
If you're considering migration, work through this checklist:
-
Measure your current query patterns
- Average QPS during peak hours
- P95 and P99 latency requirements
- Data access patterns (hot vs. cold)
-
Calculate the ROI
- Current monthly vector DB cost
- Estimated S3 Vectors cost (use AWS calculator)
- Engineering time for migration (budget 2-3 weeks)
-
Run a proof of concept
- Migrate a small, non-critical index
- Test query accuracy and latency
- Validate metadata filtering works for your use case
-
Plan for parallel operation
- Run both systems during transition
- Implement feature flags for easy rollback
- Monitor error rates and user experience
-
Execute the migration
- Off-hours data transfer
- Gradual traffic shifting
- Keep old system running for 2 weeks minimum
The Bottom Line
S3 Vectors disrupted our cost structure in the best way possible. We're saving $380/month on a single index, and we're already planning to migrate two more workloads.
But it's not a silver bullet. The latency trade-off is real, and for customer-facing features where every millisecond counts, we're keeping Pinecone. The key is matching the tool to the use case.
For our product search, document retrieval, and agent memory systems? S3 Vectors is perfect. For real-time recommendation engines and instant chatbot responses? Pinecone stays.
The future of vector storage isn't one-size-fits-all. It's about intelligent tiering—using fast, expensive databases where performance matters and cost-effective object storage everywhere else. S3 Vectors makes that architecture financially viable.
Top comments (0)