Every second, platforms like GitHub, Instagram, and Twitter process thousands of username availability checks. A seemingly simple operation that becomes a critical performance bottleneck at scale. I built a production-grade proof-of-concept to measure exactly how different architectures handle this challenge.
The Challenge
When a user types "john_doe" during signup, the system must instantly verify if that username exists among millions of registered users. At 1000 requests per second, this translates to 86.4 million database queries per day.
Traditional approaches create two fundamental problems:
Performance Degradation: Each username check requires a database roundtrip, introducing 10-50ms latency per request.
Infrastructure Costs: Database CPU and I/O costs scale linearly with request volume, creating exponential cost growth.
Three Architectural Approaches
I tested three production-viable architectures against a dataset of 10 million usernames under sustained load of 1000 requests/second.
Architecture 1: PostgreSQL Direct Query
// Every request hits PostgreSQL
const result = await db.query(
'SELECT EXISTS(SELECT 1 FROM usernames WHERE username = $1)',
[username]
);
Characteristics:
- 100% accuracy guaranteed
- Network + query latency on every request
- Database becomes the bottleneck
- Simple to implement and maintain
Architecture 2: Redis In-Memory Cache
// Query Redis SET
const exists = await redis.sIsMember('usernames', username);
Characteristics:
- 100% accuracy with proper synchronization
- Sub-5ms latency
- Requires ~500MB memory for 10M usernames
- Zero database queries after initial load
Architecture 3: Bloom Filter + Database Fallback
// Check Bloom filter first
const mightExist = bloomFilter.has(username);
if (!mightExist) {
// Definitely NOT exists - return immediately
return { available: true };
}
// Maybe exists - verify with database
const actuallyExists = await db.query('SELECT EXISTS...');
return { available: !actuallyExists };
Characteristics:
- In-process checks eliminate network overhead
- 95% of requests return instantly without database query
- ~10MB memory for 10M usernames
- 1-5% false positive rate (acceptable for this use case)
Experiment Design
Infrastructure Stack
- Runtime: Node.js 18 with Express
- Database: PostgreSQL 15 with indexed username column
- Cache: Redis 7.x with optimized memory settings
- Bloom Filter: bloom-filters library (1% false positive rate)
- Load Testing: k6 with realistic traffic patterns
- Monitoring: Prometheus + Grafana for metrics collection
Test Methodology
Dataset: 10 million pre-populated usernames
Traffic Pattern: 90% new users, 10% existing users (realistic signup distribution)
Load Profile:
Phase 1 (0-30s): Warmup - 100 req/s
Phase 2 (30-90s): Baseline - 500 req/s
Phase 3 (90-210s): Peak - 1000 req/s
Phase 4 (210-240s): Ramp down
Each test ran for 4 minutes with continuous monitoring of latency, throughput, and resource utilization.
Performance Results
| Metric | PostgreSQL | Redis | Bloom Filter |
|---|---|---|---|
| P50 Latency | 23.4ms | 2.8ms | 0.08ms |
| P95 Latency | 45.2ms | 8.1ms | 0.31ms |
| P99 Latency | 67.8ms | 14.5ms | 2.1ms |
| Throughput | 850 req/s | 3500 req/s | 12,000 req/s |
| DB Queries (per 100K) | 100,000 | 0 | 5,127 |
| Memory Usage | Minimal | 487 MB | 9.6 MB |
| Accuracy | 100% | 100% | 99.87% |
Key Findings
1. Latency Improvement: 300x Faster
Bloom filters achieved P50 latency of 0.08ms compared to PostgreSQL's 23.4ms. This 29,150% improvement comes from eliminating network overhead entirely - the check happens in-process.
2. Database Query Reduction: 95%
Out of 100,000 requests, Bloom filters only triggered 5,127 database queries (5.13% false positive rate). The remaining 94.87% returned instantly with zero database load.
3. Memory Efficiency: 50x Less
Redis required 487MB to store 10 million usernames. The Bloom filter used just 9.6MB for the same dataset - a 98% memory reduction.
4. Cost Implications
At $0.000001 per database query:
- PostgreSQL: $86.40/day for 86.4M queries
- Bloom Filter: $4.32/day for 4.32M queries (5% fallback)
- Savings: $2,462/month at 1000 req/s sustained load
Understanding Bloom Filters
A Bloom filter is a probabilistic data structure that tests set membership with two critical properties:
No False Negatives: If the filter says an element doesn't exist, it's 100% correct.
Possible False Positives: If the filter says an element might exist, you must verify with the source of truth.
How It Works
When adding a username to the filter:
"john_doe" → hash1(x), hash2(x), hash3(x) → set bits at positions [142, 891, 1523]
When checking if a username exists:
If ANY bit is 0 → Definitely NOT exists (return immediately)
If ALL bits are 1 → MAYBE exists (check database)
The false positive rate is configurable through bit array size and number of hash functions. Our implementation uses a 1% FPR, meaning 99% accuracy for "might exist" cases.
Production Considerations
When to Use Each Approach
Choose PostgreSQL when:
- Request volume is low (<100 req/s)
- Strong consistency is critical
- Team lacks operational expertise for distributed systems
- Simplicity outweighs performance optimization
Choose Redis when:
- High traffic requires <5ms response times
- 100% accuracy is non-negotiable
- Budget supports 50MB+ memory per million items
- Need distributed caching across multiple services
Choose Bloom Filters when:
- Massive scale demands sub-millisecond response
- Negative lookups dominate (90%+ checking new items)
- Memory constraints exist
- 1-5% false positive rate is acceptable
- Minimizing database load is critical
Handling False Positives
False positives in username availability checks are operationally acceptable because:
- User experience remains identical (username shown as taken)
- Database verification occurs transparently
- No data corruption or incorrect state
- Performance benefit outweighs rare false positive
// Bloom filter says "maybe exists"
if (bloomFilter.has(username)) {
// Always verify with database
const actuallyExists = await db.query('SELECT EXISTS...');
if (!actuallyExists) {
// False positive detected
// User can still register - no negative impact
metrics.increment('bloom_filter_false_positive');
}
}
Data Synchronization
Maintain filter accuracy through:
// On new user registration
async function registerUser(username) {
// 1. Insert into database
await db.query('INSERT INTO users (username) VALUES ($1)', [username]);
// 2. Update Bloom filter immediately
bloomFilter.add(username);
// 3. Persist filter periodically (every hour)
if (shouldPersist()) {
await saveBloomFilterToDisk(bloomFilter);
}
}
Real-World Applications
Major technology platforms use Bloom filters for similar use cases:
GitHub: Checks user passwords against 10 billion leaked credentials in <1ms
Medium: Filters already-read articles from recommendation feeds
Google Chrome: Detects malicious URLs locally without server requests
Akamai CDN: Performs cache existence checks at edge nodes
Bitcoin: Validates transaction double-spending in UTXO sets
Implementation Guide
Building the Bloom Filter
const { BloomFilter } = require('bloom-filters');
// Create filter: 10M capacity, 1% false positive rate
const filter = BloomFilter.create(10000000, 0.01);
// Load existing usernames
const users = await db.query('SELECT username FROM users');
users.rows.forEach(user => filter.add(user.username));
// Persist to disk
const filterData = {
size: filter.size,
nbHashes: filter.nbHashes,
bits: Array.from(filter._bits),
metadata: { totalItems: users.rows.length, fpr: 0.01 }
};
fs.writeFileSync('bloom-filter.json', JSON.stringify(filterData));
Production API Implementation
app.post('/check-username', async (req, res) => {
const { username } = req.body;
const start = Date.now();
// In-process Bloom filter check
const mightExist = bloomFilter.has(username);
if (!mightExist) {
// 95% of requests return here
return res.json({
available: true,
latency: Date.now() - start,
source: 'bloom_filter'
});
}
// 5% of requests verify with database
const result = await db.query(
'SELECT EXISTS(SELECT 1 FROM users WHERE username = $1)',
[username]
);
res.json({
available: !result.rows[0].exists,
latency: Date.now() - start,
source: 'database_fallback',
falsePositive: !result.rows[0].exists
});
});
Reproduction Instructions
Full source code and setup available on GitHub:
git clone https://github.com/builtbychikara/WhatIfSeries.git
cd what-if/bloom-filter
# Install dependencies
npm install
# Start infrastructure (PostgreSQL, Redis, Prometheus, Grafana)
npm run docker:up
# Seed 10M usernames (takes ~15 minutes)
npm run seed
# Run services (3 separate terminals)
npm run start:postgres # Terminal 1
npm run start:redis # Terminal 2
npm run start:bloom # Terminal 3
# Execute load tests
npm run experiment
# View results
open http://localhost:3100 # Grafana dashboard
Conclusion
Bloom filters provide a compelling solution for username availability checks at scale:
- 300x latency improvement over direct database queries
- 95% reduction in database load
- 50x memory efficiency compared to Redis caching
- Production-proven at companies like GitHub, Medium, and Chrome
The 1-5% false positive rate is an acceptable trade-off for the massive performance and cost benefits. For high-traffic applications where negative lookups dominate, Bloom filters are a battle-tested solution.
The complete proof-of-concept demonstrates production-grade implementation with comprehensive monitoring, load testing, and detailed metrics. All code is open source and ready to run.
GitHub Repository: https://github.com/builtbychikara/WhatIfSeries/tree/main/what-if/bloom-filter
Tech Stack: Node.js, PostgreSQL, Redis, Bloom Filters, k6, Prometheus, Grafana, Docker
Part of the What If Series - Production-grade POCs exploring system design decisions through empirical measurement.



Top comments (0)