Bhupesh Chikara

Posted on Mar 2 • Originally published at github.com

How to Check 10 Million Usernames in Under 1 Millisecond

#postgres #redis #database #performance

Every second, platforms like GitHub, Instagram, and Twitter process thousands of username availability checks. A seemingly simple operation that becomes a critical performance bottleneck at scale. I built a production-grade proof-of-concept to measure exactly how different architectures handle this challenge.

The Challenge

When a user types "john_doe" during signup, the system must instantly verify if that username exists among millions of registered users. At 1000 requests per second, this translates to 86.4 million database queries per day.

Traditional approaches create two fundamental problems:

Performance Degradation: Each username check requires a database roundtrip, introducing 10-50ms latency per request.

Infrastructure Costs: Database CPU and I/O costs scale linearly with request volume, creating exponential cost growth.

Three Architectural Approaches

I tested three production-viable architectures against a dataset of 10 million usernames under sustained load of 1000 requests/second.

Architecture 1: PostgreSQL Direct Query

// Every request hits PostgreSQL
const result = await db.query(
  'SELECT EXISTS(SELECT 1 FROM usernames WHERE username = $1)',
  [username]
);

Characteristics:

100% accuracy guaranteed
Network + query latency on every request
Database becomes the bottleneck
Simple to implement and maintain

Architecture 2: Redis In-Memory Cache

// Query Redis SET
const exists = await redis.sIsMember('usernames', username);

Characteristics:

100% accuracy with proper synchronization
Sub-5ms latency
Requires ~500MB memory for 10M usernames
Zero database queries after initial load

Architecture 3: Bloom Filter + Database Fallback

// Check Bloom filter first
const mightExist = bloomFilter.has(username);

if (!mightExist) {
  // Definitely NOT exists - return immediately
  return { available: true };
}

// Maybe exists - verify with database
const actuallyExists = await db.query('SELECT EXISTS...');
return { available: !actuallyExists };

Characteristics:

In-process checks eliminate network overhead
95% of requests return instantly without database query
~10MB memory for 10M usernames
1-5% false positive rate (acceptable for this use case)

Experiment Design

Infrastructure Stack

Runtime: Node.js 18 with Express
Database: PostgreSQL 15 with indexed username column
Cache: Redis 7.x with optimized memory settings
Bloom Filter: bloom-filters library (1% false positive rate)
Load Testing: k6 with realistic traffic patterns
Monitoring: Prometheus + Grafana for metrics collection

Test Methodology

Dataset: 10 million pre-populated usernames
Traffic Pattern: 90% new users, 10% existing users (realistic signup distribution)
Load Profile:

Phase 1 (0-30s):   Warmup - 100 req/s
Phase 2 (30-90s):  Baseline - 500 req/s
Phase 3 (90-210s): Peak - 1000 req/s
Phase 4 (210-240s): Ramp down

Each test ran for 4 minutes with continuous monitoring of latency, throughput, and resource utilization.

Performance Results

Metric	PostgreSQL	Redis	Bloom Filter
P50 Latency	23.4ms	2.8ms	0.08ms
P95 Latency	45.2ms	8.1ms	0.31ms
P99 Latency	67.8ms	14.5ms	2.1ms
Throughput	850 req/s	3500 req/s	12,000 req/s
DB Queries (per 100K)	100,000	0	5,127
Memory Usage	Minimal	487 MB	9.6 MB
Accuracy	100%	100%	99.87%

Key Findings

1. Latency Improvement: 300x Faster

Bloom filters achieved P50 latency of 0.08ms compared to PostgreSQL's 23.4ms. This 29,150% improvement comes from eliminating network overhead entirely - the check happens in-process.

2. Database Query Reduction: 95%

Out of 100,000 requests, Bloom filters only triggered 5,127 database queries (5.13% false positive rate). The remaining 94.87% returned instantly with zero database load.

3. Memory Efficiency: 50x Less

Redis required 487MB to store 10 million usernames. The Bloom filter used just 9.6MB for the same dataset - a 98% memory reduction.

4. Cost Implications

At $0.000001 per database query:

PostgreSQL: $86.40/day for 86.4M queries
Bloom Filter: $4.32/day for 4.32M queries (5% fallback)
Savings: $2,462/month at 1000 req/s sustained load

Understanding Bloom Filters

A Bloom filter is a probabilistic data structure that tests set membership with two critical properties:

No False Negatives: If the filter says an element doesn't exist, it's 100% correct.

Possible False Positives: If the filter says an element might exist, you must verify with the source of truth.

How It Works

When adding a username to the filter:

"john_doe" → hash1(x), hash2(x), hash3(x) → set bits at positions [142, 891, 1523]

When checking if a username exists:

If ANY bit is 0 → Definitely NOT exists (return immediately)
If ALL bits are 1 → MAYBE exists (check database)

The false positive rate is configurable through bit array size and number of hash functions. Our implementation uses a 1% FPR, meaning 99% accuracy for "might exist" cases.

Production Considerations

When to Use Each Approach

Choose PostgreSQL when:

Request volume is low (<100 req/s)
Strong consistency is critical
Team lacks operational expertise for distributed systems
Simplicity outweighs performance optimization

Choose Redis when:

High traffic requires <5ms response times
100% accuracy is non-negotiable
Budget supports 50MB+ memory per million items
Need distributed caching across multiple services

Choose Bloom Filters when:

Massive scale demands sub-millisecond response
Negative lookups dominate (90%+ checking new items)
Memory constraints exist
1-5% false positive rate is acceptable
Minimizing database load is critical

Handling False Positives

False positives in username availability checks are operationally acceptable because:

User experience remains identical (username shown as taken)
Database verification occurs transparently
No data corruption or incorrect state
Performance benefit outweighs rare false positive

// Bloom filter says "maybe exists"
if (bloomFilter.has(username)) {
  // Always verify with database
  const actuallyExists = await db.query('SELECT EXISTS...');

  if (!actuallyExists) {
    // False positive detected
    // User can still register - no negative impact
    metrics.increment('bloom_filter_false_positive');
  }
}

Data Synchronization

Maintain filter accuracy through:

// On new user registration
async function registerUser(username) {
  // 1. Insert into database
  await db.query('INSERT INTO users (username) VALUES ($1)', [username]);

  // 2. Update Bloom filter immediately
  bloomFilter.add(username);

  // 3. Persist filter periodically (every hour)
  if (shouldPersist()) {
    await saveBloomFilterToDisk(bloomFilter);
  }
}

Real-World Applications

Major technology platforms use Bloom filters for similar use cases:

GitHub: Checks user passwords against 10 billion leaked credentials in <1ms
Medium: Filters already-read articles from recommendation feeds
Google Chrome: Detects malicious URLs locally without server requests
Akamai CDN: Performs cache existence checks at edge nodes
Bitcoin: Validates transaction double-spending in UTXO sets

Implementation Guide

Building the Bloom Filter

const { BloomFilter } = require('bloom-filters');

// Create filter: 10M capacity, 1% false positive rate
const filter = BloomFilter.create(10000000, 0.01);

// Load existing usernames
const users = await db.query('SELECT username FROM users');
users.rows.forEach(user => filter.add(user.username));

// Persist to disk
const filterData = {
  size: filter.size,
  nbHashes: filter.nbHashes,
  bits: Array.from(filter._bits),
  metadata: { totalItems: users.rows.length, fpr: 0.01 }
};
fs.writeFileSync('bloom-filter.json', JSON.stringify(filterData));

Production API Implementation

app.post('/check-username', async (req, res) => {
  const { username } = req.body;
  const start = Date.now();

  // In-process Bloom filter check
  const mightExist = bloomFilter.has(username);

  if (!mightExist) {
    // 95% of requests return here
    return res.json({
      available: true,
      latency: Date.now() - start,
      source: 'bloom_filter'
    });
  }

  // 5% of requests verify with database
  const result = await db.query(
    'SELECT EXISTS(SELECT 1 FROM users WHERE username = $1)',
    [username]
  );

  res.json({
    available: !result.rows[0].exists,
    latency: Date.now() - start,
    source: 'database_fallback',
    falsePositive: !result.rows[0].exists
  });
});

Reproduction Instructions

Full source code and setup available on GitHub:

git clone https://github.com/builtbychikara/WhatIfSeries.git
cd what-if/bloom-filter

# Install dependencies
npm install

# Start infrastructure (PostgreSQL, Redis, Prometheus, Grafana)
npm run docker:up

# Seed 10M usernames (takes ~15 minutes)
npm run seed

# Run services (3 separate terminals)
npm run start:postgres  # Terminal 1
npm run start:redis     # Terminal 2
npm run start:bloom     # Terminal 3

# Execute load tests
npm run experiment

# View results
open http://localhost:3100  # Grafana dashboard

Conclusion

Bloom filters provide a compelling solution for username availability checks at scale:

300x latency improvement over direct database queries
95% reduction in database load
50x memory efficiency compared to Redis caching
Production-proven at companies like GitHub, Medium, and Chrome

The 1-5% false positive rate is an acceptable trade-off for the massive performance and cost benefits. For high-traffic applications where negative lookups dominate, Bloom filters are a battle-tested solution.

The complete proof-of-concept demonstrates production-grade implementation with comprehensive monitoring, load testing, and detailed metrics. All code is open source and ready to run.

GitHub Repository: https://github.com/builtbychikara/WhatIfSeries/tree/main/what-if/bloom-filter

Tech Stack: Node.js, PostgreSQL, Redis, Bloom Filters, k6, Prometheus, Grafana, Docker

Part of the What If Series - Production-grade POCs exploring system design decisions through empirical measurement.

DEV Community

How to Check 10 Million Usernames in Under 1 Millisecond

The Challenge

Three Architectural Approaches

Architecture 1: PostgreSQL Direct Query

Architecture 2: Redis In-Memory Cache

Architecture 3: Bloom Filter + Database Fallback

Experiment Design

Infrastructure Stack

Test Methodology

Performance Results

Key Findings

Understanding Bloom Filters

How It Works

Production Considerations

When to Use Each Approach

Handling False Positives

Data Synchronization

Real-World Applications

Implementation Guide

Building the Bloom Filter

Production API Implementation

Reproduction Instructions

Conclusion

Top comments (0)