<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: aditya</title>
    <description>The latest articles on DEV Community by aditya (@aditya_anavkar).</description>
    <link>https://dev.to/aditya_anavkar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3497725%2Fab78aa09-ede2-420e-ac58-6d15d0717b24.png</url>
      <title>DEV Community: aditya</title>
      <link>https://dev.to/aditya_anavkar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aditya_anavkar"/>
    <language>en</language>
    <item>
      <title>This username is already taken! (Bloom filters)</title>
      <dc:creator>aditya</dc:creator>
      <pubDate>Fri, 12 Sep 2025 14:37:48 +0000</pubDate>
      <link>https://dev.to/aditya_anavkar/this-username-is-already-taken-bloom-filters-36ni</link>
      <guid>https://dev.to/aditya_anavkar/this-username-is-already-taken-bloom-filters-36ni</guid>
      <description>&lt;h1&gt;
  
  
  How Big Tech Checks If Your Username Is Taken—In Milliseconds
&lt;/h1&gt;

&lt;p&gt;You type a clever handle into a signup form and—bam—before you can blink you’re told it’s already in use.&lt;br&gt;
That one-second response feels trivial, but for companies with billions of accounts it’s anything but simple.&lt;/p&gt;

&lt;p&gt;A naïve database query like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT … WHERE username = ?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;would crumble under the weight of global traffic.&lt;br&gt;
To keep things snappy, big platforms stack several techniques together, each tuned for speed and scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Lightning-Fast Memory Caches
&lt;/h2&gt;

&lt;p&gt;The first stop is almost always a high-speed, in-memory store such as Redis or Memcached.&lt;br&gt;
These systems keep a constantly refreshed list of usernames in memory so a check can be answered in microseconds.&lt;/p&gt;

&lt;p&gt;Think of it as a super-fast notepad: if the requested name is already listed, you get an immediate “taken” without ever touching the main database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsa37vmyhuji2slvo8o3g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsa37vmyhuji2slvo8o3g.png" alt="HashMap" width="660" height="545"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But memory is expensive and finite. You can’t realistically keep every name ever registered in one cache cluster.&lt;br&gt;
That’s why caching is just the front gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Trees for Prefix Searches
&lt;/h2&gt;

&lt;p&gt;Features like suggesting alternatives or autocomplete need more than a simple yes/no.&lt;br&gt;
For that, engineers often turn to a prefix tree, or trie.&lt;/p&gt;

&lt;p&gt;Instead of storing every username as a single string, a trie breaks them into characters that share common branches.&lt;br&gt;
Checking a name takes time proportional only to its length, not the total number of users.&lt;br&gt;
It’s ideal for finding “all handles starting with alex_,” but large tries can swell in memory usage if there’s little overlap, so teams use compressed variants or limit their size.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3l5ip1nsc5me8z6ggoar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3l5ip1nsc5me8z6ggoar.png" alt="Trie" width="512" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: B+ Trees for Ordered Lookups
&lt;/h2&gt;

&lt;p&gt;When a system needs to find the “next” available username alphabetically or perform range queries, it relies on B+ trees—the workhorse index behind many relational and NoSQL databases.&lt;/p&gt;

&lt;p&gt;These structures keep data sorted so lookups happen in logarithmic time.&lt;br&gt;
Even with billions of records, the database can locate a single username in just a few memory or disk reads.&lt;br&gt;
At global scale, services like Google Cloud Spanner distribute these indexes across machines so this speed holds up worldwide.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F729j367hvh5y5qdywsmw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F729j367hvh5y5qdywsmw.png" alt="B+ Tree" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Bloom Filters — The Unsung Superpower
&lt;/h2&gt;

&lt;p&gt;Before the cache or database even lifts a finger, Bloom filters step in to do something remarkable:&lt;br&gt;
they can tell you at lightning speed if a username is definitely not taken—without storing a single full username.&lt;/p&gt;

&lt;p&gt;Think of it as a microscopic security team made of bits.&lt;br&gt;
When a new name is added, a handful of hash functions flip a few specific bits in a giant bit array.&lt;br&gt;
Later, when you check a name, those same hashes point to the same bits.&lt;br&gt;
If even one bit is still zero, you get an immediate, rock-solid verdict: “Nope, that name isn’t in use.”&lt;/p&gt;

&lt;p&gt;Here’s the kicker: Bloom filters never give a false negative.&lt;br&gt;
If they say a name isn’t there, you can trust it completely.&lt;br&gt;
The only caveat is the occasional false positive—a cautious “maybe” that simply triggers a deeper check.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxsndwnz924qckztf9x1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxsndwnz924qckztf9x1.png" alt="Bloom filter" width="492" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the efficiency is mind-blowing.&lt;br&gt;
With careful tuning, around 1 GB of memory can represent a billion usernames—a fraction of the space you’d need to store the actual strings.&lt;br&gt;
For massive platforms, that’s like compressing an entire city’s phonebook into a thimble and still looking up names in microseconds.&lt;/p&gt;

&lt;p&gt;Bloom filters aren’t just a clever trick; they’re one of the quiet workhorses that make planet-scale systems feel instantaneous.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Load Balancers and Distributed Databases
&lt;/h2&gt;

&lt;p&gt;All these checks run across many machines.&lt;/p&gt;

&lt;p&gt;A global load balancer sends your request to the nearest data center, and a local balancer splits the work among application servers.&lt;br&gt;
Each server keeps a current Bloom filter in memory.&lt;br&gt;
If the filter can’t rule the name out, the server checks its in-memory cache.&lt;br&gt;
Only if the cache misses does it hit the underlying distributed database—for example Cassandra or DynamoDB—which spreads the data across hundreds or thousands of nodes.&lt;/p&gt;

&lt;p&gt;That final query is the source of truth, but thanks to the earlier layers it’s reached only when absolutely necessary.&lt;/p&gt;

&lt;p&gt;Next Time You See “Already Taken…”&lt;/p&gt;

&lt;p&gt;Remember the invisible choreography making that instant feedback possible:&lt;/p&gt;

&lt;p&gt;Bloom filters weed out obvious misses.&lt;/p&gt;

&lt;p&gt;Memory caches return recent hits in microseconds.&lt;/p&gt;

&lt;p&gt;Tries and B+ trees handle suggestions and ordered scans.&lt;/p&gt;

&lt;p&gt;Distributed databases provide the definitive answer.&lt;/p&gt;

&lt;p&gt;What looks like a simple pop-up message is really a carefully layered system that blends clever algorithms with large-scale infrastructure—all so you know, almost instantly, whether your dream username is still up for grabs.&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
  </channel>
</rss>
