<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sandeep Mahanty</title>
    <description>The latest articles on DEV Community by Sandeep Mahanty (@mahantys).</description>
    <link>https://dev.to/mahantys</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1146008%2F655f5c72-540a-4cfe-a225-3562f72f294b.png</url>
      <title>DEV Community: Sandeep Mahanty</title>
      <link>https://dev.to/mahantys</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mahantys"/>
    <language>en</language>
    <item>
      <title>Bloom Filters: A probabilistic powerhouse</title>
      <dc:creator>Sandeep Mahanty</dc:creator>
      <pubDate>Fri, 18 Jul 2025 09:24:22 +0000</pubDate>
      <link>https://dev.to/mahantys/bloom-filters-a-probabilistic-powerhouse-f68</link>
      <guid>https://dev.to/mahantys/bloom-filters-a-probabilistic-powerhouse-f68</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;When dealing with large-scale data systems — especially where memory and speed matter — we often face a simple but important question:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;“Have I seen this item before?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You might reach for a &lt;code&gt;HashSet&lt;/code&gt; or database index — but what if memory is tight and occasional false positives are acceptable?&lt;br&gt;&lt;br&gt;
That’s where &lt;strong&gt;Bloom Filters&lt;/strong&gt; come in.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a Bloom Filter?
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Bloom Filter&lt;/strong&gt; is a &lt;em&gt;space-efficient probabilistic data structure&lt;/em&gt; used to test &lt;strong&gt;whether an element is a member of a set&lt;/strong&gt;. It can return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Definitely not in the set&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Possibly in the set&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trade-off? It &lt;strong&gt;can return false positives&lt;/strong&gt;, but &lt;strong&gt;never false negatives&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  How Does It Work?
&lt;/h3&gt;

&lt;p&gt;A Bloom filter is essentially:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;bit array&lt;/strong&gt; of size &lt;code&gt;m&lt;/code&gt;, initialized to all 0s.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;k&lt;/code&gt; different &lt;strong&gt;hash functions&lt;/strong&gt; that map input to positions in the bit array.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Insertion
&lt;/h4&gt;

&lt;p&gt;To insert an element:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hash the element with all &lt;code&gt;k&lt;/code&gt; hash functions.&lt;/li&gt;
&lt;li&gt;Set each corresponding bit to 1.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Lookup
&lt;/h4&gt;

&lt;p&gt;To check membership:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hash the element with the same &lt;code&gt;k&lt;/code&gt; functions.&lt;/li&gt;
&lt;li&gt;If &lt;em&gt;any&lt;/em&gt; of the corresponding bits are 0 → &lt;strong&gt;definitely not in the set&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;If &lt;em&gt;all&lt;/em&gt; are 1 → &lt;strong&gt;possibly in the set&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  When Should You Use a Bloom Filter?
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Use cases:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Caching layers (check if it's worth looking up an item)&lt;/li&gt;
&lt;li&gt;Distributed systems (avoid expensive queries)&lt;/li&gt;
&lt;li&gt;Email spam detection&lt;/li&gt;
&lt;li&gt;Web crawler visited URLs&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Ideal when:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;You don’t need to delete items.&lt;/li&gt;
&lt;li&gt;You can tolerate false positives.&lt;/li&gt;
&lt;li&gt;Memory usage matters more than perfect accuracy.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Real world uses of Bloom Filter
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cassandra&lt;/strong&gt; the distributed highly available and high write throughput NoSQL database employs bloom filters extensively in its read path. Cassandra keeps data flushed into SSTables (String Sorted Tables) on disk to quickly read data. Bloom filters help here by efficiently determining if the data being requested exists in a particular SSTable file. &lt;a href="https://cassandra.apache.org/doc/5.0/cassandra/managing/operating/bloom_filters.html" rel="noopener noreferrer"&gt;Bloom Filters&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Similarly Apache HBase and ScyllaDB are also example of some databases which use bloom filters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CDNs&lt;/strong&gt; like Akamai leverage bloom filters to quickly check if an object is present in the cache or not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bloom Filter Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BloomFilter&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;BitSet&lt;/span&gt; &lt;span class="n"&gt;bitset&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;bitSize&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;hashCount&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;BloomFilter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;bitSize&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;hashCount&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;bitSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bitSize&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;hashCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashCount&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;bitset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BitSet&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bitSize&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;hashCount&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getHash&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;bitset&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;set&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;abs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;bitSize&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="nf"&gt;mightContain&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;hashCount&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getHash&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;bitset&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;abs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;bitSize&lt;/span&gt;&lt;span class="o"&gt;)))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;getHash&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;hashCode&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mh"&gt;0x5bd1e995&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Bloom filters are a powerful addition to any backend engineer’s toolkit, especially when working at scale where memory efficiency is critical. This post aimed to offer a simplified introduction to the concept.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you're interested in a more in-depth explanation—along with an interactive visualization—you can check out a post I wrote on the topic:&lt;br&gt;
&lt;a href="https://codejumbl.com/posts/bloom-filters" rel="noopener noreferrer"&gt;https://codejumbl.com/posts/bloom-filters&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datastructures</category>
      <category>webdev</category>
      <category>algorithms</category>
    </item>
  </channel>
</rss>
