<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yash Patel</title>
    <description>The latest articles on DEV Community by Yash Patel (@yash1402_).</description>
    <link>https://dev.to/yash1402_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3130367%2Fe403852b-cfc4-491c-b6a1-ea70fed1f5ce.png</url>
      <title>DEV Community: Yash Patel</title>
      <link>https://dev.to/yash1402_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yash1402_"/>
    <language>en</language>
    <item>
      <title>Boyer–Moore Majority Voting: Why It Works, Why It Fails, and Where It Actually Belongs</title>
      <dc:creator>Yash Patel</dc:creator>
      <pubDate>Sun, 04 Jan 2026 07:28:18 +0000</pubDate>
      <link>https://dev.to/yash1402_/boyer-moore-majority-voting-why-it-works-why-it-fails-and-where-it-actually-belongs-1ibn</link>
      <guid>https://dev.to/yash1402_/boyer-moore-majority-voting-why-it-works-why-it-fails-and-where-it-actually-belongs-1ibn</guid>
      <description>&lt;h2&gt;
  
  
  Boyer–Moore Majority Voting: From Exact Counting to Confidence-Based Signals
&lt;/h2&gt;

&lt;p&gt;The Boyer–Moore Majority Voting algorithm is often presented as a clever trick:&lt;br&gt;&lt;br&gt;
&lt;em&gt;find the majority element in O(1) space.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That statement is technically correct and practically misleading.&lt;/p&gt;

&lt;p&gt;This post documents the reasoning path I personally went through: starting from a&lt;br&gt;
straightforward HashMap-based solution, questioning where Boyer–Moore could ever be useful, and eventually arriving at the &lt;strong&gt;only defensible way (according to me)&lt;/strong&gt; to&lt;br&gt;
use this algorithm in real systems:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;as a &lt;em&gt;confidence-based dominance signal&lt;/em&gt;, not as an exact answer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This exploration started as a LeetCode problem and slowly turned into a system-design question &lt;strong&gt;(at which I am still a newbie at best)&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The Majority Element Problem
&lt;/h2&gt;

&lt;p&gt;Given a sequence of elements, a &lt;strong&gt;majority element&lt;/strong&gt; is one that appears &lt;strong&gt;more than&lt;br&gt;
50%&lt;/strong&gt; of the time.&lt;/p&gt;

&lt;p&gt;Important constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There may or may not be a majority&lt;/li&gt;
&lt;li&gt;If a majority exists, it is unique&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distinction matters far more than most explanations admit.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. HashMap-Based Counting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Algorithm
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Traverse the data&lt;/li&gt;
&lt;li&gt;Maintain a frequency map&lt;/li&gt;
&lt;li&gt;Count occurrences of each candidate&lt;/li&gt;
&lt;li&gt;Select the candidate with the highest count&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Complexity
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; O(n)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Space:&lt;/strong&gt; O(k), where k is the number of distinct elements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Exact counts&lt;/li&gt;
&lt;li&gt;Arbitrary filtering (time ranges, tags, attributes) if counts are stored with appropriate structure (time buckets, filterable field buckets)&lt;/li&gt;
&lt;li&gt;No additional verification step required to check whether the candidate chosen from exact counts is really a majority&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Weaknesses
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Memory grows with the cardinality of candidates&lt;/li&gt;
&lt;li&gt;Unbounded memory requirement in streaming systems with large or unknown domains&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;HashMaps feel like the “correct” solution because they answer more questions.&lt;br&gt;
They are also the first thing that breaks under scale. Scale here refers to the domain of candidates which has to be high enough to make HashMaps the reason for Friday night debugging.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Boyer–Moore Majority Voting Algorithm
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Algorithm (High-Level)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Maintain a &lt;code&gt;candidate&lt;/code&gt; and a &lt;code&gt;counter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Increment counter when the current element matches the candidate&lt;/li&gt;
&lt;li&gt;Decrement counter otherwise&lt;/li&gt;
&lt;li&gt;Reset candidate to current element when counter is zero&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify the candidate in a second pass&lt;/strong&gt; by counting its occurrences&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Complexity
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; O(n)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Space:&lt;/strong&gt; O(1)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Guarantee
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;If a majority element exists, Boyer–Moore will return it as the final candidate.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What it &lt;strong&gt;does not&lt;/strong&gt; guarantee:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;That a majority exists&lt;/li&gt;
&lt;li&gt;That the returned candidate is a majority without verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where most misunderstandings begin.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The Mandatory Second Pass (and the First Objection)
&lt;/h2&gt;

&lt;p&gt;Because Boyer–Moore always produces a candidate &lt;strong&gt;AND&lt;/strong&gt; data does not mandatorily have a majority, &lt;strong&gt;verification is required&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Second pass:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Count occurrences of the candidate&lt;/li&gt;
&lt;li&gt;Check if it exceeds &lt;code&gt;n / 2&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The algorithm may return a false majority&lt;/li&gt;
&lt;li&gt;Correctness is not guaranteed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This leads to an obvious objection:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If I need a second traversal and persistent data anyway,&lt;br&gt;&lt;br&gt;
why not just use a HashMap?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This objection is valid and cannot be hand-waved away.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Stepping Back: What Kind of Systems Are We Even Talking About?
&lt;/h2&gt;

&lt;p&gt;Before choosing an algorithm, it helps to describe the &lt;strong&gt;problem domain&lt;/strong&gt;, not the tool.&lt;/p&gt;

&lt;p&gt;Consider systems that process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A continuous stream of events&lt;/li&gt;
&lt;li&gt;High-cardinality identifiers&lt;/li&gt;
&lt;li&gt;Potentially unbounded input&lt;/li&gt;
&lt;li&gt;Fixed memory budgets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitoring and alerting pipelines&lt;/li&gt;
&lt;li&gt;Event ingestion layers&lt;/li&gt;
&lt;li&gt;Abuse or anomaly detection systems&lt;/li&gt;
&lt;li&gt;Load-shedding and backpressure mechanisms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal in these systems is often &lt;strong&gt;not&lt;/strong&gt; to compute exact counts, but to answer questions like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Is something dominating the system right now?”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  6. A Concrete Use Case: Event Dominance Over Time
&lt;/h2&gt;

&lt;p&gt;Assume we process events in fixed windows (for example, hourly):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each window contains millions of events&lt;/li&gt;
&lt;li&gt;Event types are unbounded&lt;/li&gt;
&lt;li&gt;Memory usage must be stable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We want to detect:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;whether a particular event type is &lt;em&gt;dominating&lt;/em&gt; others over time&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At first glance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HashMaps solve this perfectly&lt;/li&gt;
&lt;li&gt;Until cardinality explodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Boyer–Moore looks attractive because of constant space, but it feels unusable because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It always returns a candidate&lt;/li&gt;
&lt;li&gt;Per-window results are noisy&lt;/li&gt;
&lt;li&gt;False positives appear inevitable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the core tension.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Why Naïve Boyer–Moore Fails Here
&lt;/h2&gt;

&lt;p&gt;If we naïvely do the following:&lt;br&gt;
for each window:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run Boyer–Moore&lt;/li&gt;
&lt;li&gt;emit candidate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every window produces a candidate&lt;/li&gt;
&lt;li&gt;Random distributions produce noise&lt;/li&gt;
&lt;li&gt;Alert fatigue is guaranteed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Used this way, Boyer–Moore is indefensible.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. The Missing Insight: The Counter Is Not a Count
&lt;/h2&gt;

&lt;p&gt;The breakthrough realization that I had in my intuition is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;strong&gt;counter is not a frequency&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
It is a &lt;strong&gt;dominance margin&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Formally:&lt;br&gt;
counter ≈ (candidate's occurrence) − (#elements canceled against it)&lt;/p&gt;

&lt;p&gt;This value encodes &lt;strong&gt;how strongly the candidate resisted cancellation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That information is meaningful even without exact counts.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Interpreting the Counter as Confidence
&lt;/h2&gt;

&lt;p&gt;Real systems do not treat the candidate as a fact.&lt;br&gt;
They treat the &lt;code&gt;(candidate, counter)&lt;/code&gt; pair as a &lt;strong&gt;signal&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Common techniques:&lt;/p&gt;

&lt;h3&gt;
  
  
  Normalize by Window Size
&lt;/h3&gt;

&lt;p&gt;confidence = counter / window_size&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random data → confidence near 0&lt;/li&gt;
&lt;li&gt;Dominant data → confidence grows&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Track Stability Across Windows
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Candidate flips frequently → low confidence&lt;/li&gt;
&lt;li&gt;Same candidate persists → higher confidence&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observe Counter Trends
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Oscillating counter → noise&lt;/li&gt;
&lt;li&gt;Increasing counter → emerging dominance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Decisions are gated on &lt;strong&gt;persistence and growth&lt;/strong&gt;, not on existence.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Where This Actually Helps
&lt;/h2&gt;

&lt;p&gt;With this interpretation, Boyer–Moore becomes useful in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Streaming alerting systems&lt;/li&gt;
&lt;li&gt;Backpressure detection&lt;/li&gt;
&lt;li&gt;Anomaly or abuse signals&lt;/li&gt;
&lt;li&gt;Memory-constrained ingestion paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It remains a poor choice for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reporting&lt;/li&gt;
&lt;li&gt;Historical analysis&lt;/li&gt;
&lt;li&gt;Compliance or billing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  11. A Brief Note on Misra–Gries
&lt;/h2&gt;

&lt;p&gt;Boyer–Moore can be seen as a special case of the &lt;strong&gt;Misra–Gries&lt;/strong&gt; algorithm with &lt;code&gt;k = 2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Misra–Gries generalizes the idea:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tracks multiple candidates&lt;/li&gt;
&lt;li&gt;Still bounds memory&lt;/li&gt;
&lt;li&gt;Trades precision for coverage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a natural next step for deeper exploration.&lt;/p&gt;




&lt;h2&gt;
  
  
  12. Closing Thoughts (and Personal Context)
&lt;/h2&gt;

&lt;p&gt;This investigation started with a LeetCode problem:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://leetcode.com/problems/majority-element-ii/description/" rel="noopener noreferrer"&gt;https://leetcode.com/problems/majority-element-ii/description/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Solving the problem was easy.&lt;br&gt;
Understanding &lt;strong&gt;where the algorithm makes sense&lt;/strong&gt; was not.&lt;/p&gt;

&lt;p&gt;The only honest conclusion I could reach is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Boyer–Moore is not an exact counting algorithm.&lt;br&gt;&lt;br&gt;
It is a bounded-memory dominance signal.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Used as such, it is defensible.&lt;br&gt;&lt;br&gt;
Used as anything else, it is misleading.&lt;/p&gt;

&lt;p&gt;That distinction is the entire point of this post.&lt;/p&gt;

</description>
      <category>dsa</category>
      <category>systemdesign</category>
      <category>datastructures</category>
      <category>algorithms</category>
    </item>
  </channel>
</rss>
