DEV Community

Shreyas
Shreyas

Posted on

I Built a Stable Sorting Algorithm That Beats Java's Dual-Pivot Quicksort

A few days ago I finished benchmarking something I've been building - a cache-aware, stable, histogram-based sorting algorithm I'm calling BusSort. The results surprised even me.

At 100 million elements, it runs ~2x faster than Java's Dual-Pivot Quicksort on random data - while being stable. Dual-Pivot QS is not.


The Problem With Quicksort at Scale

Quicksort-based algorithms partition elements with random writes across the entire array. At large scales this causes cache thrashing - elements are being written to memory locations all over the place, constantly missing L1 and L2 cache.

The larger the array, the worse it gets.


The Core Idea

Instead of scattering elements globally, BusSort processes data in L1 cache-sized chunks - 4096 integers (~16KB).

For each chunk, it does 4 passes:

  • PASS 1 - Scan left-to-right, compute bucket for each element, build a local histogram
  • PASS 2 - Compute local prefix sums (bucket positions within the chunk)
  • PASS 3 - Scatter into a local grouped buffer - because this buffer is L1-sized, all random writes stay in cache
  • PASS 4 - Copy each bucket's portion to its correct global position

With 128-way splitting, recursion depth stays at just ~4 levels even for 100M elements.

Base case: Insertion Sort for ≤ 1024 elements.

On the benchmark machine (i5-1135G7, 48KB L1 data cache):

4096 × 3 × 4 bytes = 49,152 bytes ≈ 48KB
Enter fullscreen mode Exit fullscreen mode

The three working arrays fit exactly in L1. Not a coincidence.


Benchmark Results

Tested against Arrays.sort(int[]) - Java's Dual-Pivot Quicksort.

n = 100,000,000 | Java 17 | i5-1135G7 @ 2.40GHz

Input Type BusSort Dual-Pivot QS Ratio
Random 3991ms 8604ms ~2x
Sorted 57ms 104ms ~2x
Reverse 280ms 166ms 0.6x
Nearly Sorted 2452ms 2789ms ~1.1x
Duplicates 712ms 2242ms ~2.4x
Few Duplicates 1295ms 3185ms ~2.3x
All Same 51ms 32ms 0.6x
Clustered 1419ms 2242ms ~1.6x

Consistently faster on most input types. Stable. Zero comparison overhead.

The two losses (Reverse, All Same) are where Dual-Pivot QS has structural advantages - run detection and early termination.


Why It's Stable

BusSort preserves relative order of equal elements because:

  • Chunks are processed left-to-right
  • Within each chunk, elements are scattered left-to-right
  • globalNext[b] advances per chunk - earlier chunks always land before later chunks

This makes BusSort one of very few algorithms that is both faster than Dual-Pivot Quicksort and stable.


Trade-offs

  • Requires O(n) auxiliary space - standard for stable sorts
  • Slower than Dual-Pivot Quicksort on already-sorted and reverse-sorted input (improvement planned)
  • Currently supports int[] only - generic support coming soon

What's Next

  • Generic object support with Comparator<T>
  • Benchmark against TimSort on Integer[]
  • Parallel variant
  • JMH benchmarks for more rigorous measurement

GitHub

This is a work in progress. Theory, benchmarks, and current implementation are published.

👉 github.com/dev-shreyaspatil/BusSort


Curious what the Java community thinks - especially around the cache-chunking approach and whether there are similar published algorithms I should be comparing against.

Top comments (0)