Integer Sorting Via Value Space Partitioning

#algorithms #computerscience #programming

The Dutch National Flag Algorithm

Dijkstra's Dutch National Flag algorithm sorts an array of 0s, 1s, and 2s in a single linear pass. It is typically presented as a specialized trick and left at that. This is worth reconsidering, because the algorithm contains a general principle — one that, when extracted and applied recursively, yields a linear-time sorting algorithm for fixed-width integers.

The algorithm partitions an array into three contiguous regions — elements equal to 0, elements equal to 1, and elements equal to 2 — using three pointers and a single pass. Each element is routed to its region immediately and permanently via in-place swaps.

The pointer mechanics are standard. The more interesting question is why a single pass suffices at all.

The answer is foreknowledge. Before examining a single element, the algorithm knows the minimum (0), the maximum (2), and therefore the complete structure of the value space. There is no ambiguity about where any value belongs, so no element ever needs to be revisited.

That foreknowledge — not the three-way partition — is the operative idea.

SweepSort

If the value space can be known in advance, it can also be discovered. SweepSort replaces the hardcoded boundaries of DNF with dynamically computed ones and iterates.

void SweepSort(int* array, size_t n) {

    int max = INT_MIN, min = INT_MAX;

    for (size_t i = 0; i < n; i++) {

        max = Max(max, array[i]);
        min = Min(min, array[i]);
    }

    size_t low = 0, mid = 0, high = n;

    while (mid < high) {

        int new_min = INT_MAX;
        int new_max = INT_MIN;

        while (mid < high) {

            if (array[mid] == max) {

                --high;
                Swap(array + mid, array + high);
            }
            else if (array[mid] == min) {

                Swap(array + mid, array + low);
                ++low, ++mid;
            }
            else {

                new_min = Min(new_min, array[mid]);
                new_max = Max(new_max, array[mid]);

                ++mid;
            }
        }

        min = new_min;
        max = new_max;

        mid = low;
    }
}

An initial scan establishes the global minimum and maximum. Each subsequent pass runs a DNF sweep over the unsettled region: the current minimum is moved left, the current maximum is moved right, and the minimum and maximum of the remaining middle are accumulated at no additional cost. When the pass ends, the next pass's boundaries are already known.

After each pass, two distinct values are permanently placed and the unsettled region shrinks. The process repeats until the array is sorted.

Complexity. Let k be the number of distinct values in the array. Each pass eliminates exactly two, so ⌈k/2⌉ passes suffice, each linear. The total is O(k·n). When k is small relative to n the algorithm is fast; when all n values are distinct, k = n and the algorithm is O(n²).

SweepSort is not a practical general-purpose sort. Its purpose here is to show that the DNF idea is iterable, and that discovering the next pass's boundaries costs nothing if done during the current pass.

SweepSort2

SweepSort eliminates values from the extremes inward. SweepSort2 abandons that strategy and instead partitions the value space recursively, bisecting the range at each level rather than removing known extrema.

static void SweepSort2Core(int* array, size_t n, int min, int max) {

    if (min >= max) return;

    size_t low = 0, high = 0;

    int high_min = INT_MAX;
    int high_max = INT_MIN;

    int low_min = INT_MAX;
    int low_max = INT_MIN;

    int pivot = (max >= 0 && min < 0) ? -1 : min + (max - min) / 2;

    while (high < n) {

        if (array[high] > pivot) {

            high_min = Min(high_min, array[high]);
            high_max = Max(high_max, array[high]);
        }
        else {

            low_min = Min(low_min, array[high]);
            low_max = Max(low_max, array[high]);

            Swap(array + low, array + high);

            ++low;
        }

        ++high;
    }

    SweepSort2Core(array + low, n - low, high_min, high_max);
    SweepSort2Core(array, low, low_min, low_max);
}

void SweepSort2(int* array, size_t n) {
    SweepSort2Core(array, n, INT_MIN, INT_MAX);
}

Each call receives a segment of the array and the value range known to contain it. A pivot is computed at the midpoint of that range, and the segment is partitioned into elements ≤ pivot and elements > pivot. The minimum and maximum of each partition are tracked during the pass and forwarded directly to the recursive calls, so no additional scan is ever required.

Complexity. The recursion is driven entirely by the value range, not the data — the critical distinction from quicksort, whose depth is distribution-dependent. Here the depth is bounded by the number of times the range can be bisected: ⌈log₂ R⌉ for a range of width R. For 32-bit signed integers that is at most 32, and each level does O(n) work in aggregate. Total: O(32n) — linear in n for any 32-bit input, unconditionally.

Pivot formula. The midpoint computation deserves attention:

int pivot = (max >= 0 && min < 0) ? -1 : min + (max - min) / 2;

The second branch is the standard overflow-safe midpoint. The naive (min + max) / 2 overflows for large ranges; subtracting first keeps the intermediate within bounds for same-sign operands.

The first branch handles what the standard fix cannot. When the range straddles zero, max - min itself overflows — and this is not a marginal case, since the initial call to SweepSort2Core passes INT_MIN and INT_MAX explicitly. Pivoting at -1 partitions at the sign boundary and guarantees that no subsequent recursive call ever receives a cross-zero range, making the formula unconditionally safe for all remaining levels.

Conclusion

DNF works because the value space is fully known before the first comparison. SweepSort generalizes that observation by discovering the boundaries dynamically, one pass at a time. SweepSort2 takes the further step of recursing on the value space itself, reducing the depth from O(k) — a property of the data — to O(log₂ R) — a property of the integer type.

The result is an in-place sorting algorithm with O(32n) worst-case complexity for 32-bit integers, derived from first principles without bit manipulation, auxiliary allocation, or comparison-based reasoning.

DEV Community

Integer Sorting Via Value Space Partitioning

The Dutch National Flag Algorithm

SweepSort

SweepSort2

Conclusion

Top comments (0)