João Godinho

Posted on May 10

Heap Data Structure

#computerscience #programming #algorithms

Introduction

This post covers binary heaps as a complete binary tree stored in an array, where each parent follows a priority rule (min-heap or max-heap). It explains the core operations push (heapifyUp) and pop (heapifyDown), the heap property, and provides an implementation.
It also covers trade-offs of using heaps as priority queues, real use in greedy algorithms like Dijkstra’s algorithm and Huffman coding, O(n) in-place heap construction, plus heapsort, and a discussion about introsort.

Binary Heap

A binary heap is a special case of a heap with the following properties:
1. It is a complete binary tree (every level is full except the last, which is filled from left to right).
2. Being a complete binary tree implies:
  - If a node is at index i:
  - Left child = 2*i + 1
  - Right child = 2*i + 2
  - Parent = (i - 1) / 2 (if not root)
3. Heap property (what differentiates heaps):
  - The parent has higher priority than its children
  - Example: min-heap (highest priority = smallest value)
  - Parent ≤ both children
As you can see, the parent always has the highest priority.
This implies that the root (first node) always has the highest priority.

K-ary Heap

We could have a k-ary heap, the properties stay the same, only changes are:
- If a node is at index i:
- first child = k*i + 1
- second child = k*i + 2
- k-th child = k*i + k
- Parent = (i - 1) / k (if not root)

Pseudocode for Binary Heap → MinHeap

We save heap items into an array: items = []
1. PUSH inserts new nodes
- a. Insert at end of array items.push(newItem), addedItemIndex = i
- b. Now if we have more than 1 element we have to check property validity:
  - i. parent ≤ child
  - ii. if true → valid, do nothing
  - iii. if false → invalid, swap(parent, child)
  - iv. repeat until reach valid heap state.
- We generally call this b step of fixing the tree when inserting as heapifyUp.
- heapifyUp because we start from the bottom (after adding at the end of the heap), and fix the heap bottom-to-top by comparing the node with its parent.
2. POP get root (highest priority element)
- a. Remove root node and put the last added on its place.
- b. Now if we have more than 1 element we have to check property validity:
  - i. parent ≤ both children
  - ii. if true → valid, do nothing
  - iii. if false → invalid, swap(parent, smallestChild)
  - iv. repeat until reach valid heap state.
- Here we call it heapifyDown since we start on root and go down, comparing the parent with its children.

Binary Heap Code

Huffman Code in C - contains minHeap implementation within it
JS implementation below: Github repository

class BinaryHeap {
  #items = [];
  #hasHigherPriorityFn;

  constructor({ items, hasHigherPriorityFn }) {
    if (!(items && items.length)) { throw Error("'items' field, is required") }
    // no types here in JS, so we trust hasHigherPriorityFn function and items array -> the focus is binary heap
    if (!hasHigherPriorityFn || typeof hasHigherPriorityFn !== "function") { throw Error("'hasHigherPriorityFn' function is required") }
    this.#hasHigherPriorityFn = hasHigherPriorityFn;

    this.#items = items;
    this.#buildHeapInPlace(this.#items);
  }

  #swap(addedIndex, parentIndex) {
    const temp = this.#items[addedIndex];
    this.#items[addedIndex] = this.#items[parentIndex];
    this.#items[parentIndex] = temp;
  }

  // time: O(log n), space O(log n) -> recursion, iterative is O(1) space
  #heapifyUp({ insertedItemCurrIndex }) {
    const parentIndex = Math.floor((insertedItemCurrIndex - 1) / 2);
    if (parentIndex < 0 || insertedItemCurrIndex == parentIndex) {
      return;
    }

    const insertedItem = this.#items[insertedItemCurrIndex];
    const shouldSwap = this.#hasHigherPriorityFn(insertedItem, this.#items[parentIndex]);
    if (shouldSwap) {
      this.#swap(insertedItemCurrIndex, parentIndex);
      this.#heapifyUp({ insertedItemCurrIndex: parentIndex });
    }
  }

  // time: O(log n), space O(log n) -> recursion, iterative is O(1) space
  #heapifyDown({ currItemPosition }) {
    let mostPriorityItemIndex = currItemPosition;
    const leftChildIndex = 2 * currItemPosition + 1;
    const rightChildIndex = 2 * currItemPosition + 2;

    const leftChildExists = leftChildIndex < this.#items.length;
    if (leftChildExists && this.#hasHigherPriorityFn(this.#items[leftChildIndex], this.#items[mostPriorityItemIndex])) {
      mostPriorityItemIndex = leftChildIndex;
    }

    const rightChildExists = rightChildIndex < this.#items.length;
    if (rightChildExists && this.#hasHigherPriorityFn(this.#items[rightChildIndex], this.#items[mostPriorityItemIndex])) {
      mostPriorityItemIndex = rightChildIndex;
    }

    // currItem = parent (start at index 0)
    // if parent already has the highest priority, do nothing
    // otherwise swap with highest priority child until heap is valid again
    if (mostPriorityItemIndex != currItemPosition) {
      this.#swap(currItemPosition, mostPriorityItemIndex);
      this.#heapifyDown({ currItemPosition: mostPriorityItemIndex });
    }
  }

  // O(n) because most heapifyDown calls run on small-height nodes.
  #buildHeapInPlace(arr) {
    const arrSize = arr.length;
    for (let i = Math.floor(arrSize / 2) - 1; i >= 0; i--) {
      this.#heapifyDown({ currItemPosition: i });
    }
  }

  // O(log n)
  push({ item }) {
    this.#items.push(item);
    if (this.#items.length >= 2) {
      this.#heapifyUp({ insertedItemCurrIndex: this.#items.length - 1 });
    }
  }

  // O(log n)
  pop() {
    const currLength = this.#items.length;
    if (currLength == 0) { return null; }

    const itemToReturn = this.#items[0];
    this.#items[0] = this.#items[currLength - 1]; // last item is new root
    this.#items.pop(); // remove last item from array

    if (currLength > 1) {
      this.#heapifyDown({ currItemPosition: 0 }); // fix top down
    }

    return itemToReturn;
  }

  // O(1)
  peek() {
    if (!this.#items.length) { return null; }
    return this.#items[0];
  }

  printItems() {
    console.log('this.#items', this.#items);
  }

  consumeAll(callback) {
    console.log(this.constructor.name, 'consumeAll:')
    console.log('this.#hasHigherPriorityFn.toString()', this.#hasHigherPriorityFn.toString())
    while (this.#items.length) {
      callback(this.pop());
    }
  }
}

// smallest item at root index = 0
class MinHeap extends BinaryHeap {
  constructor({ items }) {
    super({ items, hasHigherPriorityFn: (itemA, itemB) => itemA < itemB });
  }
}

// biggest item at root index = 0
class MaxHeap extends BinaryHeap {
  constructor({ items }) {
    super({ items, hasHigherPriorityFn: (itemA, itemB) => itemA > itemB });
  }
}

console.log('--------------------------------------------------------')

const minHeapTest = new MinHeap({ items: [8, 7, 6, 5, 4, 3, 2, 1] });
minHeapTest.printItems();
minHeapTest.consumeAll(console.log)

console.log('--------------------------------------------------------')

const maxHeap = new MaxHeap({ items: [1, 2, 3, 4, 5, 6, 7, 8] });
maxHeap.printItems();
maxHeap.consumeAll(console.log)

console.log('--------------------------------------------------------')

const people = [{ name: "João", age: 22 }, { name: "Turing", age: 41 }, { name: "Ada Lovelace", age: 36 }]
const elderlyPeopleFirstHeap = new BinaryHeap({ items: people, hasHigherPriorityFn: (a, b) => a.age > b.age });
elderlyPeopleFirstHeap.consumeAll(console.log)

console.log('--------------------------------------------------------')

OUTPUT:

Time Complexity

Operation	Time complexity
`Push`	O(log n)
`Peek`	O(1)
`Pop`	O(log n)

O(log n) → Because complete binary tree contains:
- 2^height ~ n
- height = log n
- It means that every time we fix the heap, the height decreases by one level at each step, and in the worst case it requires moving a node from the bottom (height = 0) to the top (height = log n), which is O(log n) in the worst case.
  - Or the opposite direction, from top to bottom.

Use cases and trade-offs

Commonly used to implement priority queues. Priority queues can also be implemented using other data structures such as arrays.
- But with a sorted array, for example, you would have O(1) pop and O(n) insert, since inserting a new item in the correct position may require shifting all items in the worst case.
To implement a priority queue
- Use a queue if you will set up all inserts at once and then only perform reads.
- Use a binary heap or k-ary heap if you require faster inserts O(log n).
Widely used in greedy algorithms that require extracting the highest priority item at each step from a priority queue:
- Huffman coding (greedy + min-heap) - C code here
- Dijkstra’s algorithm (greedy + min-heap) - C++ code here, developed by Matheus Persch.
- And more…

In place `O(n)` build max heap instead of `O(n * log n)`

Github Repository

// theory: all items after n/2 are leaves: A[n/2+1, ..., n]
// practice: as indexes start at 0, n/2 is already a leaf
// so we start at (n/2)-1 until 0
    // --> from height 1 until root

function buildHeapInPlace(arr) {
  const arrSize = arr.length;
  for (let i = Math.floor(arrSize / 2) - 1; i >= 0; i--) {
    heapifyDown({ items: arr, currItemPosition: i });
  }
}

How do we know that items after n/2 are leaves?
- 1. We know that leaves have no children
  - a. left child = 2*i+1 ; right child = 2*i+2
- 2. If has no children: 2*i > n 2*i outside array bound n so there is no children
  - a. And then we find: i > n/2 , so all items A[(n/2)+1, …, n] are leaves
Why do we start from height=1 and not from leaves at height = 0?
- index = n/2, height 1 = first non leaf node;
- Because leaves (h=0) are already valid heaps (don't require fix), and heapifyDown only works to fix a parent with valid heaps as children.
- In a heap, each operation fixes one node at a time.
Why do we go until height=log n ?
- height = 1 → first non-leaf node
- height = log n → root
- We fix the heap bottom-up.
- When we reach height = log n, we have fixed the root, so the entire tree is a valid heap.

Why is it **O(n) and not O(n *log n)?**

Both heapifyDown and heapifyUp have complexity of O(log n)
heapifyDown(node): max number of required movements = distance from node to the bottom of the tree (expensive if nodes are at the top)
heapifyUp(node): max number of required movements = distance from node to the top of the tree (expensive if nodes are at the bottom)
Given an array, what would you prefer? Start building a heap from top or bottom?
- Starting at top: use heapifyUp() in all nodes
- Starting at bottom: use heapifyDown() in all nodes

Building the sum for each approach to analyze it better

At height=0 we have n/2 nodes, at height=1 n/4, and so on…

moves per level = (nodes at that height) * (max movements each node can do)
Starting at top with heapifyUp(), the sum of movements/operations would be:
- tree height = h = log n
- (n/2 * h) + (n/4 * (h-1)) + (n/8 * (h-2)) + ... + (1 * 0)
But starting at bottom with heapifyDown(), the sum of operations is:
(n/2 * 0) + (n/4 * 1) + (n/8 * 2) + ... + (1 * h)
Compare item by item in the both sums, you will notice, for example, that:
- n/2 * logn > n/2 * 0
- And starting at top makes much more movements for each level.
But we still haven't answered: why starting at bottom makes it O(n)?
- If you have at least basic calculus knowledge it will be easier, but I'll not make a formal demonstration, only show an intuitive explanation. k = current level

The sum doesn't depend on n it converges to a constant.
So it would be n*O(1) → O(n)
In summary, many nodes do small work, and few nodes do large work. The balance gives a linear total O(n).
- leaf nodes = 0 moves
- level 1 (above leaves) = at most 1 move
- …
- root = at most log n moves
For more: watch starting at 30:00 - MIT OpenCourseWare - Lecture 4: Heaps and Heap Sort (Srini Devadas, 2011)

Use case for build heap in-place

If you already have an array, it is better to build the heap in-place using bottom-up heap construction than inserting items one by one and applying heapifyDown multiple times.

buildInPlaceMaxHeap(array); // O(n)

// O(n*log n)
for(item of array) { // O(n)
    existentMaxHeap.insert(item); // O(log n)
}

// When to use for (item of arr) existentMaxHeap.insert(item)?
// 1. you don't receive the entire array at once, just streaming single items
// 2. you already have a big existent heap
    // let's say your heap contains 10k items
    // and you receive 10 items at once
    // it isn't a good idea to rebuild the entire heap again

Heapsort

I hope you can read that. Good luck.

Just kidding, I'll write it down here:
1. Build a MaxHeap in O(n) in place using an array.
- a. The heap area starts from 0 to n-1. (the entire array is a heap)
2. Swap the ROOT and the last item, then decrease the heap area by one.
- a. The root was the biggest element (inside the heap area).
- b. Now the biggest element is at the end (the correct place in the sorted array).
3. Heapify Down to fix the ROOT.
4. if (heapArea > 1) { return to step 2 } else { HALT }
- a. In the handwritten note I put heapArea > 0, but if only one element remains, this is the smallest one. (The paper is quite wrong.)
I really suggest you to watch the last minutes starting at 48:00 - MIT OpenCourseWare - Lecture 4: Heaps and Heap Sort (2011)

Heapsort code

Time complexity: O(n * log n)
Space complexity: O(1) → iterative version
Github repository

function heapsort(items) { // O(n*log n)
  buildHeapInPlace(items);

  let heapAreaEnd = items.length - 1;

  // while runs n-1 times = O(n)
  while (heapAreaEnd > 1) {
    swap(items, 0, heapAreaEnd);
    --heapAreaEnd;
    // O(log n)
    iterativeHeapifyDown({ items, currItemPosition: 0, lastHeapIndex: heapAreaEnd });
    // now heapifyDown receives the lastHeapIndex, to change only heapArea
  }
  // return items; // not needed since heapsort is in place
}

const arr = [1, 2, 9, 4, 5, 6];
console.log('before: arr', arr) // [ 1, 2, 9, 4, 5, 6 ]
heapsort(arr)
console.log('after arr', arr) // [ 1, 2, 4, 5, 6, 9 ]

// simplifying: get biggest item (root) and put it at the end of heap area
// now the last item is in the correct position
// --> fix new root that breaks heap property, heapifyDown()
// repeat until all items are in the correct position

Heapsort is an unstable sort:

Heapsort and Merge sort comparison

If heapsort has the same Big-O time complexity as merge sort, why is merge sort generally faster?
- merge sort: time O(n*log n), space O(n)
- Memory locality: heapsort jumps around memory (left child = 2*i+1, ...) while merge sort uses sequential memory, taking advantage of cache locality and CPU cache.
- Comparisons and data movements: heapsort needs to repeatedly compute left and right children, compare multiple nodes using the max heap a > b property, and swap elements. These are all constant factors that are not counted in Big-O notation, but in practice they matter.

EXTRA: Introsort and the use of heapsort

Quicksort is known for being one of the fastest non-hybrid comparison sorting algorithms.
But there is no one size fits all in computer science. Some standard libs such as in c++ mix multiple sort algorithms for reaching a better hybrid sorting algorithm with best performance.

Algorithm Time (Average) Time (Worst) Space (Average) Space (Worst)

Quicksort O(n log n) O(n^2) O(log n) O(n)

Introsort O(n log n) O(n log n) O(log n) O(log n)
Introsort is a hybrid sorting algorithm that uses:
- insertion sort for small arrays
- quicksort until a maximum recursion stack depth is reached
- heapsort after reaching the maximum recursion stack depth
  - Why not merge sort, since it is better than heapsort? Because heapsort is in-place, so we can continue the work that quicksort has already done.

Algorithm	Time (Average)	Time (Worst)	Space (Average)	Space (Worst)
Quicksort	`O(n log n)`	`O(n^2)`	`O(log n)`	`O(n)`
Introsort	`O(n log n)`	`O(n log n)`	`O(log n)`	`O(log n)`

Pseudocode:

Introsort(A):
    depth_limit = 2 * floor(log2(n))
    introsort(A, depth_limit)

introsort(A, depth_limit):
    if size(A) <= 16: // small array -> insertion sort
        insertion_sort(A)
        return

    if depth_limit == 0: // reached recursion depth limit -> heapsort
        heapsort(A)
        return

  // fallback: quicksort
    pivot = partition(A) // using median-of-three
    introsort(left of pivot, depth_limit - 1)
    introsort(right of pivot, depth_limit - 1)

DEV Community

Heap Data Structure

Introduction

Binary Heap

K-ary Heap

Pseudocode for Binary Heap → MinHeap

Binary Heap Code

Time Complexity

Use cases and trade-offs

In place `O(n)` build max heap instead of `O(n * log n)`

Why is it **O(n) and not O(n *log n)?**

Building the sum for each approach to analyze it better

Use case for build heap in-place

Heapsort

Heapsort code

Heapsort is an unstable sort:

Heapsort and Merge sort comparison

EXTRA: Introsort and the use of heapsort

References

Top comments (0)

Introduction

Binary Heap

K-ary Heap

Pseudocode for Binary Heap → MinHeap

Binary Heap Code

Time Complexity

Use cases and trade-offs

In place O(n) build max heap instead of O(n * log n)

Why is it O(n) and not O(n *log n)?

Building the sum for each approach to analyze it better

Use case for build heap in-place

Heapsort

Heapsort code

Heapsort is an unstable sort:

Heapsort and Merge sort comparison

EXTRA: Introsort and the use of heapsort

References

In place `O(n)` build max heap instead of `O(n * log n)`

Why is it **O(n) and not O(n *log n)?**