DEV Community

Cover image for Heap Data Structure
João Godinho
João Godinho

Posted on

Heap Data Structure

Introduction

  • This post covers binary heaps as a complete binary tree stored in an array, where each parent follows a priority rule (min-heap or max-heap). It explains the core operations push (heapifyUp) and pop (heapifyDown), the heap property, and provides an implementation.
  • It also covers trade-offs of using heaps as priority queues, real use in greedy algorithms like Dijkstra’s algorithm and Huffman coding, O(n) in-place heap construction, plus heapsort, and a discussion about introsort.

Binary Heap

  • A binary heap is a special case of a heap with the following properties:
    1. It is a complete binary tree (every level is full except the last, which is filled from left to right).
    2. Being a complete binary tree implies:
      • If a node is at index i:
      • Left child = 2*i + 1
      • Right child = 2*i + 2
      • Parent = (i - 1) / 2 (if not root)
    3. Heap property (what differentiates heaps):
      • The parent has higher priority than its children
      • Example: min-heap (highest priority = smallest value)
      • Parent ≤ both children
  • As you can see, the parent always has the highest priority.
  • This implies that the root (first node) always has the highest priority.

K-ary Heap

  • We could have a k-ary heap, the properties stay the same, only changes are:
    • If a node is at index i:
    • first child = k*i + 1
    • second child = k*i + 2
    • k-th child = k*i + k
    • Parent = (i - 1) / k (if not root)

Pseudocode for Binary Heap → MinHeap

  • We save heap items into an array: items = []
  • 1. PUSH inserts new nodes
    • a. Insert at end of array items.push(newItem), addedItemIndex = i
    • b. Now if we have more than 1 element we have to check property validity:
      • i. parent ≤ child
      • ii. if true → valid, do nothing
      • iii. if false → invalid, swap(parent, child)
      • iv. repeat until reach valid heap state.
    • We generally call this b step of fixing the tree when inserting as heapifyUp.
    • heapifyUp because we start from the bottom (after adding at the end of the heap), and fix the heap bottom-to-top by comparing the node with its parent.
  • 2. POP get root (highest priority element)
    • a. Remove root node and put the last added on its place.
    • b. Now if we have more than 1 element we have to check property validity:
      • i. parent ≤ both children
      • ii. if true → valid, do nothing
      • iii. if false → invalid, swap(parent, smallestChild)
      • iv. repeat until reach valid heap state.
    • Here we call it heapifyDown since we start on root and go down, comparing the parent with its children.

Binary Heap Code

class BinaryHeap {
  #items = [];
  #hasHigherPriorityFn;

  constructor({ items, hasHigherPriorityFn }) {
    if (!(items && items.length)) { throw Error("'items' field, is required") }
    // no types here in JS, so we trust hasHigherPriorityFn function and items array -> the focus is binary heap
    if (!hasHigherPriorityFn || typeof hasHigherPriorityFn !== "function") { throw Error("'hasHigherPriorityFn' function is required") }
    this.#hasHigherPriorityFn = hasHigherPriorityFn;

    this.#items = items;
    this.#buildHeapInPlace(this.#items);
  }

  #swap(addedIndex, parentIndex) {
    const temp = this.#items[addedIndex];
    this.#items[addedIndex] = this.#items[parentIndex];
    this.#items[parentIndex] = temp;
  }

  // time: O(log n), space O(log n) -> recursion, iterative is O(1) space
  #heapifyUp({ insertedItemCurrIndex }) {
    const parentIndex = Math.floor((insertedItemCurrIndex - 1) / 2);
    if (parentIndex < 0 || insertedItemCurrIndex == parentIndex) {
      return;
    }

    const insertedItem = this.#items[insertedItemCurrIndex];
    const shouldSwap = this.#hasHigherPriorityFn(insertedItem, this.#items[parentIndex]);
    if (shouldSwap) {
      this.#swap(insertedItemCurrIndex, parentIndex);
      this.#heapifyUp({ insertedItemCurrIndex: parentIndex });
    }
  }

  // time: O(log n), space O(log n) -> recursion, iterative is O(1) space
  #heapifyDown({ currItemPosition }) {
    let mostPriorityItemIndex = currItemPosition;
    const leftChildIndex = 2 * currItemPosition + 1;
    const rightChildIndex = 2 * currItemPosition + 2;

    const leftChildExists = leftChildIndex < this.#items.length;
    if (leftChildExists && this.#hasHigherPriorityFn(this.#items[leftChildIndex], this.#items[mostPriorityItemIndex])) {
      mostPriorityItemIndex = leftChildIndex;
    }

    const rightChildExists = rightChildIndex < this.#items.length;
    if (rightChildExists && this.#hasHigherPriorityFn(this.#items[rightChildIndex], this.#items[mostPriorityItemIndex])) {
      mostPriorityItemIndex = rightChildIndex;
    }

    // currItem = parent (start at index 0)
    // if parent already has the highest priority, do nothing
    // otherwise swap with highest priority child until heap is valid again
    if (mostPriorityItemIndex != currItemPosition) {
      this.#swap(currItemPosition, mostPriorityItemIndex);
      this.#heapifyDown({ currItemPosition: mostPriorityItemIndex });
    }
  }

  // O(n) because most heapifyDown calls run on small-height nodes.
  #buildHeapInPlace(arr) {
    const arrSize = arr.length;
    for (let i = Math.floor(arrSize / 2) - 1; i >= 0; i--) {
      this.#heapifyDown({ currItemPosition: i });
    }
  }

  // O(log n)
  push({ item }) {
    this.#items.push(item);
    if (this.#items.length >= 2) {
      this.#heapifyUp({ insertedItemCurrIndex: this.#items.length - 1 });
    }
  }

  // O(log n)
  pop() {
    const currLength = this.#items.length;
    if (currLength == 0) { return null; }

    const itemToReturn = this.#items[0];
    this.#items[0] = this.#items[currLength - 1]; // last item is new root
    this.#items.pop(); // remove last item from array

    if (currLength > 1) {
      this.#heapifyDown({ currItemPosition: 0 }); // fix top down
    }

    return itemToReturn;
  }

  // O(1)
  peek() {
    if (!this.#items.length) { return null; }
    return this.#items[0];
  }

  printItems() {
    console.log('this.#items', this.#items);
  }

  consumeAll(callback) {
    console.log(this.constructor.name, 'consumeAll:')
    console.log('this.#hasHigherPriorityFn.toString()', this.#hasHigherPriorityFn.toString())
    while (this.#items.length) {
      callback(this.pop());
    }
  }
}

// smallest item at root index = 0
class MinHeap extends BinaryHeap {
  constructor({ items }) {
    super({ items, hasHigherPriorityFn: (itemA, itemB) => itemA < itemB });
  }
}

// biggest item at root index = 0
class MaxHeap extends BinaryHeap {
  constructor({ items }) {
    super({ items, hasHigherPriorityFn: (itemA, itemB) => itemA > itemB });
  }
}

console.log('--------------------------------------------------------')

const minHeapTest = new MinHeap({ items: [8, 7, 6, 5, 4, 3, 2, 1] });
minHeapTest.printItems();
minHeapTest.consumeAll(console.log)

console.log('--------------------------------------------------------')

const maxHeap = new MaxHeap({ items: [1, 2, 3, 4, 5, 6, 7, 8] });
maxHeap.printItems();
maxHeap.consumeAll(console.log)

console.log('--------------------------------------------------------')

const people = [{ name: "João", age: 22 }, { name: "Turing", age: 41 }, { name: "Ada Lovelace", age: 36 }]
const elderlyPeopleFirstHeap = new BinaryHeap({ items: people, hasHigherPriorityFn: (a, b) => a.age > b.age });
elderlyPeopleFirstHeap.consumeAll(console.log)

console.log('--------------------------------------------------------')
Enter fullscreen mode Exit fullscreen mode
  • OUTPUT:

code output screenshot

Time Complexity

Operation Time complexity
Push O(log n)
Peek O(1)
Pop O(log n)
  • O(log n) → Because complete binary tree contains:
    • 2^height ~ n
    • height = log n
    • It means that every time we fix the heap, the height decreases by one level at each step, and in the worst case it requires moving a node from the bottom (height = 0) to the top (height = log n), which is O(log n) in the worst case.
      • Or the opposite direction, from top to bottom.

Use cases and trade-offs

  • Commonly used to implement priority queues. Priority queues can also be implemented using other data structures such as arrays.
    • But with a sorted array, for example, you would have O(1) pop and O(n) insert, since inserting a new item in the correct position may require shifting all items in the worst case.
  • To implement a priority queue
    • Use a queue if you will set up all inserts at once and then only perform reads.
    • Use a binary heap or k-ary heap if you require faster inserts O(log n).
  • Widely used in greedy algorithms that require extracting the highest priority item at each step from a priority queue:

In place O(n) build max heap instead of O(n * log n)

  • Github Repository

    // theory: all items after n/2 are leaves: A[n/2+1, ..., n]
    // practice: as indexes start at 0, n/2 is already a leaf
    // so we start at (n/2)-1 until 0
        // --> from height 1 until root
    
    function buildHeapInPlace(arr) {
      const arrSize = arr.length;
      for (let i = Math.floor(arrSize / 2) - 1; i >= 0; i--) {
        heapifyDown({ items: arr, currItemPosition: i });
      }
    }
    
  • How do we know that items after n/2 are leaves?

    • 1. We know that leaves have no children
      • a. left child = 2*i+1 ; right child = 2*i+2
    • 2. If has no children: 2*i > n 2*i outside array bound n so there is no children
      • a. And then we find: i > n/2 , so all items A[(n/2)+1, …, n] are leaves
  • Why do we start from height=1 and not from leaves at height = 0?

    • index = n/2, height 1 = first non leaf node;
    • Because leaves (h=0) are already valid heaps (don't require fix), and heapifyDown only works to fix a parent with valid heaps as children.
    • In a heap, each operation fixes one node at a time.
  • Why do we go until height=log n ?

    • height = 1 → first non-leaf node
    • height = log n → root
    • We fix the heap bottom-up.
    • When we reach height = log n, we have fixed the root, so the entire tree is a valid heap.

Why is it O(n) and not O(n *log n)?

  • Both heapifyDown and heapifyUp have complexity of O(log n)
  • heapifyDown(node): max number of required movements = distance from node to the bottom of the tree (expensive if nodes are at the top)
  • heapifyUp(node): max number of required movements = distance from node to the top of the tree (expensive if nodes are at the bottom)
  • Given an array, what would you prefer? Start building a heap from top or bottom?
    • Starting at top: use heapifyUp() in all nodes
    • Starting at bottom: use heapifyDown() in all nodes

Building the sum for each approach to analyze it better

  • At height=0 we have n/2 nodes, at height=1 n/4, and so on…

heap height visualization log n

  • moves per level = (nodes at that height) * (max movements each node can do)
  • Starting at top with heapifyUp(), the sum of movements/operations would be:
    • tree height = h = log n
    • (n/2 * h) + (n/4 * (h-1)) + (n/8 * (h-2)) + ... + (1 * 0)
  • But starting at bottom with heapifyDown(), the sum of operations is:
  • (n/2 * 0) + (n/4 * 1) + (n/8 * 2) + ... + (1 * h)
  • Compare item by item in the both sums, you will notice, for example, that:
    • n/2 * logn > n/2 * 0
    • And starting at top makes much more movements for each level.
  • But we still haven't answered: why starting at bottom makes it O(n)?
    • If you have at least basic calculus knowledge it will be easier, but I'll not make a formal demonstration, only show an intuitive explanation. k = current level

building heap sum series

  • The sum doesn't depend on n it converges to a constant.
  • So it would be n*O(1) → O(n)
  • In summary, many nodes do small work, and few nodes do large work. The balance gives a linear total O(n).
    • leaf nodes = 0 moves
    • level 1 (above leaves) = at most 1 move
    • root = at most log n moves
  • For more: watch starting at 30:00 - MIT OpenCourseWare - Lecture 4: Heaps and Heap Sort (Srini Devadas, 2011)

Use case for build heap in-place

  • If you already have an array, it is better to build the heap in-place using bottom-up heap construction than inserting items one by one and applying heapifyDown multiple times.

    buildInPlaceMaxHeap(array); // O(n)
    
    // O(n*log n)
    for(item of array) { // O(n)
        existentMaxHeap.insert(item); // O(log n)
    }
    
    // When to use for (item of arr) existentMaxHeap.insert(item)?
    // 1. you don't receive the entire array at once, just streaming single items
    // 2. you already have a big existent heap
        // let's say your heap contains 10k items
        // and you receive 10 items at once
        // it isn't a good idea to rebuild the entire heap again
    

Heapsort

  • I hope you can read that. Good luck.

handwritten notes about heapsort with my awkward handwriting

  • Just kidding, I'll write it down here:
  • 1. Build a MaxHeap in O(n) in place using an array.
    • a. The heap area starts from 0 to n-1. (the entire array is a heap)
  • 2. Swap the ROOT and the last item, then decrease the heap area by one.
    • a. The root was the biggest element (inside the heap area).
    • b. Now the biggest element is at the end (the correct place in the sorted array).
  • 3. Heapify Down to fix the ROOT.
  • 4. if (heapArea > 1) { return to step 2 } else { HALT }
    • a. In the handwritten note I put heapArea > 0, but if only one element remains, this is the smallest one. (The paper is quite wrong.)
  • I really suggest you to watch the last minutes starting at 48:00 - MIT OpenCourseWare - Lecture 4: Heaps and Heap Sort (2011)

Heapsort code

  • Time complexity: O(n * log n)
  • Space complexity: O(1) → iterative version
  • Github repository
function heapsort(items) { // O(n*log n)
  buildHeapInPlace(items);

  let heapAreaEnd = items.length - 1;

  // while runs n-1 times = O(n)
  while (heapAreaEnd > 1) {
    swap(items, 0, heapAreaEnd);
    --heapAreaEnd;
    // O(log n)
    iterativeHeapifyDown({ items, currItemPosition: 0, lastHeapIndex: heapAreaEnd });
    // now heapifyDown receives the lastHeapIndex, to change only heapArea
  }
  // return items; // not needed since heapsort is in place
}

const arr = [1, 2, 9, 4, 5, 6];
console.log('before: arr', arr) // [ 1, 2, 9, 4, 5, 6 ]
heapsort(arr)
console.log('after arr', arr) // [ 1, 2, 4, 5, 6, 9 ]

// simplifying: get biggest item (root) and put it at the end of heap area
// now the last item is in the correct position
// --> fix new root that breaks heap property, heapifyDown()
// repeat until all items are in the correct position
Enter fullscreen mode Exit fullscreen mode

Heapsort is an unstable sort:

heapsort visual unstable sort explanation

Heapsort and Merge sort comparison

  • If heapsort has the same Big-O time complexity as merge sort, why is merge sort generally faster?
    • merge sort: time O(n*log n), space O(n)
    • Memory locality: heapsort jumps around memory (left child = 2*i+1, ...) while merge sort uses sequential memory, taking advantage of cache locality and CPU cache.
    • Comparisons and data movements: heapsort needs to repeatedly compute left and right children, compare multiple nodes using the max heap a > b property, and swap elements. These are all constant factors that are not counted in Big-O notation, but in practice they matter.

EXTRA: Introsort and the use of heapsort

  • Quicksort is known for being one of the fastest non-hybrid comparison sorting algorithms.
  • But there is no one size fits all in computer science. Some standard libs such as in c++ mix multiple sort algorithms for reaching a better hybrid sorting algorithm with best performance.

    Algorithm Time (Average) Time (Worst) Space (Average) Space (Worst)
    Quicksort O(n log n) O(n^2) O(log n) O(n)
    Introsort O(n log n) O(n log n) O(log n) O(log n)
  • Introsort is a hybrid sorting algorithm that uses:

    • insertion sort for small arrays
    • quicksort until a maximum recursion stack depth is reached
    • heapsort after reaching the maximum recursion stack depth
      • Why not merge sort, since it is better than heapsort? Because heapsort is in-place, so we can continue the work that quicksort has already done.
  • Pseudocode:

    Introsort(A):
        depth_limit = 2 * floor(log2(n))
        introsort(A, depth_limit)
    
    introsort(A, depth_limit):
        if size(A) <= 16: // small array -> insertion sort
            insertion_sort(A)
            return
    
        if depth_limit == 0: // reached recursion depth limit -> heapsort
            heapsort(A)
            return
    
      // fallback: quicksort
        pivot = partition(A) // using median-of-three
        introsort(left of pivot, depth_limit - 1)
        introsort(right of pivot, depth_limit - 1)
    

References

Top comments (0)