krlz

Posted on Mar 22, 2023

Must read: Optimal Purely Functional Priority Queues

#scala #datascience #programming #beginners

Gerth Stølting Brodal is a Danish wizard in the world of computer science. He's a proper professor at Aarhus University, Denmark, known for being top of the class in algorithmic and data structure research. When it comes to designing algorithms for computational geometry, graph theory, and data compression, he's like a dark lord. He's covered his wall with top-notch articles in scientific journals and conference proceedings, and he's been given loads of awards and honours for his whizz-kidry in the field of computer science.

So basically, this report "Optimal Purely Functional Priority Queues" by Gerth Stølting Brodal talks about a really cool way to organize information called a "purely functional priority queue". It's a way to keep track of things and quickly find the most important ones. And get this, it's super fast! It has really fast access time, decent insertion time, and good deletion time (O(1) access time, O(log n) insertion time, and O(log n) deletion time), even when dealing with larger sets of data.

The way this works is by using something called a "treap", which is like a binary tree that can do two things at once: keep things in order and find the most important thing. They also use some fancy technology (a binary tree that maintains a heap invariant and a search tree invariant simultaneously) to make this structure fully functional, which basically means it's really reliable and won't get messed up by other parts of the program (The implementation uses techniques such as lazy evaluation and persistent data structures).

Refreshing our knowledge: What is a Binary tree?!

A binary tree is a hierarchical data structure in computer science where each node has at most two children, referred to as the left child and right child. The left child is always less than the parent node, while the right child is always greater than the parent node.

Here's an example of a simple binary tree in Scala:



sealed trait BinaryTree[+A]
case object EmptyTree extends BinaryTree[Nothing]
case class Node[A](value: A, left: BinaryTree[A], right: BinaryTree[A]) extends BinaryTree[A]

object BinaryTree {
def insert[A](value: A, tree: BinaryTree[A])(implicit ordering: Ordering[A]): BinaryTree[A] = tree match {
case EmptyTree => Node(value, EmptyTree, EmptyTree)
case Node(v, l, r) if ordering.lt(value, v) => Node(v, insert(value, l), r)
case Node(v, l, r) => Node(v, l, insert(value, r))
}
}

This is a simple immutable binary tree that takes a type parameter "A" for the value of its nodes. The sealed trait and the case classes create an algebraic data type (ADT) that defines the tree structure. The insert method takes a value and a tree and returns a new tree that includes the new value. The Ordering typeclass is used to compare values and makes this tree suitable to contain elements that can be ordered.

Note that this is just a basic example, and there are many variations, extensions, and optimizations that can be added to binary trees in Scala or any other programming language.

Going to the text!

Here's a brief and a bit formal summary of the main ideas in the text:

A purely functional priority queue is a data structure that maintains a set of elements and provides efficient access to the element with the highest priority.
A treap is a binary tree that maintains a heap invariant and a search tree invariant simultaneously.
The implementation of the purely functional priority queue is based on a treap, and uses techniques such as lazy evaluation and persistent data structures to make the data structure purely functional.
The implementation has O(1) access time, O(log n) insertion time, and O(log n) deletion time in the worst case.

Now in simple context;

A "purely functional priority queue"? Basically, it's a way to organize a bunch of things and quickly find and access the most important one. Kinda like how a bouncer at a club picks out the VIPs from the crowd.
A "treap". Picture a binary tree that does two things at once: it keeps things organized, and it can find the most important thing quickly. It's like if your address book was both alphabetical and sorted by your favorite contacts.
The "lazy evaluation" and "persistent data structures". This means that even if you change or delete stuff in the structure, it'll still keep working as it should.
O(1) access time, O(log n) insertion time, and O(log n) deletion time? This is lightning-fast!.

The big Oh notation!

Okay, let me explain the big oh notation in simpler terms. It's a way computer scientists describe how long an algorithm will take to run depending on how much input it gets. Picture you're a chef trying to cook a meal: the more food you get, the longer it takes to cook it, right? Well, algorithms work the same way! The big oh notation uses fancy letters like "O(n)" and "O(n^2)" to explain how many steps the algorithm needs to take, depending on how much input it gets. Computer scientists use this notation to figure out how fast or slow an algorithm will be and to compare different algorithms to see which one is the most efficient.

If you're interested in learning more about algorithm complexities, there are many great books available that cover the topic in depth. Here are a few suggestions:

"Introduction to Algorithms" by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein: This is a well-known textbook on algorithms and data structures, and is widely used in university computer science courses. It covers foundational topics such as sorting algorithms, graph algorithms, and dynamic programming.
"Algorithm Design" by Jon Kleinberg and Éva Tardos: This book provides a thorough introduction to algorithm design, with an emphasis on mathematical rigor and practical applications. It covers topics such as greedy algorithms, network flow algorithms, and randomized algorithms.
"Computational Complexity: A Modern Approach" by Sanjeev Arora and Boaz Barak: This is a more advanced textbook that covers the theoretical foundations of computational complexity theory. It delves into topics such as NP-completeness, cryptography, and quantum computing.

All of these books are excellent resources for learning about algorithm complexity, and are sure to provide you with a solid foundation for understanding this core computer science concept.

Going back to the text!

In this case, O(1) access time means that accessing an element in the data structure takes constant time, regardless of the amount of data present. This is the best possible scenario, and means that it takes the same amount of time to access an element whether there are ten or ten million elements in the structure.

O(log n) insertion and deletion time refer to the efficiency of adding or removing an element from the data structure. In a data structure with a log(n) time complexity, as the amount of data n grows, the time it takes to perform the operation grows logarithmically.

So, O(log n) insertion time means that the time it takes to insert an element into the data structure grows more slowly than the number of elements in the structure, and O(log n) deletion time means that the time it takes to remove an element also grows more slowly than the number of elements in the structure.

Essentially, a data structure with these time complexities sounds really efficient, even with a lot of data.

EXAMPLE of a purely functional priority queue in Scala

Here's a simple example of how to use the purely functional priority queue implementation:



import scala.util.Random
import scala.collection.immutable.{TreeSet, Queue}

val emptyQueue: Queue[Int] = Queue.empty[Int]

val random = new Random()
val elements = Seq.fill(10000)(random.nextInt())

val queue = elements.foldLeft(emptyQueue)((q, e) => q.enqueue(e))
val sortedQueue = TreeSet.empty[Int] ++ queue

println(sortedQueue.head) // prints the highest-priority element

In this example, we first create an empty queue. We then generate 10,000 random integer elements, and add them to the queue one by one using the enqueue method.

We create a sorted set (TreeSet) from the elements in the queue, which allows us to retrieve the highest-priority element using the head method.

This example demonstrates the use of a purely functional priority queue implemented using a treap data structure for efficient access to the highest-priority element.

Conclusions

I gotta admit, the topic of "Optimal Purely Functional Priority Queues" by Gerth Stølting Brodal was touch and go for me, especially since I'm more of a software engineering person than a data scientist.

I'm currently enrolled in a full Scala specialization course by Odersky on Coursera, my decision to enroll in a course aimed at data scientists was primarily fueled by my intense curiosity. Although I come from a software engineering background, I've always been fascinated by the world of data science and was eager to gain a deeper understanding of it. However, since the course covered a lot of complex concepts, it wasn't always easy to follow along. One of the lectures went fully in deep into the text "Optimal Purely Functional Priority Queues" by Gerth Stølting Brodal and I found it to be an invaluable resource, as it helped me to further deepen my understanding of priority queues and treap data structures.

The text explains how to create a highly efficient data structure called the purely functional priority queue, which utilizes the treap data structure to provide lightning-fast performance while being easy to use. Although the text assumes a certain amount of prior knowledge on topics such as binary heaps and binary search trees, algorithm analysis, functional programming, and data structures.

Understanding mathematical models in data science can be like learning magic spells as a wizard. By incorporating these models into your work, you can cast more powerful spells (write better algorithms) to analyze and visualize data. Knowing these "spells" can make you invaluable to your team and help you make sense of data in ways you didn't think possible before. So, let's get our wizard hats on and start learning!

Some additional references for these topics include:

"Introduction to Algorithms" by Cormen, Leiserson, Rivest, and Stein: A comprehensive textbook that covers algorithms and data structures, including binary heaps and binary search trees.
"Functional Programming Principles in Scala" by Martin Odersky: A free online course that covers the principles of functional programming in Scala.
"Purely Functional Data Structures" by Chris Okasaki: A book that explores the implementation of purely functional data structures, including binary search trees and heaps, in Haskell.

Top comments (1)

gpisano97 • Mar 22 '23

Have an idea of algorithm complexity is a very important skill for a developer ! Don't forget that every algo must have a complexity equal to at most O(n^2). Have an idea of complexity can be very easy... for each nested "for loop" multiplicate complexity by N, so 3 nested loop has a complexity of O(n^3).... recursive algorithm based on Divide et impera are usually O(logn) etc. To build an efficient software take a look on complexity!!

DEV Community

Must read: Optimal Purely Functional Priority Queues

Top comments (1)

Read next

Microsoft's Phi-4: Smaller AI Model Achieves Big Results Through Clean Training Data

How to Deploy a Windows Server 2022 Virtual Machine on Microsoft Azure

Finally got some time to play with the new JSONata and Variables support for Step Functions, and I have to say, it is massive improvement. Check out my latest blog post, where I walk through a simple example of how easy it is to handle pagination now

React 19: New hook useActionState