Jawad Ahmed

Posted on Jan 5

The Algorithms Cheat Sheet: Sorting, Searching, Graphics & More

#algorithms #beginners #learning #tutorial

Selection Sort

Selection sort is a simple sorting algorithm that organizes an unsorted list by repeatedly finding the smallest (or largest) element and moving it to its correct position. Let’s assume that we have a a couple of cards that we want to sort in increasing order:

Start with the entire stack, pick out the smallest card, and place it on the left side (the start of the sorted section).
Go through the rest of the cards, find the next smallest card, and move it next to the sorted section. Each time, the sorted section on the left grows by one card.
Keep doing this until there are no cards left in the unsorted pile.

Selection sort is straightforward but can be slow on large lists because it goes through the list repeatedly to find the next smallest item. It’s best for small or nearly sorted datasets since its time complexity is O(n^2).

Insertion Sort

In terms of sorting cards:

You start with the 2nd element in the array or list (we consider the 1st element already ‘sorted’).
Compare it with the elements before it and insert it into the correct position among the already sorted (prior) elements.
Repeat the process until the entire array is sorted.

It’s similar to selection sort, but instead of going through the deck each time to find the next smallest (or largest element) — you simply expand the ‘sorted section’ located to the left by one card and move the new card to its correct position. As expected, the time complexity of this sort algorithm is O(n^2).

Heap Sort

Heap sort uses a data structure called a binary heap to organize elements. It is similar to selection sort in which we first find the maximum element and put it at the end of the data structure. Here’s a quick rundown of how it works:

First, build a max-heap (a complete binary tree where each parent node is greater than its children). This step arranges the largest element at the root of the heap.
Swap the root (largest element) with the last element in the array. Then, reduce the size of the heap by one (ignoring the last element) and heapify the root to restore the max-heap property.
Continue swapping the root with the last unsorted element and re-heapifying until the heap size is reduced to one. At this point, the array is fully sorted in ascending order.

The time complexity of heap sort is O(n log n).

Quick Sort

A highly efficient divide-and-conquer algorithm that is commonly used in practice due to its average-case performance of O(n log n) for which you can find a full explanation for here. Essentially:

You select an element which serves as a pivot.
Place all elements less than the pivot on the left and all elements greater than the pivot on the right.
Recursively apply the same process to the left and right sub-arrays until the entire array is sorted.

The time complexity of quicksort is O(n log n).

Merge Sort

This is very similar to quick sort but with no pivot element and is stable. A full overview of this algorithm can be found here. To perform a merge sort:

Recursively split the array into two halves until each sub-array contains only one element.
Combine the sub arrays and sort them while merging.
Continue merging until the entire array is sorted.

The time complexity of merge sort is O(n log n). Use merge sort when you need a stable sorting algorithm, are working with linked lists, or sorting large datasets that might not fit entirely in memory (external sorting), while quicksort is generally preferred for most in-memory sorting scenarios due to its faster average performance and lower space complexity, especially when dealing with random data.

Tim Sort

Tim sort is a hybrid sorting algorithm that combines merge sort and insertion sort. It’s designed to be efficient with real-world data, particularly data that's already partially sorted. Here’s a breakdown of how it works:

Divide the Array: It divides the data into small chunks called runs. Each run is sorted using insertion sort, which is fast for small, nearly-sorted segments.
Merge Runs: The sorted runs are then merged together in a way similar to merge sort, forming larger and larger sorted sections until the entire array is sorted.

Tim sort is optimized for practical use and is the default sorting algorithm in languages like Python and Java. The time complexity of as expected is O(n log n). You can find an excellent overview of it here: Understanding Timsort.

Search Algorithms

Binary Search

A fast searching algorithm that works on sorted arrays and has a time complexity of O(log n). It is widely used due to its simplicity and speed and works by continuously dividing up the search space in half.

Think of it as opening a dictionary at the mid-point, determining whether the word we’re searching for is to the left (less than the middle element) or to the right (larger than the middle element) and continuously re-opening our new search (or page) space again. A full overview of this algorithm is available here.

Depth-First Search (DFS) and Breadth-First Search (BFS)

These are fundamental graph traversal algorithms used in networking, pathfinding, and artificial intelligence.

Breadth First Search explores all nodes at the present "depth" level before moving on to nodes at the next depth level.
Depth First Search explores as far down a branch as possible before backtracking to explore other branches.

Use DFS for deep exploration, backtracking, and memory efficiency.
Use BFS for shortest paths, level-wise traversal, and broader searches.

DFS is better for constrained memory; BFS ensures the shortest path in unweighted graphs.

Graph Algorithms

Prim’s Algorithm

Prim’s algorithm finds the minimum spanning tree (MST) of a graph by starting with a random node and repeatedly adding the smallest edge that connects a node inside the tree to one outside of it. The process continues until all nodes are connected, ensuring the total edge weight is minimized. It’s used for optimizing network designs like roads or cables, minimizing the total cost.

Kruskal’s Algorithm

Kruskal’s algorithm also finds the minimum spanning tree (MST) of a graph, but it works differently from Prim’s algorithm. It starts by sorting all the edges by weight. Then, it adds the smallest edge to the tree, but only if it doesn’t form a cycle. This process is repeated until all nodes are connected, and the tree has the minimum total edge weight.

Prim's algorithm is significantly faster when you have a really dense graph with many more edges than vertices. Kruskal performs better in typical situations (sparse graphs) because it uses simpler data structures.

Dijkstra’s Algorithm

Used to find the shortest path between a source node and all other nodes in a weighted graph with non-negative weights. A step-by-step overview of the algorithm is provided below:

Initialize Distances: Set the starting node’s distance to 0 and all other nodes to infinity (∞). Mark all nodes as unvisited.
Select Node: Choose the unvisited node with the smallest distance (starting with the starting node).
Update Neighbors: For each unvisited neighbor, calculate its distance through the current node. If this distance is smaller than its current distance, update it.
Mark as Visited: Mark the current node as visited (the shortest path to it is now known).
Repeat: Repeat steps 2-4 until all nodes are visited or you’ve reached the target.

An example graph we’ll go through to demonstrate the algorithm is provided below:

We start off at node d (left-most node) in our graph and update our weights accordingly.

(Note: We skipped 2 iterations in our outline above (visitations to Nodes c and g) since no outbound edges (neighbors) are present within these nodes — but a regular iteration of Dijkstra’s algorithm would include those nodes as well.)

The updated weights after each iteration are provided below:

Dijkstra’s algorithm is crucial because it optimally solves shortest path problems in graphs, making it applicable to real-world scenarios where efficiency and correctness are key.

Bellman-Ford Algorithm

Bellman Ford algorithm helps us find the shortest path from a vertex to all other vertices of a weighted graph. It is similar to Dijkstra's algorithm but it can work with graphs in which edges can have negative weights.

It works by iterating over all edges multiple times, progressively relaxing edge weights to find the shortest path. After V - 1 iterations (where V is the number of vertices), the shortest paths are determined. A final iteration checks for negative weight cycles.

A simple example of the Bellman-Ford algorithm in action:

The pseudo-code which demonstrates the difference between the Bellman-Ford and Dijkstra’s algorithm is provided below:

As noted - the algorithms are similar to Dijkstra’s algorithm but include an extra step to detect negative cycles.

A* Search

A* (pronounced "A-star") is a pathfinding algorithm (similar to Dijkstra’s) that efficiently finds the shortest path between two points in a graph. It is an enhancement of Dijkstra’s algorithm, combining the benefits of shortest path search and heuristic-based search to speed up performance. It finds the shortest path by balancing actual cost and estimated distance to the goal.

Imagine you're walking through a city and trying to find the shortest route to a coffee shop.

You know how far you’ve walked so far.
You look at the map and estimate how close each road gets you to the coffee shop.
Instead of randomly checking all streets, you prioritize the ones that seem best.
When you reach the coffee shop, you trace back the route you took!

A* is widely used in AI, game development, robotics, and navigation systems because it finds the shortest path faster than Dijkstra's algorithm in most practical cases. I highly recommend Introduction to the A* Algorithm from Red Blob Games.

Union-Find Algorithm (Disjoint Set Union (DSU))

It is actually more of a data structure than an algorithm and helps efficiently manage and group items into sets. It’s particularly useful for finding whether two elements are in the same set and for uniting two sets. It’s commonly used in algorithms like Kruskal’s to detect cycles in a graph and efficiently manage components.

It supports two main operations:

Find: Determines which set an element belongs to by returning its "leader" or "representative." If two elements contain the same “leader” - we know they belong to the same set.
Union: Merges two sets into one. This is done simply by finding the leaders (or roots) of the two sets, and then connecting one root to the other, merging them into a single set. For efficiency, you should always attach the smaller set under the larger one.

It efficiently tracks dynamic connectivity in graphs and sets, making it essential for problems like Kruskal’s MST, cycle detection, clustering, and network connectivity and is widely applied in graph algorithms.

Ford-Fulkerson Algorithm

Finds the maximum flow in a flow network (i.e. where you want to send the most flow (e.g., water, data) from a source to a sink) through a network of edges with capacities. It works by:

Starting with zero flow in all edges.
Looking for augmenting paths—paths from the source to the sink where more flow can be pushed, considering available capacity.
Adding the flow along this path and adjusts the capacities (both forward and backward).
Repeating until no more augmenting paths can be found.

As an example - let’s use the graph below:

The algorithm uses a special representation of a graph where each original edge has a reverse edge in another direction. The weight of each edge indicates how much more flow we could route to it. Initially, the weight of each reverse edge is set to 0.

Then, we need to perform several iterations of our algorithm, which finds paths from the source (node 1) to our sink (node 6) such that each edge within the path has a positive weight. For example, let’s choose the following path as our first iteration:

After choosing a path - we increase the flow by the smallest edge weight within our path. In the above instance - our smallest edge weight is 2, so the flow increases by 2 and the new graph is updated:

The algorithm keeps increasing the flow as long as there is a path from the source to the sink through positive weight-edges — and also allows us to decrease the amount of flow that can go through certain edges if its beneficial to re-route the flow through another path.

The Ford-Fulkerson algorithm is important since it has many applications in various fields like network routing, supply chain optimization, and image segmentation.

Compression and Encoding Algorithms

Value Encoding

Value encoding is a type of encoding which we can use to reduce storage requirements. It’s often used by columnar stores in order to reduce the memory requirements.

Imagine that we have a column containing the price of various products, all stored as integer values. The column contains many different values, and to represent all of them, we need a defined number of bits.

In the figure above, we can see that the maximum value for the price is 216. Therefore, we will need at least 8 bits to store each value. Nevertheless, by using a simple mathematical operation, we can reduce the storage to 5 bits! By subtracting the minimum value (194) from all the values of the column, it could modify the range of the column, reducing it to a range from 0 to 22.

Dictionary Encoding

Dictionary encoding is a technique which can be used to reduce the number of bits required to store columnar data. Dictionary encoding builds a dictionary of the distinct values of a column and then it replaces the column values with indexes to the dictionary.

Huffman Coding

Huffman coding is a method of data compression that reduces the number of bits needed to represent data. It’s commonly used for compressing text, images, and audio files. The algorithm gives shorter binary codes to frequent symbols and longer codes to rare ones by:

Counting how often each character / symbol appears.
Using this information to build a binary tree (called a Huffman tree) which assigns short codes to frequent symbols and longer codes to rare ones.

You can find a more thorough overview of how exactly the algorithm works here: Grokking Huffman Coding.

An example construction of a valid Huffman tree (or encoding) is provided below:

Lempel-Ziv (LZ) Compression

The Lempel-Ziv (LZ) algorithm is a lossless compression method that replaces repeated patterns with references to earlier occurrences. Unlike Huffman coding, LZ builds its dictionary dynamically in a single pass, making it more efficient in some cases.

There are many variations of Lempel Ziv (LZ77, LZ78, LZW), but they all follow the same basic idea:

Parse the sequence into distinct phrases.
For each phrase we see, we stick it in a dictionary.
The next time we want to send it, we don’t send the entire phrase, but just the number (reference) of this phrase.

LZ77 is the basis for widely used compression methods like DEFLATE (used in ZIP, PNG, Gzip), LZW (used in GIF), and LZMA (used in 7z).

Bitmap Index & Run-Length Encoding

A Bitmap Index is a way to compress and quickly search through columns of data in databases. For each unique value in a column, it creates a sequence of bits (0s and 1s) that indicate whether a row has that value.

Data warehouses use columnar storage to store data - which tends to save a lot of space since columns with many duplicate elements can be encoded in a more efficient way.

Run-Length Encoding (RLE) compresses data by replacing consecutive repeated values with a single value and the count of its repetitions.
Example: AAAAABBBCC -> 5A 3B 2C.

Burrows-Wheeler Transform

The Burrows-Wheeler Transform (BWT) rearranges a string into a form that’s easier to compress by grouping similar characters together. It does not compress data itself but prepares it for algorithms like RLE or Huffman.

It works by:

Generating all rotations of the string.
Sorting these rotations lexicographically.
Taking the last column of the sorted rotations as the transformed string.

Fourier Transform & FFT

The Fourier Transform is crucial for analyzing and compressing signals in audio processing, image analysis, and communications. In essence, it decomposes a signal into its constituent frequencies (sine waves).

(https://devincody.github.io/Blog/post/an_intuitive_interpretation_of_the_fourier_transform/)
Image Source: Devin Cody

The Fast Fourier Transform (FFT) is an optimized algorithm for computing the Fourier Transform quickly (O(n log n) instead of O(n^2)).

The FFT’s speed makes it indispensable in signal processing, medical imaging (MRI), and efficient simulation of physical systems.

Discrete Cosine Transform (DCT) & JPEG

DCT is similar to Fourier Transform but leads to better information compression for real-world data sets by using only real cosine functions.

JPEG compression uses DCT:

Divides the image into 8x8 pixel blocks.
Each block undergoes DCT to separate fine details (high frequency) from broad features (low frequency).
Quantization: High-frequency details are significantly reduced or eliminated to save data (this is the "Lossy" part).
Compressed using RLE and Huffman coding.

Video Compression & Encoding

A video is a sequence of images (frames). Video compression exploits two types of redundancy:

Intra-frame: Compression within a single picture (like JPEG).
Inter-frame: Compression between frames (since often only small parts of the scene move).

Key techniques include Motion Estimation (predicting movement of objects) and Motion Compensation (encoding only the difference/residuals).

Optimization & Graphics

Ray Tracing

Ray tracing simulates how light behaves to create highly realistic images. It traces the path of light rays from the camera back into the 3D scene. Calculations account for reflection, refraction, and shadows.

Simplex Method

The simplex algorithm is the preeminent tool for solving some of the most important mathematical problems arising in business, science, and technology. These problems are called linear programs - and for them, we try to maximize (or minimize) a linear function subject to linear constraints. An example is the diet problem posed by the U.S. Air Force in 1947: find quantities of seventy-seven differently priced foodstuffs to satisfy a man’s minimum daily requirements for nine nutrients (protein, iron, etc.) at least cost. George Dantzig invented the simplex algorithm in 1947 as a means of solving the Air Force’s diet problem and the word “program” was not yet used to mean computer code, but was a military term for a logistic plan or schedule.

The algorithm is based on the fact that if a linear programming (LP) problem has a solution, the optimal solution (best value) is always at a corner or vertex of the feasible region (the area where all constraints are satisfied). This feasible region is called a polytope, and since the shape resembles a "simplex" (a geometric shape), the algorithm gets its name from that.

A great and more elaborate example of using this method to solve a real problem is provided here:

The Simplex Method - Finding a Maximum / Word Problem Example

Newton's Method

Newton's method is an optimization algorithm used to find the local minimum or maximum of a function. It uses the first and second derivatives (gradient and Hessian) to iteratively update the current guess for the minimum.

Here’s the process in a nutshell:

Start with a guess for the solution.
Find the slope (gradient) and curvature (second derivative) of the function at that point.
Update your guess using the slope and curvature to get closer to the minimum.
Repeat until the guess doesn’t change much anymore.

I won’t go into the full details here, but you can find a more thorough explanation on Newton’s method here: Understanding Optimization Algorithms: Newton's Method

Simulated Annealing

Inspired by metallurgy (heating and cooling metal), this algorithm finds a global optimum by initially accepting "worse" solutions (high temperature) to avoid getting stuck in local optima, and gradually accepting fewer bad solutions as it "cools."

Machine Learning & Data Science

Regression (Linear, Logistic, Polynomial)

Linear Regression: Fits a straight line to data to predict a continuous outcome (e.g., housing prices).
Logistic Regression: Models the probability of a binary outcome (e.g., Spam vs Not Spam).
Polynomial Regression: Uses a curved line to model complex non-linear relationships.

Support Vector Machines (SVM)

SVMs find the "hyperplane" that best separates data into different classes by maximizing the margin between the classes. They are effective in high-dimensional spaces.

SVM Example

Decision Trees & Random Forests

Decision Tree: A flowchart-like structure where internal nodes represent tests on features (e.g., "Is Age > 18?").
Random Forest: An ensemble method that builds many decision trees and averages their predictions to reduce error and overfitting.
Boosted Trees: Builds trees sequentially, where each new tree corrects the mistakes of the previous ones.

Neural Networks & Deep Learning

Neural networks mimic the human brain. They consist of layers of neurons that process inputs, apply weights, and pass results through activation functions.

CNNs (Convolutional Neural Networks): The kings of image processing. They use filters to detect edges, textures, and shapes.
RNNs & LSTMs: Designed for sequential data (Time-series, text) as they have "memory".
Transformers: The architecture behind ChatGPT and DeepSeek. They use attention mechanisms to handle long-range dependencies efficiently.

Transformer Architecture

Reinforcement Learning (RL)

In RL, an agent learns to make decisions by performing actions in an environment and receiving rewards or penalties.

Q-Learning: The agent learns a value table (Q-table) for every state-action pair.
Deep Q-Networks (DQN): Uses Neural Networks to approximate Q-values (necessary for complex environments like video games).

Security & Cryptography

SHA (Secure Hash Algorithms)

SHA transforms any input (password, file, text) into a fixed-size string of characters called a hash. It is a "one-way" function—you cannot retrieve the original data from the hash. It is vital for password storage and data integrity.

SHA Diagram

RSA Algorithm

RSA enables secure communication over insecure networks using Public-Key Cryptography.

Public Key: Encrypts the message (shared with everyone).
Private Key: Decrypts the message (kept secret).

It relies on the mathematical difficulty of factoring the product of two large prime numbers.

Diffie-Hellman Key Exchange

Allows two parties to jointly establish a shared secret over an insecure channel. This secret key is then used to encrypt subsequent communications using a symmetric-key cipher.

By: Jawad Ahmed Nusrati
Happy Coding!

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.