DEV Community: Martin Edvardsen

What Scares Companies in 2025?

Martin Edvardsen — Tue, 29 Jul 2025 13:40:37 +0000

And just as importantly, what are we doing to prevent the threats and breaches that they fear? This article is written following my attendance at a security conference in Denmark.

The conference presented products targeted at companies seeking help securing their systems without the in-house knowledge to do so. It did not present the tools, knowledge, or frameworks for building such products, as my colleagues and I hoped. As a group of software developers, we therefore did not seem to be the intended crowd.

Nonetheless, it gave us an opportunity to observe, and put into perspective, what defines the vulnerabilities feared, and the services or products providing protection against them. Here are some of the key observations we made. These are observations from a conference and not facts, so keep that in mind.

Shadow IT becomes Shadow AI

In the industry, Shadow IT is by now a common term referring to any software or hardware used within an organisation without explicit approval or control by its IT department. For instance, using Google Drive, maybe even for sharing sensitive data, in an organisation, which has a controlled instance of OneDrive as the approved means of file storage and sharing.

A big subject of the conference was AI, specifically due to the skepticism towards AI intensifying dramatically in recent years. This has spawned a new branch called Shadow AI, which refers specifically to the unauthorised and uncontrolled use of AI tools, models, and platforms within organisations. Large language models (LLMs) like ChatGPT and Claude have become integral to many employees’ daily work, but uncontrolled usage risks data leakage, compliance violations, and biased or unvetted business decisions, that do not align with the direction or values of the company.

Countermeasures usually include the use of internal, controlled, and monitored LLMs. At the conference, the focus was on systems called LLM guardrails. They monitor and control the use of common LLM systems by constraining the behaviour of said systems to ensure safe, ethical, and reliable outputs. They focused on mitigating the most common approaches to abusing LLMs, based on the OWASP Top 10 for LLMs (https://owasp.org/www-project-top-10-for-large-language-model-applications/). More specifically, they focused on improving the security, privacy, safety, and relevancy of the LLMs, being the categories of said guardrails.

Micro-segmented Access Control

Another big focus for the conference was ZTA, Zero-Trust Architecture, where we no longer solely rely on a firewall as a strong perimeter keeping adversaries out, but rather assume that adversaries are already inside the network. This essentially means that you don’t trust anyone inside your network.

Given these circumstances, we need to monitor and limit movement inside the internal network instead. A great example is from the banking sector, where we see one of the beneficial uses of AI. To identify possible money laundering among their customers, they use AI to recognise any anomalies in the behaviour of the users, i.e. deviating from the regular patterns of that specific user, called behaviour-based detection. An example would be someone suddenly, but consistently, transferring a large amount of money to an offshore bank account every Monday of the week.

One of the countermeasures, among many, presented at the conference was a concept called micro-segmentation, which enforces strict access control between internal systems by segmenting workloads, applications, and individual processes instead of entire subnets or zones, which can apply to all levels of a system. This means that even though an adversary, or even malware, is inside one of your systems, it would be contained in that specific system and would not spread laterally to other groups.

An even more detailed approach was presented, where micro-segmentation was broken down into “outside the group” and “inside the group”, which referred to communication, policies and access control inside a group, for instance being an entire HR System, and outside of said group, being communication between said groups.

Social engineering still takes first place

But how does the adversary get into the systems in the first place? One possibility is human-induced breaches, which was the last major subject of the conference. In fact, they account for around 60% of all breaches, according to Verizon Data Breach Investigation report 2025 (https://www.verizon.com/business/resources/Tc61/reports/2025-dbir-data-breach-investigations-report.pdf).

In an ever-moving world, where rapidly evolving tools and systems constantly make it easier for us to reach our goals faster at the expense of security and privacy, we expose our systems and data to adversaries to an unprecedented degree.

At the conference, the countermeasures proposed for these breaches focused on adapting to the users, instead of restricting them. One solution proposed was to adapt to the primary productivity hub of employees working for it-based companies – the browser. Specifically, a chromium-based browser, which most people are used to, that enables complete control and monitoring, while also allowing access to private apps or SSH without the use of VPN. Furthermore, as it was browser-based, it could be used by any device, even unmanaged ones in a secure manner.

In conclusion, as trust is diminishing to the point of relying on architectures with “zero-trust”, the industry is gravitating towards adaptation over restriction. Furthermore, even though we are only seeing the tip of the iceberg in terms of how AI will affect and change the security and privacy of companies, we are constantly forced to reflect on and adapt to them.

Reach out to me for sources on the specific products mentioned and discussed in this article.

A Beginner's Guide to Falling in Love with Algorithms - Part 3: Differential Privacy

Martin Edvardsen — Fri, 10 Sep 2021 10:42:46 +0000

In this final blog post, I will get quite specific, as I try to convey the specifics of my thesis project, "Privacy-preserving Similarity Search", in which I designed and analyzed an algorithm - more specifically a differentially private one. You are not meant to understand everything in this blog post. It serves an inspirational purpose through a real world example of the process of designing an algorithm. However, if you have time for it, I encourage you to pause, anytime you read something, which you are not familiar with, and learn that concept or term before continuing.

Let’s start out with an introduction to a new class of algorithms, namely randomized algorithms, which our algorithm falls under.

Randomized Algorithms

Randomized algorithms are based on a bunch of probabilistic theories, which we will not get into. True randomness is also left for another discussion. My goal is simply to demonstrate the capabilities and advantages of introducing randomness in the design of an algorithm, allowing for random decisions when processing a given input. As opposed to the deterministic algorithms we have seen so far, neither the same output nor running time can be guaranteed for the same input to a randomized algorithm.

So why do we strengthen the underlying model of our algorithm, when we introduce randomness? It differs from algorithm to algorithm. Oftentimes the strength lies in a conceptually simpler algorithm, where less internal state or memory is required, or in an alleviation of certain worst-case scenarios. Let us look at an example of the latter.

For the sake of the example, we will look into the quicksort algorithm, as the approach of the algorithm is easily conceptualized.
The algorithm is a divide and conquer algorithm, in which a pivot in a list is initially chosen - typically the first, middle, or last element. Then it divides the remaining elements into two sublists based on whether the elements are smaller (illustrated by the green array) or larger (illustrated by the blue array) than the pivot. Then The same process is then repeated for each sub array (recursively), until the entire array is sorted. This is visualized both during and after sorting for two different elements of two different lists.

In quicksort, randomization is simply introduced by choosing the pivot at random. Why is this an advantage? First, I encourage you to prove that the optimal scenario for the quicksort algorithm is one, where the algorithm repeatedly divides the array into two equal-sized subarrays.

Now, Imagine the following worst-case scenario. Consider an implementation of the algorithm, in which the first element is always chosen as the pivot, as well as an array which is sorted in decreasing order as depicted in the visualization. Given an array of size n (here n = 30), the algorithm partitions the array into one subarray of size n-1, which is the exact opposite of the optimal partition of the array.

Introducing randomness in picking the pivot, we could alleviate certain worst-case scenarios like these, which actually improves the expected or average time complexity. However, even though we improve the average-time complexity, we still cannot improve the worst-case complexity, which is still O(n²), as the random pivot could still repeatedly be chosen as the first element in the example above.
I strongly encourage you to watch the following series of animations, to visualize the algorithm as well as the randomization hereof: https://www.chrislaux.com/quicksort.html.

Differential Privacy

The goal of differential privacy is to permit statistical analysis of a database or population, without compromising the privacy of the individuals who partake. More specifically, it strives towards ensuring that the same conclusion is reached independently of the presence or absence of any individual in said population. Hereby, an individual can partake in a statistical analysis without incurring any risk of compromising the privacy of said individual.

So why did we introduce randomized algorithms? Differential privacy is not an algorithm, but a definition of privacy tailored for private data release. The algorithms, which fall under this definition, utilizes randomized algorithms, as it will provide privacy by introducing randomness.

Differential privacy exists in both a localized and a centralized manner. We will only look into centralized differential privacy, in which a centralized trusted curator ensures the privacy of the individuals by applying a differentially private algorithm to the conclusions of any statistical query on the population.

Let us look at an example of a very simple differentially private randomized algorithm to conceptualize these guarantees, and to see how randomization can help us in complying with these guarantees.

In the algorithm randomized response, participants are asked whether or not they have some property p (e.g. diabetes). They are then asked to answer using the following approach,

Flip a coin.

If tails, then respond truthfully.

If heads, then flip a second coin and respond “yes”, if heads, or “no”, if tails.

As Dwork and Roth ([16], p. 30) states, the privacy of this algorithm originates from the randomness and the plausible deniability of any response, as an answer of “yes”, can be either truthful, or the result of two flips of heads, with a probability of ¼, independently of whether the participant has property p or not. Try to convince yourself, that this, as well as the opposite case (i.e. "no"), is true.

It becomes increasingly harder to guarantee the same level of privacy, as the number of participants decrease, seeing as the dependency of each individual on the conclusion of a statistical analysis inherently increases. Intuitively, and in a simplified manner, one would then imagine that the smaller the population is, the more noise should be added to any statistical query over said population, to ensure the same level of privacy.

To determine the amount of noise needed to maintain the same level of privacy, we introduce an important parameter in differential privacy. The parameter epsilon, ε, captures the probability of reaching the same conclusion, where a smaller ε would yield better privacy, but less accurate conclusions. This parameter is called privacy loss and is interesting, as it makes privacy a quantifiable measure.
An algorithm is said to be ε-differentially private, if for any pair of datasets that differ by a single element or individual, the probability of reaching the same conclusion on the two datasets differs by no more than e^ε.

Let us end our introduction of differential privacy by looking at some of its many qualitative properties.
First and most importantly, it enables a quantification of privacy, by the privacy loss parameter, ε, which is notable, as privacy is often considered as a binary concept.
Second, it neutralizes any linkage attacks based on past, present, or even future data. In a linkage attack, one tries to de-anonymize individuals by comparing different datasets, most recently known from the Netflix Prize, https://en.wikipedia.org/wiki/Netflix_Prize.
Third and lastly, it enables composition of the previously mentioned privacy loss. This means that we are able to extend our quantification of privacy to the privacy loss of multiple queries in a cumulative manner.

Given this very brief introduction to randomized algorithms and differential privacy, we will dive right into the design and analysis of the differentially private algorithm, which I developed in my thesis project.

Privacy-preserving Similarity Search

We have now finally reached our real world example of the design and, to some extent analysis, of an algorithm. Let's get into our differentially private algorithm. This chapter is very technical and uses theory, which is not previously introduced. I hope it will still give you a valuable perspective on what is possible in the world of algorithms.

In my thesis project, entitled "Privacy-preserving Similarity Search", my thesis partner and I proposed and implemented a private data release mechanism (an algorithm), for applying differential privacy in a similarity search setting. We then evaluated the quality of the mechanism for nearest neighbor search on private datasets by comparing it with another mechanism developed by Dhaliwal et al. (https://arxiv.org/pdf/1901.09858.pdf).
An understanding of similarity search, and nearest neighbor search in particular, is not necessary, as it merely served as an exemplary domain for our mechanism.

We will only go into the details with the design of the algorithm, as the complexity of the analysis exceeds the purpose of this blogpost. However, in our report, which I will gladly email to anyone interested, we do prove that our algorithm is (ε, δ)-differentially private, that the mechanism is an unbiased estimator maintaining distances in expectation, and present guarantees for the precision of the estimation by means of Chebyshev bounds.

Design considerations

Let’s get into the design and implementation of the algorithm. Differential privacy algorithms usually consist of two stages - first the original data is transformed by reducing the dimensionality, then, a certain amount of noise is added to make the data private. In a centralized setting, where a trusted curator handles queries on the population, the steps can be depicted as seen in the visualization.

Before introducing the algorithm, I will introduce one final piece of theory, which we use in the first stage of our algorithm.

We want to reduce the dimensionality of our data. Let’s imagine that our data consists of a matrix, where each row represents a participant and each column represents some answer or characteristic of said participant.
To transform said data, we adapted the SimHash algorithm (https://www.cs.princeton.edu/courses/archive/spr04/cos598B/bib/CharikarEstim.pdf). Our main reason for choosing this algorithm is its ability to preserve distances (in terms of similarity search domain) despite projecting the data into a binary format, which makes it less spatially expensive.
The algorithm falls under the Locality Sensitive Hashing framework, which is based on the principle that if two data points are close, based on some similarity function, then their respective hash values are also close. This means, as opposed to conventional hashing, that collisions for similar items are maximized, which is certainly favorable in a similarity search setting.

The algorithm is based upon a collection of random d-dimensional vectors, r in R^d, where each coordinate is drawn from a Gaussian distribution. Given a random vector, r, and some vector, u, of the original dataset, the hash functions of the algorithm is defined as follows,

Therefore, using the dot product between the random vector, r, and the vector, u, from the original dataset, it is encoded whether u falls on one side (1) or the other (0) of the random vector, r. By increasing the number of hash functions we apply to the vector, we can decrease the probability of collision of hash values of dissimilar vectors. Again, this got really technical, but try to convince yourself of this using the illustration below, where we have applied two hash functions, based on two random vectors, r1 and r2, to three different vectors of the original dataset for comparison.

Designing the algorithm

It is finally time to introduce the algorithm. I will start by formally introducing our SimRand algorithm, thereafter I will explain the algorithm, step by step.

First, let's go through the input and output of the algorithm. The input of the algorithm consists of an original dataset X, a matrix with n rows and d columns, a target dimensionality k for our dimensionality reduction, epsilon and delta values representing the privacy loss, and finally a alpha value representing an upper bound on the angle between two neighboring vectors (or databases) - neighboring databases will be defined in terms of the sensitivity of a query shortly). The output of the algorithm is a differentially private dataset Z.

Let’s go through the algorithm step by step. We will use an example dataset consisting of a single vector to explain the steps of the algorithm.

In the first step, we create projection matrix, R, which is effectively used to reduce the dimensionality (i.e. the number of columns) of our original dataset from d to k, in our case from 8 to 6 columns.

In the second step, we reduce the dimensionality of our original dataset, creating a new matrix, Y, by matrix multiplication between the projection matrix, R, and the original matrix, X, which in our case is simplified to a vector (i.e. one row). Effectively, each coordinate of Y represents the dot product between the vector of X and the columns of R. Our new reduced matrix Y is now a vector of 6 columns.

In the third step, we effectively transform the matrix, Y, into a binary matrix, ^Y, based on the sign of the dot product between each coordinate of the matrices X and R in Y. Each coordinate is represented by a 1 if the angle between the vectors is less than 90 degrees.

We have now reduced the dimensionality of the original dataset and projected it into a binary format. Now, we need to add noise to make the data private. But how much noise should we add? As discussed early in our introduction, more noise means more privacy, but also lower accuracy.

It turns out that the amount of noise added, depends on the sensitivity of the function or query, which we want to apply to our dataset. The sensitivity of a function represents the largest possible difference that one row can have on the result of that function or query, for any dataset, which by definition of differential privacy is exactly what we are trying to conceal. The sensitivity of a query, which simply counts the number of rows, or participants, in a dataset has a sensitivity of 1, as adding or removing a single row from any dataset will change the count by at most 1. It turns out, that the sensitivity also defines neighboring database - in our example if two databases differ in no more than one row.

In our fourth step, we calculate the sensitivity K, i.e. the maximum number of places, where two vectors differ, which intuitively can be no larger than the dimensionality of the two vectors.

In our fifth step, using K, we can now calculate the probability p that two coordinates collide. This will also serve as the probability that we will flip a bit of our projected matrix, ^Y, which will effectively conceal the sensitivity K, as the probability is based upon it.

In our sixth step, we create a matrix delta (represented by a uppercase delta), which consists of numbers drawn uniformly from [0, 1).

In our seventh step, we create another matrix delta^, where each coordinate is 1 an all places, where delta < p, which effectively means, that it is 1 with a probability of p.

The bitwise XOR operator (exclusive or) only works on bit patterns of equal length and results in 1 if only one of the compared bits are 1, but will be 0 if both are 0 or both are 1.

In our eighth step, we can produce the matrix Z, in which we ‘exclusive or’ our projected matrix ^Y with our new matrix delta^, which effectively flips the bits of our projected matrix with probability p.

Finally, we can output our differentially private matrix, or vector in this case, Z.

This was a very technical, but also very minimalistic, introduction to the design of a differentially private algorithm. If you would like to read my entire thesis, you can reach out to me at mart.edva@gmail.com.

Finally, I hope that I’ve inspired some of you to pursue the algorithmic path throughout your education, which truly was my humble goal for this series of blog posts. I look forward to your feedback and a lively discussion in the comments!

A Beginner's Guide to Falling in Love with Algorithms - Part 2: Algorithm Design

Martin Edvardsen — Fri, 03 Sep 2021 10:46:15 +0000

With a backpack filled with the fundamentals of algorithmic complexity, we are ready to dive into the topic of this blogpost - algorithm design. Furthermore, using our newly acquired knowledge, we are able to analyze the computational tractability of these algorithms.

An algorithm that runs in polynomial time for an input size n is said to be tractable. If you are unfamiliar with polynomial time, I suggest you pause and read up on this before continuing. We categorize problems, which are solvable in polynomial time, as P problems. Conversely, an algorithm that runs in exponential time or worse for an input size n is said to be intractable. We categorize problems, for which we have not yet found a tractable polynomial time algorithm, as NP problems, i.e. non-polynomial meaning they are not solvable in polynomial time. We will not be covering NP problems in this particular series of posts.

Algorithm design consists of two parts, the design and analysis of an algorithm. In the design of the algorithm, we categorize the problem in accordance with the family of algorithms, for which the optimal algorithm for solving said problem is to be found. Then, by inspiration of algorithms from this family, we design an algorithm, which solves the problem at hand.

In the analysis of the algorithm, we start by determining the running time of the algorithm, with which we can determine the computational tractability of the algorithm, to quickly discard an intractable algorithm. Then, we determine the correctness of the algorithm, which is asserted by proving that it always produces the optimal and correct solution to the problem.

Let's see some examples of both the design and analysis of some algorithms. Through these examples, we will learn about three of said families of algorithms, albeit many more exist for you to discover.

Greedy Algorithms

An algorithm is known as greedy, if it builds up a solution in small steps, by choosing the locally optimal solution at each step, based on some criterion, towards a solution. Often, this success criterion is based on the structure of the problem, which we will see in the forthcoming example.

For the analysis of greedy algorithms, a common approach is to establish that the greedy algorithm stays ahead. In other words, at each step, one establishes that no other algorithm can do better, until it finally produces the optimal solution.

Interval Scheduling - Designing a Greedy Algorithm

For this algorithm, we will prove its correctness by establishing that the greedy algorithm stays ahead. Now let’s get into a problem, for which we will go through all the steps in the design of an algorithm that solves this problem. We start by presenting the problem:

We have a set of appointments, {1, 2,..., n}, where the i^th appointment corresponds to an interval spanning from a starting time, call it si, to a finishing time, call it fi. We say that a subset of the intervals (a collection of said intervals) are compatible , if no two of the intervals overlap in time. Our goal is to accept as large a compatible subset as possible.

In other words, an optimal solution to our problem is one that always picks the largest number of appointments, for which none of them overlap in their starting and finishing time. Give the definition another glance, if you are not completely sure yet, which problem we are trying to solve.

Designing the algorithm

Finally, we are ready to design an algorithm. The process of doing so is often governed by the definition of a set of rules, which ultimately defines the algorithm. Therefore, it is often very helpful to start out by specifying and concretizing these, before getting into the specifics of the algorithm.

In our algorithm for interval scheduling, naturally, we need to start by selecting our first interval, call it i1, and discarding all intervals, which are not compatible with said interval i1 (remember our definition of compatible intervals). We then continue in this manner until we run out of intervals, which leaves us with a solution. This must mean that the optimality of said solution is governed by the rule, which we use to select said intervals. Great, so using our definition of the problem, we brainstorm ideas for a definition of a rule for selecting the "right" intervals, which will lead to the optimal solution.

An obvious contender for a rule, which comes to mind, is to select the interval that starts the earliest, i.e. the one with minimal starting time, si, which is still available, hereby using up intervals as soon as possible. This rule does not yield the optimal solution, and I encourage you to figure out why before moving on. In fact, I encourage you to do so at every step of our brainstorm.
We depict a counterexample, where the earliest interval spans multiple intervals and also has the latest finishing time, in which case more intervals could optimally be satisfied.

Another rule, which then comes to mind, is to select the smallest interval, i.e. where fi - si is the smallest. Again, this rule does not yield the optimal solution. We depict a counterexample, where picking the smallest interval satisfies a suboptimal amount of intervals.

Before presenting the rule, which does yield the optimal solution, I encourage you to stop reading now, and to come up with other alternatives.
It turns out that the optimal solution in fact relies on the finishing time. Choosing the interval, which finishes first, i.e. the one with the minimal finishing time, fi, will always yield the optimal solution. Intuitively, this makes sense, as we free up our resources as soon as possible. Again, I encourage you to convince yourself of the validity of this. If it helps you, you can use the below visualization for this approach.

With this in mind, we are ready to state our algorithm. We do so in pseudocode, which uses structural conventions common in most programming languages, but is otherwise a plain English description of the steps of the algorithm (Hint: Everything between the keywords While and EndWhile is repeated until the statement after the keyword While is true).

Let I be the set of all intervals.
Let A be the set of accepted intervals, which is initially empty
While I is not empty
    Choose interval i from I with the smallest finishing time
    Add interval i to A
    Remove all intervals not compatible with i from I
EndWhile
Return the set of all accepted intervals, A.

Analyzing the algorithm

In an analysis of an algorithm, one proves the correctness of the algorithm and determines the running time of said algorithm. We start with the proof of correctness.

First, by the steps of the algorithm, we can declare that the intervals of A are compatible, as we discard all non-compatible intervals, when we add an interval to A. Second, we must show that the solution is optimal, which is proven by induction. We will not get into the proof, but I will quickly describe the general idea. Let's consider theta, which represents an optimal solution, i.e. the set with the maximum amount of compatible intervals for a given collection of intervals. As many optimal solutions may exist, i.e. multiple combinations of compatible intervals, which would be of maximum size for a given collection, we should not prove that our solution A is equivalent to theta, i.e. A = theta. We should rather simply prove that the size of both sets are equal, i.e. |A| = |theta|. We do so by proving, that for all indices the following holds for the finishing times,

f(ir) <= f(jr)

where ir and jr are the r^th accepted intervals of A and theta, respectively. The proof by induction start by realizing that

f(i1) <= f(j1)

by the greedy rule of our algorithm, which states that we start by picking the interval with the earliest finishing time. That was quite a lot of theory and proofs of correctness.

We now move on to the running time. We can show, that our algorithm runs in time O(n log n), which we learned is linearithmic time in the previous blog post (part 1). Let us learn why this is the case by considering an implementation of said algorithm. This will get quite technical, but I merely want to give you an idea of the process.

We start by sorting all intervals by their finishing time, fi, starting from the earliest, which takes O(n log n) time. Then, we construct an array S, which contains the value si of interval i at position S[i] in the array, which takes O(n) time.

Then, using our sorted list, we select intervals in order of increasing finishing time, fi, which starts by choosing the first interval. Then, we iterate the intervals until reaching the first interval, call it j, where sj >= fi, i.e. which starts at or after the finishing time of the selected interval. We find the starting time of the iterated intervals using the array S. We continue iterating subsequent intervals, until no more intervals exist, where sj >= fi. This process takes O(n) time, which leaves us with a total running time of O(n log(n)) + O(n) + O(n) = O(n log(n)).

Dynamic Programming

In a way, this family of algorithms represents the opposition to greedy algorithms, as it entails exploring all possible solutions. In dynamic programming, we solve a problem by decomposing it into a set of smaller subproblems, and then incrementally composing the solutions to these into solutions to larger problems. Importantly, to avoid computing the result of the same subproblem multiple times, we store or memoize results in order to avoid these unnecessary computations. Note, that another technique called tabulation also exists, which will not be covered.

Before getting into dynamic programming, it is essential that we understand the concept of recursion, as dynamic programming is simply somewhat of an optimization of this using memoization.

Recursion means to define a problem in terms of itself, which in programming relates to the process of a function calling itself. In such a process, the solution to one or more base cases are defined, from which the solution to larger problems are recursively defined. This might seem as pure gibberish, but fear not, we will learn by example momentarily. It is important to note that an iterative approach to dynamic programming also exists, which will not be covered.

Fibonacci Sequence

Let’s define the Fibonacci sequence, which will serve as our example problem for a recursive algorithm:

A series of numbers, where each number is the sum of the two preceding numbers, i.e. 1, 1, 2, 3, 5, 8, 13, 21...

Based on this definition, we define a recursive algorithm for finding the n^th Fibonacci number.

def fib(n)
    if n == 1 or n == 2 # (1) base case
        return 1
    else 
        return fib(n - 2) + fib(n - 1) # (2) recursive step

In figure XX, we’ve build a tree, which represent the recursive function calls of the above algorithm, when looking for the 4^th Fibonacci number, i.e. fib(4). Using this tree, we’ll go through one of the branches of the recursive algorithm.

Initially, we call the function with n = 4, which means we go down the else-branch of our algorithm, in which we return the addition of the result of two new calls to the same function, i.e. our recursive step (step 2). To figure out the result of these two function calls, we look at each function call individually, which is represented by the second layer of our tree.

Let’s focus at the right side of the tree, represented by fib(3). We see the same behaviour, which results in a third layer. Looking at the two function calls in the third layer, we notice that both of these go down the if-branch of our algorithm, i.e. our base case for which we return 1.

We now know the solution to fib(1) and fib(2), which we can return to the function call in layer 2. This in turn means that we return the solution to fib(3) = fib(1) + fib(2) = 2, i.e. the solutions to the smallest problems, at the bottom of the tree, cascade to the top, giving us the solution to our initial function call.

Designing the algorithm

If we look at the recursion tree of the previous implementation, we clearly see that the result of the same subproblem is calculated multiple times, which will results in an exponential-time algorithm. To exemplify the power of memoizing the result (caching), we will now see, how we can reduce a exponential-time algorithm, for finding the n^th Fibonacci number, to a linear-time one.

Now, instead of computing the same result multiple times, we design an algorithm, in which we memoize the result of previous subproblems, which we can then reuse. A dynamic programming algorithm is designed in three steps:

Define the subproblems in terms of the input.
Identify the base case(s).
Define the recursive step.

Let’s use these steps to define our algorithm. First, the subproblems comes naturally from the definition of the Fibonacci sequence, as the n^th Fibonacci number is based on the two preceding numbers.

Second, to make the Fibonacci sequence work, we need to define the first two numbers, as the n^th Fibonacci sequence is based on the two preceding numbers - we define fib(1) = 1 and fib(2) = 1.

Finally, based on the definition of our subproblems, our recursive step is defined by the decomposition of the n^th Fibonacci number, fib(n), i.e. fib(n) = fib(n - 2) + fib(n - 1), which we memoize to avoid computing the same result multiple times.

We now define our algorithm based on our initial design process:

cache = [] # (1) cache for memoization

def fib(n)      
    if n in cache:
        return cache[n] # (2) reuse old memoized/cached computation
    else:
        if n = 1 or n = 2: # (3) base case
            cache[n] = 1 # (4.1) memoize result
        else:
            cache[n] = fib(n - 2) + fib(n - 1) # (4.2) memoize result

    return cache[n]

Analyzing the algorithm

Using this approach, we never recompute any result in our recursion tree, but simply reuse the memoized result from the cache (step 2). As we memoize any result, which is not present in the cache (step 4), we do O(n) computations, which intuitively gives us a linear-time algorithm.

Divide and Conquer

We move on to the final family of algorithms for this blogpost, which closely resembles those of dynamic programming with some key differences. Despite being a well-known term, divide and conquer algorithms can be quite tricky to grasp, as they are, like most dynamic programming algorithms, also based heavily on recursion.

The divide and conquer family comprises a set of algorithms, in which the input is divided up into smaller sets of subproblems, each subproblem is conquered or solved recursively, and finally the solutions are combined into an overall solution.

The distinction between dynamic programming and divide and conquer is very subtle, as dynamic programming can be seen as an extension of divide and conquer. Both approaches work by solving subproblems recursively and combining these into a solution.
Dynamic programming algorithms exhaustively check all possible solutions, but importantly only solve the same subproblem once. The solution to a subproblem can then be reused in other overlapping subproblems using memoization or tabulation - the extension of the divide and conquer technique.
Divide and conquer algorithms do not reuse the solution of subproblems in other subproblems, as the subproblems are independent given the initial non-overlapping division of the problem.

Mergesort

The most famous subfamily of divide and conquer algorithms is definitely the sorting algorithms. In our example, we will present the mergesort algorithm, in which we will focus on the design of the algorithm. The analysis of divide and conquer algorithms include recurrence relations, which is outside the scope of our analysis.

Designing the algorithm

We wish to design an algorithm, which sorts a list A. The design of divide and conquer algorithms consists of defining three steps:

Divide the problem into subproblems
Conquer each subproblem by solving them recursively
Combine the result of each subproblem into a final solution

First, we define our divide step. Given an index m, representing the middle of the list, we divide our list into two smaller, equal-sized sublists. Second, in our conquer step, we recursively call the mergesort algorithm for our two new sublists. Finally, we merge the two sublists. Let’s see an implementation of this algorithm, which we can use as a reference.

def mergesort(l):
    if len(l) == 1:
        return l # base case
    mid = len(l) // 2
    left = l[:mid] # divide
    right = l[mid:] # divide
    a, b = mergesort(left), mergesort(right) # conquer
    return merge(a, b) # combine

def merge(a, b): # combine function
    l = list()
    i, j = 0, 0
    while i < len(a) and j < len(b): # 1. still elements in both lists
        if a[i] < b[j]:
            l.append(a[i])
            i += 1
        else:
            l.append(b[j])
            j += 1

    while i < len(a): # 2. still elements in a
        l.append(a[i])
        i += 1
    while j < len(b): # 3. still elements in b
        l.append(b[j])
        j += 1

    return l

Just like in dynamic programming, we continue in a recursive manner, until we reach our base case. In mergesort our base case is a list of length one that we cannot divide any further.

As we hit the base case, our two recursive mergesort() functions (line 7, conquer step) can now return two sorted lists, as they contain only one element. We will call these two lists a and b. Then, we pass these two lists to our merge() function, to merge the two lists (line 8).

Let’s visualize this with an example. First we divide the list by calling the mergesort() algorithm recursively with the two halves of the list, until we reach our base case. We focus on the left branch, as the process is similar for both branches.

We’ve reached the base case for our recursion. Given our base case, in which the two lists contain only one element, we can now return two sorted lists to the previous recursive mergesort() function call, setting the two lists, a and b.

a, b = mergesort(left), mergesort(right)

Given these, we can call our merge() function, to sort the two lists (we will get into this function after the example).

We can now return this list to the recursive function above (or before) this function, which we will use to merge the list [27,38], with the sorted list from the other branch [21, 42], which eventually will give us our final merged, sorted list [21, 27, 38, 42].

In our merge step, we need to merge our two lists, which importantly are already sorted, into a combined, sorted list. We use two pointers, i for the list a and j for the list b, which helps us keep track of our progress in the two lists, in terms of adding elements to the new list.

In our merge step we proceed based on three different rules, which defines our algorithm, until all elements of lists a and b are present in the combined list.

If there are still elements left to be added from both lists, i.e. i < len(a) and j < len(b).

We check, which of the two elements at the index of our pointers are the smallest, and add that element to the combined list, as per this step of the merge function,

while i < len(a) and j < len(b): # 1. still elements in both lists
        if a[i] < b[j]:
            l.append(a[i])
            i += 1
        else:
            l.append(b[j])
            j += 1

Visualised as follows,

Where a[i] < b[j] is true, which is why we add 1 as the first element to the combined list.

If there are only elements left to be added for list a, in which case j < len(b) is no longer true, as j > len(b).

In this case we can simply add the remaining elements of list a to the combined list, as we know, that no element is smaller in b, given that b is empty.

while i < len(a): # 2. still elements in a
    l.append(a[i])
    i += 1

This is visualised as follows,

In which case we can simply add the elements 10 and 12 to the combined list.

If there are only elements left to be added for list b, in which case i < len(a) is no longer true, as i > len(a).

The process is equivalent to the latter, which is represented with the following code.

while j < len(b): # 3. still elements in b
    l.append(b[j])
    j += 1

When none of these three rules apply, our combined list is complete, and we can return it.

Analyzing the algorithm

We will not get into much detail in this analysis, but simply realize why this is a linearithmic algorithm, and why it is superior to the most intuitive way of sorting a set of numbers. Let’s look at an example of this for comparison.

Most people actually use an algorithm called selection sort, when they are to sort a set of playing cards. Given an list of numbers, a, of size n, where each number represents the value of each card in your hand, the algorithm works as follows,

Iterate over the list a[1] to a[n], and for each element a[i],

Compare the element to its predecessors, i.e. elements in the range a[1] to a[i-1].

Move all elements greater than a[i] up one position, essentially moving them to the right of a[i].

We give an example of a step using this algorithm, where n = 8, i = 6, and a[i] = 5, for which two elements are to be moved, to correctly position the element a[i].

If we look at a couple of steps of this algorithm,

It is clear, that at each step, we compare the element a[i] with one more element, until we finally compare the algorithm with all but one element, i.e. n - 1 elements. This means, that in average, we compare each element with n/2 elements, which gives us a total of n * n/2 comparisons. In asymptotic time, this means O(n²) comparisons, i.e. a quadratic running time. From the first blog post, we know that this is not a favorable running time.

Let’s return to mergesort and realize why this is a linearithmic time algorithm, i.e. superior to an algorithm of quadratic time (recall using https://www.bigocheatsheet.com/).

First, recall that x in log2(n) = x, represents the number of times we can half our input size. Therefore, we can conclude, that the divide step of our algorithm divides our list into n single-element lists in O(log(n)) time.

Second, we convince ourselves, that recombining the list in the merge step of our algorithm takes O(n) time. Now, recall that our mergesort algorithm simply iterates over the two lists to combine them. In the worst case, the two lists are of length n/2, in which case it takes a total of O(n) time to iterate and combine the two lists.

Now, as we divided the initial list a total of O(log(n)) times, we must recursively merge that many lists to reach the final, sorted list. Therefore, the overall running time must be O(n * log(n)). Again, this might not be very intuitive, but in reality, the process of the analysis should be the real takeaway here.

What's to come?

We are still missing a very important family of algorithms, which we briefly mentioned in the start of this blogpost - NP algorithms. These are however deemed outside the scope of this blogpost, as it is simply not possible to cover this subject in less than a full blog post - we will leave this for another blog post.

The next and final blog post will go into depth with a specific family of algorithms - randomized algorithms. However, the focus of the blog post will be based on differential privacy, which utilizes randomized algorithms, and my thesis about this topic. In differential privacy one attempts to publicly share information about a population, without compromising the privacy of the individuals of said population. Tune in for the third and final blog post of this series for more on this.

References

https://jeffe.cs.illinois.edu/teaching/algorithms/book/04-greedy.pdf
Algorithm Design, Kleinberg and Tardos

A Beginner's Guide to Falling in Love with Algorithms - Part 1: Introduction

Martin Edvardsen — Fri, 27 Aug 2021 11:04:31 +0000

The purpose of this series of blog posts is to demystify and conceptualize algorithms - more specifically the design and analysis hereof. These blog posts are targeted for software development students early in their studies, as well as students from similar lines of study.

However, I will also continuously relate the concepts to examples in laymans terms, to include people with no related background, but merely with a desire to learn more about algorithms. A certain level of mathematical understanding is however a prerequisite.

Some would argue that an understanding of algorithms is not needed to become a programmer. An understanding of algorithms will separate you from these programmers, by allowing you to quantify and compare different approaches to a problem. It will give you a broader vocabulary, or even language, with which you can rightfully reason for the superiority of a solution, or algorithm, for a given problem.

Most importantly, in terms of everyday programming, it will give you a greater understanding of data structures, which are an essential part of writing efficient code. Are you performing lookups in an array of reference type objects? Convert it to a data structure, which supports constant lookups, such as hash tables, instead of a linear scan. Are you using an array-based list for FIFO operations? Array-based lists mean linear time pop operations. Convert it to a FIFO-based data structure, such as a queue, which has constant time append and pop.
Examples like these show you why you need to familiarize yourself with data structures. Do so using what you learn from these blog posts, and you should see a dramatic improvement in the efficiency of your code.

This specific blog post will also give you the needed prerequisites for the coming blog posts, in which we will get into the actual design of algorithms. Let's get to it!

Algorithmic complexity

Complexity is the grounds upon which we discuss algorithms, namely in terms of space and time. Intuitively, we often determine the running time of an algorithm based on the input size. We will get back to this, as we familiarize ourselves with different running times. First off, we need to get acquainted with asymptotic notation. It gives us a way of expressing the growth rate of a function, or algorithm, in which we focus on the important terms of the function.

Take a simple quadratic equation, (an² + bn + c), i.e.

15n² + 10n + 5

in which n defines the size of the input. The running time of this function is bounded by the fastest growing term of this function, which is n². As the input size n grows, where a > 0, the term n² will eventually always exceed the size of the other terms and constants, i.e. bn + c in this case.

Using asymptotic notation, we express this by removing coefficients and inferior terms, which leaves us with an approximation of the growth rate of an algorithm. In asymptotic notation, we refer to O(⋅), 'big O'-notation, which expresses an upper bound on an algorithm, i.e. the worst case scenario, and Ω(⋅), 'big theta'-notation, which expresses a lower bound on an algorithm, i.e. the best case scenarios. In this series of blog posts, we will only focus on the upper bound.

Asymptotic running times

Again, let's talk about prerequisites. Complexity refers to both running time and space consumption, however, we will focus on running times, as it is more approachable and intuitive than space complexity.

The examples of this post will be based on both a list of whole numbers, i.e. an integer array, with a thousand elements and a group of people, for different levels of complexity. Furthermore, for the sake of simplicity and understanding, we assume that one operation takes one second. For our group of people, a question to a member of the group could correspond to an operation. Finally, we will refer to the below graph from http://www.bigocheatsheet.com for a visual comparison of the running times.

O(1). Constant Time

This is referred to as constant time, as it requires a constant amount of operations. The bigocheatsheet graph refers to this as an 'excellent' running time. Furthermore, an important point here is that the algorithm is independent of the input size, which means the running time does not increase, as the input increases.

An example of a constant time algorithm is finding the value of an element of our integer array. Given an integer array with a thousand elements, accessing a single element at a certain index is constant, as it requires a single lookup, which corresponds to a single second.

This example corresponds to finding the age of a specific person from a group, regardless of the size, which only requires a single question to that very person. Simple, right? Moving on.

O(n). Linear Time

This is referred to as linear time, as it is linearly dependent on the size of the input. The bigocheatsheet graph refers to this as a 'fair' running time. As the size of your input increases, so does your number of operations, and therefore your running time. However, remember that due to our 'big O'-notation, the running time of a linear time algorithm increases at most at the same rate, as the number of elements of the input - for simplicity our example will use every element of the input.

An example of a linear time algorithm is finding the minimum value of an unsorted integer array. Given an integer array with a thousand elements, finding the minimum value requires iterating through the entire list, which requires n operations, which in our case corresponds to a thousand seconds. A dramatic increase in running time, compared to constant time, O(1), with no increase in the input size.

This example corresponds to finding the youngest person in a group of people, which requires asking every single person of said group, before you are able to conclude, which person is the youngest.

O(n²). Quadratic Time and O(n log(n)). Linearithmic Time

We now increase the theoretical complexity considerably. We assess two running times, which are often referred to in job interviews in combination - quadratic (polynomial) time and linearithmic time algorithms, respectively. Interviewers will often pose problems, which seem to have an obvious, intuitive solution in quadratic time, but are often solvable in at least linearithmic time - a dramatic improvement in running time.

With O(n²), we look at each element a constant amount of times, for each other element, at most, which is a considerable increase in operations, compared to linear time, where we only look at each element a constant amount of times. This is a dramatic increase in running time, which is also apparent in the bigocheatsheet graph, where we've moved from the yellow (fair), into the red (horrible). In a job interview, such a running time should be your cue to look for a better solution.

An example of a quadratic time algorithm is finding all pairs in our integer array. Given an integer array of a thousand elements, one would find all pairs by iterating over the entire collection of size n once for each of the n elements. This corresponds to n² operations, which in the case of the integer array corresponds to a million seconds. A nested loop, in which both the inner and outer loop iterates the same list, is a classic example of a quadratic time algorithm.

This example translates directly to our group of people, in which we try to find all pairs of people. It should be noted that pairing up with oneself would not be a valid pair, but it is irrelevant for the matter of the example.

For O(n log n), first we note, that we typically refer to log2(n), as we discuss logarithms in algorithms. The graph places this between linear and quadratic time, at a 'bad' running time. If needed, please do refresh your memory on logarithms - the kryptonite of exponentials - before continuing with the next blogpost. The most common linearithmic time algorithms are definitely sorting algorithms. Replacing a quadratic time sorting algorithm with one of linearithmic time is an example of said improvement of running time.

An example of a linearithmic time algorithm is mergesort. In short, given our integer array with a thousand elements, one would compare pairwise consecutive integers

((e1, e2), (e3, e4), (e5, e6) ...)

and then merge exponentially larger sets of integers until finally combining the two halves of the array. Don't worry about the details of the algorithm - we'll get into the algorithm in the next blogpost.

This algorithm and example would also be applicable for sorting a group of people by age, but quite hard to explain in layman's terms. For a more hands-on example, however, I strongly recommend this clip, https://www.youtube.com/watch?v=XaqR3G_NVoo, in which Sapientia University visualizes the mergesort algorithm by means of Transylvanian-saxon folk dance. If nothing, just do it for the laugh of it.

The important thing to notice here is the dramatic improvement in running time. A quadratic time algorithm would sort an array in n² operations, i.e. a million operations or seconds in our case; the linearithmic algorithm only requires n log n operations, i.e. ten thousand operations or seconds. This is a reduction from eleven days to two hours!

O(log(n)) Logarithmic Time

We refer to this as logarithmic time. Returning to the bigocheatsheet graph, we finish back where we started, an 'excellent' running time, even though the running time is dependent on the input size, n.

An example of a logarithmic time algorithm is binary search for ordered collections. Given our integer array, you search for an integer i of the array, by continuously looking at the element at the center of the array, and doing one of three things. If the value of the middle element m:

Is the integer i, we have found the integer.
Is greater than the integer i, we repeat the process for the lower half of the array.
Is less than the integer i, we repeat the process for the upper half of the array.

It should be noted, that other strategies exist for picking the pivot other than choosing the middle element. This is irrelevant for the purpose of this example. It is important to notice, that as a property of logarithms, we need to do this no more than log2(n) times, as this would leave us with a single element. To figure out, why this is true, try to prove to yourself, that x in log2(n) = x, represents the number of times, we can half our input size, and that n in 2^x = n, represents the size of an array you get by doubling x times.

This example corresponds to having a thousand people line up in order of their salary, and looking for the person closest to a ridiculously specific yearly salary, such as 359344,297 DKK.

By following the three steps above, you only need to ask a maximum of log2(n) + 1 people, in this case eleven, to come to a conclusion. Note, that the last question (i.e. the + 1) comes from asking the last person to compare him with his "neighbours".

Again, the important thing to notice is how logarithms scale. If we increase the number of people to a million people, the amount of needed questions only increases to twenty. This explains, why it is depicted on top of constant time algorithms in the bigocheatsheet graph, as it almost seems to be independent of the input size, n.

What's to come?

With the basics down, we will get into the design and analysis of algorithms in the next two blog posts. We will do so by becoming acquainted with three families of algorithms - greedy algorithms, dynamic programming algorithms, and divide and conquer algorithms. This will lead us to the final blog post, which will discuss the subject of my own master thesis, which is based on randomized algorithms - more specifically differential privacy.

You made it this far. By now, you should already feel more comfortable discussing algorithms and data structures in future projects. We've covered the basics, which means, you are more than qualified for reading the rest of the posts. I hope you do!

References

https://www.youtube.com/watch?v=XaqR3G_NVoo
https://www.bigocheatsheet.com/
Algorithm Design, Kleinberg and Tardos
Coding Blocks, episode 'What is Algorithmic Complexity?'

DEV Community: Martin Edvardsen

What Scares Companies in 2025?

Shadow IT becomes Shadow AI

Micro-segmented Access Control

Social engineering still takes first place

A Beginner's Guide to Falling in Love with Algorithms - Part 3: Differential Privacy

Randomized Algorithms

Differential Privacy

Privacy-preserving Similarity Search

Design considerations

Designing the algorithm

A Beginner's Guide to Falling in Love with Algorithms - Part 2: Algorithm Design

Greedy Algorithms

Interval Scheduling - Designing a Greedy Algorithm

Designing the algorithm

Analyzing the algorithm

Dynamic Programming

Fibonacci Sequence

Designing the algorithm

Analyzing the algorithm

Divide and Conquer

Mergesort

Designing the algorithm

Analyzing the algorithm

What's to come?

References

A Beginner's Guide to Falling in Love with Algorithms - Part 1: Introduction

Algorithmic complexity

Asymptotic running times

O(1). Constant Time

O(n). Linear Time

O(n2). Quadratic Time and O(n log(n)). Linearithmic Time

O(log(n)) Logarithmic Time

What's to come?

O(n²). Quadratic Time and O(n log(n)). Linearithmic Time