Demystifying Dynamic Programming
Alaina Kafkes ใป15 min read
Maybe youโve heard about it in preparing for coding interviews. Maybe youโve struggled through it in an algorithms course. Maybe youโre trying to learn how to code on your own, and were told somewhere along the way that itโs important to understand dynamic programming. Using dynamic programming (DP) to write algorithms is as essential as it is feared.
And who can blame those who shrink away from it? Dynamic programming seems intimidating because it is ill-taught. Many tutorials focus on the outcomeรขโฌล โรขโฌล explaining the algorithmรขโฌล โรขโฌล instead of the processรขโฌล โรขโฌล finding the algorithmรขโฌล โรขโฌล which encourages memorization, not understanding.
Not this one.
During my algorithms class this year, I pieced together my own process for solving problems that require dynamic programming. Parts of it come from my algorithms professor (to whom much credit is due!), and parts from my own dissection of dynamic programming algorithms.
In this post, I'll teach you my process for creating dynamic programming algorithms. But before I share this process, letโs start with the basics. What is dynamic programming, anyway?
DP Defined
Dynamic programming amounts to breaking down an optimization problem into simpler sub-problems, and storing the solution to each sub-problem so that each sub-problem is only solved once.
๊ฐ๋ ์กด์@garettjonesBellman, inventor of dynamic programming, had to hide the fact he was inventing it from the Secretary of Defense:โฆ twitter.com/i/web/status/8โฆ13:01 PM - 26 May 2017
To be honest, this definition may not make total sense until you see an example of a sub-problem. Thatโs okayรขโฌล โรขโฌล itโs coming up in the next section.
What I hope to convey is that DP is a useful technique because it looks through all possible sub-problems and never recomputes the solution to any sub-problem. This guarantees correctness and efficiency, which cannot be said of most techniques used to solve or approximate algorithms. This alone makes DP special.
In the next two sections, Iโll explain what a sub-problem is, and then motivate why storing solutionsรขโฌล โรขโฌล a technique known as memoizationรขโฌล โรขโฌล matters in dynamic programming.
Sub-problems on Sub-problems on Sub-problems
Sub-problems are simply smaller versions of the original problem. In fact, sub-problems often look like a reworded version of the original problem. If formulated correctly, sub-problems build on each another in order to obtain the solution to the original problem.
To give you a better idea of how this works, letโs find the sub-problem in an example dynamic programming problem.
Pretend youโre back in the 1950s working on an IBM-650 computer. You know what this meansรขโฌล โรขโฌล punchcards! Your job is to manรขโฌล โรขโฌล or womanรขโฌล โรขโฌล the IBM-650 for a day. Youโre given a natural number n punchcards to run. Each punchcard i must be run at some predetermined start time s_{i} and stop running at some predetermined finish time f_{i}. Only one punchcard can run on the IBM-650 at once. Each punchcard also has an associated value v_{i} based on how important it is to your company.
Problem: As the person in charge of the IBM-650, you must determine the optimal schedule of punchcards that maximizes the total value of all punchcards run.
Because Iโll go through this example in great detail throughout this article, Iโll only tease you with its sub-problem for now:
Sub-problem: The maximum value schedule for punchcards i through n such that the punchcards are sorted by start time.
Notice how the sub-problem breaks down the original problem into components that build up the solution. With the sub-problem, you can find the maximum value schedule for punchcards n-1 through n, and then for punchcards n-2 through n, et cetera. By finding the solutions for every single sub-problem, you can then tackle the original problem itself: the maximum value schedule for punchcards 1 through n.
Otherwise put, since the sub-problem looks like the original problem, sub-problems can be used to solve the original problem.
In dynamic programming, after you solve each sub-problem, you must memoizeรขโฌล โรขโฌล or storeรขโฌล โรขโฌล it. Letโs find out why in the following section.
Motivating Memoization with Fibonacci Numbers
When told to implement an algorithm that calculates the Fibonacci value for any given number, what would you do? Most people I know would opt for a recursive algorithm that looks something like this in Python:
def fibonacciVal(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fibonacciVal(n-1) + fibonacciVal(n-2)
This algorithm accomplishes its purpose, but at a huge cost. For example, letโs look at what this algorithm must calculate in order to solve for n=5 (abbreviated as F(5)):
F(5)
/ \
/ \
/ \
F(4) F(3)
/ \ / \
F(3) F(2) F(2) F(1)
/ \ / \ / \
F(2) F(1) F(1) F(0) F(1) F(0)
/ \
F(1) F(0)
The tree above represents each computation that must be made in order to find the Fibonacci value for n=5. Notice how the sub-problem for n=2 is solved thrice. For a relatively small example (n=5), thatโs a lot of repeatedรขโฌล โรขโฌล and wastedรขโฌล โรขโฌล computation!
What if, instead of calculating the Fibonacci value for n = 2 three times, we created an algorithm that calculates it once, stores its value, and accesses the stored Fibonacci value for every subsequent occurrence of n=2? Thatโs exactly what memoization does.
With this idea in mind, Iโve written a dynamic programming solution to the Fibonacci value problem:
def fibonacciVal(n):
memo = [0] * (n+1)
memo[0], memo[1] = 0, 1
for i in range(2, n+1):
memo[i] = memo[i-1] + memo[i-2]
return memo[n]
Notice how the solutionรขโฌล โรขโฌล the return valueรขโฌล โรขโฌล comes from the memoization array memo[], which is iteratively filled in by the for loop. By โiteratively,โ I mean that memo[2] is calculated and stored before memo[3], memo[4],ร โฆ, and memo[n]. Because memo[] is filled in this order, the solution for each sub-problem (e.g., n=3) can be found using the solutions to its preceding sub-problems (e.g., n=2 and n=1) because these values were already stored in memo[] at an earlier time.
Memoization means no re-computation, which makes for a more efficient algorithm. Thus, memoization ensures that dynamic programming is efficient, but it is choosing the right sub-problem that guarantees overall correctness.
Now that weโve addressed memoization and sub-problems, itโs time to learn the dynamic programming process. Buckle in.
My Dynamic Programming Process
Step 1: Identify the sub-problem inร words.
Too often, programmers will turn to writing code before thinking critically about the problem at hand. Not good. One strategy for firing up your brain before you touch the keyboard is using wordsรขโฌล โรขโฌล English or otherwiseรขโฌล โรขโฌล to describe the sub-problem that you have identified within the original problem.
If youโre solving a problem that requires dynamic programming, grab a piece of paper and think about the information that you need in order to solve this problem. Write out the sub-problem with this in mind.
For example, in the aforementioned punchcard problem, I stated that the sub-problem can be written as โthe maximum value schedule for punchcards i through n such that the punchcards are sorted by start time.โ I found this sub-problem by realizing that, in order to determine the maximum value schedule for punchcards 1 through n such that the punchcards are sorted by start time, I would need to find the answer to the following sub-problems:
- The maximum value schedule for punchcards n-1 through n such that the punchcards are sorted by start time
- The maximum value schedule for punchcards n-2 through n such that the punchcards are sorted by start time
- The maximum value schedule for punchcards n-3 through n such that the punchcards are sorted by start time
- (Et cetera)
- The maximum value schedule for punchcards 2 through n such that the punchcards are sorted by start time
If you can identify a sub-problem that builds upon previous sub-problems in order to solve the problem at hand, then youโre on the right track.
Step 2: Write out the sub-problem as a recurring mathematical decision.
Once youโve identified a sub-problem in words, itโs time to write it out mathematically. Why must this be done? Well, the mathematical recurrenceรขโฌล โรขโฌล or repeated decisionรขโฌล โรขโฌล that you find will eventually be what you put into your code. Besides, writing out the sub-problem mathematically vets your sub-problem in words from Step 1. If it is difficult to encode your sub-problem from Step 1 in math, then it may be the wrong sub-problem!
There are two questions that I ask myself every time I try to find a recurrence:
- What decision do I make at every step?
- If my algorithm is at step i, what information would it need to decide what to do in step i+1? (And sometimes: If my algorithm is at step i, what information did it need to decide what to do in step i-1?)
Letโs return to the punchcard problem and ask these questions.
What decision do I make at every step? Assume that the punchcards are sorted by start time, as mentioned previously. For each punchcard that is compatible with the schedule so far (i.e., its start time is after the finish time of the punchcard that is currently running), the algorithm must choose between two options: to run, or not to run, the punchcard.
If my algorithm is at step i, what information would it need to decide what to do in step i+1? In order to decide between the two options, the algorithm needs to know the next compatible punchcard in the order.ร
If my algorithm is at step i, what information did it need to decide what to do in step i-1? The algorithm needs to know about future decisions: the ones made for punchcards i through n in order to decide to run or not to run punchcard i-1.
Now that weโve answered these questions, perhaps youโve started to form a recurring mathematical decision in your mind. If not, thatโs also okayรขโฌล โรขโฌล it becomes easier and easier to write recurrences as you get exposed to more dynamic programming problems.
Without further ado, hereโs our recurrence:
OPT(i) = max(v_{i} + OPT(next[i]), OPT(i+1))
This mathematical recurrence requires some explaining, especially for those who havenโt written one before. Here, I use OPT(i) to represent the maximum value schedule for punchcards i through n such that the punchcards are sorted by start time. Sounds familiar, right? OPT(รขโฌยข) is our sub-problem from Step 1.
In order to determine the value of OPT(i), we consider two options, and we want to take the maximum of these options in order to meet our goal: the maximum value schedule for all punchcards. Once we choose the option that gives the maximum result at step i, we memoize its value as OPT(i).
The two optionsรขโฌล โรขโฌล to run or not to run punchcard iรขโฌล โรขโฌล are represented mathematically as follows:
v_{i} + OPT(next[i])
This clause represents the decision to run punchcard i. It adds the value gained from running punchcard i to OPT(next[i]), where next[i] represents the next compatible punchcard following punchcard i. OPT(next[i]) gives the maximum value schedule for punchcards next[i] through n such that the punchcards are sorted by start time. Adding these two values together produces maximum value schedule for punchcards i through n such that the punchcards are sorted by start time if punchcard i is run.
OPT(i+1)
Conversely, this clause represents the decision to not run punchcard i. If punchcard i is not run, its value is not gained. OPT(i+1) gives the maximum value schedule for punchcards i+1 through n such that the punchcards are sorted by start time. Otherwise stated, OPT(i+1) gives the maximum value schedule for punchcards i through n such that the punchcards are sorted by start time if punchcard i is not run.
In this way, the decision made at each step of the punchcard problems is encoded mathematically to reflect the sub-problem in Step 1.
Step 3: Solve the original problem using Steps 1 andร 2.
In Step 1, we wrote down the sub-problem for the punchcard problem in words. In Step 2, we wrote down a recurring mathematical decision that corresponds to these sub-problems. How can we solve the original problem with this information?
OPT(1)
Itโs that simple. Since the sub-problem we found in Step 1 is the maximum value schedule for punchcards i through n such that the punchcards are sorted by start time, we can write out the solution to the original problem as the maximum value schedule for punchcards 1 through n such that the punchcards are sorted by start time. Since Steps 1 and 2 go hand in hand, the original problem can also be written as OPT(1).
Step 4: Determine the dimensions of the memoization array and the direction in which it should beร filled.
Did you find Step 3 deceptively simple? It sure seems that way. You may be thinking, how can OPT(1) be the solution to our dynamic program if it relies on OPT(2), OPT(next[1]), et cetera?
Youโre correct to notice that OPT(1) relies on the solution to OPT(2). This follows directly from Step 2:
OPT(1) = max(v_{1} + OPT(next[1]), OPT(2))
But this is not a crushing issue. Think back to Fibonacci memoization example. In order to find the Fibonacci value for n=5, the algorithm relied on the fact that the Fibonacci values for n=4, n=3, n=2, n=1, and n=0 had already been memoized. If we fill in our memoization table in the correct order, the reliance of OPT(1) on other sub-problems is no big deal.
How can we identify the correct direction to fill the memoization table? Well, in the punchcard problem, since we know OPT(1) relies on the solutions to OPT(2) and OPT(next[1]), and that punchcards 2 and next[1] have start times after punchcard 1 due to sorting, we can infer that we need to fill our memoization table from OPT(n) to OPT(1).
How do we determine the dimensions of this memoization array? Hereโs a trick: the dimensions of the array are equivalent to the number and size of the variables on which OPT(รขโฌยข) relies. In the punchcard problem, we have OPT(i), which means that OPT(รขโฌยข) only relies on variable i, which represents the punchcard number. This suggest that our memoization array will be one-dimensional and that its size will be n since there are n total punchcards.
If we know that n=5, then our memoization array might look like this:
memo = [OPT(1), OPT(2), OPT(3), OPT(4), OPT(5)]
However, because many programming languages start indexing arrays at 0, it may be more convenient to create this memoization array so that its indices align with punchcard numbers:
memo = [0, OPT(1), OPT(2), OPT(3), OPT(4), OPT(5)]
Step 5: Codeร it!
To code our dynamic program, we put together Steps 2โ4. The only new piece of information that youโll need to write a dynamic program is a base case, which you can find as you tinker with your algorithm.
A dynamic program for the punchcard problem will look something like this:
def punchcardSchedule(n, values, next):
'''Where values stores the punchcard values (ordered by punchcard start time)
& next[i] stores the next punchcard that can run after punchcard i'''
# Initialize memoization array - Step 4
memo = [0] * (n+1)
# Set base case
memo[n] = values[n]
# Build memoization table from n to 1 - Step 2
for i in range(n-1, 0, -1):
memo[i] = max(values[i] + memo[next[i]], memo[i+1])
# Return solution to original problem OPT(1) - Step 3
return memo[1]
Congrats on writing your first dynamic program! Now that youโve wet your feet, Iโll walk you through a different type of dynamic program.
Paradox of Choice: Multiple Options DPร Example
Although the previous dynamic programming example had a two-option decisionรขโฌล โรขโฌล to run or not to run a punchcardรขโฌล โรขโฌล some problems require that multiple options be considered before a decision can be made at each step.
Time for a new example.
Pretend youโre selling the friendship bracelets to n customers, and the value of that product increases monotonically. This means that the product has prices {p_{1},ร โฆ, p_{n}} such that p_{i} รขโฐยค p_{j} if customer j comes after customer i. These n customers have values {v_{1},ร โฆ, v_{n}}. A given customer i will buy a friendship bracelet at price p_{i} if and only if p_{i} รขโฐยค v_{i}; otherwise the revenue obtained from that customer is 0. Assume prices are natural numbers.
Problem: You must find the set of prices that ensure you the maximum possible revenue from selling your friendship bracelets.
Take a second to think about how you might address this problem before looking at my solutions to Steps 1 and 2.
Step 1: Identify the sub-problem inร words.
Sub-problem: The maximum revenue obtained from customers i through n such that the price for customer i-1 was set at q.
I found this sub-problem by realizing that, in order to determine the maximum revenue for customers 1 through n, I would need to find the answer to the following sub-problems:
- The maximum revenue obtained from customers n-1 through n such that the price for customer n-2 was set at q.
- The maximum revenue obtained from customers n-2 through n such that the price for customer n-3 was set at q.
- (Et cetera)
Notice that I introduced a second variable q into the sub-problem. I did this because, in order to solve each sub-problem, I need to know the price I set for the customer before that sub-problem. Variable q ensures the monotonic nature of the set of prices, and variable i keeps track of the current customer.
Step 2: Write out the sub-problem as a recurring mathematical decision.
There are two questions that I ask myself every time I try to find a recurrence:
- What decision do I make at every step?
- If my algorithm is at step i, what information would it need to decide what to do in step i+1? (And sometimes: If my algorithm is at step i, what information did it need to decide what to do in step i-1?)
Letโs return to the friendship bracelet problem and ask these questions.
What decision do I make at every step? I decide at which price to sell my friendship bracelet to the current customer. Since prices must be natural numbers, I know that I should set my price for customer i in the range from qรขโฌล โรขโฌล the price set for customer i-1รขโฌล โรขโฌล to v_{iรขโฌล }โรขโฌล the maximum price at which customer i will buy a friendship bracelet.
If my algorithm is at step i, what information would it need to decide what to do in step i+1? My algorithm needs to know the price set for customer i and the value of customer i+1 in order to decide at what natural number to set the price for customer i+1.
With this knowledge, I can mathematically write out the recurrence:
OPT(i, q) = max~([Revenue(v_{i}, a) + OPT(i+1, a)])
such that max~ finds the maximum over all a in the range q รขโฐยค a รขโฐยค v_{i}
Once again, this mathematical recurrence requires some explaining. Since the price for customer i-1 is q, for customer i, the price a either stays at integer q or it changes to be some integer between q+1 and v_{i}. To find the total revenue, we add the revenue from customer i to the maximum revenue obtained from customers i+1 through n such that the price for customer i was set at a.
In other words, to maximize the total revenue, the algorithm must find the optimal price for customer i by checking all possible prices between q and v_{i}. If v_{i} รขโฐยค q, then the price a must remain at q.
What about the otherร steps?
Working through Steps 1 and 2 is the most difficult part of dynamic programming. As an exercise, I suggest you work through Steps 3, 4, and 5 on your own to check your understanding.
Runtime Analysis of Dynamicร Programs
Now for the fun part of writing algorithms: runtime analysis. Iโll be using big-O notation throughout this discussionรขโฌล โรขโฌล if youโre not yet familiar with big-O, I suggest you read up on it here.
Generally, a dynamic programโs runtime is composed of the following features:
- Pre-processing
- How many times the for loop runs
- How much time it takes the recurrence to run in one for loop iteration
- Post-processing
Overall runtime takes the following form:
Overall runtime = Pre-processing + Loop * Recurrence + Post-processing
Letโs perform a runtime analysis of the punchcard problem in order to get familiar with big-O for dynamic programs. Here is the punchcard problem dynamic program:
def punchcardSchedule(n, values, next):
'''Where values stores the punchcard values (ordered by punchcard start time)
& next[i] stores the next punchcard that can run after punchcard i'''
# Initialize memoization array - Step 4
memo = [0] * (n+1)
# Set base case
memo[n] = values[n]
# Build memoization table from n to 1 - Step 2
for i in range(n-1, 0, -1):
memo[i] = max(values[i] + memo[next[i]], memo[i+1])
# Return solution to original problem OPT(1) - Step 3
return memo[1]
Letโs break down its runtime:
- Pre-processing: Here, this means building the the memoization array. O(n).
- How many times the for loop runs: O(n).
- How much time it takes the recurrence to run in one for loop iteration: The recurrence takes constant time to run because it makes a decision between two options in each iteration. O(1).
- Post-processing: None here! O(1).
The overall runtime of the punchcard problem dynamic program is O(n) O(n) * O(1) + O(1), or, in simplified form, O(n).
You Did It!
Well, thatโs itรขโฌล โรขโฌล youโre one step closer to becoming a dynamic programming wizard!
One final piece of wisdom: keep practicing dynamic programming. No matter how frustrating these algorithms may seem, repeatedly writing dynamic programs will make the sub-problems and recurrences come to you more naturally. Hereโs a crowdsourced list of classic dynamic programming problems for you to try.
So get out there and take on your interviews, classes, and life (of course) with your newfound dynamic programming knowledge!
Many thanks to Steven Bennett, Claire Durand, and Prithaj Nath for proofreading this blog post. You all rock. Thank you Professor Hartline for getting me so excited about dynamic programming that I willingly wrote about it at length.
Enjoy what you read? Spread the love by liking and sharing this piece! Have thoughts or questions? Reach out to me on Twitter or in the comments below.
Great post! I'm going to have to re-read it a few times to make sure I follow completely. This is a subject that needs a lot of demystifying for me ๐
Haha it's a long one, but it had to be because I couldn't miss a single detail about dynamic programming. ๐ Thanks for reading!
This technique it's great and very useful, and kinda reminds me about cache memories, but it flaws when it's about not discrete data, which occurs so many times. That's when approximate-result or sub-optimal solutions come in.
PS: great article and sorry for my English D:
Good read. Caught my attention since Iโm supposed to hold a guest lecture tomorrow about programming for IBM iSeries, think I might have a topic now! ๐ค
Although Iโve never intentionally used DP professionally, it does remind me of caching n-dimensional arrays with known partial solutions.
Hii, Aliana this is an awesome post I haven't made it all the way through but till how much I have read,it's simpler to understand maybe it's because I have already studied it but the thinking process is very good and it's awesome to go through the process of problem solving with DP.
In the midst of application season myself, this looks super great. Thanks for the write-up.
This is definitely the best explanation to understand Dynamic Programming so far; although it would simplify more if you can explain more about each of the variables in the mathematical recurrence!