## DEV Community is a community of 700,142 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

# Solving the knapsack problem in PostgreSQL

Francesco Tisiot
Francesco is a Developer Advocate at Aiven. Long time data practitioner and analytics enthusiast!
Originally published at aiven.io on ・8 min read

The knapsack problem: how to use limited luggage space to pack only and all the items you're most likely to need? Enter PostgreSQL! Come and find out how to use the world's best open source database to help you pack.

The days before a holiday are exciting. You've planned a vacation somewhere and can't wait to hit the road.
But first, there's a whole list of decisions to make: how many underpants should I bring? Are the running shoes necessary? What about the hairdryer? Which jacket should I pack? Lots of decisions to take with the suitcase size as constraint and the best holiday experience as ultimate goal.

The aim of the blog post is to help solve the packing problem using technology. Specifically, PostgreSQL and the recursive Common Table Expressions.

## Combinatorial optimization - the knapsack problem

People have had to pack luggage for a long time, so this optimization dilemma is far from new. It even has a name: the knapsack problem. It can be applied to a variety of use cases where there is a set of items with a defined weight (space occupied in the luggage in our example) and value (item benefits during holiday) and where the total weight is limited (luggage size).

The end goal is to come up with a set of items that fits within the weight constraint and has the maximum possible value.

For example, let's assume we have the following items:

Item Value Weight
🧦 Socks 10 3
🎩 Hat 15 5
👖 Trousers 5 15
👟 Shoes 8 10
👕 T-Shirt 7 10

And we have a knapsack with a capacity of 20.

This means we could pack:

• 👖 Trousers (weight 15) + 🎩 Hat (weight 5) for a total value of (5 + 15) = 20 OR
• 👕 T-Shirt (weight 10) + 🎩 Hat (weight 5) + 🧦 Socks (weight 3) for a total value of (7 + 15 + 10) = 32

The second option gives us a much better overall value based on the settings in the table.

(At this point, we should note that this concept and solution is presented as an oversimplified mathematical problem. Aiven is not responsible if you decide to leave for your holidays without any shoes or trousers!)

## PostgreSQL to the rescue

How can we be sure to make the best choice? Can technology help? Of course! Just pull out your trusty PostgreSQL (which you should always remember to pack).

Let's start by creating an instance with the help of Aiven's command line interface.

``````avn service create demo-pg      \
--service-type pg           \
--plan hobbyist
``````

The above creates a PostgreSQL instance (`--service-type pg`) called `demo-pg` in the region `google-europe-west3` with an `hobbyist` plan. You can size and move your instance as you wish; check out all the combinations available.

Once the instance is up and running (we can use `avn service wait demo-pg` to wait for it), we can log into the database from the prompt with:

``````avn service cli demo-pg
``````

Great, we're in PostgreSQL! Now we can create the new `inventory` table and fill it with our dataset.

``````create table inventory (item_id serial, item_name varchar, value int, weight int);
insert into inventory(item_name, value, weight) values ('Socks', 10,3);
insert into inventory(item_name, value, weight) values ('Hat', 15,5);
insert into inventory(item_name, value, weight) values ('Trousers', 5,15);
insert into inventory(item_name, value, weight) values ('Shoes', 8,10);
insert into inventory(item_name, value, weight) values ('T-Shirt', 7,10);
``````

Let's verify the data:

``````select * from inventory;
``````

And we get the expected output:

`````` item_id | item_name | value | weight
--------------+-----------+-------+--------
1 | Socks     |    10 |      3
2 | Hat       |    15 |      5
3 | Trousers  |     5 |     15
4 | Shoes     |     8 |     10
5 | T-Shirt   |     7 |     10
(5 rows)
``````

We're now ready to crack our luggage knapsack problem.

## Iterative approach and PostgreSQL recursive CTEs

There are multiple ways of solving the knapsack problem. One of them is to replicate in PostgreSQL what you would do in a trial & error case: you pick one item, add it to the luggage, then pick another item and add it, going on like that until the suitcase is full. Once no more items can fit in, you sum all the values of the inserted items and take note of the list. Then you empty the luggage and start from scratch again, trying to make sure you don't end up with the same sequence of items just in a different order.

When all the possible item combinations have been checked, you can then select the one with maximum value. It's a long and error prone process, but luckily, computers can handle it for us.

The above process defines the steps we need to perform to find all the combinations:

1. Select one random item from `inventory`.
2. Add it to the `picked_items` list and store the item cost
3. Select another item from `inventory`, making sure that: a. it wasn't already in the `picked_items` list and b. the `weight` of the current item plus the `total_weight` of all the items in the `picked_items` list is less or equal to the maximum allowed weight
4. Add the new item in the `picked_items` list
5. Add the item `weight` to the `total_weight`
6. Go back to step #3 until `inventory` contains new items satisfying conditions a and b.

We can spot an iterative pattern here of sequential queries against the `inventory` table. How can we map it into PostgreSQL? We want to perform select statements against the `inventory` table for a variable number of iterations (the list of `picked_items` can have different sizes). We need a way to recursively call the select statement: here is were PostgreSQL recursive CTEs shine, since they allow the creation of queries that can reference their own output.

For example we can get all the combination of 3 items starting with `Socks` in our `inventory` table with:

``````WITH RECURSIVE items(picked_items, nr_items) as (
SELECT ARRAY[item_name] as picked_items,
1 nr_items
from inventory
where item_name = 'Socks'
UNION ALL
select picked_items || item_name,
nr_items + 1
from inventory cross join items
where nr_items+1 <= 3
)
select * from items where nr_items=3;
``````

The `RECURSIVE` keyword sets the scene. The second `SELECT` statement is what does the trick by joining the `inventory` table with the result of the `RECURSIVE items` query we are still defining. The result lists all the 25 combinations of 3 items in the `inventory` table starting with `Socks`.

``````            name             | nr_items
----------------------------------+----------
{Socks,Socks,Socks}         |        3
{Socks,Socks,Hat}           |        3
{Socks,Socks,Trousers}      |        3
{Socks,Socks,Shoes}         |        3
{Socks,Socks,T-Shirt}       |        3
{Socks,Hat,Socks}           |        3
...
{Socks,T-Shirt,Hat}         |        3
{Socks,T-Shirt,Trousers}    |        3
{Socks,T-Shirt,Shoes}       |        3
{Socks,T-Shirt,T-Shirt}     |        3
(25 rows)
``````

The above query returns all the combinations available in the `inventory` table limited to 3 items. However, our knapsack problems has different constraints:

• Once an item is picked, we can't reuse it
• The sequence doesn't have a fixed length but it's driven by the `total_cost`

Let's check how we can add the above constraints in the recursive query, keeping `20` as total allowed cost:

``````WITH RECURSIVE items(item_id, picked_items, nr_items, total_weight, total_value) as (
SELECT
item_id,
ARRAY[item_name] as picked_items,
1 nr_items,
weight total_weight,
value total_value
from inventory
UNION ALL
select
inventory.item_id,
picked_items || item_name,
nr_items + 1,
weight + total_weight,
value + total_value
from inventory cross join items
where
picked_items::varchar[] @> ARRAY[item_name] = false
and weight + total_weight <= 20
and inventory.item_id > items.item_id
)
select * from items order by total_value;
``````

Let's cut the query down in sections: the first piece is the seed of our recursive query, we select all rows from the `inventory` table

``````SELECT
item_id,
ARRAY[item_name] as picked_items,
1 nr_items,
weight total_weight,
value total_value
from inventory
``````

We store the `item_id` as single value and we add it to the `picked_items` array. We also store the total number of items (`1`), the associated `weight` as `total_weight` and `value` as `total_value`. The next phase, after the `UNION ALL`, is the recursive bit:

``````select
inventory.item_id,
picked_items || item_name,
nr_items + 1,
weight + total_weight,
value + total_value
from inventory cross join items
where
picked_items::varchar[] @> ARRAY[item_name] = false
and weight + total_weight <= 20
and inventory.item_id > items.item_id
``````

We select the `item_id`, store it a single value and add it to the `picked_items` array. Add `+1` to count of items and add the `weight` and `value` of the current item to `total_weight` and `total_value` of the previously selected items respectively.

The interesting bit comes in the following section. We perform a `CROSS JOIN` between `inventory` and `items` (our recursive query) creating a cartesian product of the tables. In other words, we then have a row for each item that exists in `inventory` multiplied for each row in `items`. Not very clever, but the filtering does the magic:

• `picked_items::varchar[] @> ARRAY[item_name] = false`: makes sure that we exclude from the result set any rows for which the `item_name` is already in the list of `picked_items`
• `weight + total_weight <= 20`: discards from the result set all items for which the related `weight` summed to the previously calculated `total_weight` exceeds the `20` threshold we set for our luggage.
• `inventory.item_id > items.item_id`: it's an optimization filter; `{Trousers,Shoes}` and `{Shoes,Trousers}` are two combinations of the same items, and we want to keep only one. Thus we allow the selection of items only in ascending `item_id` order, this will help improving performances since we are not analysing the same subset of items multiple times.

The result of the full query above is:

`````` item_id |       picked_items        | nr_items | total_weight | total_value
--------------+---------------------------+----------+--------------+-------------
3 | {Trousers}                |        1 |           15 |           5
5 | {T-Shirt}                 |        1 |           10 |           7
4 | {Shoes}                   |        1 |           10 |           8
1 | {Socks}                   |        1 |            3 |          10
2 | {Hat}                     |        1 |            5 |          15
5 | {Shoes,T-Shirt}           |        2 |           20 |          15
3 | {Socks,Trousers}          |        2 |           18 |          15
5 | {Socks,T-Shirt}           |        2 |           13 |          17
4 | {Socks,Shoes}             |        2 |           13 |          18
3 | {Hat,Trousers}            |        2 |           20 |          20
5 | {Hat,T-Shirt}             |        2 |           15 |          22
4 | {Hat,Shoes}               |        2 |           15 |          23
2 | {Socks,Hat}               |        2 |            8 |          25
5 | {Socks,Hat,T-Shirt}       |        3 |           18 |          32
4 | {Socks,Hat,Shoes}         |        3 |           18 |          33
(15 rows)
``````

It shows all the combinations of items that fit in our `20` capacity luggage. When we sort it by `total_value`, it becomes evident that the combination `{Socks,Hat,Shoes}` is the one that provides the maximum value.

Again, please take this as pure theoretical exercise and not as holiday packing suggestion. What? do you have also other constrains to add? Taking this as a basis, feel free to add any additional complexity that your real case requires!

## Some thoughts on knapsack problem and PostgreSQL recursive CTEs

The knapsack problem is very common: from project planning to resource allocation there is a variety of real cases where we need to fit as much items as possible in some sort of limited bucket. PostgreSQL recursive CTEs represent a valid option where a query iterative approach is needed.

A word of caution here: performance can quickly go wild if the recursion is not properly limited by filters. Define your recursion, set your boundaries, and enjoy how PostgreSQL can help you packing before the holidays!

Some more references: