From Algorithms and Data Structures in Action by Marcello La Rocca

You’ve been selected to be among the first people to colonize Mars! Now you have to pack the optimal amount of food to get the most calories per kilogram—how can algorithms help?


Take 37% off Algorithms and Data Structures in Action. Just enter fccrocca into the discount code box at checkout at manning.com.


Packing your Knapsack: Data Structures Meet the Real World

Congrats, you have been selected to populate the first Mars colony! Drugstores on Mars are still a bit short of goods… and hard to find. So, you will have, eventually, to grow your own food. In the meantime, for the first few months, you will have goods shipped to sustain you.

Abstracting the Problem Away

The problem is, your crates can’t weight more than 1000 kilograms, and that’s a hard limit.

To make things harder, you can choose only from a limited set of things, already packed in boxes:

·         Potatoes, 800 kg

·         Rice, 200 kg

·         Wheat Flour, 400 kg

·         Peanut butter, 10 kg

·         Tomatoes cans, 300 kg

·         Beans, 300 kg

·         Strawberry jam, 50 kg

You’ll get water for free, so no worries about that.

And you can either take a whole crate, or not take it. You’d certainly like to have some choice, and not a ton of potatoes (i.e. “The Martian” experience).

But, at the same time, the expedition’s main concern is having you well sustained and energetic through your stay, and so the main discriminant to choose what goes on will be the nutritional values. Let’s say the total calories will be a good indicator:

Food

Weight
(kg)

Total
calories

Potatoes

800

1,502,000

Wheat Flour

400

1,444,000

Rice

300

1,122,000

Beans (can)

300

690,000

Tomatoes (can)

300

237,000

Strawberry jam

50

130,000

Peanut butter

20

117,800

So, since the actual content is irrelevant for the decision (despite your understandable protests, mission control is very firm on that point), the only things that matters are the weight and total calories provided, for each of the boxes.

Hence, our problem can be abstracted as “choose any number of items from a set, without the chance to take a fraction of any item, such that their total weight won’t be over 1000 kg, and in such a way that the total amount of calories is maximized”.

Looking for Solutions

Now that we have stated our problem, we can start looking for solutions.

You might be tempted to start packing your crate with boxes starting from the one with the highest calorie value. That would be weighting 800 kg of potatoes.

If you do so, however, neither rice nor wheat flour would fit in the crate, and their combined calorie count exceeds (by far) any other combination you can create within the remaining 200 kg. The best value you get with this strategy is 1,749,800 calories, by picking potatoes, strawberry jam, and peanut butter.

So, what would have looked like the most natural approach, a greedy[1] algorithm that at each step chooses the best immediate option, would not work. This problem needs to be carefully thought through.

Time for brainstorming. You gather your logistic team and together look for a solution.

Soon enough, someone suggests that, rather than the total calories, you should look at the calories per kg. So, you update the table above with a new column, and sort it accordingly.

Food

Weight
(kg)

Total
calories

Calories
per kg

Peanut butter

20

117,800

5,890

Rice

300

1,122,000

3,740

Wheat Flour

400

1,444,000

3,610

Strawberry jam

50

130,000

2,600

Beans (can)

300

690,000

2,300

Potatoes

800

1,501,600

1,877

Tomatoes (can)

300

237,000

790

Then you try to go top to bottom, picking up the food with the highest ratio of calories per unit of weight, ending up with peanut butter, rice, wheat flour and strawberry jam, for a total of 2,813,400 calories.

That’s much better than our first result. It’s clear that this was a step in the right direction. But, looking closely, it’s apparent that adding peanut butter prevented us from adding in beans, which would have allowed us to increase the total value of the crate even more. The good news, is that at least you won’t be forced to eat the Martian’s diet anymore: no potatoes going to Mars this time.

After a few more hours of brainstorming, you are all pretty much ready to give up: the only way you can solve this is to check, for each item, if you can get a better solution by including it or leaving it out. And the only way you know for doing so is enumerating all possible solutions and filtering out the ones that are above the weight threshold, which leaves the best solutions unfiltered. This is what’s called a brute force algorithm and we know from Math that it’s a very time-consuming one.

Since for each item you can either pack it or leave it, the possible solutions are 27=128. You guys are not too keen on going through a hundred solutions, but after a few hours, despite understanding why it’s called brute force, you are totally exhausted, but almost done with it.

Then the news breaks: someone called from the mission control, and following up on complaints from some settlers-to-be, 25 new food items have been added to the list, including sugar, oranges, soy, and marmite (don’t ask…).

After checking your numbers everyone is in dismay: now you’d have approximately 4 billion different combinations to check.

Algorithms to the Rescue

At that point, it’s clear that you need a computer program to crunch the numbers and make the best decision.

You write one yourself in the next few hours, but even that takes a long time to run: another couple of hours. Now, as much as you’d like to apply the same diet to all the colonists, turns out some of them have allergies: a quarter of them can’t eat gluten, and apparently many swear they are allergic to marmite. So, you’ll have to run the algorithm again a few times, taking each time into account individual allergies. To make thing worse, mission control is considering adding 30 more items to the list, to make up for what people with allergies will miss. If that happens, we would end up with 62 total possible items, and the program will have to go over several billions of billions of possible combinations. You try that, and after a day the program is still running, nowhere close to termination.

The whole team is ready to give up and go back to the potatoes diet, when someone remembers there is a guy in the launch team who had an algorithms book on his desk.

You call in the guy, and he immediately sees the problem for what it is: the 0-1 knapsack problem. The bad news is that it is a NP-complete[2] problem, and this means it is hard to solve, as there is no “quick” (as in polynomial with respect to the number of items) algorithm that computes the optimal solution.

There is, however, also good news: there is a pseudo-polynomial solution that uses dynamic programming[3] and takes time proportional to the max capacity of the knapsack (in our case, a crate), and luckily the capacity of the crate is limited: it takes a number of steps equals to the number of possible values for the capacity multiplied by the number of items, so assuming the smallest unit is 1kg, it only takes 1000 * 62 steps. Wow, that’s much better than 262! In fact, once you rewrite the algorithm, it finds the best solution in a matter of seconds.

At this time, you are willing to take the algorithm as a black box and plug it in without too many questions. Still, this decision is crucial to your survival… it seems a situation where you could use some deep knowledge of how this algorithm works.

For our initial example, it turns out that the best possible combination we can find is rice, wheat flour and beans, for a total of 3,256,000 calories. That would be quite a good result, compared to our first attempt, right?

You probably had already guessed the best combination, but if it seems too easy in the initial example, where we only have seven items, try the same with hundreds of different products, in a setup closer to the real scenario, and see how many years it takes you to find the best solution by hand!

We could already be happy with this solution, it’s the best we can find given the constraint.

Thinking (literally) out of the Box

But, in our narration, this is the point when the real expert in algorithms comes in. For instance, imagine that a distinguished academic is visiting our facilities while preparing the mission, invited to help with computing the best route to save fuel. During lunch break someone proudly tells her about how you brilliantly solved the problem with packing goods. And then she asks an apparently naïve question: why can’t you change the size of the boxes?

The answer will likely be either that “this is the way it has always been”, or that “items come already packed from vendors, and it would cost time and money to change it”.

And that’s when the expert will explain that if we remove the constraint on the box size, the 0-1 knapsack problem (which is an NP-complete problem) becomes the unbounded knapsack problem, for which we have a linear-time greedy solution that is usually better than the solution to the solution for the 0-1 knapsack version.

Translated in human-understandable language, we can turn this problem into one that’s easier to solve and that will allow us to pack the crates with the largest possible total of calories. The problem statement now becomes:

“Given a set of items, choose any subset of items or fraction of items from it, such that their total weight won’t be over 1000 kg, and in such a way that the total amount of calories is maximized”.

And yes, it is worth to go through the overhead of re-packing everything, because we get a nice improvement.

Specifically, if we can take any fraction of the original weights for each product, we can simply pack products starting from the one with the highest ratio of calories per kg (peanut butter, in this example), and when we get to one box that would not fit in the remaining available space, we take a fraction of it to fill the gap, and repack it. So, in the end we wouldn’t even have to repack all the goods, but just one.

The best solution would then be: peanut butter, rice, wheat flour, strawberry jam, and 230 kilograms of beans, for a total of 3,342,800 calories.

Happy Ending

So, in our story, the future Mars settlers will have greater chances of survival, and won’t go into depression because of a diet only comprising potatoes with peanut butter and strawberry jam.

From a computational point of view, we moved from incorrect algorithms (the greedy solutions taking the highest total, or highest ratio first), to a correct but unfeasible algorithm (the brute force solution enumerating all the possible combinations), and finally to a smart solution which would organize the computation in a more efficient fashion.

The following step, as important or even more, brought us to thinking out of the box to simplify our problem, by removing some constraint, and thus finding an easier algorithm and a better solution. This is actually another golden rule: always study your requirements thoroughly, question them and, when possible, try to remove them if that brings you a solution that is at least as valuable, or a slightly less valuable one that can be reached with much lower cost. Of course, in this process, other concerns (like laws and security, at all levels) must be taken into consideration, so some constraint can’t be removed.

That’s all for now. If you want to learn more about the book, check it out on liveBook here and see this slide deck.

 


[1] A greedy algorithm is a strategy to solve problems that finds the optimal solution by making the locally optimal choice at each step. It can only find the best solution on a small subclass of problems, but it can also be used as a heuristic to find approximate (sub-optimal) solutions.

[2] NP-complete problems are a set of problems for which any given solution can be verified quickly (in polynomial time), but there is no known efficient way to locate a solution in the first place. NP-complete problems, by definition, can’t be solved in polynomial time on a classical deterministic machine.

[3]  Dynamic programming is a strategy for solving complex problems with certain characteristics: a recursive structure of subproblems, with the result of each subproblem needed multiple times in the computation of the final solution. The final solution is computed by breaking the problem down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions.