## Sunday, February 22, 2009

One of my fearless readers (okay, my brother-in-law!) lamented the recent lack of mathematical content in this blog. I'm sure that he's not the only person who reads this blog just for the math. So, without further ado, I present a topic I've been working on lately: linear programming.

Suppose that you're wanting to get back into shape, and want to develop a good, healthy diet to go along with your exercise regimen. You know that you need to eat 2000 calories (for example), less than 300 g carbohydrates, at least 25 but less than 75 g fat, and 90-120 g protein.*

Perhaps we want to minimize the cost of this new diet. Then if we knew the cost and nutritional content of all foods, we could set up the following optimization problem:

minimize diet ∈ foods Cost(diet) subject to
Calories(diet) = 2000
Carbohydrates(diet) ≤ 300
25 ≤ Fat(diet) ≤ 75
90 ≤ Protein(diet) ≤ 120.

This is what's known as a linear programming problem. It's linear because the equations we're trying to solve or use as constraints are all linear -- x grams of potatoes has twice as many calories as x/2 grams. The term programming is historical and does not have anything to do with computers -- think programming as in scheduling.

As the name implies, linear programming problems often arise in the logistics and economics fields. For example, you own a chocolate factory and want to make the maximum profit, subject to constraints on demand, supplies, and labor. Or more generally, many commodity markets (e.g., soybeans, wheat, etc.) can be modeled with a linear program.

An Alabama sheriff who was recently convicted of starving his prisoners could have benefited from using linear programming. Alabama sheriffs are allotted \$1.75 per prisoner per day for food, and get to keep the extra cash they don't end up spending. Instead of computing the optimal diet for his prisoners (for which he could have pocketed \$0.79 per prisoner per day), he chose to skimp on their diets instead.

(Of course, the optimal diet is kind of disgusting, consisting of carrots, peanut butter, air-popped popcorn, baked potatoes, and skim milk. It may be optimal for cost but it's decidedly sub-optimal in terms of taste! Construct your own diet here.)

Suppose you want to make the maximum amount of profit from making widgets at your factory. You have labor, material, and shipping costs, which can be represented by linear functions of the number of widgets. The problem is, you can't make 5247.68 widgets: you have to make an integer number of them.

Problems that contain both integer and continuous variables are known as mixed-integer programs. Our diet example might be better posed as a mixed-integer linear program -- it would make more sense to eat an integer number of bananas than to eat 128.6 grams of banana (for example).

Mixed-integer linear programs are hard to solve -- they have been shown to be NP-hard. But for many of these problems, you can use a relaxation into a linear program (relax the requirement that the integer variables must be integer) as a lower bound, and find the solution using a branch-and-bound, branch-and-cut, or branch-and-price method.

Branching has to do with dividing the problem domain into pieces. If we have an integer variable constrained to be between 1 and 10, for example, we could branch it into two pieces, one containing the domain 1 through 5, and the other containing 6 through 10. If we know that the global bound from the linear program is bigger than the solution for any of the integer solutions in a given branch, we can remove that branch from consideration. We would then further subdivide the active branch until we are able to find a unique solution.

This type of branching algorithm is inherently parallelizable. First, we would solve the relaxation of the mixed-integer problem and broadcast it to all processes. Then, we would assign branches to processes, which would then eliminate or further subdivide branches as appropriate. If done naively, we would end up with all the work being done by the one process that originally owned the branch containing the solution, while all the rest of the processes are idle. Instead, good parallel implementations of branching algorithms adaptively reassign new sub-branches to idle processes.

I am working on solving a big mixed-integer linear program that represents the biofuel supply chain infrastructure. I know next to nothing about biofuel production, but my other collaborators do. My expertise lies in solving the problem in parallel on leadership supercomputing resources. I'm really excited about solving these complicated problems at this large scale!

* Note that I totally made up these numbers and if you're actually wanting to get back into shape you should consult your physician, not an applied mathematician.