An important part of my job is coming up with algorithms that can be used to solve difficult problems on a supercomputer. Leadership-class petascale supercomputers like the flagship machine here at the lab have state-of-the-art hardware, so the crunching of individual numbers goes quickly. But still, we have a limited allocation of time on the big machine, so we have to get the most bang for our buck (so to speak) by using the most efficient algorithms we can.
The application of the math is something in quantum chemistry. I'm not completely up on the application side, but it has to do with the electrons in atoms and small molecules. In order to solve these problems, we take a box in space containing the molecule, and compute something having to do with the change in kinetic or potential energy contained therein.
There are more interesting parts of the box in space, such as the part that contains the nucleus. And then there are boring parts, such as the part that is completely empty. We adaptively subdivide the space into smaller boxes, refining it more in the interesting parts and less in the boring parts. We end up with a tree structure that represents this adaptively refined box in space. This tree can have millions or even billions of nodes, depending on how much we refine the box, and how much interesting stuff is contained therein. Each node in the tree has a bunch of data associated with it, representing the wave functions of the electrons and the like. In order to figure out the energy change associated with the molecule, we have to perform a whole bunch of computations on the nodes in the tree. Since we have millions of nodes, this would take a very long time to do on even the fastest normal computer. In fact, it may not even be possible to do on a normal computer, because of all the memory requirements. Thus we use a supercomputer, farming out the work to a bunch of processors.
Since we have only a limited amount of time (both on the supercomputer and in life in general!), it would behoove us to distribute the work as evenly as possible over the processors in the supercomputer. It would also be a good idea to make sure that this distribution doesn't result in an excessive number of broken links in the tree, because broken links translate to communication between processors, and communication is expensive, orders of magnitude more expensive than computation. (When I say expensive, what I mean is that it takes a lot of time, time that we could be using for other things.)
Unfortunately, this load balancing problem is really, really hard. For all intents and purposes, it is impossible to solve for a tree of the size we're talking about. Nobody knows exactly how much time it would take to solve this problem for a tree with N nodes, but it is probably proportional to an exponential function of N. If it took a picosecond (10-12 seconds) to solve this problem for a tree where N=10, then for a tree with a million nodes (N=106), solving this problem would take about 1030,000 years. Just for reference, scientists calculate the age of the universe to be on the order of 1010 years.
So instead of solving it exactly, we use heuristics (strategies) to come up with nearly-optimal solutions. Yeah, it would be nice if we could compute using the best possible load configuration, but by the time we figured out just what that was, we could have computed the solution to the computational chemistry problem, zillions of times over.
I have developed a new heuristic for load balancing, which I can't share with you here because we're writing a paper on it. But suffice it to say, I can find a nearly-optimal load balance in an amount of time proportional to kN, where k is the depth of the tree.
In my next blog entry, I will explain why it is so hard to solve the load balancing problem.