Friday, May 11, 2007

Ask an Applied Mathematician, Supercomputing Paradigm Edition

Fearless reader Marius asked me what I meant by the "stagnating supercomputing paradigm" in the previous installment of "Ask an Applied Mathematician." It was a good question, deserving of my time to write a good answer. What follows is a long answer, but I think that it's a good answer, too!

For any of my answer to make sense, we need to discuss how the current computing paradigm came about. For the larger perspective, I rely on the expertise of Bill Camp, recently retired from Sandia National Laboratories, who visited our lab last year and gave a very eye-opening talk.

In the 1950's, John von Neumann laid the foundation of modern computing by proposing the von Neumann computer architecture. The idea of the von Neumann architecture is that the computer has a processor, and a storage component used for storing both the program and the data. The program and data can be modified. This was a big change from earlier computers, which were more like calculators, in the sense that they had a single program hard-wired inside, and if you wanted to change the program, you had to rewire the machine. Now all you had to do was input a new program, and that was made even easier by the invention of FORTRAN.

In the 1960's, the vector processor was invented, which sped up computations by performing calculations on an array of numbers simultaneously. In the 1970's, Cray came out with a vector supercomputer, the Cray 1, which famously doubles as a couch (click on the link to see the Wikipedia article, including a picture). Also in the 1970's DEC came out with the computer on a chip (a CPU, memory, and I/O lines all on one chip), which in the long term ended up revolutionizing the supercomputing industry.

But the vector-supercomputer honeymoon period was rather short-lived. In the 1980's, Amdahl's law (a rule of thumb showing that the speedup of a program is constrained by the percentage of the program that must be performed in serial) was interpreted as a very pessimistic law of diminishing returns. Camp described the bleak outlook of the times: "Parallelize = paralyze."

Obviously today, nothing of the sort is true. The high-performance computing (HPC) industry is booming, and developing the biggest, fastest machines has become a national security priority. So what happened to change this?

What happened was that some "disruptive technologies" came along and revitalized the industry. MPP (massively parallel processing) was a revolutionary idea that was mostly dismissed by the industry. But Sandia was willing to take a chance on it. They invested in a 1024-processor nCube, and the benchmarking numbers were so amazing that several prominent scientists accused them of fudging their results! But the numbers were real.

So this is the first reason that I think we need to be aware that the predominant programming paradigm may not last. Another disruptive technology may come along and displace the current one. Indeed, the high-performance computing world is aware of this fact and at SC '07, they are holding major events centering around disruptive technologies.

As it turned out, "inner loop" parallelism (the parallelism exploited by a vector processor) really is limited by Amdahl's law. But "outer loop" parallelism, which can be exploited through distributing the work over multiple processors that do not share memory, is much more forgiving. So eventually those prominent scientists saw that Sandia was not cheating, and the MPP paradigm grew to dominate the field.

Continuing along the timeline of computing history, in the early 1990's a lot of supercomputer companies went under, and there was a Great Depression in the HPC industry. Out of desperation, some people started hooking a bunch of computers together, operating under Linux, into a Beowulf cluster. As Linux matured and the price on commodity parts came down, this architecture became more and more viable and affordable.

In 2002, a single event revitalized the American HPC industry. The behemoth Earth Simulator machine came online. It was a Japanese project that somehow escaped American notice, despite its five-year development plan and $350 million price tag. I attended SC 2002, and I remember hearing the uproar from my colleagues when, for the first time, the top spot on the Top 500 list was claimed by a non-American machine. It was at that point that we in this country took up the gauntlet to develop faster machines. Cray launched the XT-1, and IBM announced plans for the Blue Gene, a very innovative machine that claims the top spot on the current Top 500 list.

Today's machines are based upon the same concepts that the earlier MPP machines were. The major differences are that today, we have faster processors, faster interconnects, and better software, and we use more processors and interconnects in our machines. Sandia's 1024-node nCube was huge at the time; just yesterday I was working on a machine with more than 20,000 processors; and the top machine on the current Top 500 list has more than 100,000 processors! The power that it takes to run these machines could power a small city. The cooling system for my lab's machine room requires all the water from a water main that is the same diameter as the one that supplies my town of approximately 25,000 people. How much bigger can we go?

Power and water constraints aside, there are limitations in how much more we can improve these big machines' hardware and software. Moore's law says that computing power roughly doubles every 18 months. But there is a limit to that, because processors are limited by atomic size constraints. You can shrink a processor so small, but not smaller than the size of a couple of atoms. Similarly, we have good interconnects now, and optical interconnects are hitting the market soon, but ultimately we are constrained by three-dimensional geometry and the speed of light. I was trying to avoid talking about memory and I/O, but those are also troublesome aspects of the equation. Bigger, faster machines invite bigger problems, resulting in bigger outputs. Some programs, such as climate models, produce outputs on the order of tens or even hundreds of terabytes. We have to somehow come up with a way to put these huge outputs on disk. The I/O problem is the biggest bottleneck in today's machines, and there is really no scalable solution.

Optimizing the operating system, libraries, and compiler is another way to squeeze better performance out of a machine. IBM's Blue Gene project was revolutionary in that. Instead of Linux, the compute nodes ran a stripped-down, optimized version of the Linux kernel. Cray did a similar thing with their Catamount operating system, and these tweaks have squeezed a lot of performance out of these big machines. But eventually, there won't be any more tweaks to make. And we'll be stagnating.

Even now, it's hard to scale programs up to the level of 10,000 processors. A program that scales well from four to eight processors, for example (meaning the run time on 8 processors is half the run time on 4 processors), doesn't always scale well beyond that. Typically, as we increase the number of processors running a program, the program will scale well at first, but then the performance gain trails off, and in fact it may even lose performance at a certain point. Even embarrassingly parallel programs run into this problem, because using more processes translates to doing more communication, especially as a means to overcome the I/O bottleneck I described earlier. As computers get bigger and bigger, it will become even harder to take advantage of the state-of-the-art.

So, now would be a perfect time for a disruptive technology to enter from left field. It could be quantum computing. With quantum computing, you could perform on a single processor what it takes a two-acre machine room and enough power and water for a small city to perform today. Will they get quantum computing working, or will some currently unknown disruptive technology save us?

Got a question for the Applied Mathematician? Leave it as a comment, or e-mail me!

No comments: