Tuesday, January 25, 2005

Adventures in Benchmarking

Last night I slept for nearly twelve hours, so that helped me to shake whatever Jeff was trying to give to me. For now, anyhow, I think I am feeling pretty healthy.

It was good to finish up that fellowship application and now be able to concentrate on my research. Right now I'm doing a couple of particularly boring things, the most boring of which is performance evaluation.

You see, I wrote a program that I am hoping is scalable, meaning that I hope that if you use twice as many processors on a parallel computer to do the same amount of work, it should take half the time. And not only that, but this relationship should apply across many different numbers of processors, meaning that the time it takes for 256 processors to do the work should be one-eighth of the time it takes for 32 processors, because 32/256 is 1/8. Generally you test the performance of a program using powers of two for the numbers of processors, plus a few intermediate values. So for my program, I have run it with 16, 24, 32, 48, 64, 96, 128, and 256 processors. Doing these performance runs is called benchmarking your code.

The problem is that you have to do at least three runs for each number of processors, to make sure that you consistently get the same timings. And some runs take over 48 hours to complete, so as you can imagine, this is a time-consuming process.

I finished my first runs on the NCSA supercomputer just last week. It took about a month to complete them all, because of the long wait time on that machine. What happens is that you put your job into a queue, and the scheduler program on the computer decides which jobs should be run in which order. The problem of queue packing is very hard, and essentialy unsolvable, so the scheduler uses heuristics (strategies) to make these decisions. The heuristics on the particular machine I have been using are unfavorable to people like me who run relatively long (in time) jobs on relatively small numbers of processors. I got all my 256-processor jobs done before even one 16-processor job had completed.

Now I am running jobs on the new Apple cluster that my advisor was instrumental in bringing to campus. He went around and got a lot of donations from different departments and divisions in the University to pay for a cluster computer made out of Apple G5's. So since this is his baby, he wants me to run my stuff on it too, in addition to the NCSA supercomputer upon which we received an allocation. I am running the jobs on that computer too, although it is somewhat unstable and sometimes aborts my jobs for no reason at all.

The results on the NCSA supercomputer show that my algorithm is scalable to 512 processors. You can see this if you look at a graph of the run time vs. number of processors, on a log-log scale (looking at essentially the order of magnitude of each number). If the graph is basically a straight line sloping downward, that means your program is scalable. My graph was a straight line sloping downward, so my program is scalable.

I hope my description of benchmarking made sense. Any questions?

1 comment:

Anonymous said...

So how does your ability to benchMARK relate to your ability to benchPRESS? Could you lift that many computers?