Adventures in Applied Math: Computational Conversations

Sunday, September 25, 2011

Computational Conversations

(based on actual conversations I have had with domain scientists...)

Solving a scientific problem? I can help!

Is it a partial differential equation? I know something about that. I've solved Schrödinger's equation in many forms. Advection and diffusion I know well. Choose your finiteness: finite elements, finite differences, or finite volumes? Each one has its own advantages and pitfalls.

Linear algebra? Use an optimized BLAS library, not the reference library! Our vendor has a group of applied mathematicians who developed these math libraries just for you. Don't let all their effort go to waste! Eigenvalues -- you don't need to orthogonalize at every step, just every so often. Maybe it will require a few more iterations, but the iterations will be so much cheaper...

Optimization? Yes! Finally something I am an expert in! Okay, so what type of problem are you solving? Is it convex? Is the objective function cheap to evaluate? How many parameters? Is it a mixed-integer program? Actually, I think your problem could be reformulated as a combinatorial optimization problem and we could try an evolutionary algorithm...

Oh, you meant code optimization. Yes, I can do that too. See these loops? Since it's Fortran, we need to reorder the loops so that the array is accessed by its first index in the innermost loop, not its second. See, we just got a 4x performance boost just by swapping these indices. Okay, let's change this array to a function, because it is easily calculated and the function can be inlined whereas that memory lookup can't. And let's eliminate these "if" statements by dividing this loop into two parts.

Your job fails at 120,000 MPI processes, but not below that? What happens when it fails? How long does it take to get to the point of failure? What is the code doing when it fails? Does it always happen in the same place? Let me ask my sysadmin colleague to look in the top-secret log files that only they can view...

Does this computation depend on that one, or can it be done in any order? Is there any reason to keep this data after initialization? What if we used this framework? Asynchronously spawning new tasks would implicitly load balance this algorithm. Your algorithm is not scalable. The synchronizing you do will be a huge bottleneck as you scale up. It may work okay now, but at the petascale or beyond there will be scalability issues coming out of the woodwork, things even beyond this synchronization issue. Trust me, I have seen this happen even in my own codes. Let my pain be your gain!

I may not be a scientist in your field, but I know a lot of things that can help you. I've boosted the performance of codes by a factor of two with a single keystroke. I may not have seen your problem before, but I've seen something like it.