Tuesday, September 21, 2010

Adventures in Little Things Making a Big Difference

So, at work I work with some scientists in a field that rhymes with shmooclear shmisics.  These people are very smart application scientists, but definitely not computer scientists.  I love them because as long as they exist, I will always have a job.  They write pretty miserable code, because that's really not their thing.  They just want to do the science.

Half of my job with them is improving their code, but more importantly (if I want my efforts to not be wasted), I spend a lot of time developing a working relationship with them.  You see, they are a kind of insular community, and don't generally trust some "hot shot" outsider who doesn't know the science.  But I have been able to do a few simple things that have drastically improved the performance of their codes, so I think we are getting somewhere.

Most recently, I was profiling one of their codes, and discovered that it spent more than 50% of its time sorting.  I looked at their sorting algorithm and saw that it was some homebrew sorting algorithm that was kind of like bubble sort (with computational complexity order N2, or the worst possible performance without doing something completely stupid).  My guess is, they didn't know that some smart computer scientists had thought a lot about sorting algorithms and developed smart ways to sort; they probably just thought of how they would sort things and implemented that.  So I replaced their Frankenstein sort with a heapsort algorithm (worst-case complexity N log N) and the sorting became an insignificant portion of the total runtime.  Then, I showed my primary collaborator what I had done, and they discussed it in a meeting the next week.  As it turned out, nobody knew why it was sorting; it was some legacy of an abandoned algorithm.  I removed the sorting altogether and am in the process of doing a little benchmarking study.

It was pretty amazing, though, that this piece of code that nobody realized was being executed was taking up more than 50% of the test problem runtime, and more than 20% of the benchmark problem runtime!

The next bottleneck in their code is the I/O.  They are reading the input in a very unintelligent way (all the processors opening the same file and reading it), so I plan to fix that for them when I get the chance.

2 comments:

Laura said...

Woo, Becca! Clean HOUSE, girl! I love it. :-)

(I meant that figuratively, not literally or gender-pejoratively.)

Amy K said...

So cool!! We all have to specialize, because, as one of my colleagues put it recently, you can only spend 6 years in grad school once. But this is the second example I've seen recently of someone from an outside field making serious improvements in the insider's methods, using things the insiders had no idea existed! (The other case involved some serious group theory. Group theory is everywhere!)