So, at work I work with some scientists in a field that rhymes with shmooclear shmisics. These people are very smart application scientists, but definitely not computer scientists. I love them because as long as they exist, I will always have a job. They write pretty miserable code, because that's really not their thing. They just want to do the science.
Half of my job with them is improving their code, but more importantly (if I want my efforts to not be wasted), I spend a lot of time developing a working relationship with them. You see, they are a kind of insular community, and don't generally trust some "hot shot" outsider who doesn't know the science. But I have been able to do a few simple things that have drastically improved the performance of their codes, so I think we are getting somewhere.
Most recently, I was profiling one of their codes, and discovered that it spent more than 50% of its time sorting. I looked at their sorting algorithm and saw that it was some homebrew sorting algorithm that was kind of like bubble sort (with computational complexity order N2, or the worst possible performance without doing something completely stupid). My guess is, they didn't know that some smart computer scientists had thought a lot about sorting algorithms and developed smart ways to sort; they probably just thought of how they would sort things and implemented that. So I replaced their Frankenstein sort with a heapsort algorithm (worst-case complexity N log N) and the sorting became an insignificant portion of the total runtime. Then, I showed my primary collaborator what I had done, and they discussed it in a meeting the next week. As it turned out, nobody knew why it was sorting; it was some legacy of an abandoned algorithm. I removed the sorting altogether and am in the process of doing a little benchmarking study.
It was pretty amazing, though, that this piece of code that nobody realized was being executed was taking up more than 50% of the test problem runtime, and more than 20% of the benchmark problem runtime!
The next bottleneck in their code is the I/O. They are reading the input in a very unintelligent way (all the processors opening the same file and reading it), so I plan to fix that for them when I get the chance.