Friday, July 27, 2007

Supercomputing Course: Makefiles


If you write a small computer program, you can usually fit it all into one file. But when you start writing anything halfway significant, it's a good idea to split the program into pieces. For example, if I'm writing a program to optimize a function, I'd probably want to put the function evaluation code in one file, and the optimization code in another (or, depending on the complexity of the algorithm, several others). As the complexity of the program and the size of the code grows, it's easy to expand into several hundred (or even thousand) files spread across multiple directories.

Here's how a program is made, in a nutshell. First, you write some code in a computer language, such as C++. Then, the compiler takes your code and parses it, and turns it into an object file, which is just a fancy way of saying machine-readable code. C++ code files usually have the suffix .h if they are header files, and .cc if they are source files; object files usually end in .o . Then, the compiler takes all the object files and links them together into an executable, which is what you run when you're running your program.

It's pretty simple to compile a small program in one file: you just type g++ -o myprog myprog.cc (where g++ is the name of the compiler and myprog is the name of the executable). It's even simple to compile a couple of files into a single program: you can just type g++ -c *.cc; g++ -o myprog *.o and that will do it. But when you have a bigger program and you require many files in many different directories, you run into a number of problems: first, you might have to link in libraries, or to pre-process certain files, and that can get very confusing; second, the more files that need to be compiled, the longer it takes. Each object file contains only data from the source files it depends on, so if you made a change in one code file but not in the others, you could remake the corresponding object file and relink everything together.

The make utility will help you to preprocess certain files correctly, link in the proper libraries, and recompile only the source codes that need to be recompiled, through the use of a Makefile. In a Makefile, you provide the rules for compiling your code, and the make utility automatically figures out which codes to recompile based on the timestamp of the object file and the source file(s) it depends on. So if my source file mycode.cc has been modified more recently than the corresponding object file mycode.o, then make will automatically recompile mycode.cc into mycode.o .

Makefiles follow a certain format. First, they are nearly always called Makefile (or makefile, although I prefer to capitalize it so it stands out better). You could name your Makefile Bob if you felt like it, but then you have to feed the file name to the make command and it can get ugly. Also, if you're writing a code that somebody else will ever need to decipher, then you should really stick with the standard names and just call it Makefile.

The contents of the Makefile are standardized too. Generally, the first thing you do is assign variable names (technically these are macros), like this:
CXX = mpicxx
FC = mpif90
LIBS = -lmylib -lm

INCLUDE_DIR = -I/home/rebecca/include

LIB_DIR = -L/home/rebecca/lib

CXX_FLAGS = -Wall -g -O0

EXEC = myprog
OBJS = myprog.o sub1.o sub2.o sub3.o \

        sub4.o
# this is a comment

In the first two lines, I create variables to alias my compilers. The reason I do this is because compilers have different names on different machines. So if I ever want to port my code over to a different machine, if the C++ compiler on that machine isn't called mpicxx, I can just change it on one line instead of having to do a global replace with vi.

The next line I have put what libraries I want to link with my code. When you see a -l followed by some letters, e.g., -lmylib, what that means is that the compiler needs to look for a library file by the name of libmylib (usually with a suffix of .a or .so). I'm also telling it to look in the directory /home/rebecca/lib in addition to the normal list of directories it goes through with the LIB_DIR variable. The INCLUDE_DIR variable has a list of directories in which I want the compiler to look for header files (again, in addition to the normal list of directories). The CXX_FLAGS variable holds some compiler option flags that I want the compiler to invoke. The first one, -Wall, tells the compiler to report all warnings (when I do something questionable but not bad enough to be a deal-breaker); -g tells the compiler to insert debugging information into the code so that when it breaks down I can use a debugger on it, and –O0 tells it to turn off all optimizations (such as loop unrolling). EXEC holds the name of the executable(s) I'm compiling. OBJS is the list of object files that need to be created in order to make $(EXEC). (When we use the variable for an expression, we precede it with the $ character, which tells make to evaluate the variable [i.e. print it out].)

Notice that at the end of the line starting with OBJS, I have a backslash. That represents continuation, meaning that make will go on to the next line and read it as part of the previous one. Something very important to remember is that when you indent in a Makefile, you always do it by tabbing, and never with the spacebar. make will give you weird, cryptic errors if you use spaces instead of tabs. In my experience, 99% of the errors in Makefiles are a result of using spaces instead of tabs. So learn from my experience, and don't make the same mistake!

After the variable declarations comes the meat of the makefile. In this part, you'll use the variables you just defined to create rules for make to follow. The basic format is the following:
target: dependencies
        commands

So, we begin with the obvious. If we want to make $(EXEC) (target), that depends on $(OBJS) (dependencies), and the command we use to link that invokes the compiler to link together the objects and the libraries. Thus the next lines in our makefile are
myprog: $(OBJS)
        $(CXX) -o $(EXEC) $(LIB_DIR) $(OBJS) \
                $(LIBS)


Next we need some lines telling how we make the object files. These take a weird format, so bear with me.
.cc.o:
        $(CXX) $(INCLUDE_DIR) $(CXX_FLAGS) -c $@ $<

.f.o:
        $(FC) -c $@ $<

So what's that crazy stuff at the end of the command lines? Well, @ and < are variables in make with a special meaning; $@ means the name of the current target, and $< means the source file of the current dependency.

There's another thing that it's very useful to have: a make clean rule. What that does is to wipe out your executable and all the object files, giving you a clean slate to restart your compilation. That rule looks like this:
clean:
        /bin/rm -f *.o $(EXEC)


(look up rm using man if you're interested in what the -f flag does.)

So, after you've written your Makefile and you want to compile your executable, you just type make myprog and away you go. And if you want to wipe it clean, just type make clean to do that.

Useful Makefile references:
GNU make
Make – a Tutorial
Oram, Andrew, and Steve Talbott. Managing Projects with make, O'Reilly & Associates, 1991.

Next topic: Batch Scripts

No comments: