Friday, May 29, 2009

Dynamic Linking and Distributed Computing Don't Mix

Dynamic linking is one of the more frustrating aspects of distributed computing in the real world. It's is the sort of technology that is meant to optimize the computer's happiness at the expense of the end user's sanity. Dynamic linking should really be avoided, except in a few very specific cases outlined below.

For those of you who don't remember, here is a brief primer on linking:

Back in the good old days, programmers would group commonly used functions (like printf and strlen) into a common module, otherwise known as a library. However, managing the library was difficult. If you simply compiled your library into the program, it would work, but your program would be full of unused code. The alternative was to cut-and-paste the needed routines into your program, but this was time consuming, and led to many copies of the code that were difficult to synchronize. Frustration was the result.

The solution to this is a tool known as a link editor or just linker. A linker looks at a program and a set of libraries, figures out all the pieces that are needed, and then constructs a complete executable program with only the routines that are actually needed. In the example below, suppose that main.o needs to use the functions printf.o and baz.o. The linker figures out that those reside in libc.a and libstrange.a, and puts the whole thing together in prog.exe. This program can be copied to any other machine, and will run correctly. This is now known as static linking.

As machines grew larger, and had ever more programs and libraries installed, someone clever observed an inefficiency. Nearly every program requires printf, so a copy of the printf code was present in nearly every single program, wasting space in both the filesystem and virtual memory. Further, if someone fixed a bug or security flaw in printf, it was necessary to recompile everything.

To address these problems, dynamic linking was invented. In this model, the linker does not copy routines into the executable, it simply makes a note that the program depends upon a certainly library. When the program is actually run, the loader binds the function calls in the program to the shared libraries on disk. Often, the executable program is very small, and simply consists of a few calls to a large number of libraries.

Now enter distributed systems. Suppose that you wish to take a program that you have written on one machine, and run it on another machine? If you have employed static linking, it's easy: you simply copy the program over, and run it. If you have used dynamic linking, it's a real pain: you must identify all of the libraries that the program depends upon, copy them over, set some obscure environment variables, and then run the program.

Ironically, dynamic linking is less efficient than static linking in several ways. First, it actually ends up using more disk space, virtual memory, and network traffic, because you have to copy over the entire libraries, not just the parts that your program needs. (Of course, you can break the dynamic library up into smaller libraries, but then you are just making it harder on the programmer and user to identify the right libraries.) Second, it makes program startup very slow, especially on a distributed filesystem, because the loader must search for every single library in the search path.

For a nice example of how this can make a simple program ridiculously complicated, try the following two commands on Linux: ldd /bin/ls and strace /bin/ls . The former shows the libraries required to run the ls command, and the latter shows the hundreds of system calls needed to just start the program. Of course, a few hundred system calls isn't much by itself, but when you think of hundreds of users sharing a common file server, and ever call to exec() results in this traffic, you can start to see why this might not be a good idea.

So, to sum up:

Static LinkingDynamic Linking
On A Single
Easy to use.
Wastes space.
Easy to use.
Saves space.
In a Distributed

Easy to use.
Saves space.

Hard to use.
Wastes space.

My advice? Always use static linking, unless you are 100% sure that every single computer on the planet has the libraries that you need. That means, link dynamically against the standard C and math libraries, maybe against pthreads and X11, and statically against everything else.

Appendix: How to control linking with gcc.

To link everything in your program statically, use the -static flag:
gcc -static main.o -lstrange -lc -lm -o prog.exe

To link some libraries statically and some dynamically, use -Xlinker -Bdynamic and -Xlinker -Bstatic to switch between modes:
gcc main.o -Xlinker -Bstatic -lstrange -Xlinker -Bdynamic -lc -lm -o prog.exe

To see what dynamic libraries your program depends upon, use the ldd command:
% ldd /bin/ls => /lib/tls/ (0x00a99000) => /lib/tls/ (0x00cf8000)
/lib/ (0x00a7f000)