Thursday, October 8, 2009

On Programming With Processes, Part II

One of the biggest challenges in building computer systems is finding a way to make things simpler. Any propeller-head can make a piece of software more complicated. Unfortunately, our industry seems to have a way of gravitating toward the complex. Let's look at the current state of the web browsers -- pick any one -- which seem to insist upon reimplementing or otherwise abusing the operating system.

Exhibit 1: About 2003, tabbed browsing is heralded as the wave of the future, and every web browser re-writes itself from scratch to support tabs and issues gushing press releases. What is a tab? Well, it's a way to switch between multiple running programs, each with its own title and visual space. Which is to say... it's like having windows! Except it's worse than having windows, it's like the old awful Multiple Document Interface, which even Microsoft now admits confused the heck out of everyone.

The funny thing is, you can achieve exactly the same behavior by dragging your taskbar to the top of the screen, like this:

Exhibit 2: You cannot run the latest version of Netscape (a.k.a Mozilla, Firefox, SeaMonkey, IceWeasel, good grief...) if your home directory is on a distributed file system. Never mind that putting your home directory on a shared filesystem is the normal practice in 90% of the industrialized world, where the user of the machine works for an organization that keeps important documents on a central server.

Apparently, Firefox uses an embedded database to store your preferences, bookmarks, cache, etc, and it cannot tolerate multiple simultaneous access. So, if you try to run multiple instances at once, it has to be clever enough to find the running copy and tell it to open a new window. If it cannot find it because the other copy is running in another console or on another machine, you get this ridiculous message:



Exhibit 3: Google Chrome is supposed to be the re-invention of the web browser, except simpler and more robust. Instead of threads, it uses this new-fangled technology called "processes" instead of those old gnarly threads. So far, so good. Then Firefox decides to get on this bandwagon.

Unfortunately, Firefox is missing the point entirely. The plan is to break the UI that controls all the windows into one process, and the plugins, parsers, renderers, etc into separate processes. It should come as no surprise that this makes things even more complicated, because the various pieces have to communicate with each other. More subtly, it makes the failure semantics really strange: if a helper process dies, one window will fail, but if the UI process dies, a whole bunch of windows will fail. If you look at the set of running processes, you are going to see an unpredictable number of processes with names that have no relation to what you are actually doing.

Everyone seems to have missed a ridiculously simple solution to all of these problems: Run each browser window in a separate process. You don't have to separate out all of the complex plugins, renderers, and so forth, because if one crashes, it will only take down that window. Furthermore, to open a new browser page in any context, all you have to do is fork() and exec("browser http://") and the operating system takes care of the rest.

See also: On Parallel Programming with Processes

Thursday, October 1, 2009

Partly Cloudy with a Chance of Condor

We have been thinking about cloud computing quite a bit over the last month. As I noted earlier, cloud computing is hardly a new idea, but it does add a few new twists on some old concepts in distributed systems. So, we are spending some time to understand how we can take our existing big applications and make them work with cloud systems and software. It should come as no surprise that there are a number of ways to use Condor to harness clouds for big applications.
Two weeks ago, I gave a talk titled Science in the Clouds at an NSF workshop on Cloud Computing and the Geosciences. One of the points that I made was that although clouds make it easy to allocate new machines that have exactly the environment you want, they don't solve the problem of work management. That is, if you have one million tasks to do, how do you reliably distribute them between your workstation, your campus computer center, and your cloud workforce? For this, you need some kind of job execution system, which is largely what grid computing has focused on:

As it stands, Condor is pretty good at managing work across multiple different kinds of systems. In fact, today you can go to a commercial service like Cycle Computing, who can build an on-demand Condor pool by allocating machines from Amazon:


Just today, we hosted Dhruba Borthakur at Notre Dame. Dhruba is the project lead for the open source Apache Hadoop system. We are cooking up some neat ways for Condor and Hadoop to play together. As a first step, one of my students Peter Bui has cooked up a module for Parrot that talks to HDFS, the Hadoop file system. This allows any Unix program -- not just Java -- talk to HDFS, without requiring the kernel configuration and other headaches of using FUSE. Then, you can submit your jobs into a Condor pool and allow them to access data in HDFS as if it were a local file system. The next step is to co-locate the Condor jobs with the Hadoop data that they want to access.
Finally, if you are interested in cloud computing, you should attend CCA09 - Cloud Computing and Applications - to be held in Chicago on October 20th. This will be a focused, one day meeting with speakers from industry, academia who are both building and using cloud computers.

Monday, August 3, 2009

REU Project: BXGrid

This post continues last week's subject of summer REU projects.

Rachel Witty and Kameron Srimoungchanh worked on BXGrid, our web portal and computing system for biometrics research. This project is a collaboration between the Cooperative Computing Lab and the Computer Vision Research Lab at Notre Dame. Hoang Bui is the lead graduate student on the project. Rachel and Kameron added a bunch of new capabilities to the system; I'll show three examples today.

The first is the ability to handle 3-D face scans taken by a specialized camera equipped with a laser rangefinder. The still picture here doesn't quite do it justice, because each white "mask" on the left is a rotating animation of the face. By integrating this data into BXGrid, the 3-D data can be validated against previous ordinary images of the face.

I previously discussed All-Pairs problems, which are common in biometrics. While we already had the ability to run very large All-Pairs, problems, we never had the capability to view the results easily. Now, with the click of a button, you can set up a small All-Pairs problem and view the results on the portal:


Currently, new data ingested into the system is validated manually by people who must visually check that a eye, face, or whatever matches existing data in the system. Although this can be divided up among a large team of people, it is still time consuming and error prone.

Kameron and Rachel built a system that does a first pass at this task automatically. Using Makeflow, they set up a system to export all newly ingested images along with five good images that should match. This results in thousands of jobs sent to our Condor pool, which transform and compare the images. When all the results come back, you get a nice web page that summarizes the images and the results:


This research was supported in part by the National Science Foundation via grant NSF-CCF-0621434.

Tuesday, July 28, 2009

REU Project: Biocompute

This summer, we hosted four REU students who contributed to two web portals for distributed computing: Biocompute and BXGrid. I'll write about one this week and the other next week.

REU students Ryan Jansen and Joey Rich worked with recent grad Rory Carmichael on Biocompute, our web portal and computing system for bioinformatics research. Biocompute was originally created by Patrick Braga-Henebry for his B.S. honors thesis, and we are now putting it into production in collaboration with the Bioinformatics Core Facility at Notre Dame.

Biocompute allows researchers at Notre Dame to run standard bioinformatics tools like BLAST, and then share and manage the results. The new twist is that we transparently parallelize the tasks and run them on our campus Condor pool. This allows people to run tasks that were previously impossible: we routinely run workloads that would take months on a single machine, but get completed in hours on Biocompute.

The user simply fills out a form specifying the query, genomic databases, and so forth:

Biocompute transforms the request into a large Makeflow job that looks like this:


Users and administrators can view the progress of each job:


When the task is complete, you can browse the results, download them, or feed them into another tool on the web site:

This work was sponsored in part by the Bioinformatics Core Facility and the National Science Foundation under grant NSF-06-43229.

Friday, July 3, 2009

Make as an Abstraction for Distributed Computing

In previous articles, I have introduced the idea of abstractions for distributed computing. An abstraction is a way of specifying a large amount of work in a way that makes it possible to be distributed across a large computing system. All of the abstractions I have discussed so far have a compact, regular structure.


However, many people have large workloads that do not have a regular structure. They may have one program that generates three output files, each consumed by another program, and then joined back together. You can think of these workloads as a directed graph of processes and files, like the figure to the right.

If each of these programs may run for a long time, then you need a workflow engine that will keep track of the state of the graph, submit the jobs for execution, and deal with failures. There exist a number of workflow engines today, but without naming names, they aren't exactly easy to use. The workflow designer has to write a whole lot of batch scripts, or XML, or learn a rather complicated language to put it all together.

We recently wondered if there was a simpler way to accomplish this, A good workflow language should make it easy to do simple things, and at least obvious (if not easy) how to specify complex things. For implementation reasons, a workflow language needs to clearly state the data needs of an application: if we know in advance that program A needs file X, then we can deliver it efficiently before A begins executing. If possible, it shouldn't require the user to know anything about the underlying batch or grid systems.

After scratching our heads for a while, we finally came to the conclusion that good old Make is an attractive worfklow language. It is very compact, it states data dependencies nicely, and lots of people already know it. So, we have built a workflow system called Makeflow, which takes Makefiles, and runs them on parallel and distributed systems. Using Makeflow, you can take a very large workflow and run it on your local workstation, a single 32-core server, or a 1000-node Condor pool.

What makes Makeflow different from previous distributed makes is that is does not rely on a distributed file system. Instead, it uses the dependency information already present in the Makefile to send data to remote jobs. For example, if you have a rule like this:

output.data final.state : input.data mysim.exe
./mysim.exe -temp 325 input.data


then Makeflow will ensure that the input files input.data and mysim.exe are placed at the worker node before running mysim.exe. Afterwards, Makeflow brings the output files back to the initiator.

Because of this property, you don't need a data center in order to run a Makeflow. We provide a simple process called worker that you can run on your desktop, your laptop, or any other old computers you have lying around. The workers call home to Makeflow, which coordinates the execution on whatever machines you have available.


You can download and try out Makeflow yourself from the CCL web site.

Tuesday, June 2, 2009

Grid Heating: Putting Data Center Heat to Productive Use

Dr. Paul Brenner, a research scientist in the Computing Research Center at the University Notre Dame, has been advocating a novel idea called grid heating. He recently won a "Green IT Award" from the Uptime Institute for his work. Here is a short introduction to the idea:

Around the world, large data centers consume enormous amounts of power. In addition to the energy needed to spin disks and rearrange electrons, an approximately equal amount of power is needed to run the air conditioners and fans to remove that heat from the data center. In this sense, data centers are doubly inefficient, because they are using power to both heat and cool the same space. If we could put that heat to productive use, then we could save energy on cooling the data center, as well as save energy that would have otherwise been used to generate heat.


Last year. Dr. Brenner constructed a prototype of this idea at the city greenhouse in South Bend, which was struggling with enormous heating bills during the winter. He constructed a small cluster, and placed it in the Arizona Desert display in the greenhouse, where the plants need the highest temperature. Notre Dame paid the electricity bill, the greenhouse got the benefit of the heat, and the computers simply joined our campus Condor pool. Everybody wins, and nobody has to pay an air conditioning bill.



However, the first cluster was just a prototype, and couldn't generate nearly enough heat for the entire greenhouse. So, this year, Dr. Brenner is building a small data center in a modular shipping container. next to the greenhouse. With a new electricity and network hookup, the data center will run several hundred CPUs, and function as a secondary furnace for the facility, hopefully reducing the heating bill by half over the winter.



The new facility will significantly add to our campus grid, and will also give us some interesting scheduling problems to work on. The greenhouse needs heat the most during the winter, and to a lesser extent during the summer, so the computing capacity of the system will change with the seasons. Further, the price of electricity varies significantly during the day, so jobs run in the dead of night may be cheaper than those run during the day. If we can connect our "campus grid" to the "smart electric grid", we can make the system automatically schedule around these constraints.

Here are some recent articles about Grid Heating:

Friday, May 29, 2009

Dynamic Linking and Distributed Computing Don't Mix



Dynamic linking is one of the more frustrating aspects of distributed computing in the real world. It's is the sort of technology that is meant to optimize the computer's happiness at the expense of the end user's sanity. Dynamic linking should really be avoided, except in a few very specific cases outlined below.

For those of you who don't remember, here is a brief primer on linking:

Back in the good old days, programmers would group commonly used functions (like printf and strlen) into a common module, otherwise known as a library. However, managing the library was difficult. If you simply compiled your library into the program, it would work, but your program would be full of unused code. The alternative was to cut-and-paste the needed routines into your program, but this was time consuming, and led to many copies of the code that were difficult to synchronize. Frustration was the result.

The solution to this is a tool known as a link editor or just linker. A linker looks at a program and a set of libraries, figures out all the pieces that are needed, and then constructs a complete executable program with only the routines that are actually needed. In the example below, suppose that main.o needs to use the functions printf.o and baz.o. The linker figures out that those reside in libc.a and libstrange.a, and puts the whole thing together in prog.exe. This program can be copied to any other machine, and will run correctly. This is now known as static linking.

As machines grew larger, and had ever more programs and libraries installed, someone clever observed an inefficiency. Nearly every program requires printf, so a copy of the printf code was present in nearly every single program, wasting space in both the filesystem and virtual memory. Further, if someone fixed a bug or security flaw in printf, it was necessary to recompile everything.

To address these problems, dynamic linking was invented. In this model, the linker does not copy routines into the executable, it simply makes a note that the program depends upon a certainly library. When the program is actually run, the loader binds the function calls in the program to the shared libraries on disk. Often, the executable program is very small, and simply consists of a few calls to a large number of libraries.


Now enter distributed systems. Suppose that you wish to take a program that you have written on one machine, and run it on another machine? If you have employed static linking, it's easy: you simply copy the program over, and run it. If you have used dynamic linking, it's a real pain: you must identify all of the libraries that the program depends upon, copy them over, set some obscure environment variables, and then run the program.

Ironically, dynamic linking is less efficient than static linking in several ways. First, it actually ends up using more disk space, virtual memory, and network traffic, because you have to copy over the entire libraries, not just the parts that your program needs. (Of course, you can break the dynamic library up into smaller libraries, but then you are just making it harder on the programmer and user to identify the right libraries.) Second, it makes program startup very slow, especially on a distributed filesystem, because the loader must search for every single library in the search path.

For a nice example of how this can make a simple program ridiculously complicated, try the following two commands on Linux: ldd /bin/ls and strace /bin/ls . The former shows the libraries required to run the ls command, and the latter shows the hundreds of system calls needed to just start the program. Of course, a few hundred system calls isn't much by itself, but when you think of hundreds of users sharing a common file server, and ever call to exec() results in this traffic, you can start to see why this might not be a good idea.

So, to sum up:

Static LinkingDynamic Linking
On A Single
Computer
Easy to use.
Wastes space.
Easy to use.
Saves space.
In a Distributed
System

Easy to use.
Saves space.

Hard to use.
Wastes space.


My advice? Always use static linking, unless you are 100% sure that every single computer on the planet has the libraries that you need. That means, link dynamically against the standard C and math libraries, maybe against pthreads and X11, and statically against everything else.



Appendix: How to control linking with gcc.


To link everything in your program statically, use the -static flag:
gcc -static main.o -lstrange -lc -lm -o prog.exe

To link some libraries statically and some dynamically, use -Xlinker -Bdynamic and -Xlinker -Bstatic to switch between modes:
gcc main.o -Xlinker -Bstatic -lstrange -Xlinker -Bdynamic -lc -lm -o prog.exe

To see what dynamic libraries your program depends upon, use the ldd command:
% ldd /bin/ls
libc.so.6 => /lib/tls/libc.so.6 (0x00a99000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00cf8000)
/lib/ld-linux.so.2 (0x00a7f000)