Friday, July 3, 2009

Make as an Abstraction for Distributed Computing

In previous articles, I have introduced the idea of abstractions for distributed computing. An abstraction is a way of specifying a large amount of work in a way that makes it possible to be distributed across a large computing system. All of the abstractions I have discussed so far have a compact, regular structure.


However, many people have large workloads that do not have a regular structure. They may have one program that generates three output files, each consumed by another program, and then joined back together. You can think of these workloads as a directed graph of processes and files, like the figure to the right.

If each of these programs may run for a long time, then you need a workflow engine that will keep track of the state of the graph, submit the jobs for execution, and deal with failures. There exist a number of workflow engines today, but without naming names, they aren't exactly easy to use. The workflow designer has to write a whole lot of batch scripts, or XML, or learn a rather complicated language to put it all together.

We recently wondered if there was a simpler way to accomplish this, A good workflow language should make it easy to do simple things, and at least obvious (if not easy) how to specify complex things. For implementation reasons, a workflow language needs to clearly state the data needs of an application: if we know in advance that program A needs file X, then we can deliver it efficiently before A begins executing. If possible, it shouldn't require the user to know anything about the underlying batch or grid systems.

After scratching our heads for a while, we finally came to the conclusion that good old Make is an attractive worfklow language. It is very compact, it states data dependencies nicely, and lots of people already know it. So, we have built a workflow system called Makeflow, which takes Makefiles, and runs them on parallel and distributed systems. Using Makeflow, you can take a very large workflow and run it on your local workstation, a single 32-core server, or a 1000-node Condor pool.

What makes Makeflow different from previous distributed makes is that is does not rely on a distributed file system. Instead, it uses the dependency information already present in the Makefile to send data to remote jobs. For example, if you have a rule like this:

output.data final.state : input.data mysim.exe
./mysim.exe -temp 325 input.data


then Makeflow will ensure that the input files input.data and mysim.exe are placed at the worker node before running mysim.exe. Afterwards, Makeflow brings the output files back to the initiator.

Because of this property, you don't need a data center in order to run a Makeflow. We provide a simple process called worker that you can run on your desktop, your laptop, or any other old computers you have lying around. The workers call home to Makeflow, which coordinates the execution on whatever machines you have available.


You can download and try out Makeflow yourself from the CCL web site.

3 comments:

  1. I am curious how the data files move to the worker nodes. Does the makeflow master process transfer it directly from itself? Does it send a link to a central repository? Will it do anticipatory transferring for large jobs on dedicated clusters to avoid I/O clobbering?

    This makes me think of Map-Reduce and I am curious if the master workflow process has a monitoring system that can manage slower or failing jobs to decrease waiting for later jobs? Or would this been an extension that would be more likely to be put on for domain specific applications?

    ReplyDelete
  2. Makeflow works with a number of remote 'worker' processes. Each worker calls home to the Makeflow master, which then sends the needed files directly over that connection to the worker. So, the master controls all of the data transfers, and you don't need a shared filesystem.

    If you know in advance that all the tasks will take about the same amount of time, then you can cancel slow tasks that fall far outside the distribution. (See Fail Fast, Fail Often.) However, for Make in general, you don't know in advance how long each will take, so you can't do much better than simply waiting for each task to succeed or fail on its own.

    ReplyDelete
  3. I'll have to try makeflow. I have an environment where there are 3 to 4 levels of dependency to process. The final object sits and produces live data. Environmental changes trigger a rebuild stage that is a DAG and I was planning to parallel make, run periodically, to manage the whole thing.

    I was drawn to make for about the same reasons you state: it is well known and easy to program. I'd add to that: it's robust, stable, well-debugged software.

    I was going over the source code for gmake, though, and there's a bit of cruft in there.

    Ideally, I would live a make daemon, capable of reacting to changes in the file system and separating the three traditional stages of (1) constructing the dependency tree, (2) determining required update trees and (3) executing the resulting forest of updates. Ideally I would like all three stages as separate daemons, with the execution phase capable of mutex locking portions of the dependency tree until they are completed.

    ReplyDelete