Prof. Douglas Thain: makeflow

Showing posts with label makeflow. Show all posts

Monday, February 6, 2012

Some Open Computer Science Problems in Workflow Systems

In the previous article, I extolled the virtues of Makeflow, which has been very effective at engaging new users and allowing them to express their workflows in a way that facilitates parallel and distributed computing. We can very consistently get new users going from one laptop to 100 cores in a distributed system very easily.

However, as we develop experience in scaling up workflows to thousands of cores across wide area distributed systems, a number of interesting computer science challenges have emerged. These problems are not specific to Makeflow, but can be found in most workflow systems:

Capacity Management
Just because a workflow expresses thousand-way concurrency doesn't mean that it is actually a good idea to run it on one thousand nodes! The cost of moving data to and from the execution nodes may outweigh the benefit of the added computational power. If one uses fewer nodes than the available parallelism, then it may be possible to pay the data movement cost once, and then exploit it multiple times. For most workflows, there is a "sweet spot" at which performance is significantly maximized. Of course, users don't want to discover this by experiment, they need some tool to recommend an appropriate size for the given workflow.

Software Engineering Tools
A workflow is just another kind of program: it has source code that must be managed, dependencies that must be gathered, and a history of modification to be tracked. In this sense, we are badly in need of tools for manipulating workflows in coherent ways. For example, we need a linker that can take a workflow, find all the dependent components, and gather them together in one package. We need a loader that can take an existing workflow, load it into a computing system, and then update file names and other links to accomodate it. We need a profiler that can report on the time spent across multiple runs of a workflow, so as to determine where problem spots may be.

Portability and Reproducibility
Makeflow itself enables portability across execution systems. For example, you can run your application on Condor or SGE without modification. However, that doesn't mean that your applications are actually portable. If one cluster runs Blue Beanie Linux 36.5 and another runs Green Sock Linux 82.7, your chances of the same executable running on both are close to zero. Likewise, if you run a workflow one day, then set it aside for a year, it's possible that your existing machine has been updated to the point where the old workflow no longer runs.

However, if we also explicitly state the execution environment in the workflow, then this can be used to provide applications with what they need to run. The environment might be as simple as a directory structure with the applications, or as complex as an entire virtual machine. Either way, the environment becomes data that must be managed and moved along with the workflow, which affects the performance and cost issues discussed above.

Composability
Everything in computing must be composable. That is, once you get one component working, the very next step is to hook it up to another so that it runs as a subcomponent. While we can technically hook up one Makeflow to another, this doesn't currently happen in a way that results in a coherent program. For example, the execution method and resource limits don't propagate from one makeflow to another. To truly enable large scale structures, we need a native method of connecting workflows together that connects not only the control flow, but the resource allocation, capacity management, and everything else discussed above.

Effortless Scalability

As a rule of thumb, I tell brand new users that running a Makeflow on 10 cores simultaneously is trivial, running on 100 cores is usually easy, and getting to 1000 cores will require some planning and debugging. Going over 1000 cores is possible (our largest system is running on 5000 cores) but requires a real investment of time by the user.

Why does scale make things harder? One reason is that computer systems are full of artificial limits that are not widely know or managed effectively. On a Unix-like system, a given process has a limited number of file descriptors per process and a limited number of files per directory. (Most people don't figure this out until they hit the limit, and then the work must be restructured to accomodate.) A complex network with translation devices may have a limited number of simultaneously network connections. A data structure that was small to ignore suddenly becomes unmanageable when there are 10,000 entries.

To have a software system that can scale to enormous size, you need to address these known technical issues, but also have methods of accomodating limits that you didn't expect. You also need an architecture that can scale naturally and observe its own limits to understand when they are reached. An ideal implementation would know its own limits and not require additional experts in order to scale up.

---

Each of these problems, though briefly described, are pretty hefty problems once you start digging into them. Some of them are large enough to earn a PhD. (In fact, some are already in progress!) They all have the common theme of making data intensive workflows manageable, useable, portable, and productive across a wide variety of computing systems.

More to follow.

Wednesday, February 1, 2012

Why Makeflow Works for New Users

In past articles, I have introduced Makeflow, which is a large scale workflow engine that we have created at Notre Dame.

Of course, Makeflow is certainly not the first or only workflow engine out there. But, Makeflow does have several unique properties that make it an interesting platform for bringing new people into the world of distributed computing. And, it is the right level of abstraction that allows us to address some fundamental computer science problems that result.

Briefly, Makeflow is a tool that lets the user express a large number of tasks by writing them down as a conventional makefile. You can see an example on our web page. A Makeflow can be just a few rules long, or it can consist of hundreds to thousands of tasks, like this EST pipeline workflow:

Once the workflow is written down, you can then run Makeflow in several different ways. You can run it entirely on your workstation, using multiple cores. You can ask Makeflow to send the jobs to your local Condor pool, PBS or SGE cluster, or other batch system. Or, you can start the (included) Work Queue system on a few machines that you happen to have handy, and Makeflow will run the jobs there.

Over the last few years, we have had very good experience getting new users to adopt Makeflow, ranging from highly sophisticated computational scientists all the way to college sophomores learning the first principles of distributed computing. There are a couple of reasons why this is so:

A simple and familiar language. Makefiles are already a well known and widely used way of expressing dependency and concurrency, so it is easy to explain. Unlike more elaborate languages, it is brief and easy to read and write by hand. A text-based language can be versioned and tracked by any existing source control method.
A neutral interface and a portable implementation. Nothing in a Makeflow references any particular batch system or distributed computing technology, so existing workflows can be easily moved between computing systems. If you I use Condor and you use SGE, there is nothing to prevent my workflow from running on your system.
The data needs are explicit. A subtle but significant difference between Make and Makeflow is that Makeflow treats your statement of file dependencies very seriously. That is, you must state exactly which files (or directories) that your computation depends upon. This is slightly inconvenient at first, but vastly improves the ability of Makeflow to create the right execution environment, verify a correct execution, and manage storage and network resources appropriately.
An easy on-ramp to large resources. We have gone to great lengths to make it absolutely trivial to run Makeflow on a single machine with no distributed infrastructure. Using the same framework, you can move to harnessing a few machines in your lab (with Work Queue) and then progressively scale up to enormous scale using clusters, clouds, and grids. We have users running on 5 cores, 5000 cores, and everything in between.

Of course, our objective is not simply to build software. Makeflow is a starting point for engaging our research collaborators, which allows us to explore some hard computer science problems related to workflows. In the next article, I will discuss some of those challenges.

Monday, November 8, 2010

Sometimes It All Comes Together

Most days, software engineering involves compromises and imperfect solutions. It's rare for two pieces of software to mesh perfectly -- you always have to work to overcome the limitations or assumptions present in different modules. But, every once in a while, the pieces just come together in a satisfying way.

A few weeks back, we ran into a problem with BXGrid, our system for managing biometric data. Our users had just ingested a whole pile of new images and videos, and were browsing and validating the data. Because data was recently ingested, no thumbnails had been generated yet, so every screenful required a hundred or so thumbnails to be created from the high resolution images. Multiple that by each active user, and you have a web site stuck in the mud.

A better solution would be to generate all of the missing thumbnails offline in an orderly way. Since many of the transcoding operations are compute intensive, it makes sense to farm them out to a distributed system.

Peter Bui -- a graduate student in our group -- solved this problem elegantly by putting together almost all of our research software simultaneously. He used Weaver as the front-end language to query BXGrid and determine what thumbnails needed to be generated. Weaver generated a Makeflow to perform all of the transcodings. Makeflow used Work Queue to execute the tasks, with the Workers submitted to our campus Condor pool.

So far, so good. But, the missing piece was that Makeflow expects data to be available as ordinary files. At the time, this would have required that we copy several terabytes out of the archive onto the local disk, which wasn't practical. So, Peter wrote a module for Parrot which enabled access to BXGrid files under paths like /bxgrid/fileid/123. Then, while attached to Parrot, Makeflow could access the files, being none the wiser that they were actually located in a distributed system.

Put it all together, and you have this:

Sometimes, it all comes together.

Monday, November 1, 2010

Compiling Workflows with Weaver

Over the last year, our Makeflow system has become quite popular here at Notre Dame. Briefly, Makeflow takes a workload expressed in the plain old Make format, and executes it in a distributed system, using the dependency information to set up the appropriate remote execution environment. It does not require a distributed filesystem, so it's easy to get your applications going on hundreds to thousands of processors from the cloud or the grid. Makeflow is currently the back-end engine for our science portals in bioinformatics, biometrics, and molecular dynamics.

It didn't take long before our users started writing scripts in Perl or Python in order to generate Makeflows with tens of thousands of nodes. Those scripts all did similar things (query a database, break a dataset into N pieces) but also started to get unruly and difficult to debug. It wasn't easy to look at a script generator and determine what it was trying to accomplish.

Enter Weaver, which is the creation of Peter Bui, one of our graduate students. Weaver is a high level Python framework that, in a few simple lines, can generate enormous (optimized) Makeflows. Peter presented a paper about Weaver at the workshop on Challenges of Large Applications in Distributed Environments at HPDC earlier this year.

Consider this common biometrics workload: extract all of the blue irises from our repository, then convert each iris into a 'template' data type, then compare all of them to each other. Here is how you do it in Weaver:


b = SQLDataSet (’db’,’biometrics','irises')
nefs = Query(db,db.color='Blue')

conv = SimpleFunction('convertiristotemplate',outsuffix='bit')
bits = Map(conv,nefs)

cmp = SimpleFunction('compareiristemplates')
AllPairs(cmp,bits,bits,output='matrix.txt')

In the simplest case, Weaver just emits one gigantic Makeflow that performs all of the operations. However, sometimes there are patterns that can be executed more efficiently, given some better underlying tool. AllPairs is the perfect example of this optimization -- you can do an AllPairs using Makeflow, but it won't be as efficient as our native implementation. If the various data types line up appropriately, Weaver will simply call the All-Pairs abstraction. If not, it will expand the call into Makeflow in the appropriate way.

In principle, this is a lot like a C compiler: under certain conditions, the addition of two arrays can be accomplished with a vector add instruction, otherwise it must be expanded into a larger number of conventional instructions. So, we think of Weaver as a compiler for workflows: it chooses the best implementation available to execute a complex program, leaving the programmer to worry about the higher level objectives.

Monday, August 3, 2009

REU Project: BXGrid

This post continues last week's subject of summer REU projects.

Rachel Witty and Kameron Srimoungchanh worked on BXGrid, our web portal and computing system for biometrics research. This project is a collaboration between the Cooperative Computing Lab and the Computer Vision Research Lab at Notre Dame. Hoang Bui is the lead graduate student on the project. Rachel and Kameron added a bunch of new capabilities to the system; I'll show three examples today.

The first is the ability to handle 3-D face scans taken by a specialized camera equipped with a laser rangefinder. The still picture here doesn't quite do it justice, because each white "mask" on the left is a rotating animation of the face. By integrating this data into BXGrid, the 3-D data can be validated against previous ordinary images of the face.

I previously discussed All-Pairs problems, which are common in biometrics. While we already had the ability to run very large All-Pairs, problems, we never had the capability to view the results easily. Now, with the click of a button, you can set up a small All-Pairs problem and view the results on the portal:

Currently, new data ingested into the system is validated manually by people who must visually check that a eye, face, or whatever matches existing data in the system. Although this can be divided up among a large team of people, it is still time consuming and error prone.

Kameron and Rachel built a system that does a first pass at this task automatically. Using Makeflow, they set up a system to export all newly ingested images along with five good images that should match. This results in thousands of jobs sent to our Condor pool, which transform and compare the images. When all the results come back, you get a nice web page that summarizes the images and the results:

This research was supported in part by the National Science Foundation via grant NSF-CCF-0621434.

Tuesday, July 28, 2009

REU Project: Biocompute

This summer, we hosted four REU students who contributed to two web portals for distributed computing: Biocompute and BXGrid. I'll write about one this week and the other next week.

REU students Ryan Jansen and Joey Rich worked with recent grad Rory Carmichael on Biocompute, our web portal and computing system for bioinformatics research. Biocompute was originally created by Patrick Braga-Henebry for his B.S. honors thesis, and we are now putting it into production in collaboration with the Bioinformatics Core Facility at Notre Dame.

Biocompute allows researchers at Notre Dame to run standard bioinformatics tools like BLAST, and then share and manage the results. The new twist is that we transparently parallelize the tasks and run them on our campus Condor pool. This allows people to run tasks that were previously impossible: we routinely run workloads that would take months on a single machine, but get completed in hours on Biocompute.

The user simply fills out a form specifying the query, genomic databases, and so forth:

Biocompute transforms the request into a large Makeflow job that looks like this:

Users and administrators can view the progress of each job:

When the task is complete, you can browse the results, download them, or feed them into another tool on the web site:

This work was sponsored in part by the Bioinformatics Core Facility and the National Science Foundation under grant NSF-06-43229.

Friday, July 3, 2009

Make as an Abstraction for Distributed Computing

In previous articles, I have introduced the idea of abstractions for distributed computing. An abstraction is a way of specifying a large amount of work in a way that makes it possible to be distributed across a large computing system. All of the abstractions I have discussed so far have a compact, regular structure.

However, many people have large workloads that do not have a regular structure. They may have one program that generates three output files, each consumed by another program, and then joined back together. You can think of these workloads as a directed graph of processes and files, like the figure to the right.

If each of these programs may run for a long time, then you need a workflow engine that will keep track of the state of the graph, submit the jobs for execution, and deal with failures. There exist a number of workflow engines today, but without naming names, they aren't exactly easy to use. The workflow designer has to write a whole lot of batch scripts, or XML, or learn a rather complicated language to put it all together.

We recently wondered if there was a simpler way to accomplish this, A good workflow language should make it easy to do simple things, and at least obvious (if not easy) how to specify complex things. For implementation reasons, a workflow language needs to clearly state the data needs of an application: if we know in advance that program A needs file X, then we can deliver it efficiently before A begins executing. If possible, it shouldn't require the user to know anything about the underlying batch or grid systems.

After scratching our heads for a while, we finally came to the conclusion that good old Make is an attractive worfklow language. It is very compact, it states data dependencies nicely, and lots of people already know it. So, we have built a workflow system called Makeflow, which takes Makefiles, and runs them on parallel and distributed systems. Using Makeflow, you can take a very large workflow and run it on your local workstation, a single 32-core server, or a 1000-node Condor pool.

What makes Makeflow different from previous distributed makes is that is does not rely on a distributed file system. Instead, it uses the dependency information already present in the Makefile to send data to remote jobs. For example, if you have a rule like this:

output.data final.state : input.data mysim.exe
./mysim.exe -temp 325 input.data

then Makeflow will ensure that the input files input.data and mysim.exe are placed at the worker node before running mysim.exe. Afterwards, Makeflow brings the output files back to the initiator.

Because of this property, you don't need a data center in order to run a Makeflow. We provide a simple process called worker that you can run on your desktop, your laptop, or any other old computers you have lying around. The workers call home to Makeflow, which coordinates the execution on whatever machines you have available.

You can download and try out Makeflow yourself from the CCL web site.

Prof. Douglas Thain