Prof. Douglas Thain: Make as an Abstraction for Distributed Computing

Friday, July 3, 2009

Make as an Abstraction for Distributed Computing

In previous articles, I have introduced the idea of abstractions for distributed computing. An abstraction is a way of specifying a large amount of work in a way that makes it possible to be distributed across a large computing system. All of the abstractions I have discussed so far have a compact, regular structure.

However, many people have large workloads that do not have a regular structure. They may have one program that generates three output files, each consumed by another program, and then joined back together. You can think of these workloads as a directed graph of processes and files, like the figure to the right.

If each of these programs may run for a long time, then you need a workflow engine that will keep track of the state of the graph, submit the jobs for execution, and deal with failures. There exist a number of workflow engines today, but without naming names, they aren't exactly easy to use. The workflow designer has to write a whole lot of batch scripts, or XML, or learn a rather complicated language to put it all together.

We recently wondered if there was a simpler way to accomplish this, A good workflow language should make it easy to do simple things, and at least obvious (if not easy) how to specify complex things. For implementation reasons, a workflow language needs to clearly state the data needs of an application: if we know in advance that program A needs file X, then we can deliver it efficiently before A begins executing. If possible, it shouldn't require the user to know anything about the underlying batch or grid systems.

After scratching our heads for a while, we finally came to the conclusion that good old Make is an attractive worfklow language. It is very compact, it states data dependencies nicely, and lots of people already know it. So, we have built a workflow system called Makeflow, which takes Makefiles, and runs them on parallel and distributed systems. Using Makeflow, you can take a very large workflow and run it on your local workstation, a single 32-core server, or a 1000-node Condor pool.

What makes Makeflow different from previous distributed makes is that is does not rely on a distributed file system. Instead, it uses the dependency information already present in the Makefile to send data to remote jobs. For example, if you have a rule like this:

output.data final.state : input.data mysim.exe
./mysim.exe -temp 325 input.data

then Makeflow will ensure that the input files input.data and mysim.exe are placed at the worker node before running mysim.exe. Afterwards, Makeflow brings the output files back to the initiator.

Because of this property, you don't need a data center in order to run a Makeflow. We provide a simple process called worker that you can run on your desktop, your laptop, or any other old computers you have lying around. The workers call home to Makeflow, which coordinates the execution on whatever machines you have available.

You can download and try out Makeflow yourself from the CCL web site.

16 comments:

UnknownJuly 7, 2009 at 10:27 AM
I am curious how the data files move to the worker nodes. Does the makeflow master process transfer it directly from itself? Does it send a link to a central repository? Will it do anticipatory transferring for large jobs on dedicated clusters to avoid I/O clobbering?

This makes me think of Map-Reduce and I am curious if the master workflow process has a monitoring system that can manage slower or failing jobs to decrease waiting for later jobs? Or would this been an extension that would be more likely to be put on for domain specific applications?
ReplyDelete
Replies
Douglas ThainJuly 7, 2009 at 11:51 AM
Makeflow works with a number of remote 'worker' processes. Each worker calls home to the Makeflow master, which then sends the needed files directly over that connection to the worker. So, the master controls all of the data transfers, and you don't need a shared filesystem.

If you know in advance that all the tasks will take about the same amount of time, then you can cancel slow tasks that fall far outside the distribution. (See Fail Fast, Fail Often.) However, for Make in general, you don't know in advance how long each will take, so you can't do much better than simply waiting for each task to succeed or fail on its own.
ReplyDelete
Replies
SimonCOctober 19, 2010 at 11:55 AM
I'll have to try makeflow. I have an environment where there are 3 to 4 levels of dependency to process. The final object sits and produces live data. Environmental changes trigger a rebuild stage that is a DAG and I was planning to parallel make, run periodically, to manage the whole thing.

I was drawn to make for about the same reasons you state: it is well known and easy to program. I'd add to that: it's robust, stable, well-debugged software.

I was going over the source code for gmake, though, and there's a bit of cruft in there.

Ideally, I would live a make daemon, capable of reacting to changes in the file system and separating the three traditional stages of (1) constructing the dependency tree, (2) determining required update trees and (3) executing the resulting forest of updates. Ideally I would like all three stages as separate daemons, with the execution phase capable of mutex locking portions of the dependency tree until they are completed.
ReplyDelete
Replies
AnonymousMay 17, 2022 at 7:35 PM
perde modelleri
Numara onay
mobil odeme bozdurma
nft nasıl alınır
ankara evden eve nakliyat
trafik sigortası
Dedektör
web sitesi kurma
aşk kitapları
ReplyDelete
Replies
AnonymousJune 27, 2022 at 8:28 PM
minecraft premium
yurtdışı kargo
lisans satın al
nft nasıl alınır
en son çıkan perde modelleri
uc satın al
en son çıkan perde modelleri
özel ambulans
ReplyDelete
Replies
numara onayOctober 24, 2022 at 5:20 PM
Congratulations on your article, it was very helpful and successful. c89e966eb1eda287a9ec6baa77c1f187
sms onay
website kurma
website kurma
ReplyDelete
Replies
altın dedektörüOctober 28, 2022 at 10:00 PM
Thank you for your explanation, very good content. 43f37e03f37e3c332eec63ccc34924c4
define dedektörü
ReplyDelete
Replies
conduit de cheminéeOctober 29, 2022 at 8:53 PM
Dans notre boutique en ligne, vous trouverez des tuyaux de cheminée et des produits pour le conduit de cheminée

Nous sommes spécialisés dans les systèmes de cheminée et avons l'expérience de la vente en ligne de conduits de cheminée double paroi inox, de systèmes de conduits pour le tubage des cheminées. Nos cheminées de qualité irréprochable doivent répondre à un cahier des charges très strict. Vous pouvez donc être sûr que vous recevrez un produit de première qualité et un excellent rapport qualité-prix. Nous avons plus de 10 ans d'expérience.

Vous pouvez trouver des conduit de cheminée double paroi inox, des conduit de cheminée simple paroi inox, des conduit de cheminée isolés et bien plus encore dans notre magasin.
ReplyDelete
Replies
evden iş imkanıNovember 16, 2022 at 8:52 AM
Thanks for your article. ad8cf9aa610f123373872a72cfd0a400
evde iş imkanı
ReplyDelete
Replies
TorzFebruary 16, 2023 at 8:52 PM
elf bar
binance hesap açma
sms onay
7OWZV2
ReplyDelete
Replies
LilZeyMarch 16, 2023 at 10:40 AM
betmatik
kralbet
betpark
tipobet
slot siteleri
kibris bahis siteleri
poker siteleri
bonus veren siteler
mobil ödeme bahis
ROBX
ReplyDelete
Replies
cansuJuly 10, 2023 at 10:22 PM
çeşme
mardin
başakşehir
bitlis
edremit
ONS4K
ReplyDelete
Replies
İremAugust 4, 2023 at 5:05 AM
salt likit
salt likit
DZ5Z
ReplyDelete
Replies
AnonymousNovember 3, 2024 at 3:30 AM
شركة مكافحة النمل الابيض بالجبيل stwjfdEPb8
ReplyDelete
Replies
AnonymousNovember 11, 2024 at 8:21 AM
شركة مكافحة النمل الابيض بالاحساء rPKU3DdoKi
ReplyDelete
Replies
AnonymousJune 10, 2025 at 6:11 AM
عزل اسطح القطيف
oFZYQFXhtk
ReplyDelete
Replies

Add comment

Prof. Douglas Thain

Friday, July 3, 2009

Make as an Abstraction for Distributed Computing

16 comments: