I just attended the IEEE conference on e-Science in Indianapolis, and gave this talk on harnessing distributed systems with high level abstractions.
Another highlight of the conference was Rich Wolski's talk on Eucalyptus, an open source toolkit for cloud computing. Like Nimbus, it is API compatible with Amazon's EC2. That is, if your code works with Amazon, you can install Nimbus or Eucalyptus on your on cluster and run your own cloud.
Rich also spoke on the distinction between clouds and grids, which to many are the same idea. However, he pointed out a very important distinction:
- Clouds provide a resource allocation service. You ask for a certain number of machines, you get them for a certain amount of time, and you can choose to use them however you like. This gives you guaranteed service, which is great if you are running a web server, but can lead to underutilization if your goal is to run a large number of simulations.
- Grids provide a task execution service. You submit work to be executed, and the system decides where and when to execute the tasks, perhaps preempting or migrating them along the way. You have no guarantees of service, but the system will get very high utilization, making it good for high throughput computing.
As such, you can combine the two ideas together. For example, Cycle Computing provides a value added web service that employs both Amazon and Condor. You request a certain number of CPUs to execute a certain number of tasks. Cycle allocates the CPUs using Amazon, installs Condor on the nodes, and then runs the jobs. The result is a grid running on a cloud.