Amazon EC2 Considered Harmful

“The TruckNumber is the size of the smallest set of people in a project such that, if all of them got hit by a truck, the project would be in trouble.” - Portland Pattern Repository

bigbus

I’m taking an “Introduction to Beowulf Design” course this week from the Georgetown University Advanced Research Computing (ARC) division. The class definitely hasn’t been boring. By a strange coincidence, it turns out that the guy sitting next to me is Mike Cariaso, an MPIBlast developer who I have been corresponding with this month in some nodalpoint posts. The course gave us an opportunity to hash out some details around running MPI on EC2. He had just booted up a 10 node Amazon EC2 cluster with MPIBlast when a bus crashed into our building…

(more…)

MPI Cluster with Python and Amazon EC2 (part 2 of 3)

Today I posted a public AMI which can be used to run a small beowulf cluster on Amazon EC2 and do some parallel computations with C, Fortran, or Python. If you prefer another language (Java, Ruby, etc) just install the appropriate MPI library and rebundle the EC2 image. The following set of Python scripts automate the launch and configuration of an MPI cluster on EC2 (currently limited to 20 nodes while EC2 is in beta):

Update (3-19-08): Code for running a cluster with large or xlarge 64 bit EC2 instances is now hosted on google code. The new images include NFS, ganglia, IPython1, and other useful python packages.

http://code.google.com/p/elasticwulf/

Update (7-24-07): I’ve made some important bug fixes to the scripts to address issues mentioned in the comments. See the README file for details

The file contains some quick scripts I threw together using the AWS Python example code. This is the approach I’m using to bootstrap an MPI cluster until one of the major linux cluster distros is ported to run on EC2. Details on what is included in the public AMI were covered in Part 1 of the tutorial, Part 3 will cover cluster operation on EC2 in more detail and show how to use Python to carry out some neat parallel computations.

The cluster launch process is pretty simple once you have an Amazon EC2 account and keys, just download the Python scripts and you can be running a compute cluster in a few minutes. In a later post I will look at cluster bandwidth and performance in detail. If you have only an occasional need for running large jobs, $2/hour for a 20 node MPI cluster on EC2 is not a bad deal considering the ~ $20K price for building your own comparable system.

(more…)

On-Demand MPI Cluster with Python and EC2 (part 1 of 3)

In this post, we will build a 20 node Beowulf cluster on Amazon EC2 and run some computations using both MPI and its Python wrapper pyMPI. This tutorial will only describe how to get the cluster running and show a few example computations. I’ll save detailed benchmarking for a later write-up.

One way to build an MPI cluster on EC2 would be to customize something like Warewulf or rebundle one of the leading linux cluster distributions like Parallel Knoppix or the Rocks Cluster Distribution onto an Amazon AMI. Both of these distros have kernels which should work with EC2. To get things running quickly as a proof of concept, I implemented a “roll-your-own” style cluster based on a Fedora Core 6 AMI managed with some simple Python scripts. I’ve found this approach suitable for running occasional parallel computations on EC2 with 20 nodes and have been running a cluster off and on for several months without any major issues. If you need to run a much larger cluster or require more complex user management, I’d recommend modifying one of the standard distributions. This will save you from some maintenance headaches and give you the additional benefit of the user/developer base for those systems.

The main task I use the cluster for is distributing large matrix computations, which is a problem well suited to existing libraries based on MPI. Depending on your needs, another platform such as Hadoop, Rinda, or cow.py might make more sense. I use Hadoop for some other projects, including MapReduce style tasks with Jython, and highly recommend it. That said, lets start building the MPI cluster…

(more…)