MPI Cluster with Python and Amazon EC2 (part 2 of 3)

Today I posted a public AMI which can be used to run a small beowulf cluster on Amazon EC2 and do some parallel computations with C, Fortran, or Python. If you prefer another language (Java, Ruby, etc) just install the appropriate MPI library and rebundle the EC2 image. The following set of Python scripts automate the launch and configuration of an MPI cluster on EC2 (currently limited to 20 nodes while EC2 is in beta):

Update (3-19-08): Code for running a cluster with large or xlarge 64 bit EC2 instances is now hosted on google code. The new images include NFS, ganglia, IPython1, and other useful python packages.

http://code.google.com/p/elasticwulf/

Update (7-24-07): I’ve made some important bug fixes to the scripts to address issues mentioned in the comments. See the README file for details

The file contains some quick scripts I threw together using the AWS Python example code. This is the approach I’m using to bootstrap an MPI cluster until one of the major linux cluster distros is ported to run on EC2. Details on what is included in the public AMI were covered in Part 1 of the tutorial, Part 3 will cover cluster operation on EC2 in more detail and show how to use Python to carry out some neat parallel computations.

The cluster launch process is pretty simple once you have an Amazon EC2 account and keys, just download the Python scripts and you can be running a compute cluster in a few minutes. In a later post I will look at cluster bandwidth and performance in detail. If you have only an occasional need for running large jobs, $2/hour for a 20 node MPI cluster on EC2 is not a bad deal considering the ~ $20K price for building your own comparable system.

Prerequisites:

  1. Get a valid Amazon EC2 account
  2. Complete the most recent “getting started guide” tutorial on Amazon EC2 and create all needed web service accounts, authorizations, and keypairs
  3. Download and install the Amazon EC2 Python library
  4. Download the Amazon EC2 MPI cluster management scripts

Launching the EC2 nodes

First , unzip the cluster management scripts and modify the configuration parameters in ‘'’EC2config.py”’, substituting your own EC2 keys and changing the cluster size if desired:



#replace these with your AWS keys
AWS_ACCESS_KEY_ID = ‘YOUR_KEY_ID_HERE’
AWS_SECRET_ACCESS_KEY = ‘YOUR_KEY_HERE’
#change this to your keypair location (see the EC2 getting started guide tutorial on using ec2-add-keypair)
KEYNAME = "gsg-keypair"
KEY_LOCATION = "/Users/pskomoroch/id_rsa-gsg-keypair"
# remove these next two lines when you’ve updated your credentials.
print "update %s with your AWS credentials" % sys.argv[0]
sys.exit()

MASTER_IMAGE_ID = "ami-3e836657"
IMAGE_ID = "ami-3e836657"

DEFAULT_CLUSTER_SIZE = 5

 

Launch the EC2 cluster by running the ‘'’ec2-start_cluster.py”’ script from your local machine:


peter-skomorochs-computer:~/AmazonEC2_MPI_scripts pskomoroch$ ./ec2-start-cluster.py

image ami-3e836657
master image ami-3e836657
—– starting master —–
RESERVATION     r-275eb84e   027811143419       default
INSTANCE        i-0ed33167      ami-3e836657      pending
—– starting workers —–
RESERVATION     r-265eb84f   027811143419       default
INSTANCE        i-01d33168      ami-3e836657      pending
INSTANCE        i-00d33169      ami-3e836657      pending
INSTANCE        i-03d3316a      ami-3e836657      pending
INSTANCE        i-02d3316b      ami-3e836657      pending
 

Verify the EC2 nodes are running with ‘'’./ec2-check-instances.py”’:


peter-skomorochs-computer:~/AmazonEC2_MPI_scripts pskomoroch$ ./ec2-check-instances.py
—– listing instances —–

RESERVATION     r-aec420c7      027811143419    default
INSTANCE        i-ab41a6c2      ami-3e836657    domU-12-31-33-00-02-5A.usma1.compute.amazonaws.com      running
INSTANCE        i-aa41a6c3      ami-3e836657    domU-12-31-33-00-01-E3.usma1.compute.amazonaws.com      running
INSTANCE        i-ad41a6c4      ami-3e836657    domU-12-31-33-00-03-AA.usma1.compute.amazonaws.com      running
INSTANCE        i-ac41a6c5      ami-3e836657    domU-12-31-33-00-04-19.usma1.compute.amazonaws.com      running
INSTANCE        i-af41a6c6      ami-3e836657    domU-12-31-33-00-03-E3.usma1.compute.amazonaws.com      running
 

Cluster Configuration and Booting MPI

Run ‘'’ec2-mpi-config.py”’ to configure MPI on the nodes, this will take a minute or two depending on the number of nodes.


peter-skomorochs-computer:~/AmazonEC2_MPI_scripts pskomoroch$ ./ec2-mpi-config.py

—- MPI Cluster Details —-
Numer of nodes = 5
Instance= i-ab41a6c2 hostname= domU-12-31-33-00-02-5A.usma1.compute.amazonaws.com state= running
Instance= i-aa41a6c3 hostname= domU-12-31-33-00-01-E3.usma1.compute.amazonaws.com state= running
Instance= i-ad41a6c4 hostname= domU-12-31-33-00-03-AA.usma1.compute.amazonaws.com state= running
Instance= i-ac41a6c5 hostname= domU-12-31-33-00-04-19.usma1.compute.amazonaws.com state= running
Instance= i-af41a6c6 hostname= domU-12-31-33-00-03-E3.usma1.compute.amazonaws.com state= running

The master node is ec2-72-44-46-78.z-2.compute-1.amazonaws.com


…<snip> …

Configuration complete, ssh into the master node as lamuser and boot the cluster:
$ ssh lamuser@ec2-72-44-46-78.z-2.compute-1.amazonaws.com
> mpdboot -n 5 -f mpd.hosts
> mpdtrace
 

Login to the master node, boot the MPI cluster, and test the connectivity:



peter-skomorochs-computer:~/AmazonEC2_MPI_scripts pskomoroch$ ssh lamuser@ec2-72-44-46-78.z-2.compute-1.amazonaws.com



Sample Fedora Core 6 + MPICH2 + Numpy/PyMPI compute node image

http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3.html

—- Modified From Marcin’s Cool Images: Cool Fedora Core 6 Base + Updates Image v1.0 —

see http://developer.amazonwebservices.com/connect/entry.jspa?externalID=554&categoryID=101


Like Marcin’s image, standard disclaimer applies, use as you please…

Amazon EC2 MPI Compute Node Image
Copyright (c) 2006 DataWrangling. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

    * Redistributions of source code must retain the above copyright
       notice, this list of conditions and the following disclaimer.

    * Redistributions in binary form must reproduce the above
       copyright notice, this list of conditions and the following
       disclaimer in the documentation and/or other materials provided
       with the distribution.

    * Neither the name of the DataWrangling nor the names of any
       contributors may be used to endorse or promote products derived
       from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
[lamuser@domU-12-31-33-00-02-5A ~]$
[lamuser@domU-12-31-33-00-02-5A ~]$ mpdboot -n 5 -f mpd.hosts
[lamuser@domU-12-31-33-00-02-5A ~]$ mpdtrace
domU-12-31-33-00-02-5A
domU-12-31-33-00-01-E3
domU-12-31-33-00-03-E3
domU-12-31-33-00-03-AA
domU-12-31-33-00-04-19

 

The results of the mpdtrace command show we have an MPI cluster running on 5 nodes. In the next section, we will verify that we can run some basic MPI tasks. For more detailed information on these mpi commands (and MPI in general), see the MPICH2 documentation.

Testing the MPI Cluster

Next we execute a sample C program bundled with MPICH2 which estimates pi using the cluster:


[lamuser@domU-12-31-33-00-02-5A ~]$  mpiexec -n 5 /usr/local/src/mpich2-1.0.5/examples/cpi
Process 0 of 5 is on domU-12-31-33-00-02-5A
Process 1 of 5 is on domU-12-31-33-00-01-E3
Process 2 of 5 is on domU-12-31-33-00-03-E3
Process 3 of 5 is on domU-12-31-33-00-03-AA
Process 4 of 5 is on domU-12-31-33-00-04-19
pi is approximately 3.1415926544231230, Error is 0.0000000008333298
wall clock time = 0.007539

 

Test the message travel time for the ring of nodes you just created:


[lamuser@domU-12-31-33-00-02-5A ~]$ mpdringtest 100
time for 100 loops = 0.14577794075 seconds

 

Verify that the cluster can run a multiprocess job:


[lamuser@domU-12-31-33-00-02-5A ~]$ mpiexec -l -n 5 hostname
3: domU-12-31-33-00-03-AA
0: domU-12-31-33-00-02-5A
1: domU-12-31-33-00-01-E3
4: domU-12-31-33-00-04-19
2: domU-12-31-33-00-03-E3

 

Testing PyMPI

Lets verify that the PyMPI install is working with our running cluster of 5 nodes. Execute the following on the master node:


[lamuser@domU-12-31-33-00-02-5A ~]$ mpirun -np 5 pyMPI /usr/local/src/pyMPI-2.4b2/examples/fractal.py
Starting computation (groan)

process 1 done with computation!!
process 3 done with computation!!
process 4 done with computation!!
process 2 done with computation!!
process 0 done with computation!!
Header length is  54
BMP size is  (400, 400)
Data length is  480000
[lamuser@domU-12-31-33-00-02-5A ~]$ ls
hosts  id_rsa.pub  mpd.hosts  output.bmp

 

This produced the following fractal image (output.bmp):

output.bmp

We will show some more examples using PyMPI in the next post.

Changing the Cluster Size

If we want to modify the number of nodes in the cluster we first need to kill the mpi cluster from the master node as follows:


[lamuser@domU-12-31-33-00-02-5A ~]$ mpdallexit
[lamuser@domU-12-31-33-00-02-5A ~]$ mpdcleanup
 

Once this is done, you can start additional instances of the public AMI from your local machine, then re-run the ec2-mpi-config.py script and reboot the cluster.

Cluster Shutdown

Run ‘'’ec2-stop-cluster.py”’ to stop all EC2 MPI nodes. If you just want to stop the slave nodes, run ec2-stop-slaves.py



peter-skomorochs-computer:~/AmazonEC2_MPI_scripts pskomoroch$ ./ec2-stop-cluster.py
This will stop all your EC2 MPI images, are you sure (yes/no)? yes
—– listing instances —–
RESERVATION     r-aec420c7      027811143419    default
INSTANCE        i-ab41a6c2      ami-3e836657    domU-12-31-33-00-02-5A.usma1.compute.amazonaws.com      running
INSTANCE        i-aa41a6c3      ami-3e836657    domU-12-31-33-00-01-E3.usma1.compute.amazonaws.com      running
INSTANCE        i-ad41a6c4      ami-3e836657    domU-12-31-33-00-03-AA.usma1.compute.amazonaws.com      running
INSTANCE        i-ac41a6c5      ami-3e836657    domU-12-31-33-00-04-19.usma1.compute.amazonaws.com      running
INSTANCE        i-af41a6c6      ami-3e836657    domU-12-31-33-00-03-E3.usma1.compute.amazonaws.com      running

—- Stopping instance Id’s —-
Stoping Instance Id = i-ab41a6c2
Stoping Instance Id = i-aa41a6c3
Stoping Instance Id = i-ad41a6c4
Stoping Instance Id = i-ac41a6c5
Stoping Instance Id = i-af41a6c6

Waiting for shutdown ….
—– listing new state of instances —–
RESERVATION     r-aec420c7      027811143419    default
INSTANCE        i-ab41a6c2      ami-3e836657    domU-12-31-33-00-02-5A.usma1.compute.amazonaws.com      shutting-down
INSTANCE        i-aa41a6c3      ami-3e836657    domU-12-31-33-00-01-E3.usma1.compute.amazonaws.com      shutting-down
INSTANCE        i-ad41a6c4      ami-3e836657    domU-12-31-33-00-03-AA.usma1.compute.amazonaws.com      shutting-down
INSTANCE        i-ac41a6c5      ami-3e836657    domU-12-31-33-00-04-19.usma1.compute.amazonaws.com      shutting-down
INSTANCE        i-af41a6c6      ami-3e836657    domU-12-31-33-00-03-E3.usma1.compute.amazonaws.com      shutting-down

 

40 Responses to “MPI Cluster with Python and Amazon EC2 (part 2 of 3)”

  1. April 9th, 2007 | 2:49 pm

    […] Part 2 of 3 […]

  2. April 26th, 2007 | 12:55 am

    Excellent stuff! I’ve gotten started with EC2 and I’ll be trying your images out soon. I doubt that I’ll be trying to make ParallelKnoppix work on EC2, because your approach is the right one, I think. PK is designed to use when the hardware is not known ahead of time. With EC2, the hardware is known, so a tailor-made image is the way to go. Your scripts allow an on-demand cluster to be created in minutes, and that’s all that PK offers, anyway. PK usually needs some remastering so that users can add their own packages. Re-bundling an EC2 image is completely analogous. I’m planning on doing just that, probably starting with your images, and doing some testing of latency on tasks that require different degrees of internode communication. Thanks for all this, it’ll make the rest an easy job.

  3. April 26th, 2007 | 1:14 am

    One question, do you know if something like an NFS shared home directory is possible. Using S3, possibly?

  4. April 26th, 2007 | 2:06 am

    A little report on my trial.
    1) ./ec2-start_cluster.py is not always successful in getting the requested number of nodes to come up. The instances sometimes have status “terminated” before anything is done with them.

    2) When the 5 nodes all come up, I still get a problem with ./ec2-mpi-config.py requesting a root password:

    michael@yosemite:~/ec2/AmazonEC2_MPI_scripts$ ./ec2-mpi-config.py

    —- MPI Cluster Details —-
    Numer of nodes = 5
    Instance= i-e39c7a8a hostname= ec2-72-44-45-138.z-2.compute-1.amazonaws.com state= running
    Instance= i-e29c7a8b hostname= ec2-72-44-45-185.z-2.compute-1.amazonaws.com state= running
    Instance= i-e59c7a8c hostname= ec2-72-44-45-186.z-2.compute-1.amazonaws.com state= running
    Instance= i-e49c7a8d hostname= ec2-72-44-45-122.z-2.compute-1.amazonaws.com state= running
    Instance= i-e79c7a8e hostname= ec2-72-44-45-60.z-2.compute-1.amazonaws.com state= running

    The master node is ec2-72-44-45-138.z-2.compute-1.amazonaws.com

    Writing out mpd.hosts file
    nslookup ec2-72-44-45-138.z-2.compute-1.amazonaws.com
    (0, ‘Server:\t\t158.109.0.1\nAddress:\t158.109.0.1#53\n\nNon-authoritative answer:\nName:\tec2-72-44-45-138.z-2.compute-1.amazonaws.com\nAddress: 72.44.45.138\n’)
    nslookup ec2-72-44-45-185.z-2.compute-1.amazonaws.com
    (0, ‘Server:\t\t158.109.0.1\nAddress:\t158.109.0.1#53\n\nNon-authoritative answer:\nName:\tec2-72-44-45-185.z-2.compute-1.amazonaws.com\nAddress: 72.44.45.185\n’)
    nslookup ec2-72-44-45-186.z-2.compute-1.amazonaws.com
    (0, ‘Server:\t\t158.109.0.1\nAddress:\t158.109.0.1#53\n\nNon-authoritative answer:\nName:\tec2-72-44-45-186.z-2.compute-1.amazonaws.com\nAddress: 72.44.45.186\n’)
    nslookup ec2-72-44-45-122.z-2.compute-1.amazonaws.com
    (0, ‘Server:\t\t158.109.0.1\nAddress:\t158.109.0.1#53\n\nNon-authoritative answer:\nName:\tec2-72-44-45-122.z-2.compute-1.amazonaws.com\nAddress: 72.44.45.122\n’)
    nslookup ec2-72-44-45-60.z-2.compute-1.amazonaws.com
    (0, ‘Server:\t\t158.109.0.1\nAddress:\t158.109.0.1#53\n\nNon-authoritative answer:\nName:\tec2-72-44-45-60.z-2.compute-1.amazonaws.com\nAddress: 72.44.45.60\n’)
    Warning: Permanently added ‘ec2-72-44-45-138.z-2.compute-1.amazonaws.com,72.44.45.138′ (RSA) to the list of known hosts.
    id_rsa.pub 100% 1675 1.6KB/s 00:00
    root@ec2-72-44-45-138.z-2.compute-1.amazonaws.com’s password:

    This is as far as I can get at the moment. Looks like a minor problem. Cheers, M.

  5. April 26th, 2007 | 11:10 am

    Michael,

    I haven’t had the scripts prompt me for a password before, are you running them from your local machine? The mpi-config script expects the keyname and keypair location to match what was used to start the instance. Take a look at your EC2config.py file and make sure the instances were all started with your own keypair (i used the gsg keypair I created on my laptop in the Amazon “getting started guide” tutorial):

    AWS_ACCESS_KEY_ID = ‘YOUR_KEY_ID_HERE’
    AWS_SECRET_ACCESS_KEY = ‘YOUR_KEY_HERE’
    MASTER_IMAGE_ID = “ami-3e836657″
    IMAGE_ID = “ami-3e836657″
    KEYNAME = “gsg-keypair”
    KEY_LOCATION = “~/id_rsa-gsg-keypair”
    DEFAULT_CLUSTER_SIZE = 5

    I’m working on an updated version of the scripts and EC2 image which should make things a bit cleaner. Sorry the code is ugly right now in terms of error handling…I just wanted to toss something together to get people started :)

  6. April 27th, 2007 | 4:32 am

    Yep, I run the mpi-config script right after creating the instances, doing just what you suggest. The fact that the instances start up at all seems to me to mean that the keypair information is ok. Do you know if anyone but you has been able to launch a cluster? Very cool stuff. I’m going to be looking into making a Debian AMI that works the same way.

  7. April 27th, 2007 | 7:50 am

    Mike Cariaso modified my scripts to fix some path issues and got it working on a windows laptop, he might have also fixed some other errors I didn’t notice. I haven’t had a chance to try them yet, but you can download the modified scripts here:

    http://mpiblast.pbwiki.com/AmazonEC2

  8. June 28th, 2007 | 6:31 pm

    ===== DO NOT USE THESE SCRIPTS! =====

    This section of ec2-mpi-config.py is a bit problematic:

    os.system(’cp %s ~/id_rsa.pub’ % KEY_LOCATION )
    os.system(’cp ~/id_rsa.pub ~/.ssh/id_rsa’)

    This will clobber any existing rsa key on the initiating machine’s account, and with break normal auth on the next login if you have a different default rsa key!

    The script should instead copy the private key directly from KEY_LOCATION to the nodes.

    ===== DO NOT USE THESE SCRIPTS! =====

    Otherwise, way cool. Thanks for putting this tutorial together. We’re trying EC2 clusters out as a way to get quicker feedback from regression tests after changes to our software. Unfortunately, with the one hour granularity I don’t think it will be price competitive. We want 20-100 nodes for about 5 minutes at a time.

  9. June 28th, 2007 | 7:45 pm

    Ralph,

    Good catch. Thanks for pointing that out. I just lifted those passwordless ssh lines straight from an MPI tutorial.

    This might solve the clobbering as well (from http://www.maclife.com/forums/topic/61520):

    cat id_rsa.pub >> .ssh/authorized_keys

    “The above command will create the “authorized_keys” file in the “.ssh” directory if that file doesn’t already exist, and it will append the new id_rsa.pub file to it if it does already exist.”

    I’ll add that change to the scripts. Good luck with the regression cluster, I heard Oracle developers do something like that using Condor on otherwise idle desktops (see http://www.cs.wisc.edu/condor/doc/nmi-lisa2006-slides.pdf).

    -Pete

  10. June 29th, 2007 | 12:00 pm

    Yeah, that would work better. Some more detailed comments:

    • Your image has /home/lamuser/.mpd.conf owned by root. I had to chown it to lamuser before I could start mpd.

    • You script passes the public dns names for the nodes into mpd.hosts. For that to work, a hole has to be opened in the firewall for the ports the mpi daemon is using. A simpler solution is to just pass the internal dns names. Then all the traffic happens behind the firewall, which probably also improves latency. (Although my ringtest was noticably slower than yours, averaging 2.2e-3 seconds/loop so who knows?)

    • I was surprised that when I originally ran ec2-add-keypair in the EC2 tutorial that it uploaded the public key (ok) and printed out the private key (ok I guess) but didn’t print out the public key locally (weird). Your scripts seem to assume the public key is available as id_rsa.pub on the client machine. Shouldn’t this first be copied either from /root/.ssh/authorized_keys on the master node (as installed by amazon) or retrieved through the query interface?

    Is the mutual ssh access required for more than just launching the MPI daemon? If all subsequent traffic goes through the mpi daemons, starting mpd from the client machine, or automatically from the init scripts after pulling mpd.hosts from S3 would save the whole trouble, including uploading the private key at all.

  11. June 29th, 2007 | 1:41 pm

    Ralph,

    More good points. I’ve been tied up with some other projects, but it sounds like enough feedback is in to make a revised version of the image and scripts. I expect the latency to vary a bit depending on the random EC2 network topology when a cluster is launched…(instances on the same box vs. over ethernet) that might explain the ringtest. The mutual ssh access was set up since we do a lot of file/data shuffling between nodes outside of MPI.

    Thanks again, looking forward to hearing how the regression test system works out.

    -Pete

  12. July 24th, 2007 | 1:04 pm

    Update (7-24-07): I’ve made some important bug fixes to the scripts to address issues mentioned in the comments.

    Specific changes made:

    • fixed lamuser home directory permissions bug
    • fixed section of ec2-mpi-config.py which clobbered existing rsa keys on the client machine
    • Updated calls of AWS python EC2 library to use API version 2007-01-19
      http://developer.amazonwebservices.com/connect/entry.jspa?externalID=552&categoryID=85
    • fixed mpdboot issue by using amazon internal DNS names in hosts files
    • scripts should now work on windows/cygwin client environments

    After I run some benchmarks, I’m hoping to find some time to add LAM and OpenMPI to the EC2 image along with NFS configuration, C3 cluster tools, Ganglia, and a benchmarking package.

  13. Soo..
    August 25th, 2007 | 11:09 pm

    What about that Part 3? :)

  14. Patrick Ball
    October 23rd, 2007 | 12:12 am

    the first two parts really set the stage … Part 3?

    :)

  15. December 26th, 2007 | 4:22 pm

    Does the 5 month hiatus in this project mean that it was a bad idea and you guys have learnt enough to waste no more time on it?

    Given the virtualization uncertainty, finding the right communication/computation balance for typical MPI programs appears to be very unrewarding. Secondly, MPI development and debug and then QA and scale out are not addressed, which doesn’t bode well. It appears most productive to have a local small cluster for development and debug, and then do QA and scale out on EC2, but some benchmarking numbers would really help.

    If EC2 is only robust for embarrassingly parallel problems, then MapReduce style programs are more attractive. There the size of the data set and how well it integrates in a distributed file system appear to be the problems to focus on. Or BOINC like approaches if there is no integrated DFS. Anyone have operational data on these approaches?

  16. January 18th, 2008 | 11:41 am

    Theo,

    Sorry for the delay in posting this and responding. I’ve been working on a startup for the past 7 months and was in serious crunch mode. Don’t read too much into the large gap in posts, it is just me working on this as a side-project. I finished moving the blog to another host and finally have some time to get back to the EC2 work. This experience has taught me to never name a series of blog posts “part 1 of N” :)

    You make some excellent points. One thing that has changed since I wrote the first post is that EC2 now offers larger 64bit machine images with better I/O (you can provision an entire physical server and not be limited by sharing network resources in the virtual instance). I’d like to see if this improves the network performance. I’m giving a talk on this in March, so I’m on the hook to have some benchmarks by then.

    I also agree on the mapreduce side. For embarrassingly parallel problems, hadoop on ec2 is potentially much more attractive…more robust, easier for most people to program. Ideally, I would like to do some comparisons between the two approaches and run the numbers.

    The performance of an EC2 MPI cluster is definitely going to be worse than your own custom hardware, but it still might fit certain niche situations. In my case, I needed to run some MPI code for a large problem and didn’t have access to a large enough cluster. The performance on EC2 was nowhere near what you get on a high-end cluster, but it got the job done for a reasonable price.

    This discussion on the beowulf list goes into more detail on the pros/cons:

    http://www.beowulf.org/pipermail/beowulf/2008-January/020490.html

    -Pete

  17. pete
    February 4th, 2008 | 8:58 am

    Can’t get the ec-mpi-config to work. Says list index out of range for mpi-externalnames[0] on line 108
    start cluster and check instances are OK so I think that python, EC, elementree
    are OK
    Any ideas why? Has AWS changed the format of the response you’re parsing (yes I have had a look at the python code but since I haven’t used python before I can’t see anything obvious to me)
    BTW you have a typo in mpi config Numer of nodes as opposed to Number of nodes , it even shows in your example above.
    Otherwise I like what you’ve done, I’d just like it to work for me.
    Thanks,
    Pete

  18. February 5th, 2008 | 2:01 pm

    pete found the error… the image Ids he entered into the config module inadvertently contained a capital letter. This doesn’t cause any problems for starting images since string case is ignored by Amazon. The corresponding image id response string from AWS is always lowercase, so the python script comparison on image ID string fails.

    In the next version of the scripts, I will handle upper/lowercase differences in the ami strings. For now, just make sure to use all lower case or call the python .lower() method,

    >>> test = 'ami-fE9a7f97'
    >>> test.lower()
    'ami-fe9a7f97'
    >>> 
    
  19. pete
    February 5th, 2008 | 2:16 pm

    Found another typo too, ok I’m nit picking. In the stop-cluster script the message says Stoping as opposed to stopping. A year ago when you first posted this stuff you mentioned that the reason why the non-root user was called lamuser was that the scripts were used for LAM in some previous incarnation. Since I’m actually trying to use LAM, if you have any LAM stuff around that might help me to iron out one or two problems I still have.

    Anyway, thanks again,
    Pete

  20. February 5th, 2008 | 2:29 pm

    No problem, thanks for finding the typos. These were meant to be some quick hacks, but took on a life of their own after a while.

    I found this worked for configuring LAM, I’ll send you more details in an email…

    The contents of bash_profile should be as follows:

    -bash-3.1# more .bash_profile
    # .bash_profile
    
    # Get the aliases and functions
    if [ -f ~/.bashrc ]; then
            . ~/.bashrc
    fi
    
    # User specific environment and startup programs
    
    LAMRSH="ssh -x"
    export LAMRSH
    
    LD_LIBRARY_PATH="/usr/local/lam-7.1.2/lib/"
    export LD_LIBRARY_PATH
    
    MPICH_PORT_RANGE="2000:8000"
    export MPICH_PORT_RANGE
    
    PATH=$PATH:$HOME/bin
    
    PATH=/usr/local/lam-7.1.2/bin:$PATH
    
    MANPATH=/usr/local/lam-7.1.2/man:$MANPATH
    
    export PATH
    export MANPATH
    

    Launch the cluster on EC2 and try booting LAM manually:

    [lamuser@domU-12-31-33-00-04-4B ~]$ lamboot /etc/mpd.hosts
    
    [lamuser@domU-12-31-33-00-04-4B ~]$ lamnodes
    n0      domU-12-31-33-00-04-4B.usma1.compute.amazonaws.com:1:origin,this_node
    n1      domU-12-31-33-00-03-35.usma1.compute.amazonaws.com:1:
    n2      domU-12-31-33-00-03-3C.usma1.compute.amazonaws.com:1:
    n3      domU-12-31-34-00-00-55.usma2.compute.amazonaws.com:1:
    
    [lamuser@domU-12-31-33-00-04-4B ~]$ tping N -c3
      1 byte from 3 remote nodes and 1 local node: 0.039 secs
      1 byte from 3 remote nodes and 1 local node: 0.004 secs
      1 byte from 3 remote nodes and 1 local node: 0.002 secs
    
  21. raghav
    March 8th, 2008 | 9:39 pm

    Why does it ask me for a password when i try to run the ec2-mpi-config.py file.?
    it says root@xxx password:
    And I get a lot of text on the terminal when I try running the file.

  22. March 9th, 2008 | 1:16 am

    raghav,

    I assume you were able to start the instances with ec2-start-cluster.py? The text on the terminal is normal, but it shouldn’t ask you for a password (I should probably add a verbose option instead of streaming out text by default). There was a path issue on windows with an earlier version of the scripts, so that may be the problem.

    If you send me the script version number from the README and/or terminal output, I can try to track down what is going on…

    peter.skomoroch@gmail.com

    -Pete

  23. March 9th, 2008 | 3:07 am

    raghav,

    Another suggestion is to make sure the instances are running with ./ec2-check-instances.py and then retry the script, sometimes it takes a while for sshd to start up on EC2.

    -Pete

  24. raghav
    March 10th, 2008 | 1:18 am

    Hey guys,
    Actually I made a change in the ec2-mpi-cluster.py file. I have no clue about python and I dono why it worked but it worked.

    I modified:

    template = ssh -o “StrictHostKeyChecking no” %(user)s@%(host)s “%(cmd)s”
    to
    template = ’ssh -i “/home/id_rsa-gsg-keypair” %(user)s@%(host)s “%(cmd)s”

    and

    template = ‘%(cmd)s %(switches)s -o “StrictHostKeyChecking no” %(src)s %(user)s@%(host)s:%(dest)s’
    to
    template = ‘%(cmd)s %(switches)s -i “/home/id_rsa-gsg-keypair” %(src)s %(user)s@%(host)s:%(dest)s’

    And it started working perfectly fine. I was able to log in to the master node and the pi problem executed perfectly fine.

    Thanks a lot guys

    Cheers,
    Raghav

  25. raghav
    March 10th, 2008 | 1:20 am

    Thanks pete. For your prompt reply!!

  26. March 21st, 2008 | 10:38 pm

    Thanks Pete. I wish I had made the PyCon session, but these posts have been very helpful. The cluster went up pretty quickly and I have already used it to crunch a few minor data runs.
    In setting everything up I also ran into a similar problem as Raghav and ended up solving it in a similar manner by forcing the -i credentials switch. I imagine it has something to do with the way I configured and placed my certs.

  27. raghav
    March 24th, 2008 | 7:45 pm

    i am trying to compile a simple c mpi file “hellompi.c” using the command:

    mpicc -o /usr/hellompi /usr/local/src/hellompi.c

    why does it give me the following error?

    /usr/bin/ld: cannot open output file /usr/hellompi: Permission denied
    collect2: ld returned 1 exit status

    how do I get root priveledges?

  28. March 25th, 2008 | 11:21 am

    Raghav,

    You can ssh in as root instead of lamuser, or compile the output file into your home directory.

    Check out the new AMI and managment code:

    http://www.datawrangling.com/pycon-2008-elasticwulf-slides.html

    The new AMI includes a preconfigured NFS mounted directory /home/beowulf. If you compiled the file there, hellompi would be available on all nodes.

    Note that the new images default to the ‘large’ instance type which charges .40 cents/hour for each node.

    -Pete

  29. Patrick
    April 7th, 2008 | 9:41 am

    Peter,

    Very useful tool! I’ve gotten a cluster up and running using the small instance type but am having difficulty launching the _64 AMIs.

    $ ./ec2-start-cluster.py
    m1.large
    image ami-eb13f682
    master image ami-e813f681
    —– starting master —–
    Traceback (most recent call last):
    File “./ec2-start-cluster.py”, line 39, in ?
    master_response = conn.run_instances(imageId=MASTER_IMAGE_ID, minCount=1, maxCount=1, keyName= KEYNAME, instanceType=INSTANCE_TYPE )
    TypeError: run_instances() got an unexpected keyword argument ‘instanceType’

    If I try to start the cluster without passing an INSTANCE_TYPE arg I get the following:
    $ ./ec2-start-cluster.py
    m1.large
    image ami-eb13f682
    master image ami-e813f681
    —– starting master —–
    InvalidParameterValue: The requested instance type’s architecture (i386) does not match the architecture in the manifest for ami-e813f681 (x86_64)
    —– starting workers —–
    InvalidParameterValue: The requested instance type’s architecture (i386) does not match the architecture in the manifest for ami-eb13f682 (x86_64)

    Any ideas? Thanks!

  30. April 7th, 2008 | 1:58 pm

    Patrick,

    Did you start with a clean install of the 64 bit scripts? I made some changes to EC2.py in the new scripts to handle the new instance types…

  31. April 7th, 2008 | 4:02 pm

    Peter:

    I am diving into Hadoop with Map/Reduce as we speak. As you know Google implemented its environment in C++, so I was a bit disappointed that Hadoop had chosen Java VM to do its bidding. Java makes interfacing with hardcore numerical operations much harder. The particular problems I am looking at are large scale Lanczos solvers to find eigen values/vectors of large systems of equations. These systems are of interest in advertising, quantitative finance, and sensor networks. Problem is that they all are environments in which latency is of the essence. So you have a capacity component in terms of the size of the system and a latency issue in terms of the data rate coming in and the opportunity cost for somebody to get to the answer faster.

    I would be interested in working on this particular benchmark problem: pick a big eigen value/vector problem and solve it on a cluster, EC2, and via Hadoop/Map-reduce. Clearly this is going to be a lot of work so this should be publishing worthy. I am sure many folks would be interested in this experiment, so let me know if this is something that could invest time in.

    Theo

  32. Patrick
    April 11th, 2008 | 9:28 pm

    Thanks, Peter. The original EC2.py was the problem. I now have the large AMIs up and running. Thanks again for the article and help!

    Patrick

  33. June 27th, 2008 | 11:20 pm

    I found the secret to avoiding a lot of MPI errors on EC2, but haven’t found time to do an additional post…

    The secret seems to be that just because Amazon says that an instance is “running”, doesn’t mean that the ssh daemons are available. This caused all kinds of intermittent problems setting up the hosts and my old scripts would fail silently.

    In my current codebase, I do some checks like the following:

        print "Instance is %s" % BOOTING_INSTANCE
    
        # wait for instance description to return "running" and grab HOSTNAME variable
        print "Polling server status (ec2-describe-instances %s)" % BOOTING_INSTANCE
        while 1:
          print "waiting for instance to boot..."
          HOSTNAME = commands.getoutput("ec2-describe-instances %s | grep running | awk '{print $4}'" % BOOTING_INSTANCE)
          if len(HOSTNAME) > 1:
            print "-------Instance booted, The server is available at %s" % HOSTNAME
            DOM_NAME = commands.getoutput("ec2-describe-instances %s | grep running | awk '{print $5}'" % BOOTING_INSTANCE).split('.')[0]
            break
          time.sleep(1)    
    
        # sometimes it takes a while for the ssh service to start, even when the ec2 api describes an instance as running.
        # A machine in the "running" state may not have finished booting. Try executing a no-op command until a valid response is found
        print "verifying ssh daemon has started..."
        counter=0
        while 1:
          print "Waiting for ssh daemon to start..."
          counter += 1
          REPLY = commands.getoutput('''ssh %s "root@%s" 'echo "hello"' ''' % (SSH_OPTS, HOSTNAME) )
          if REPLY == 'hello':
            print "-------ssh has started, proceeding with AMI build"
            break
          if counter > 24:
            print "Instance not respoding to SSH hails, aborting..."
            ## sshd should not take more than 2 minutes to launch
            terminate_status = commands.getoutput('ec2-terminate-instances %s' % BOOTING_INSTANCE)
            ec2_launch_failed = True
            print "Base Instance terminated"
            break
          time.sleep(5)
    
        if ec2_launch_failed:
            print "Aborting build"
            return
    
  34. June 27th, 2008 | 11:32 pm

    @Theo,

    I’m attending MMDS this week at Stanford (http://www.stanford.edu/group/mmds/), and had a chance to ask James Demmel a few questions. He gave a talk titled “Avoiding communication in linear algebra algorithms”, which was very relevant. His advice for matrix multiplication in a high latency environment like EC2 was to try dialing up the block size as much as possible in the standard MPI solvers and see how performance was affected.

    -Pete

  35. magg
    September 11th, 2008 | 5:39 pm

    Hi Peter,

    Have you tried to connect EC2 instances with your local desktops? I am trying to do that with mpich2 1.0.7 but I am not successful at all. mpdboot complains about invalid port info (no_port) - actually no port when I try to do mpdboot -n 2. Even when I tried to mpd& on EC2 machine and then mpdtrace -l and then unblock the port and then mpd -h ec2-blabla -p ec2-mpdtrace-l-port still I have no luck. Have you faced similar problems?

    Thanks
    - magg

  36. September 11th, 2008 | 5:49 pm

    Magg,

    I wouldn’t recommend it, the latency would be huge and I’m not sure how MPI would handle that. You would also need to open the mpi ports to the outside world using the EC2 security group authorize commands.

    An alternative is to open an X11 session and connect to the head node or maybe VNC in to the instance. The 64 bit elasticwulf images are set up for X11 sessions and adding a desktop package would allow you to VNC in if you prefer that route.

    -Pete

  37. Tim Salimans
    September 29th, 2008 | 11:25 am

    Great project and thanks very much for sharing! I do have some trouble getting it all to work though. Everything works fine until it tries to run the create_hosts.py:

    /////// OUTPUT ///////////////

    Creating hosts file on master node and copying hosts file to compute nodes…

    pscp -scp -i D:\grid\keys\keypair.ppk -q create_hosts.py root@ec2-67-202-19-253.
    compute-1.amazonaws.com:/etc/

    plink -ssh -i D:\grid\keys\keypair.ppk root@ec2-67-202-19-253.compute-1.amazonaw
    s.com “python /etc/create_hosts.py”

    exporting 10.252.31.48:/home/beowulf
    exporting 10.252.31.48:/mnt/data
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @ WARNING: UNPROTECTED PRIVATE KEY FILE! @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    Permissions 0644 for ‘/root/.ssh/id_rsa’ are too open.
    It is recommended that your private key files are NOT accessible by others.
    This private key will be ignored.
    bad permissions: ignore key: /root/.ssh/id_rsa
    Permission denied, please try again.
    Permission denied, please try again.
    Permission denied (publickey,gssapi-with-mic,password).
    lost connection
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @ WARNING: UNPROTECTED PRIVATE KEY FILE! @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    Permissions 0644 for ‘/root/.ssh/id_rsa’ are too open.
    It is recommended that your private key files are NOT accessible by others.
    This private key will be ignored.
    bad permissions: ignore key: /root/.ssh/id_rsa
    Permission denied, please try again.
    Permission denied, please try again.
    Permission denied (publickey,gssapi-with-mic,password).
    lost connection

    etcetera

    //////////////////////////////////

    As you can see I made some small modifications in order to use PuTTy as my SSH client, but that does not seem to be the problem… Does anyone else have this problem, and does anyone know how to fix it?

  38. Tim Salimans
    October 1st, 2008 | 4:33 am

    Got it working using OpenSSH, guess PuTTy was the problem after all.

  39. jjiyunlee
    October 19th, 2008 | 6:47 pm

    Hi,

    Thanks for your writeup! It’s very helpful. I’m running into an error with mpdtrace and was hoping for some of your insight into it. I am running mpd as root, with one node for simplicity.

    I can successfully start up mpd on the instance and “mpd &”:
    root@…:/etc# mpdboot -n 1 -f mpd.hosts
    root@…:/etc# mpd &
    [1] 2280

    but “mpdtrace -l” gives me an error:
    root@ip-10-251-143-0:/etc# mpdtrace -l
    mpdtrace: unexpected msg from mpd=:{’error_msg’: ‘invalid secretword to root mpd’}:

    I have tried all pairwise combinations of having MPD_SECRETWORD= or secretword= in ~/.mpd.conf and /etc/mpd.conf, all of which were set to read/write for root only.

    I also can’t do “mpdallexit”:
    I can’t mpdallexit:
    root@…:~# mpdallexit
    mpdallexit: mpd_uncaught_except_tb handling:
    : ‘cmd’
    /usr/local/bin/mpich2-install/bin/mpdallexit 53 mpdallexit
    elif msg[’cmd’] != ‘mpdallexit_ack’:
    /usr/local/bin/mpich2-install/bin/mpdallexit 59
    mpdallexit()

    I can also run mpdcheck as a server and have it listen for mpdcheck as a client from the same instance (in a different window).

    Suggestions/help? I’d greatly appreciate any advice you have on this problem. Thanks –

    • Joanne
  40. October 19th, 2008 | 10:58 pm

    Joanne,

    Try logging in and running your commands as “lamuser” instead of root. The default configuration assumes lamuser is running all commands.

    $ ssh lamuser@ec2-72-44-46-78.z-2.compute-1.amazonaws.com

    See part 1 of the post for details on changing the configuration to run MPI as root.

    -Pete

Leave a reply