Running MPI Python Code on Yale Omega
Yesterday I got my Parallel Tempering MCMC code working on Yale’s Omega Cluster. I found the “documentation” to be far out of date, and not very helpful. Fortunately, there were several people in the department that could help me out (thanks Kaylea, Duncan, and Andys!!). In the hopes of easing the reducing the setup time for others that may be interested in using Omega for their research, I decided to write this blog post detailing my setup.
###Welcome Email
When your account is first created, you will receive an email that starts off like this:
Welcome to Omega - Yale High Performance Cluster
An account has been created for you on omega.hpc.yale.edu.
Details about the cluster and its usage can be found at
- http://hpc.research.yale.edu/wiki/index.php/Omega
Before you can login you will need to create and upload your ssh key here:
- http://gold.hpc.yale.internal/cgi-bin/sshkeys.py
For additional information about ssh please visit:
- http://hpc.yale.edu/faq/secure-shell-faq/
There are several queues to choose from, each serving a different purpose
- http://hpc.research.yale.edu/wiki/index.php/Omega#FAS_Queues
When submitting jobs, some of the qsub terms have changed; most importantly when selecting the number of nodes or number of processors:
- http://hpc.research.yale.edu/wiki/index.php/Omega#Scheduling_your_Programs_to_run
####SSH Keys The first thing to note that is not mentioned in the welcome email message (and I could not find mentioned in the online documentation) is that the link given to upload your SSH key does not work in the Safari browser. Use Google Chrome.
####SSHing Into Omega Next, to SSH into omega, I used the command:
####Creating a Test Script Now comes the fun stuff. All jobs submitted for processing need to be wrapped in a shell script. Before you do anything else, create a test script and see if it works. The sample script on the HPC site is out of date and results in error messages. Below is a sample script that works (as of March 12, 2015).
Contents of my_test_script.sh
:
Most of what I wrote above is probably self-explanatory through the comments.
Yes, the ‘#’ signs should be in front of the PBS commands. The $PBS_JOBNAME
and $PBS_JOBID
are handy variables that can be used to ensure your output is
printed to unique directories (i.e., you’re not overwriting previous results).
To submit this job, type the following at the command line:
You should receive an email message when your script has begun execution. You
can see the status of it by typing showq
. The list is usually quite long, so
you might find it useful to pipe the results to grep and search for just lines
that contain your netid:
Note that this test script is simply printing the date, so it won’t take long to run, and you should receive a finished email with exit 0 status shortly after receiving the start email message.
If all went well, congratulations!
###Setting up Python and MPI on Omega Now comes the fun part. Omega uses a module system. To print all available modules in the Terminal window type the following:
To find specific modules, use the modulefind
command:
As can be seen in the included output above, modulefind
is case insensitive.
There are many versions of python available, but the only packages that I
wanted to uses were numpy and scipy.
####Loading Modules and Adding Python Packages Using pip
Adding more packages to your path can be a little tricky.
I wanted to use python 2.7.9, but when I loaded that module,
the pip
command was not in my path. I loaded several others, and it looks
like the most recent python 2 version to include pip
on Omega is 2.7.3.
Another problem is that you won’t have write access to install packages to
the default site-packages directory, so you will need to create a subdirectory
in your home directory, and specify an optional argument to pip telling it
where to install the libraries you need:
###Setting up your .bashrc file
Now that we have the python libraries we want to use with our code, we can modify our bash startup file to load the other modules we want and modify our python path to include the libraries we installed in ~/local:
###Testing the real code
That should be everything. You should test your code on a single node before
attempting to run it on many cores. fas_devel
, as shown in the test script
above, is the queue you want to use for that. To see what other queues are
available click here. Note there are 8 cores per node for
Omega, and 36 GB of RAM per node, as mentioned in the Hardware
section.
I hope you found this post on getting started with MPI and python on the Yale Omega cluster helpful!