------------------------------------- How to use mpirun in the CS cluster (a quick guide) ------------------------------------- WARNING: This guide can have lots of typos, misleading advices, spelling/grammar mistakes.... it is not even polite. so, be careful if you still want to read it. Introduction. ------------- The machines in the CS department have installed an implementation of MPI called MPICH. You can check what it is going to /opt/mpich You can install another implementation in your home account, but believe me, you don't want it. Reading about MPI. ------------------- If you want to learn more about MPI, MPICH and related stuff, you can start going to: http://www.mpi-forum.org/ http://www-unix.mcs.anl.gov/mpi/mpich/ and reading/compiling/running the examples at /opt/mpich/examples Before compiling the examples ------------------------------- 1. Setup your ssh correctly. Check what files do you have in ~/.ssh/ Do not copy the files. It is dangerous. For your security and for the cluster security. Go to your home directory (this is important!), and there start to type the commands specified at http://www.cs.uchicago.edu/info/services/ssh in the section: "How to ssh to other CS machines without a password" If you can ssh to any other linux machine in the CS cluster without typing your password you are done. You should have in the ssh directory the following files: authorized_keys id_rsa id_rsa.pub All right? good! go to step 2. If you have failed I think it would be good to erase the garbage you did. (Hey! I am not responsible for the files you delete!) Hope you did not forget what you had before!! Now repeat the process. Do it right this time :) 2. Copy the contents of the examples directory into a directory in your account space 3. Setup a list of machines in which you want to run your programs Initially you can write a text file with the following lines: abacus.cs.uchicago.edu accessory.cs.uchicago.edu akash.cs.uchicago.edu berlioz.cs.uchicago.edu gareth.cs.uchicago.edu If you want a list of available machines, follow the instructions in the course webpage ADVANCED FEATURES. Some machines have 2 processors. If you want to run one process in each processor, you have to tell mpirun to do it. How? in the previous file write ":2" in front of the name of the machine In example, the machine "gareth.cs" has two processors, so you can write abacus.cs.uchicago.edu accessory.cs.uchicago.edu akash.cs.uchicago.edu berlioz.cs.uchicago.edu gareth.cs.uchicago.edu:2 If you want to check a machine specification, execute: cat /proc/cpuinfo Compiling the examples at /opt/mpich/examples --------------------------------------------- You can write programs in Fortran, C and C++ using MPI. So, MPICH has compiler for each of these languages. Fortran77 mpif77 Fortran90 mpif90 C mpicc C++ mpiCC as far as I know there is no MPI implementation for Scheme...I'm sorry folks :) Compiling is quite easy from the command line. Use the same flags you usually use for gcc, g++ and I guess it should be the same for fortran. ...well, I don't know about the huge list of strange flags :( Check the Makefile in the examples directory if you want a more detailed view Now, if you want to compile the examples, at least the first time, you should run make, and check what it does :) Running the examples ------------------------------ Now that everything compiles without errors (I hope so), you can run your examples. Start with a "hello world" example. It's the tradition! :) Running MPI programs is a bit weird... just type: mpirun -machinefile -np <# of processes> where is the file with the list of machines that you wrote before. If your file is called machines.linux and you want to run your program "hello" in 4 machines, type: mpirun -machinefile machines.linux -np 4 hello Now what? --------- Now you can start writing your own programs. ah, remember: since your programs are going to die (I'm sorry, but it always happen) this time you are going to generate zombie processes in every machine you run your program. Nobody is going to clean it for you. Cleaning zombie processes in a cluster is painful if you want to do it by hand, so try to write your own scripts for remote process killing :) You can start using this (I am almost illiterate in shell programming, so I don't event know if this works correctly) This script works in some Debian 3.0 machines, but not everywhere The script name is mpikill.sh --------------------------------------------------------------------------- #/bin/sh # # An script to kill remotely processes run (possibly with mpirun) # # USE: mpikill.sh # where # is a file with the list of machines # is the name of the program to kill in # for node in $(cat $1) do echo $node PROCESSES=$(rsh $node ps x | grep $2 | awk 'Begin {} {printf("%d ",$1)} END {}') rsh $node kill -9 $PROCESSES echo "Killed" done ----------------------------------------------------------------------------- This is a simple script to run a command in many machines ------------------------------------------------------------- #/bin/sh # # An script to kill remotely processes run (possibly with mpirun) # # USE: remote.sh # where # is a file with the list of machines # is the command you want to run # for node in $(cat $1) do echo $node rsh $node $2 done ------------------------------------------------------