Running LAM MPI
From BCCD 3.0
Contents |
Setting LAM as the Default MPI Environment
LAM is a great MPI environment. Unfortunately, it's not the default MPI environment used on the BCCD. The reason for this is simple: LAM lost the coin toss to MPICH when the BCCD was first created. The problem historically with having LAM-MPI and MPICH in the same environment is one of libraries, default executables, consistency across hosts, and the expectations of the end user.
Switching the default environment from MPICH to LAM is easy to do, but one needs to be completely thorough. In other words, if the systems are not completely transitioned to use LAM, the resulting environment will be very, very broken.
First, see what your default MPI is:
Currently Loaded Modulefiles: 1) modules 7) gromacs/3.3.3_openmpi 2) PSC_DX/04.3.3000 8) intelcc/11.0.074 3) cuda/0.2.1221 9) jre/1.6.0_07 4) dreamm/4.1.6 10) mcell/3.1 5) fftw/3.1.2 11) mpe/1.9.1 6) openmpi/1.2.6 12) xmpi/2.2.3b8
It's not LAM, so unload it and put in LAM:
module unload openmpi && module load lam
Next, open ~/.bash_profile and change this line
module load PSC_DX cuda dreamm fftw openmpi gromacs intelcc jre mcell mpe xmpi
to
module load PSC_DX cuda dreamm fftw mpich2 gromacs intelcc jre mcell mpe xmpi
Booting LAM MPI
LAM-MPI requires a file consisting of a list of current nodes to boot. Make sure that every node has started pkbcast, bccd-allowall, and bccd-snarfhosts, as discussed in Booting up the CD. The bccd-snarfhosts command should generate the appropriate machines file, in the user bccd's local directory. This file contains a list of active nodes, and is exactly what LAM needs. Issue the following command to verify that the cluster is bootable:
If the command is successful, you should see the message below:
To actually start LAM on the specified cluster, issue the following:
If you don't see any error message, then you can now run MPI programs under LAM. Gravy!
Compiling and Running MPI Programs with LAM
To find out how to compile and run sample MPI programs, take a look at Compiling and Running. Remember, the example programs for LAM are in the directory ~/lam-mpi/examples. The examples are sorted inside directories. You may go into each directory to compile and run each program using the familiar mpicc and mpirun commands.
Shutting Down LAM
Cleaning LAM
Instead of lambooting after each MPI run, we can issue a lamclean command to remove all user processes and messages:
After doing this, we can mpirun another program.
Halting LAM
After we are all done, the lamhalt command removes all traces of the LAM session on the network.
And just in case...
In the case of a catastrophic failure (i.e., one or more LAM nodes crash), we can issue a wipe command to halt everything instead of issuing lamhalt.
