Running LAM MPI

From BCCD 3.0

Revision as of 04:14, 31 March 2009 by Skylar (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Setting LAM as the Default MPI Environment

LAM is a great MPI environment. Unfortunately, it's not the default MPI environment used on the BCCD. The reason for this is simple: LAM lost the coin toss to MPICH when the BCCD was first created. The problem historically with having LAM-MPI and MPICH in the same environment is one of libraries, default executables, consistency across hosts, and the expectations of the end user.

Switching the default environment from MPICH to LAM is easy to do, but one needs to be completely thorough. In other words, if the systems are not completely transitioned to use LAM, the resulting environment will be very, very broken.

First, see what your default MPI is:

 Currently Loaded Modulefiles:
 1) modules                 7) gromacs/3.3.3_openmpi
 2) PSC_DX/04.3.3000        8) intelcc/11.0.074
 3) cuda/0.2.1221           9) jre/1.6.0_07
 4) dreamm/4.1.6           10) mcell/3.1
 5) fftw/3.1.2             11) mpe/1.9.1
 6) openmpi/1.2.6    12) xmpi/2.2.3b8

It's not LAM, so unload it and put in LAM:

  module unload openmpi && module load lam

Next, open ~/.bash_profile and change this line

  module load PSC_DX cuda dreamm fftw openmpi gromacs intelcc jre mcell mpe xmpi


  module load PSC_DX cuda dreamm fftw mpich2 gromacs intelcc jre mcell mpe xmpi

Booting LAM MPI

LAM-MPI requires a file consisting of a list of current nodes to boot. Make sure that every node has started pkbcast, bccd-allowall, and bccd-snarfhosts, as discussed in Booting up the CD. The bccd-snarfhosts command should generate the appropriate machines file, in the user bccd's local directory. This file contains a list of active nodes, and is exactly what LAM needs. Issue the following command to verify that the cluster is bootable:

Type "recon -v ~/machines" at the command prompt

If the command is successful, you should see the message below:

The success message for the recon command, beginning with, "Woo hoo! recon has completed successfully."

To actually start LAM on the specified cluster, issue the following:

Type "lamboot -v ~/machines" at the command prompt

If you don't see any error message, then you can now run MPI programs under LAM. Gravy!

Compiling and Running MPI Programs with LAM

To find out how to compile and run sample MPI programs, take a look at Compiling and Running. Remember, the example programs for LAM are in the directory ~/lam-mpi/examples. The examples are sorted inside directories. You may go into each directory to compile and run each program using the familiar mpicc and mpirun commands.

Shutting Down LAM

Cleaning LAM

Instead of lambooting after each MPI run, we can issue a lamclean command to remove all user processes and messages:

Type "lamclean -v" at the command prompt

After doing this, we can mpirun another program.

Halting LAM

After we are all done, the lamhalt command removes all traces of the LAM session on the network.

Type "lamhalt" at the command prompt

And just in case...

In the case of a catastrophic failure (i.e., one or more LAM nodes crash), we can issue a wipe command to halt everything instead of issuing lamhalt.

Type "wipe -v machines" at the command prompt
Personal tools