HPL

From BCCD 3.0

Revision as of 22:11, 22 April 2010 by Fitz (Talk | contribs)
Jump to: navigation, search

Contents

HPL

The High Performance Linpack (HPL) benchmark is the tool used to calculate the performance of a distributed memory computer. It is the software package from which the numbers listed on the top500 list are derived.

The basic BCCD image is not distributed with HPL or the requisite linear algebra libraries. This page will describe the process for compiling and running the HPL benchmark with the BCCD.

HPL as pre-loaded on the BCCD

As of March 25, 2010, the HPL source and build scripts tailored to the BCCD are included in the bccd user's home directory. To build and run, follow these simple steps:

$ cd ~/hpl
$ make
$ /bin/bash hpl.run

Note, however, that this will run HPL over all of the nodes in the current BCCD cluster. So if you're in a lab with other students, be kind to them and edit the hpl.run script to reduce the number of processes in use.

HPL from scratch

Prerequisites

  1. HPL
    1. As of this writing, version 2.0 (September 10, 2008) was the most recent stable version, and was used below.
  2. A BLAS (Basic Linear Algebra Subprograms) implementation such as ATLAS
    1. As of this writing, version 3.8.3 (February 18, 2009) was the most recent stable version, and was used below.
  3. The BCCD

Compiling

The compiling stage can take a very long time, depending on your hardware. The ATLAS configure/compile scripts run a large suite of tests to determine the best configuration for your system. On a 1.6GHz dual-core Atom with 2GB of RAM, this stage took a number of hours.

$ mkdir hpl
$ tar xf atlas3.8.3.tar.gz
$ tar xf hpl-2.0.tar.gz
$ cd hpl/ATLAS
$ mkdir Linux_Atom330          # Typically this is <OS>_<Architecture>
$ cd Linux_Atom330
$ ../configure -b 32 \         # Currently the BCCD only supports 32-bit
   -t -1 \                     # -1 tells ATLAS to try to autodetect the number of threads to use
   -Si cputhrchk 0 \           # Do not check for CPU throttling
   --prefix=$HOME/hpl/atlas \  # Could be anywhere, but note this path, we'll use it later
   --nof77 \                   # Don't worry about FORTRAN
   --cc=/usr/bin/gcc \         # Use gcc
   -C ic /usr/bin/gcc          # Really, use gcc (see doc for explaination)
$ make build && make check && make time && make install
 $ ls ~/hpl/atlas/lib ~/hpl/atlas/include
 /bccd/home/bccd/hpl/atlas/include:
 atlas  cblas.h	clapack.h

 /bccd/home/bccd/hpl/atlas/lib:
 libatlas.a  libcblas.a	libf77blas.a  liblapack.a  libptcblas.a  libptf77blas.a
 $ cd ~/hpl/hpl-2.0
$ cp setup/Make.Linux_PII_CBLAS .
 TOPdir       = $(HOME)/hpl/hpl-2.0

 MPdir        = /bccd/software/openmpi-1.2.9
 MPlib        = $(MPdir)/lib/libmpi.so

 LAdir        = $(HOME)/hpl/atlas
 LAinc        = $(LAdir)/include
 LAlib        = $(LAdir)/lib/libcblas.a $(LAdir)/lib/libatlas.a

 HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) -I$(LAinc) $(MPinc)
 LINKER       = /usr/bin/gcc
$ make arch=Linux_PII_CBLAS
$ cd bin/Linux_PII_CBLAS
$ ls
HPL.dat  xhpl

Running

HPLinpack benchmark input file
Cluster Computing Group, Earlham College
HPL.out      output file name (if any)
1            device out (6=stdout,7=stderr,file)
4            # of problems sizes (N)
5000 10000 15000 20000 Ns
8            # of NBs
32 64 96 128 160 192 224 256      NBs
0            PMAP process mapping (0=Row-,1=Column-major)
2            # of process grids (P x Q)
2 1          Ps
6 6          Qs
16.0         threshold
3            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
2            # of recursive stopping criterium
2 4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
3            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
$ mpirun -np 12 --hostfile ~/machines ./xhpl
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00R2L4       15000    64     2     6             809.16              2.781e+00
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0034938 ...... PASSED

Links

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox