PBS/Torque

From BCCD 3.0

Jump to: navigation, search

Contents

Introduction

PBS/Torque combines a job scheduler (a program that accepts, queues, and dispatches user jobs to execution nodes) with a resource manager (a program that interprets resource requirements for a job and finds execution nodes satisfying those requirements). BCCD ships with PBS/Torque configured out of the box, and it works in both live and liberated modes.

Local BCCD customizations

BCCD runs a stock PBS/Torque install from Debian, except for the /etc/cron.d/add-pbsnodes cron job, which will automatically add any nodes returned by bccd-snarfhosts. The file /etc/bccd-exclude-pbsnodes contains a newline-delimited list of nodes that should not be in the PBS batch queue (by default, just node000).

User recipes

How many nodes do I have?

After running qnodes, you should see output like this:

node000
     state = free
     np = 2
     ntype = cluster
     status = opsys=linux,uname=Linux node000.bccd.net 3.6.0bccd-00748-g657543a #6 SMP PREEMPT Sat Apr 20 22:21:39 EDT 2013 x86_64,
     sessions=4520,nsessions=1,nusers=1,idletime=0,totmem=3349212kb,availmem=2883832kb,physmem=3349212kb,
     ncpus=2,loadave=1.77,netload=609596,state=free,jobs=,varattr=,rectime=1376854118

node009
     state = free
     np = 2
     ntype = cluster
     status = opsys=linux,uname=Linux node009.bccd.net 3.6.0bccd-00748-g657543a #6 SMP PREEMPT Sat Apr 20 22:21:39 EDT 2013 x86_64,
      sessions=4330,nsessions=1,nusers=1,idletime=14776,totmem=3349212kb,availmem=3145728kb,physmem=3349212kb,
      ncpus=2,loadave=0.13,netload=297141,state=free,jobs=,varattr=,rectime=1376854119

This shows you have two nodes (node000 and node009), each with two CPUs (np = 2), and a bit over 2.5GB of memory available (availmem=).

Hello world! (aka, What does a simple PBS submit script looks like?)

At its simplest, a submit script could be as short as this:

#PBS -N HelloWorld
#PBS -l nodes=1:ppn=1,mem=500m,walltime=1:0:0

echo Hello World!

sleep 30

This will run a job called HelloWorld, that will run on one node, with one CPU on that node, 500MB of reserved memory, and run for at most one hour. To submit, save the above script as something like helloworld.pbs, and then submit:

bccd@node000:~$ qsub hello-world.pbs 
2.node000.bccd.net

The number you get before the hostname is your job ID. Standard output and standard error from the job will be copied to the directory that you submitted the job from, with file names starting with job-name.e|ojob-id. For instance, this job was submitted from the bccd home directory (~) and the output will end up in the files HelloWorld.o2 and HelloWorld.e2

Which jobs are running?

Use the qstat -q command. For instance, this demonstrates how the queue looks empty and with one job running in it:

bccd@node000:~$ qstat -q

server: node000

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
batch              --      --       --      --    0   0 --   E R
                                               ----- -----
                                                   0     0
bccd@node000:~$ qsub hello-world.pbs 
2.node000.bccd.net
bccd@node000:~$ qstat -q

server: node000

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
batch              --      --       --      --    1   0 --   E R
                                               ----- -----
                                                   1     0

How do I add node000 as an execution host?

  1. As root, open the /etc/bccd-exclude-pbsnodes file in your favorite editor
    1. For example, sudo nano /etc/bccd-exclude-pbsnodes
  2. Remove the line containing node000
  3. Save and exit

Helpful commands

#PBS -N test
echo "PBS TEST on $(hostname)"
sleep 30

Install notes

  1. Install these packages form Debian:
    1. torque-client
    2. torque-common
    3. torque-mom
    4. torque-scheduler
    5. torque-server
  2. Make sure you have the /etc/init.d/bccd-torque init script available. This script will setup the infrastructure of Torque (queue, scheduling options, etc.)
  3. Make sure you have the /etc/cron.d/add-pbsnodes cron job enabled. This will automatically add and remove nodes based on bccd-snarfhosts and pbsnodes output
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox