Short course for use of the PBS system

by David E. Konerding, Ph.D. and Conrad Huang

In the past CGL has provided a command called "submit" (type "man submit" to learn more about it) which was used for running long-term jobs. submit allows users to submit a job and log out, while the job continues running at lowered priority. With the advent of the multi-node socrates cluster, a more sophisticated job submission process is in place: the "OpenPBS" batch queueing system. We recommend submitting jobs that require over an hour of processing be run in the batch queueing system.

The "submit" command has been replaced with a script that emulates the old behavior, but we strongly recommend that you use the new "qsub" command instead to maximize the functionality of PBS.

PBS configuration on Socrates

There are five nodes on the socrates cluster: adenine, cytosine, thymine, uracil and guanine. The first four are Alpha ES45 nodes with four CPUs and 16GB of memory; guanine is an Alpha ES40 node with four slower CPUs and 4GB of memory. Adenine is typically the interactive login node, and as such, only two of its CPUs are available to PBS; cytosine is typically the web server and only three of its CPUs are available to PBS; both uracil and thymine have all four CPUs available to PBS; guanine, because it is older and slower, is typically not used by PBS.

There are two commonly used queues for executing jobs: "regular" and "verylong". The "regular" queue is for jobs that require less than 24 hours of CPU time, while "verylong" can handle jobs requiring up to 28 days of CPU time. Note that CPU time differs from wall clock time, with wall clock time being typically longer (i.e., your job will not get 100% of the CPU all the time, so it takes longer to run than actual CPU time). An exception to this rule occurs when your job uses multiple processors (e.g., running "blast" with "-p 3" uses three CPUs for the job); in this case, the wall clock time may be as low as one third of the CPU time because

wall clock time ~= (total CPU time) / (number of active CPUs)

There may be filled with up to seven active jobs in the "regular" queue and up to five active jobs in the "verylong" queue. Subsequent jobs are placed in a "wait" state and only started when an active job terminates. There is no user-specific limits on the number of jobs, either active or submitted. However, we expect users to demonstrate discretion when using PBS. Running a few active job at once when usage is light is probably fine; submitting 50 jobs at once and locking out other users will result in draconian actions by CGL staff.

Both the "regular" and "verylong" queues only handle jobs that run on a single node. If you need to run a multi-node job, please contact CGL staff.

How do I submit a job in PBS?

You create a job script file (in this example, the file is named 'script', but you can call it whatever you want) which contains the commands to run your job, and then run the command (the "> " represents the shell prompt, you do not need to type it)

> qsub script

Here is an example job script file which simply echos a string to the standard output:

## The next line is an instruction to PBS: it tells PBS to email
## you when your job "b"egins, "a"borts, and "e"nds.
#PBS -m bae
echo "Hello world!"
## end of batch script

If you create the file "script" with the contents above, and "qsub" it, you will almost immediately get an email message saying that the job has been started, and almost immediately after, that the job has completed. If a lot of other people have jobs running, your job might not start immediately. You can check on the status of your job by using the "qstat" command:

> qstat

Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
7998.socrates    script           dek              11:50:10 R verylong

In this case, only one job is running: its name is "7998.socrates", it's being run for user "dek", and has been running for 11 hours and 50 minutes. The "state" of the job is "R" which is short for running. If the job has not yet started running, the state will be "Q" (for queued). If you happen to type "qstat" just as your job is finishing, the state will be "E" (for ending).

How do I submit a job in the "verylong" queue?

In the example above, there is no mention of queues, either "regular" or "verylong". That is because PBS, without user hints, assumes that your jobs typically run less than 24 CPU hours and always places them into the "regular" queue. To run a job in the "verylong" queue, you need to tell PBS that the approximate run time is more than the default 24 CPU hours.

## The next line is an instruction to PBS: it tells PBS
## that your job will take 40 hours, or 240 minutes.
#PBS -l cput=40:00:00
#PBS -m bae
date
ls
date
## end of batch script

PBS will place the job in the "verylong" queue and execute it. Notice that the job need not actually take 40 hours (the job above will take almost no time at all), but the CPU time estimate will force it into the "verylong" queue. If a submitted job were to run for entire estimated required CPU time, it would be terminated by PBS, so it is always better to overestimate the CPU time requirement than to underestimate.

How do I submit a sophisticated job in PBS?

The examples above are a very simple situation. When you want to run a more sophisticated, one that perhaps needs multiple processors, or a large amount of memory, or needs to run for a very long time, you'll need to ask PBS to make sure your jobs is submitted to the appropriate queue. Let's examine a situation where you want to run BLAST with 3 processors so that you get your results as quickly as possible:

The command line you would type to do such a search might look like:

> blastall -a 3 -d nr -p blastp -i myseq.fa -o hits.out

The -a 3 flag tells blast to run with 3 CPUs. -d nr uses the nonredundant database. -p blastp tells blast it's a protein sequence. -i myseq.fa uses a query sequence stored in the file myseq.fa. -o hits.out saves the output to the file hits.out.

When you submit a job to PBS, by default you get to use only 1 CPU. But there is a way to ask PBS you want to use 3 CPUs on one node. The job script would look like this:

#PBS -m e
#PBS -l nodes=1:ppn=3,cput=36:00:00
blastall -a 3 -d nr -p blastp -i myseq.fa -o hits.out
## end of batch script

As you can see the job script file has a new line:

#PBS -l nodes=1:ppn=3,cput=36:00:00

This says to PBS: "I am requesting that my job run on 1 node, with 3 processors per node" and the job will take up to 36 CPU hours. PBS will automatically route this job to the "verylong" queue because the "regular" queue only handles jobs taking up to 24 hours of CPU time. If your job will take more than 24 CPU hours and you omit the "cput" estimate, then it will be run in the "regular" queue and be terminated after 24 CPU hours.

Why does my job run in my home directory rather than the directory where I submitted the job?

This is a commonly asked question. The answer is very simple: PBS allows you to submit jobs from one machine to run on another machine which might not have access to the directory from which you submitted the job. However, on the socrates cluster, you always have access to the directory from which you submitted the job. So, add the following line before the blast command:

cd $PBS_O_WORKDIR

now your script will look like:

#PBS -m e
#PBS -l nodes=1:ppn=3
cd $PBS_O_WORKDIR
blastall -a 3 -d nr -p blastp -i myseq.fa -o hits.out
## end of batch script

This will cause your script to go to the directory that you submitted the job from.

I've got an advanced program that can use multiple CPUs, and I want it to work well with PBS.

This is a bit more complex than any of the examples provided earlier. Let's say you have a program which, when run, takes a list of hosts on the command line and uses the rsh command to open remote shells on those hosts to do some multiple processing. PBS provides an environment variable which names a file which lists the hosts assigned to a job. Let's say you add the following PBS request to your job script file:

#PBS -l nodes=1:ppn=3

When run, the job script will be executed on one of the hosts, and it will have an environment variable called PBS_NODEFILE. If you look at the contents of the file listed in PBS_NODEFILE it might look like this:

> cat $PBS_NODEFILE

thymine.cgl.ucsf.edu
thymine.cgl.ucsf.edu
thymine.cgl.ucsf.edu

As you can see, thymine have been assigned to the job, and is listed 3 times. This is because we asked for 3 processors. Since your program just takes a list of hosts on its command line, your job script could have the following:

myprogram `cat $PBS_NODEFILE`

this would expand to:

myprogram thymine.cgl.ucsf.edu thymine.cgl.ucsf.edu thymine.cgl.ucsf.edu

and your program would end up using 3 cpus on thymine.

Technically speaking, this process does not actually assign a specific CPU to a specific application; it's merely a side effect of the multiprocessor scheduler that the operating system provides. This means, unfortunately, that if somebody else logs into the system and runs some job that uses up a whole CPU, you won't actually get 3 CPUs worth of processing, so you may get suboptimal performance. It is in everybody's best interest not to abuse the queueing system for his own short term interests. Further, CGL has set up queueing policies which try to avoid having system service processes like the web server from interfering with PBS jobs.

I want my job to run on a SPECIFIC node (faster, more memory, direct access to disk)

To do this, you can either specify the hostname of the node in the PBS batch submission script, or you can specify a node property. The "pbsnodes -a" command will tell you the properties of each node:

thymine.cgl.ucsf.edu
     state = free
     np = 4
     properties = socrates
     ntype = cluster
     jobs = 0/636.socrates.cgl.ucsf.edu

uracil.cgl.ucsf.edu
     state = free
     np = 4
     properties = socrates
     ntype = cluster

cytosine.cgl.ucsf.edu
     state = free
     np = 3
     properties = socrates
     ntype = cluster

adenine.cgl.ucsf.edu
     state = free
     np = 2
     properties = socrates
     ntype = cluster

An example script which would always run on uracil (you must use the fully qualified name, "uracil.cgl.ucsf.edu"):

#PBS -m e
#PBS -l nodes=uracil.cgl.ucsf.edu
echo `hostname`
# end of PBS script

An example script which would always run on an ES45 node:

#PBS -m e
#PBS -l nodes=es45
echo `hostname`
# end of PBS script

Is there a comprehensive list of #PBS directives?

To get a detailed description of all the available PBS directives and how to use them, read the qsub manual page using the following command:

man qsub

It provides information on what environment variables to use, what options are available, how input and output streams are handled, and much more.