Principles of a ‘good’ cluster script:
1) Should only copy over ONE file at the beginning & only ONE file back at the end.
2) Should show which node and directory it is running in, so that we can figure out which nodes are not running.
3) Should clean up after the run is done, erasing files from the compute node.
4) Should keep stderr and stdout output to a minimum, by piping it to /dev/null
A) Make a ‘files.list’ file contain a list of all the files you need to copy over.
% more files.list
ccrel.BATCH
ccrel.awk
crun.sh
datain.dat
loop.sh
out.txt
seed
simdata.dat
simped.dat
slinkin.dat
unmake.awk
B) Create a tar file ‘files.tgz’ based on this list:
tar zcvfT files.tgz files.list
IMPORTANT: Do steps A and B OUTSIDE of the script that will be submitted to the cluster.
Do not use zip instead of tar. Use of tar as described here creates ONE compressed file - copying of only one file over/back keeps creation of the associated meta-data to a minimum.
C) Use qsub to submit a file like this to the cluster.
#!/bin/csh/ -f
# Purpose:
#
# Steps:
#
# Helper scripts:
#
# Uses:
# ==============================================================================
#$ -cwd
#$ -m e
#$ -j y
#$ -N CCRELsim
date
# This will tell you which host it is running on.
echo JOB_ID: $JOB_ID JOB_NAME: $JOB_NAME HOSTNAME: $HOSTNAME
unalias cp
# ==============================================================================
echo Making directory /tmp/$$
mkdir /tmp/$$/
if !(-e files.tgz) then
echo ERROR The files.tgz archive does not exist
echo Please create it prior to running this script
exit(1)
endif
# Copy the needed files over to the compute node
cp files.tgz /tmp/$$/
# Set an alias to your current directory
set HomeDir = `pwd`
# Move into the temporary directory on the compute node
cd /tmp/$$
# Extract the files from your compressed tar file
tar zxvf files.tgz >& /dev/null
# ==============================================================================
# Do the needed computations. In this case, these are done by my 'loop.sh' script file that I copied over
./loop.sh >& /dev/null
# ==============================================================================
# Copy results back to working directory
unalias cp
cd ..
tar zcf results_$JOB_ID.tgz $$
cp results_$JOB_ID.tgz $HomeDir
# ==============================================================================
# Enter working directory
cd $HomeDir
# Remove all the files on compute node
rm -rf /tmp/$$
rm -f /tmp/results_$JOB_ID.tgz
date
# ================================
On June 6, 2007, Ryan sent this message:
ReplyDeleteI have made a change in the way /tmp directories are created on the cluster for each script.
You no longer have to create the /tmp directories in your script. It is taken care of automatically by the grid engine software.
This will help in keeping the /tmp directories from filling up the drives on the compute nodes.
I would ask that anyone wanting to submit a script please send it to me first for modification.
Thanks,
Ryan Evans
Programmer / Systems Administrator, Center for Computational Genetics
University of Pittsburgh
Department of Human Genetics