Tuesday, March 22, 2016

Simple parallelization using 'sem' of the GNU parallel package

If you have a multiprocessor computer, you can easily use the ‘sem’ part of the GNU Parallel system to run processes in parallel.   

It was really simple to use and worked as intended.  And this was much easier to install and get working than the other approach I was contemplating, which was installing grid engine software.

See:

https://www.gnu.org/software/parallel/sem.html

Below is my parallelized script, which ran a whole set of “HaploPS” commands in parallel using 12 of my processors.  I have bolded the two 'sem' commands that made this script execute in parallel.


$ more run_haploPS_parallel.sh 
#!/bin/bash

for (( chr=1;chr<=22;chr=chr+1)) {
     cd $chr
     for (( i=95 ; i>=5 ; i=i-5)) {

        freq=$(echo "scale=2;$i/100"|bc)
        date
        echo "Starting HaploPS run on chromosome " $chr " with freq = " $freq
        sem -j12 HaploPS -geno selscan.hap -legend selscan.map -freq 0$freq -out ../haploPS/haploPS_0${freq}_chr${chr}.txt 
        echo "Run completed."
        date
}
      cd ..
}
sem --wait
##############It will automatically run haploPS at 5 to 95 percent frequencies.
#################################################################

About Me

My photo
Pittsburgh, PA, United States