Friday, December 23, 2016

Creating Beamer slides with optionally included notes slides using R Markdown

Using RStudio, it is very easy to create Beamer slides using R Markdown.  RStudio also supports LaTeX, so this enables even more flexibility.

Using the template shared here:

https://github.com/DanielEWeeks/Beamer-Rmd-Notes-Slides

one can also insert 'note slides' that can easily be all included or all excluded from the generated set of slides, just by toggling one word in the LaTeX header.

This allows one to use a more concise set of slides, without the notes, in class while providing the students a more comprehensive set that contains all the notes.

It could also be used to hide 'answer' slides from the students' slide set when posing questions in class.  When posing questions in class, I prefer that the students don't have the 'answer' slides right in front of them.  Then, after class, I make the slide set that contains the 'answer' slides available.

I previously shared instructions on a different approach for doing this when generating Beamer slides using LaTeX or LyX - see:

http://deweeks.blogspot.com/2014/08/hiding-answer-slides-in-student.html




Friday, November 11, 2016

Optical character recognition

Here is a script that uses the 'tesseract' optical character recognition software to extract recognizable text from a PDF file:

#!/bin/bash
# Purpose: 
#   To carry out OCR on a PDF file
if [[ $# -ne 1 ]]; then
  echo "This script expects one argument."
  echo "  This argument is the name of the pdf file" 
  echo "  including the .pdf extension"
  echo "Usage: $0 file.pdf "
else
filename=$(basename "$1")
filename="${filename%.*}"
echo "Converting $1 to a tiff file named $filename.tiff"
echo "... "
convert -density 300 $1 -depth 8 -strip -background white -alpha off $filename.tiff
echo "Carrying out OCR on $filename.tiff to create $filename.txt"
tesseract $filename.tiff $filename
echo "The recognizable text in $1 has been output to the $filename.txt file."
fi

For information on how to install the needed software, see this web page.

Friday, October 28, 2016

Please assign informative names to downloaded PDFs!

Bravo for the journals that assign human-readable informative names to the PDF versions of articles downloaded from their web sites.  For example, these naming schemes are very nice:

Hum. Mol. Genet.-2015-Simpkin-3752-63.pdf
PNAS-2005-Storey-12837-42.pdf
Int. J. Epidemiol.-2015-Sharp-1288-304.pdf


Boo for the journals that assign non-informative names to their PDF files.  For example, these naming schemes are not informative to me (even though the DOI is part of the name of the first two):

art%3A10.1186%2Fgb-2013-14-5-r42.pdf
art%3A10.1007%2Fs11357-016-9927-9.pdf
ijerph-12-14461.pdf

Tuesday, March 22, 2016

Simple parallelization using 'sem' of the GNU parallel package

If you have a multiprocessor computer, you can easily use the ‘sem’ part of the GNU Parallel system to run processes in parallel.   

It was really simple to use and worked as intended.  And this was much easier to install and get working than the other approach I was contemplating, which was installing grid engine software.

See:

https://www.gnu.org/software/parallel/sem.html

Below is my parallelized script, which ran a whole set of “HaploPS” commands in parallel using 12 of my processors.  I have bolded the two 'sem' commands that made this script execute in parallel.


$ more run_haploPS_parallel.sh 
#!/bin/bash

for (( chr=1;chr<=22;chr=chr+1)) {
     cd $chr
     for (( i=95 ; i>=5 ; i=i-5)) {

        freq=$(echo "scale=2;$i/100"|bc)
        date
        echo "Starting HaploPS run on chromosome " $chr " with freq = " $freq
        sem -j12 HaploPS -geno selscan.hap -legend selscan.map -freq 0$freq -out ../haploPS/haploPS_0${freq}_chr${chr}.txt 
        echo "Run completed."
        date
}
      cd ..
}
sem --wait
##############It will automatically run haploPS at 5 to 95 percent frequencies.
#################################################################

About Me

My photo
Pittsburgh, PA, United States