Friday, December 23, 2016

Creating Beamer slides with optionally included notes slides using R Markdown

Using RStudio, it is very easy to create Beamer slides using R Markdown.  RStudio also supports LaTeX, so this enables even more flexibility.

Using the template shared here:

one can also insert 'note slides' that can easily be all included or all excluded from the generated set of slides, just by toggling one word in the LaTeX header.

This allows one to use a more concise set of slides, without the notes, in class while providing the students a more comprehensive set that contains all the notes.

It could also be used to hide 'answer' slides from the students' slide set when posing questions in class.  When posing questions in class, I prefer that the students don't have the 'answer' slides right in front of them.  Then, after class, I make the slide set that contains the 'answer' slides available.

I previously shared instructions on a different approach for doing this when generating Beamer slides using LaTeX or LyX - see:

Friday, November 11, 2016

Optical character recognition

Here is a script that uses the 'tesseract' optical character recognition software to extract recognizable text from a PDF file:

# Purpose: 
#   To carry out OCR on a PDF file
if [[ $# -ne 1 ]]; then
  echo "This script expects one argument."
  echo "  This argument is the name of the pdf file" 
  echo "  including the .pdf extension"
  echo "Usage: $0 file.pdf "
filename=$(basename "$1")
echo "Converting $1 to a tiff file named $filename.tiff"
echo "... "
convert -density 300 $1 -depth 8 -strip -background white -alpha off $filename.tiff
echo "Carrying out OCR on $filename.tiff to create $filename.txt"
tesseract $filename.tiff $filename
echo "The recognizable text in $1 has been output to the $filename.txt file."

For information on how to install the needed software, see this web page.

Friday, October 28, 2016

Please assign informative names to downloaded PDFs!

Bravo for the journals that assign human-readable informative names to the PDF versions of articles downloaded from their web sites.  For example, these naming schemes are very nice:

Hum. Mol. Genet.-2015-Simpkin-3752-63.pdf
Int. J. Epidemiol.-2015-Sharp-1288-304.pdf

Boo for the journals that assign non-informative names to their PDF files.  For example, these naming schemes are not informative to me (even though the DOI is part of the name of the first two):


Tuesday, March 22, 2016

Simple parallelization using 'sem' of the GNU parallel package

If you have a multiprocessor computer, you can easily use the ‘sem’ part of the GNU Parallel system to run processes in parallel.   

It was really simple to use and worked as intended.  And this was much easier to install and get working than the other approach I was contemplating, which was installing grid engine software.


Below is my parallelized script, which ran a whole set of “HaploPS” commands in parallel using 12 of my processors.  I have bolded the two 'sem' commands that made this script execute in parallel.

$ more 

for (( chr=1;chr<=22;chr=chr+1)) {
     cd $chr
     for (( i=95 ; i>=5 ; i=i-5)) {

        freq=$(echo "scale=2;$i/100"|bc)
        echo "Starting HaploPS run on chromosome " $chr " with freq = " $freq
        sem -j12 HaploPS -geno selscan.hap -legend -freq 0$freq -out ../haploPS/haploPS_0${freq}_chr${chr}.txt 
        echo "Run completed."
      cd ..
sem --wait
##############It will automatically run haploPS at 5 to 95 percent frequencies.

Tuesday, August 18, 2015

The difference between mathematics and statistics

"Mathematics is about whether the conclusions follow from the assumptions. By contrast, statistics is about whether the assumptions have anything to do with the real world." 

Jay Kadane, Leonard J. Savage Professor of Statistics, Emeritus, Carnegie-Mellon University

Monday, February 23, 2015


As part of transitioning from using an old desktop to a new desktop (while both are still active), I found the 'rsync' command to be very useful.  Here is the options I ended up using to accomplish my goal of copying over (old) files from the old machine to the new without wiping out any new files on the new machine:

rsync -avzuhP --log-file=/destination/dir/rsync_log.txt -e ssh remoteuser@remotehost:/source/dir /destination/dir/

Here's what the options do:
-a archive
-v verbose
-z compress
-u update  <= Do Not Overwrite the Modified Files at the Destination
-h human-readable
-P progress bar/partial transfers

Note that these options are turned on in '-a' archive mode:
        -r, --recursive             recurse into directories
        -l, --links                 copy symlinks as symlinks
        -p, --perms                 preserve permissions
        -t, --times                 preserve times
        -g, --group                 preserve group
        -o, --owner                 preserve owner (super-user only)
        -D                          same as --devices --specials

Thursday, August 28, 2014

Hiding 'answer' slides in student handouts using Beamer

After adding 'Question' slides, followed by 'Answer' slides, to my Beamer presentation, I wondered if there was an easy way to remove the 'Answer' slides from the handout version of the slides that I will give to the students.

It turns out that there is, as described in this link:  just add <handout:0> right after the \begin{frame}

When 'handout' is added to the document class, all those slides marked with <handout:0> are automatically excluded.

This is wonderful!

About Me

My photo
Pittsburgh, PA, United States