It is often difficult to construct a good genetic map of a large number of SNPs. Typically this involves interpolation between a genetic map and the physical map of the SNPs.
The Cartographer web site at
https://apps.bioinfo.helsinki.fi/software/cartographer.aspx
looks like it might ease this task.
Here is a description of what Cartographer does from a recent paper:
"Cartographer retrieves the physical location of the markers from the University of California Santa Cruz (UCSC) database and orders the markers based on the sequence information. The genetic location of each marker is defined using the published deCODE genetic map [51], which are also stored in the UCSC database. For markers that were not included in the deCODE genetic map, we used linear interpolation for obtaining estimates of the genetic locations of these markers by using the physical and the genetic locations of the immediately flanking deCODE markers. Also, if the physical location from the UCSC database and the genetic location from the deCODE genetic map for a given marker were in disagreement, we obtained an estimate of the genetic location via interpolation using the nearest flanking deCODE markers that are in agreement with the sequence information. For those markers that were not found in the UCSC database, we retrieved PCR primers from UniSTS and used these to map the marker using UCSC in-silico PCR, and performed the linear interpolation to obtain an estimate of the genetic location as described previously [14]."
From:
Combined Genome Scans for Body Stature in 6,602 European Twins: Evidence for Common Caucasian Loci Perola M, Sammalisto S, Hiekkalinna T, Martin NG, Visscher PM, et al. PLoS Genetics Vol. 3, No. 6, e97 doi:10.1371/journal.pgen.0030097
-----------
For an alternative approach to constructing interpolated genetic maps, see David Duffy's discussion.
Dr. Weeks is a Professor of Human Genetics and Biostatistics at the University of Pittsburgh. His research focuses on statistical human genetics in the area of mapping susceptibility loci involved in complex human diseases. The content on this blog is for informational purposes only - use at your own risk!
Friday, July 31, 2009
10 Simple Rules for Making Good Presentations
The PLoS Computational Biology journal has a very interesting series of '10 Simple Rules' papers that are quite good. Here is one of the more recent ones (which contains references to the earlier ones):
Bourne PE (2007) Ten Simple Rules for Making Good Oral Presentations. PLoS Comput Biol 3(4): e77 doi:10.1371/journal.pcbi.0030077
See also:
Bourne PE (2005) Ten simple rules for getting published. PLoS Comp Biol 1: e57.
Bourne PE, Chalupa LM (2006) Ten simple rules for getting grants. PLoS Comp Biol 2: e12.
Bourne PE, Korngreen A (2006) Ten simple rules for reviewers. PLoS Comp Biol 2: e110.
Bourne PE, Friedberg I (2006) Ten simple rules for selecting a postdoctoral fellowship. PLoS Comp Biol 2: e121.
Vicens Q, Bourne PE (2007) Ten simple rules for a successful collaboration. PLoS Comp Biol 3: e44.
Bourne PE (2007) Ten Simple Rules for Making Good Oral Presentations. PLoS Comput Biol 3(4): e77 doi:10.1371/journal.pcbi.0030077
See also:
Bourne PE (2005) Ten simple rules for getting published. PLoS Comp Biol 1: e57.
Bourne PE, Chalupa LM (2006) Ten simple rules for getting grants. PLoS Comp Biol 2: e12.
Bourne PE, Korngreen A (2006) Ten simple rules for reviewers. PLoS Comp Biol 2: e110.
Bourne PE, Friedberg I (2006) Ten simple rules for selecting a postdoctoral fellowship. PLoS Comp Biol 2: e121.
Vicens Q, Bourne PE (2007) Ten simple rules for a successful collaboration. PLoS Comp Biol 3: e44.
Excellent slides on R
Here are some excellent slides on R (pdf), which were made by Dr. Thomas Lumley for a Two-day Tutorial in "R" Statistical Computing that he gave in February 2006.
These are by far the best slides I've seen so far on R, with lots of very useful advice and insights into how to use R optimally.
These are by far the best slides I've seen so far on R, with lots of very useful advice and insights into how to use R optimally.
Quotations from a scientific article
"Perhaps men are just simpler than women."
From: Atwood LD, Heard-Costa NL, Fox CS, Jaquish CE, Cupples LA. Sex and age specific effects of chromosomal regions linked to body mass index in the Framingham Study. BMC Genet. 2006 Jan 26;7:7.
For more quotations, please see the Statistical Genetics Humor Page.
From: Atwood LD, Heard-Costa NL, Fox CS, Jaquish CE, Cupples LA. Sex and age specific effects of chromosomal regions linked to body mass index in the Framingham Study. BMC Genet. 2006 Jan 26;7:7.
For more quotations, please see the Statistical Genetics Humor Page.
Science is poetry
"Basic science" reasoning is often as cloudy, beautiful, and impossible to refute as poetry.
From: Ioannidis JP. Journals should publish all "null" results and should sparingly publish "positive" results. Cancer Epidemiol Biomarkers Prev. 2006 Jan;15(1):186.
For more quotations, please see the Statistical Genetics Humor Page.
From: Ioannidis JP. Journals should publish all "null" results and should sparingly publish "positive" results. Cancer Epidemiol Biomarkers Prev. 2006 Jan;15(1):186.
For more quotations, please see the Statistical Genetics Humor Page.
Creating TIFF files from pdfs
R easily creates graphic files in pdf format, but many journals require that one submit high-resolution TIFF files. Johanna figured out how to convert from pdf to TIFF:
ghostscript can be used to change between formats. '-r' option for how
may d.p.i you want. This is the command to write to file (you can also
write to X11 using ghostview). It is also possible to change to and from
many other formats.
The command used:
gs -q -dNOPAUSE -dBATCH -sDEVICE=tifflzw -r600 -sOutputFile=name.tiff
name.pdf
I think when using preview to change from pdf to tiff it automatically
puts the figure in 72 d.p.i.
This looks very nice in Word even if zooming more than 200%.
NOTES:
A) The 'tifflzw' option results in a compressed file that is much much smaller than the corresponding uncompressed tiff file. And it is a loss-less compression, so there is no loss of quality.
B) The 'tifflzw' option produces monochrome output. If you need LZW-compressed color TIFF files, you can do this, using a command from the libtiff library:
gs -q -dNOPAUSE -dBATCH -sDEVICE=tiff24nc -r600 -sOutputFile=figure4a.TIFF figure_4a.pdf
tiffcp -c lzw figure4a.TIFF figure4aLZW.TIFF
C) For some reason, importing pdf figures into Microsoft Word causes the figures to become fuzzy. To avoid this, convert your pdf figures to compressed TIFF and then import the TIFF files into Word.
D) When creating plots within R, one can directly create a TIFF file using the bitmap() or dev2bitmap() functions. For example, this command:
dev2bitmap(file="test.tiff",type="tifflzw",res=600)
copied the plot I had already drawn in the R graphics window to a 600 DPI TIFF file named "test.tiff".
ghostscript can be used to change between formats. '-r' option for how
may d.p.i you want. This is the command to write to file (you can also
write to X11 using ghostview). It is also possible to change to and from
many other formats.
The command used:
gs -q -dNOPAUSE -dBATCH -sDEVICE=tifflzw -r600 -sOutputFile=name.tiff
name.pdf
I think when using preview to change from pdf to tiff it automatically
puts the figure in 72 d.p.i.
This looks very nice in Word even if zooming more than 200%.
NOTES:
A) The 'tifflzw' option results in a compressed file that is much much smaller than the corresponding uncompressed tiff file. And it is a loss-less compression, so there is no loss of quality.
B) The 'tifflzw' option produces monochrome output. If you need LZW-compressed color TIFF files, you can do this, using a command from the libtiff library:
gs -q -dNOPAUSE -dBATCH -sDEVICE=tiff24nc -r600 -sOutputFile=figure4a.TIFF figure_4a.pdf
tiffcp -c lzw figure4a.TIFF figure4aLZW.TIFF
C) For some reason, importing pdf figures into Microsoft Word causes the figures to become fuzzy. To avoid this, convert your pdf figures to compressed TIFF and then import the TIFF files into Word.
D) When creating plots within R, one can directly create a TIFF file using the bitmap() or dev2bitmap() functions. For example, this command:
dev2bitmap(file="test.tiff",type="tifflzw",res=600)
copied the plot I had already drawn in the R graphics window to a 600 DPI TIFF file named "test.tiff".
Old Blog entries
All of the entries from my old blog can be found in this pdf file:
OldBlog.pdf
The pdf file contains bookmarks, which might be helpful in navigating through the document.
OldBlog.pdf
The pdf file contains bookmarks, which might be helpful in navigating through the document.
Using R to reshape data
Sometimes marker data are sent to us in this format, where each row only contains one genotype for a particular marker and person:
However, in order to put this marker data in LINKAGE-format, we need to reshape the data so that each row contains all the marker data for a specific person. This can easily be done using the ‘reshape’ command in R:
ID Marker A1 A2
1 M1 1 2
2 M1 2 2
3 M1 2 2
1 M2 1 1
3 M2 1 2
However, in order to put this marker data in LINKAGE-format, we need to reshape the data so that each row contains all the marker data for a specific person. This can easily be done using the ‘reshape’ command in R:
> a < - read.table("marker.txt",header=T)
> a
ID Marker A1 A2
1 1 M1 1 2
2 2 M1 2 2
3 3 M1 2 2
4 1 M2 1 1
5 3 M2 1 2
> attach(a)
> b < - reshape(a,idvar="ID",direction="wide",timevar="Marker")
> b
ID A1.M1 A2.M1 A1.M2 A2.M2
1 1 1 2 1 1
2 2 2 2 NA NA
3 3 2 2 1 2
Model cluster script
*** IMPORTANT: Please also read the Comment. ***
Principles of a ‘good’ cluster script:
1) Should only copy over ONE file at the beginning & only ONE file back at the end.
2) Should show which node and directory it is running in, so that we can figure out which nodes are not running.
3) Should clean up after the run is done, erasing files from the compute node.
4) Should keep stderr and stdout output to a minimum, by piping it to /dev/null
A) Make a ‘files.list’ file contain a list of all the files you need to copy over.
B) Create a tar file ‘files.tgz’ based on this list:
IMPORTANT: Do steps A and B OUTSIDE of the script that will be submitted to the cluster.
Do not use zip instead of tar. Use of tar as described here creates ONE compressed file - copying of only one file over/back keeps creation of the associated meta-data to a minimum.
C) Use qsub to submit a file like this to the cluster.
Principles of a ‘good’ cluster script:
1) Should only copy over ONE file at the beginning & only ONE file back at the end.
2) Should show which node and directory it is running in, so that we can figure out which nodes are not running.
3) Should clean up after the run is done, erasing files from the compute node.
4) Should keep stderr and stdout output to a minimum, by piping it to /dev/null
A) Make a ‘files.list’ file contain a list of all the files you need to copy over.
% more files.list
ccrel.BATCH
ccrel.awk
crun.sh
datain.dat
loop.sh
out.txt
seed
simdata.dat
simped.dat
slinkin.dat
unmake.awk
B) Create a tar file ‘files.tgz’ based on this list:
tar zcvfT files.tgz files.list
IMPORTANT: Do steps A and B OUTSIDE of the script that will be submitted to the cluster.
Do not use zip instead of tar. Use of tar as described here creates ONE compressed file - copying of only one file over/back keeps creation of the associated meta-data to a minimum.
C) Use qsub to submit a file like this to the cluster.
#!/bin/csh/ -f
# Purpose:
#
# Steps:
#
# Helper scripts:
#
# Uses:
# ==============================================================================
#$ -cwd
#$ -m e
#$ -j y
#$ -N CCRELsim
date
# This will tell you which host it is running on.
echo JOB_ID: $JOB_ID JOB_NAME: $JOB_NAME HOSTNAME: $HOSTNAME
unalias cp
# ==============================================================================
echo Making directory /tmp/$$
mkdir /tmp/$$/
if !(-e files.tgz) then
echo ERROR The files.tgz archive does not exist
echo Please create it prior to running this script
exit(1)
endif
# Copy the needed files over to the compute node
cp files.tgz /tmp/$$/
# Set an alias to your current directory
set HomeDir = `pwd`
# Move into the temporary directory on the compute node
cd /tmp/$$
# Extract the files from your compressed tar file
tar zxvf files.tgz >& /dev/null
# ==============================================================================
# Do the needed computations. In this case, these are done by my 'loop.sh' script file that I copied over
./loop.sh >& /dev/null
# ==============================================================================
# Copy results back to working directory
unalias cp
cd ..
tar zcf results_$JOB_ID.tgz $$
cp results_$JOB_ID.tgz $HomeDir
# ==============================================================================
# Enter working directory
cd $HomeDir
# Remove all the files on compute node
rm -rf /tmp/$$
rm -f /tmp/results_$JOB_ID.tgz
date
# ================================
Subscribe to:
Posts (Atom)