Wednesday, December 9, 2009

Open access Autism 10K genome-scan data set in NIH GEO

The Affymetrix 10K genome-scan data set from the Autism Genome Project is available in the NIH GEO repository (open access):

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6754

"This is a large linkage study undertaken by The Autism Genome Project (AGP) Consortium to search for candidate genes underlying the etiology of autism. 1168 Muliplex families (= 2 affected individuals) consisting of 7600 individuals were genotyped using Affymetrix 10K whole genome mapping arrays. Copy number analysis was performed using DNA Chip (dChip) Analyzer."

Something just didn't seem right: noticing the unexpected!

The secret of good data analysis is noticing those things that just don't seem right!:

'Wildlife Conservation Officer Cory Bentzoni noticed something suspicious on a Luzerne County highway -- a pickup truck with a heaping pile of doughnuts, breads and pastries spilling over its bed.

"Being that we were so close to bear season, seeing that person drive by with an unusual amount of pastries was like watching an individual go down a row of parked vehicles testing each handle to see if it were open," Bentzoni said in a written statement. "Something just didn't seem right."'

http://www.post-gazette.com/pg/09343/1019403-454.stm

Monday, December 7, 2009

The R Inferno

Here is the link to a PDF file entitled "The R Inferno", which is not only an excellent guide to efficient and good programming practice in R, but is also very funny:

http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

Here's the Abstract:

"If you are using R and you think you’re in hell, this is a map for you."

Tuesday, October 27, 2009

The Elements of Statistical Learning

A complete PDF of this very good book:

Hastie et al. The elements of statistical learning: data mining, inference, and prediction‎. (2009) pp. 745

is available from the authors' web site here.

Monday, October 26, 2009

Humor: Large-scale science

"A BIG COMPUTER, A COMPLEX ALGORITHM AND A LONG TIME DOES NOT EQUAL SCIENCE."

ROBERT GENTLEMAN, SSC 2003, HALIFAX (JUNE 2003)

(From the One R Tip a Day blog http://onertipaday.blogspot.com/)


And should we add

"and a p-value of 10^-8"?

Humor: How to write a scientific paper

This article on "How to Write a Scientific Paper" is amusing.

HapMap BioMart interface

One can retrieve allele frequencies and genotype frequencies for all the populations in HapMap in a nice tabular format using their BioMart HapMart interface available here:

http://hapmap.ncbi.nlm.nih.gov/biomart/martview

This is a very nice interface, which is relatively easy to use.

The same interface also generates Perl code for the same query, which then could easily be reused in a program if needed.

About Me

My photo
Pittsburgh, PA, United States