Archive
Quantifying evidence for enrichment
Z. Yang and D. R. Bickel, “Minimum description length measures of evidence for enrichment,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 76, available at biostats.bepress.com/cobra/ps/art76 (2010). Full preprint
Statomics on Web 2.0
Follow the Statomics Lab on Facebook.
Quantifying evidence for genetic association
Y. Yang and D. R. Bickel, “Minimum description length and empirical Bayes methods of identifying SNPs associated with disease,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 74, available at biostats.bepress.com/cobra/ps/art74 (2010).
This manuscript adapts two new evidential, information-theoretic methods to the problem of detecting SNPs associated with disease on the basis of genome-wide association data. Both an application to coronary artery disease and an extensive set of simulation studies indicate that these parametric methods tend to be more reliable than a popular semi-parametric approach to estimating local false discovery rates. In addition, the paper reports that one of the two novel methods performs better than the other.
The abstract and the discussion section of the preprint provide more detailed summaries.
Postdoctoral training in large-scale biostatistics
Reliable interpretation of genomic and genetic information makes unprecedented demands for innovations in statistical methodology and its application to biological systems. This unique opportunity drives research at the Statomics Lab of the Ottawa Institute of Systems Biology (http://www.statomics.com). The Statomics Lab seeks a postdoctoral fellow who will collaboratively develop and apply novel methods of statistical inference to attack current problems in analyzing data from genome-wide association studies and other high-dimensional biological data.
Experience in computationally intensive data analysis is essential, as is the ability to quickly develop reliable software implementing the statistical algorithms developed. Strong initiative, excellent communication skills, and reception of a PhD or equivalent doctorate in statistical genetics, statistics, bioinformatics, computer science, mathematics, physics, any field of engineering, or an equally quantitative field within four years prior to the start date are also absolutely necessary. The following qualities are desirable but not required: working knowledge of statistical genetics; familiarly with R, S-PLUS, Mathematica, C, Fortran, and/or LaTeX; experience in a UNIX or Linux environment.
To apply, send a PDF CV that has contact information of three references to dbickel@uottawa.ca, with “GWA Postdoctoral Fellowship” and the year of your graduation or anticipated graduation in the subject field of the message. In the message body, concisely present evidence that you meet each requirement for the position and describe your most significant papers and software packages with summaries of how you contributed to them. All applicants are thanked in advance; only those selected for further consideration will receive a response.
Normalized maximum weighted likelihood
D. R. Bickel, “Statistical inference optimized with respect to the observed sample for single or multiple comparisons,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1010.0694 (2010). Full preprint
Inference to the best explanation
D. R. Bickel, “The strength of statistical evidence for composite hypotheses: Inference to the best explanation,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 71, available at biostats.bepress.com/cobra/ps/art71 (2010).
Was the loss function a mistake?
Laplace’s “introduction of a loss function proved to be a serious mistake, which came to hamper the development of an objective theory of statistical inference to the present day” (Hald, 2007, pp. 3-4).
Fisherian alternatives to conventional statistics
Novel developments in statistics and information theory call for a reconsideration of important aspects of two of R. A. Fisher’s most controversial ideas: the fiducial argument and the direct use of the likelihood function. Some key features of observed confidence levels, the direct use of the likelihood function, and the minimum description length principle are summarized here:
- Like the fiducial distribution, a probability measure of observed confidence levels is in effect a posterior probability distribution of the parameter of interest that does not require any prior distribution. Derived from sets of confidence intervals, this probability distribution of a parameter of interest is traditionally known as a confidence distribution. When the parameter of interest is scalar, the observed confidence level of a composite hypothesis is equal to its fiducial probability. On the other hand, observed conference levels do not suffer from the difficulties of constructing a fiducial distribution of a vector parameter.
- The likelihood ratio serves not only as a tool for the construction of point estimators, p-values, confidence intervals, and posterior probabilities, but is also fruitfully interpreted as a measure of the strength of statistical evidence for one hypothesis over another through the lens of a family of distributions. Modern versions of Fisher’s evidential use of the likelihood overcome multiplicity problems that arise in standard frequentism without resorting to a prior distribution.
- A related approach is to select the family of distributions using a modern information-theoretic reinterpretation of the likelihood function. In particular, the minimum description length principle extends the scope of Fisherian likelihood inference to the challenging problem of model selection.
Medium-scale simultaneous inference
D. R. Bickel, “Minimum description length methods of medium-scale simultaneous inference,” Technical Report, Ottawa Institute of Systems Biology, available at tinyurl.com/36dm6lj (2010). Full preprint
Abstract— Nonparametric statistical methods developed for analyzing data for high numbers of genes, SNPs, or other biological features tend to have low efficiency for data with the smaller numbers of features such as proteins, metabolites, or, when expression is measured with conventional instruments, genes. For this medium-scale inference problem, the minimum description length (MDL) framework quantifies the amount of information in the data supporting a null or alternative hypothesis for each feature in terms of parametric model selection. Two new MDL techniques are proposed. First, using test statistics that are highly informative about the parameter of interest, the data are reduced to a single statistic per feature. This simplifying step is already implicit in conventional hypothesis testing and has been found effective in empirical Bayes applications to genomics data. Second, the codelength difference between the alternative and null hypotheses of any given feature can take advantage of information in the measurements from all other features by using those measurements to find the overall code of minimum length summed over those features. The techniques are applied to protein abundance data, demonstrating that a computationally efficient approximation that is close for a sufficiently large number of features works well even when the number of features is as low as 20.
Keywords: information criteria; minimum description length; model selection; reduced likelihood
You must be logged in to post a comment.