Archive

Author Archive

Quantifying evidence for enrichment

7 January 2011 Leave a comment

Z. Yang and D. R. Bickel, “Minimum description length measures of evidence for enrichment,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 76, available at biostats.bepress.com/cobra/ps/art76 (2010). Full preprint

Bayesian posterior as approximate confidence

22 December 2010 Leave a comment
Categories: Fragments

Statomics on Web 2.0

11 December 2010 Leave a comment
Categories: Fragments

Quantifying evidence for genetic association

30 November 2010 Leave a comment

Y. Yang and D. R. Bickel, “Minimum description length and empirical Bayes methods of identifying SNPs associated with disease,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 74, available at biostats.bepress.com/cobra/ps/art74 (2010).

This manuscript adapts two new evidential, information-theoretic methods to the problem of detecting SNPs associated with disease on the basis of genome-wide association data. Both an application to coronary artery disease and an extensive set of simulation studies indicate that these parametric methods tend to be more reliable than a popular semi-parametric approach to estimating local false discovery rates. In addition, the paper reports that one of the two novel methods performs better than the other.

The abstract and the discussion section of the preprint provide more detailed summaries.

Postdoctoral training in large-scale biostatistics

14 October 2010 Leave a comment

Reliable interpretation of genomic and genetic information makes unprecedented demands for innovations in statistical methodology and its application to biological systems. This unique opportunity drives research at the Statomics Lab of the Ottawa Institute of Systems Biology (http://www.statomics.com). The Statomics Lab seeks a postdoctoral fellow who will collaboratively develop and apply novel methods of statistical inference to attack current problems in analyzing data from genome-wide association studies and other high-dimensional biological data.

Experience in computationally intensive data analysis is essential, as is the ability to quickly develop reliable software implementing the statistical algorithms developed. Strong initiative, excellent communication skills, and reception of a PhD or equivalent doctorate in statistical genetics, statistics, bioinformatics, computer science, mathematics, physics, any field of engineering, or an equally quantitative field within four years prior to the start date are also absolutely necessary. The following qualities are desirable but not required: working knowledge of statistical genetics; familiarly with R, S-PLUS, Mathematica, C, Fortran, and/or LaTeX; experience in a UNIX or Linux environment.

To apply, send a PDF CV that has contact information of three references to dbickel@uottawa.ca, with “GWA Postdoctoral Fellowship” and the year of your graduation or anticipated graduation in the subject field of the message. In the message body, concisely present evidence that you meet each requirement for the position and describe your most significant papers and software packages with summaries of how you contributed to them. All applicants are thanked in advance; only those selected for further consideration will receive a response.

Normalized maximum weighted likelihood

8 October 2010 1 comment

D. R. Bickel, “Statistical inference optimized with respect to the observed sample for single or multiple comparisons,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1010.0694 (2010). Full preprint

Inference to the best explanation

13 September 2010 Leave a comment

Was the loss function a mistake?

28 August 2010 Leave a comment

Laplace’s “introduction of a loss function proved to be a serious mistake, which came to hamper the development of an objective theory of statistical inference to the present day” (Hald, 2007, pp. 3-4).

Categories: Fragments

Fisherian alternatives to conventional statistics

22 August 2010 Leave a comment

Novel developments in statistics and information theory call for a reconsideration of important aspects of two of R. A. Fisher’s most controversial ideas: the fiducial argument and the direct use of the likelihood function. Some key features of observed confidence levels, the direct use of the likelihood function, and the minimum description length principle are summarized here:

  1. Like the fiducial distribution, a probability measure of observed confidence levels is in effect a posterior probability distribution of the parameter of interest that does not require any prior distribution. Derived from sets of confidence intervals, this probability distribution of a parameter of interest is traditionally known as a confidence distribution. When the parameter of interest is scalar, the observed confidence level of a composite hypothesis is equal to its fiducial probability. On the other hand, observed conference levels do not suffer from the difficulties of constructing a fiducial distribution of a vector parameter.
  2. The likelihood ratio serves not only as a tool for the construction of point estimators, p-values, confidence intervals, and posterior probabilities, but is also fruitfully interpreted as a measure of the strength of statistical evidence for one hypothesis over another through the lens of a family of distributions. Modern versions of Fisher’s evidential use of the likelihood overcome multiplicity problems that arise in standard frequentism without resorting to a prior distribution.
  3. A related approach is to select the family of distributions using a modern information-theoretic reinterpretation of the likelihood function. In particular, the minimum description length principle extends the scope of Fisherian likelihood inference to the challenging problem of model selection.
Categories: Fragments, MDL

Medium-scale simultaneous inference

14 August 2010 3 comments

D. R. Bickel, “Minimum description length methods of medium-scale simultaneous inference,” Technical Report, Ottawa Institute of Systems Biology, available at tinyurl.com/36dm6lj (2010). Full preprint

Abstract— Nonparametric statistical methods developed for analyzing data for high numbers of genes, SNPs, or other biological features tend to have low efficiency for data with the smaller numbers of features such as proteins, metabolites, or, when expression is measured with conventional instruments, genes. For this medium-scale inference problem, the minimum description length (MDL) framework quantifies the amount of information in the data supporting a null or alternative hypothesis for each feature in terms of parametric model selection. Two new MDL techniques are proposed. First, using test statistics that are highly informative about the parameter of interest, the data are reduced to a single statistic per feature. This simplifying step is already implicit in conventional hypothesis testing and has been found effective in empirical Bayes applications to genomics data. Second, the codelength difference between the alternative and null hypotheses of any given feature can take advantage of information in the measurements from all other features by using those measurements to find the overall code of minimum length summed over those features. The techniques are applied to protein abundance data, demonstrating that a computationally efficient approximation that is close for a sufficiently large number of features works well even when the number of features is as low as 20.

Keywords: information criteria; minimum description length; model selection; reduced likelihood