DRB | David Bickel, statistician

Quantifying evidence for enrichment

7 January 2011 DRB Leave a comment

Z. Yang and D. R. Bickel, “Minimum description length measures of evidence for enrichment,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 76, available at biostats.bepress.com/cobra/ps/art76 (2010). Full preprint

Categories: MDL, ontology, preprints, statistical evidence

Bayesian posterior as approximate confidence

22 December 2010 DRB Leave a comment

Bayesian inference: an approach to statistical inference – Fraser – 2010 – Wiley Interdisciplinary Reviews: Computational Statistics – Wiley Online Library.

Categories: Fragments

Statomics on Web 2.0

11 December 2010 DRB Leave a comment

Follow the Statomics Lab on Facebook.

Categories: Fragments

Quantifying evidence for genetic association

30 November 2010 DRB Leave a comment

Y. Yang and D. R. Bickel, “Minimum description length and empirical Bayes methods of identifying SNPs associated with disease,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 74, available at biostats.bepress.com/cobra/ps/art74 (2010).

This manuscript adapts two new evidential, information-theoretic methods to the problem of detecting SNPs associated with disease on the basis of genome-wide association data. Both an application to coronary artery disease and an extensive set of simulation studies indicate that these parametric methods tend to be more reliable than a popular semi-parametric approach to estimating local false discovery rates. In addition, the paper reports that one of the two novel methods performs better than the other.

The abstract and the discussion section of the preprint provide more detailed summaries.

Categories: empirical Bayes, genetic association, MDL, preprints, statistical evidence

Postdoctoral training in large-scale biostatistics

14 October 2010 DRB Leave a comment

Reliable interpretation of genomic and genetic information makes unprecedented demands for innovations in statistical methodology and its application to biological systems. This unique opportunity drives research at the Statomics Lab of the Ottawa Institute of Systems Biology (http://www.statomics.com). The Statomics Lab seeks a postdoctoral fellow who will collaboratively develop and apply novel methods of statistical inference to attack current problems in analyzing data from genome-wide association studies and other high-dimensional biological data.

Experience in computationally intensive data analysis is essential, as is the ability to quickly develop reliable software implementing the statistical algorithms developed. Strong initiative, excellent communication skills, and reception of a PhD or equivalent doctorate in statistical genetics, statistics, bioinformatics, computer science, mathematics, physics, any field of engineering, or an equally quantitative field within four years prior to the start date are also absolutely necessary. The following qualities are desirable but not required: working knowledge of statistical genetics; familiarly with R, S-PLUS, Mathematica, C, Fortran, and/or LaTeX; experience in a UNIX or Linux environment.

To apply, send a PDF CV that has contact information of three references to dbickel@uottawa.ca, with “GWA Postdoctoral Fellowship” and the year of your graduation or anticipated graduation in the subject field of the message. In the message body, concisely present evidence that you meet each requirement for the position and describe your most significant papers and software packages with summaries of how you contributed to them. All applicants are thanked in advance; only those selected for further consideration will receive a response.

Categories: outdated (do not apply)

Normalized maximum weighted likelihood

8 October 2010 DRB 1 comment

D. R. Bickel, “Statistical inference optimized with respect to the observed sample for single or multiple comparisons,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1010.0694 (2010). Full preprint

Categories: MDL, preprints, proteomics, statistical evidence

Inference to the best explanation

13 September 2010 DRB Leave a comment

D. R. Bickel, “The strength of statistical evidence for composite hypotheses: Inference to the best explanation,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 71, available at biostats.bepress.com/cobra/ps/art71 (2010).

Categories: gene expression, preprints, statistical evidence

Was the loss function a mistake?

28 August 2010 DRB Leave a comment

Laplace’s “introduction of a loss function proved to be a serious mistake, which came to hamper the development of an objective theory of statistical inference to the present day” (Hald, 2007, pp. 3-4).

Categories: Fragments

Fisherian alternatives to conventional statistics

22 August 2010 DRB Leave a comment

Novel developments in statistics and information theory call for a reconsideration of important aspects of two of R. A. Fisher’s most controversial ideas: the fiducial argument and the direct use of the likelihood function. Some key features of observed confidence levels, the direct use of the likelihood function, and the minimum description length principle are summarized here:

Like the fiducial distribution, a probability measure of observed confidence levels is in effect a posterior probability distribution of the parameter of interest that does not require any prior distribution. Derived from sets of confidence intervals, this probability distribution of a parameter of interest is traditionally known as a confidence distribution. When the parameter of interest is scalar, the observed confidence level of a composite hypothesis is equal to its fiducial probability. On the other hand, observed conference levels do not suffer from the difficulties of constructing a fiducial distribution of a vector parameter.
The likelihood ratio serves not only as a tool for the construction of point estimators, p-values, confidence intervals, and posterior probabilities, but is also fruitfully interpreted as a measure of the strength of statistical evidence for one hypothesis over another through the lens of a family of distributions. Modern versions of Fisher’s evidential use of the likelihood overcome multiplicity problems that arise in standard frequentism without resorting to a prior distribution.
A related approach is to select the family of distributions using a modern information-theoretic reinterpretation of the likelihood function. In particular, the minimum description length principle extends the scope of Fisherian likelihood inference to the challenging problem of model selection.

Categories: Fragments, MDL

Medium-scale simultaneous inference

14 August 2010 DRB 3 comments

D. R. Bickel, “Minimum description length methods of medium-scale simultaneous inference,” Technical Report, Ottawa Institute of Systems Biology, available at tinyurl.com/36dm6lj (2010). Full preprint

Abstract— Nonparametric statistical methods developed for analyzing data for high numbers of genes, SNPs, or other biological features tend to have low efficiency for data with the smaller numbers of features such as proteins, metabolites, or, when expression is measured with conventional instruments, genes. For this medium-scale inference problem, the minimum description length (MDL) framework quantifies the amount of information in the data supporting a null or alternative hypothesis for each feature in terms of parametric model selection. Two new MDL techniques are proposed. First, using test statistics that are highly informative about the parameter of interest, the data are reduced to a single statistic per feature. This simplifying step is already implicit in conventional hypothesis testing and has been found effective in empirical Bayes applications to genomics data. Second, the codelength difference between the alternative and null hypotheses of any given feature can take advantage of information in the measurements from all other features by using those measurements to find the overall code of minimum length summed over those features. The techniques are applied to protein abundance data, demonstrating that a computationally efficient approximation that is close for a sufficiently large number of features works well even when the number of features is as low as 20.

Keywords: information criteria; minimum description length; model selection; reduced likelihood

Categories: empirical Bayes, MDL, preprints, proteomics, statistical evidence

Newer Entries Older Entries

David Bickel, statistician

Archive

Quantifying evidence for enrichment

Bayesian posterior as approximate confidence

Statomics on Web 2.0

Quantifying evidence for genetic association

Postdoctoral training in large-scale biostatistics

Normalized maximum weighted likelihood

Inference to the best explanation

Was the loss function a mistake?

Fisherian alternatives to conventional statistics

Medium-scale simultaneous inference

News

Pre-2020 content by topic

Pre-2020 content by date

Slideshow

Facebook page

Archive

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

News

Pre-2020 content by topic

Pre-2020 content by date

Slideshow