Archive

Archive for the ‘Methods’ Category

All-scale FDR estimation

24 June 2011 Leave a comment

D. R. Bickel, “Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1106.4490 (2011). Full preprint

To address multiple comparison problems in small-to-high-dimensional biology, this paper introduces novel estimators of the local false discovery rate (LFDR), reports their main properties, and illustrates their use with proteomics data. Unlike previous LFDR estimators, the new estimators have all of the following advantages:

  1. proven asymptotic conservatism;
  2. simplicity of calculation without the tuning of smoothing parameters;
  3. no strong parametric assumptions;
  4. applicability to very small numbers of hypotheses as well as to very large numbers of hypotheses.

Observed confidence levels for microarrays, etc.

22 June 2011 Leave a comment

D. R. Bickel, “Estimating the null distribution to adjust observed confidence levels for genome-scale screening,” Biometrics 67, 363-370 (2011). Abstract and article | French abstract | Supplementary material | Simple explanation

image

This paper describes the first application of observed confidence levels to data of high-dimensional biology. The proposed method for multiple comparisons can take advantage of the estimated null distribution without any prior distribution. The new method is applied to microarray data to illustrate its advantages.

Unknown Bayes factor approximation

5 April 2011 Leave a comment

D. R. Bickel, “Measuring support for a hypothesis about a random parameter without estimating its unknown prior,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1101.0305 (2011). Full preprint

Small-scale inference

5 April 2011 Leave a comment

D. R. Bickel, “Small-scale inference: Empirical Bayes and confidence methods for as few as a single comparison,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1104.0341 (2011). Full preprint

Parametric empirical Bayes methods of estimating the local false discovery rate by maximum likelihood apply not only to the multiple comparison settings for which they were developed, but, with a simple modification, also to small numbers of comparisons. In fact, data for a single comparison are sufficient under broad conditions, as seen from applications to measurements of the abundance levels of 20 proteins and from simulation studies with confidence-based inference as the competitor.

Confidence intervals for semi-parametric empirical Bayes

7 January 2011 2 comments

D. R. Bickel, “Large-scale interval and point estimates from an empirical Bayes extension of confidence posteriors,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1012.6033 (2010). Full preprint

To address multiple comparison problems in high-dimensional biology, this paper introduces shrunken point estimates for feature prioritization and shrunken confidence intervals to indicate the uncertainty of the point estimates. The new point and interval estimates are applied to gene expression data and are found to be conservative by simulation, as expected from limiting cases. Unlike the parametric empirical Bayes estimates, the new estimates are compatible with the semi-parametric approach to local false discovery rate estimation that has been extensively developed and applied over the last decade. This is carried out by replacing strong parametric assumptions with the confidence posterior theory of papers in the presses of Biometrics and Communications in Statistics — Theory and Methods.

Quantifying evidence for enrichment

7 January 2011 Leave a comment

Z. Yang and D. R. Bickel, “Minimum description length measures of evidence for enrichment,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 76, available at biostats.bepress.com/cobra/ps/art76 (2010). Full preprint

Quantifying evidence for genetic association

30 November 2010 Leave a comment

Y. Yang and D. R. Bickel, “Minimum description length and empirical Bayes methods of identifying SNPs associated with disease,” Technical Report, Ottawa Institute of Systems Biology, COBRA Preprint Series, Article 74, available at biostats.bepress.com/cobra/ps/art74 (2010).

This manuscript adapts two new evidential, information-theoretic methods to the problem of detecting SNPs associated with disease on the basis of genome-wide association data. Both an application to coronary artery disease and an extensive set of simulation studies indicate that these parametric methods tend to be more reliable than a popular semi-parametric approach to estimating local false discovery rates. In addition, the paper reports that one of the two novel methods performs better than the other.

The abstract and the discussion section of the preprint provide more detailed summaries.

Normalized maximum weighted likelihood

8 October 2010 1 comment

D. R. Bickel, “Statistical inference optimized with respect to the observed sample for single or multiple comparisons,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1010.0694 (2010). Full preprint

Inference to the best explanation

13 September 2010 Leave a comment

Fisherian alternatives to conventional statistics

22 August 2010 Leave a comment

Novel developments in statistics and information theory call for a reconsideration of important aspects of two of R. A. Fisher’s most controversial ideas: the fiducial argument and the direct use of the likelihood function. Some key features of observed confidence levels, the direct use of the likelihood function, and the minimum description length principle are summarized here:

  1. Like the fiducial distribution, a probability measure of observed confidence levels is in effect a posterior probability distribution of the parameter of interest that does not require any prior distribution. Derived from sets of confidence intervals, this probability distribution of a parameter of interest is traditionally known as a confidence distribution. When the parameter of interest is scalar, the observed confidence level of a composite hypothesis is equal to its fiducial probability. On the other hand, observed conference levels do not suffer from the difficulties of constructing a fiducial distribution of a vector parameter.
  2. The likelihood ratio serves not only as a tool for the construction of point estimators, p-values, confidence intervals, and posterior probabilities, but is also fruitfully interpreted as a measure of the strength of statistical evidence for one hypothesis over another through the lens of a family of distributions. Modern versions of Fisher’s evidential use of the likelihood overcome multiplicity problems that arise in standard frequentism without resorting to a prior distribution.
  3. A related approach is to select the family of distributions using a modern information-theoretic reinterpretation of the likelihood function. In particular, the minimum description length principle extends the scope of Fisherian likelihood inference to the challenging problem of model selection.
Categories: Fragments, MDL