Archive

Archive for the ‘empirical Bayes’ Category

Local FDR estimation software

30 June 2012 1 comment

LFDRenrich is a suite of R functions for the estimation of local false discovery rates by maximum likelihood under a two-component or three-component parametric mixture model of 2X2 tables such as those used in gene enrichment analyses.

LFDRhat is a more general suite of R functions for the estimation of local false discovery rates by maximum likelihood under a two-component or three-component parametric mixture model.

Effect-size estimates from hypothesis probabilities

25 February 2012 Leave a comment

D. R. Bickel, “Empirical Bayes interval estimates that are conditionally equal to unadjusted confidence intervals or to default prior credibility intervals,” Statistical Applications in Genetics and Molecular Biology 11 (3), art. 7 (2012). Full article | 2010 preprint

image
The method contributed in this paper adjusts confidence intervals in multiple-comparison problems according to the estimated local false discovery rate. This shrinkage method performs substantially better than standard confidence intervals under the independence of the data across comparisons. A special case of the confidence intervals is the posterior median, which provides an improved method of ranking biological features such as genes, proteins, or genetic variants. The resulting ranks of features lead to better prioritization of which features to investigate further.

Estimating probabilities of enrichment

4 January 2012 Leave a comment

Z. Yang, Z. Li, and D. R. Bickel, “Empirical Bayes estimation of posterior probabilities of enrichment,” Technical Report, Ottawa Institute of Systems Biology, Technical Report, Ottawa Institute of Systems Biology, arXiv:1201.0153 (2011). Full preprint | 2010 seed

This paper adapts novel empirical Bayes methods for the problem of detecting enrichment in the form of differential representation of genes associated with a biological category with respect to a list of genes identified as differentially expressed. A microarray case study illustrates the methods using Gene Ontology (GO) terms, and a simulation study compares their performance. We report that which enrichment methods work best depends strongly on how many GO terms or other biological categories are of interest.

Combining inferences from different methods

28 November 2011 Leave a comment

D. R. Bickel, “Resolving conflicts between statistical methods by probability combination: Application to empirical Bayes analyses of genomic data,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1111.6174 (2011). Full preprint

This paper proposes a solution to the problem of combining the results of differing statistical methods that may legitimately be used to analyze the same data set. The motivating application is the combination of two estimators of the probability of differential gene expression: one uses an empirical null distribution, and the other uses the theoretical null distribution. Since there is usually not any reliable way to predict which null distribution will perform better for a given data set and since the choice between them often has a large impact on the conclusions, the proposed hedging strategy addresses a pressing need in statistical genomics. Many other applications are also mentioned in the abstract and described in the introduction.

Software for local false discovery rate estimation

15 August 2011 Leave a comment

LFDR-MLE is a suite of R functions for the estimation of local false discovery rates by maximum likelihood under a two-group parametric mixture model of test statistics.

All-scale FDR estimation

24 June 2011 Leave a comment

D. R. Bickel, “Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1106.4490 (2011). Full preprint

To address multiple comparison problems in small-to-high-dimensional biology, this paper introduces novel estimators of the local false discovery rate (LFDR), reports their main properties, and illustrates their use with proteomics data. Unlike previous LFDR estimators, the new estimators have all of the following advantages:

  1. proven asymptotic conservatism;
  2. simplicity of calculation without the tuning of smoothing parameters;
  3. no strong parametric assumptions;
  4. applicability to very small numbers of hypotheses as well as to very large numbers of hypotheses.

Observed confidence levels for microarrays, etc.

22 June 2011 Leave a comment

D. R. Bickel, “Estimating the null distribution to adjust observed confidence levels for genome-scale screening,” Biometrics 67, 363-370 (2011). Abstract and article | French abstract | Supplementary material | Simple explanation

image

This paper describes the first application of observed confidence levels to data of high-dimensional biology. The proposed method for multiple comparisons can take advantage of the estimated null distribution without any prior distribution. The new method is applied to microarray data to illustrate its advantages.

Unknown Bayes factor approximation

5 April 2011 Leave a comment

D. R. Bickel, “Measuring support for a hypothesis about a random parameter without estimating its unknown prior,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1101.0305 (2011). Full preprint

Small-scale inference

5 April 2011 Leave a comment

D. R. Bickel, “Small-scale inference: Empirical Bayes and confidence methods for as few as a single comparison,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1104.0341 (2011). Full preprint

Parametric empirical Bayes methods of estimating the local false discovery rate by maximum likelihood apply not only to the multiple comparison settings for which they were developed, but, with a simple modification, also to small numbers of comparisons. In fact, data for a single comparison are sufficient under broad conditions, as seen from applications to measurements of the abundance levels of 20 proteins and from simulation studies with confidence-based inference as the competitor.

Confidence intervals for semi-parametric empirical Bayes

7 January 2011 2 comments

D. R. Bickel, “Large-scale interval and point estimates from an empirical Bayes extension of confidence posteriors,” Technical Report, Ottawa Institute of Systems Biology, arXiv:1012.6033 (2010). Full preprint

To address multiple comparison problems in high-dimensional biology, this paper introduces shrunken point estimates for feature prioritization and shrunken confidence intervals to indicate the uncertainty of the point estimates. The new point and interval estimates are applied to gene expression data and are found to be conservative by simulation, as expected from limiting cases. Unlike the parametric empirical Bayes estimates, the new estimates are compatible with the semi-parametric approach to local false discovery rate estimation that has been extensively developed and applied over the last decade. This is carried out by replacing strong parametric assumptions with the confidence posterior theory of papers in the presses of Biometrics and Communications in Statistics — Theory and Methods.