Archive for the ‘statistical evidence’ Category

Inference after eliminating Bayesian models of insufficient evidence

1 December 2016 Leave a comment

A Bayesian approach to informing decision makers

23 September 2016 Leave a comment

Frequentist inference principles

2 April 2016 Leave a comment
Reid, Nancy; Cox, David R.
On some principles of statistical inference.
Int. Stat. Rev. 83 (2015), no. 2, 293–308.
62A01 (62F05 62F15 62F25)


Reid and Cox bear the standard of a broad Fisherian school of frequentist statistics embracing not only time-tested confidence intervals and p values derived from parametric models, perfected by higher-order asymptotics, but also such developments as false discovery rates and modern versions of the fiducial argument [see S. Nadarajah, S. I. Bityukov and N. V. Krasnikov, Stat. Methodol. 22 (2015), 23–46; MR3261595]. To defend this confederation, they wield inference principles against rival visions of frequentism as well as against Bayesianism.
While agreeing with other frequentists on the necessity of guaranteeing good performance over repeated sampling, Reid and Cox also value neglected rules of inference such as the conditionality principle. Against the steady advance of nonparametric methods, Reid and Cox point to the interpretive power of parametric models.Frequentist decision theory is only mentioned. Glimpses of the authors’ perspectives on that appear in [D. R. Cox, Principles of statistical inference, Cambridge Univ. Press, Cambridge, 2006 (8.2); MR2278763 (2007g:62007)] and [N. M. Reid, Statist. Sci. 9 (1994), no. 3, 439–455; MR1325436 (95m:01020)].On the Bayes front, Reid and Cox highlight the success frequentist methods have enjoyed in scientific applications as a decisive victory over those Bayesian methods that are most consistent with their subjectivist foundations. Indeed, no one can deny what C. Howson and P. Urbach call the “social success” of frequentist methods [Scientific reasoning: the Bayesian approach, third edition, Open Court, Chicago, IL, 2005 (p. 9)]. Reid and Cox do not attribute their widespread use in scientific practice to political factors.

Rather, for scientific inference as opposed to individual decision making, they find frequentist methods more suitable in principle than fully Bayesian methods. For while the need for an agent to reach a decision recognizes no line between models of the phenomena under study and models of an agent’s thought, science requires clear reporting on the basis of the former without introducing biases from the latter. Although subjective considerations admittedly come into play in interpreting reports of statistical analyses, a dependence of the reports themselves on such considerations conflicts with scientific methodology. In short, the Bayesian theories supporting personal inference are irrelevant as far as science is concerned even if they are useful in personal decision making. This viewpoint stops short of that of Philip Stark, who went as far as to call the practicality of that private application of Bayesian inference into question [SIAM/ASA J. Uncertain. Quantif. 3 (2015), no. 1, 586–598; MR3372107].

On reference priors designed to minimize subjective input, Reid and Cox point out that those that perform well with low-dimensional parameters can fail in high dimensions. Eliminating the prior entirely leads to the pure likelihood approach, which, based on the strong likelihood principle, limits the scope even further, to problems with a scalar parameter of interest and no nuisance parameters [A. W. F. Edwards, Likelihood. An account of the statistical concept of likelihood and its application to scientific inference, Cambridge Univ. Press, London, 1972; MR0348869 (50 #1363)]. More recent developments of that approach were explained by R. M. Royall [Statistical evidence, Monogr. Statist. Appl. Probab., 71, Chapman & Hall, London, 1997; MR1629481 (99f:62012)] and C. A. Rohde [Introductory statistical inference with the likelihood function, Springer, Cham, 2014 (Chapter 18); MR3243684].

Reid and Cox see some utility in Bayesian methods that have good performance by frequentist standards, noting that such performance can require the prior to depend on which parameter happens to be of interest and, through model checking, on the data. Such dependence raises the question, “Is this, then, Bayesian? The prior distribution will then not represent prior knowledge of the parameter in [that] case, but an understanding of the model” [T. Schweder and N. L. Hjort, Scand. J. Statist. 29 (2002), no. 2, 309–332; MR1909788 (2003d:62085)].

Reviewed by David R. Bickel

This review first appeared at “On some principles of statistical inference” (Mathematical Reviews) and is used with permission from the American Mathematical Society.

Coherent inference after checking a prior

7 January 2016 Leave a comment

Inference after checking the prior & sampling model

1 September 2015 Leave a comment

D. R. Bickel, “Inference after checking multiple Bayesian models for data conflict and applications to mitigating the influence of rejected priors,” International Journal of Approximate Reasoning 66, 53–72 (2015). Simple explanation | Published version2014 preprint | Slides


The proposed procedure combines Bayesian model checking with robust Bayes acts to guide inference whether or not the model is found to be inadequate:

  1. The first stage of the procedure checks each model within a large class of models to determine which models are in conflict with the data and which are adequate for purposes of data analysis.
  2. The second stage of the procedure applies distribution combination or decision rules developed for imprecise probability.

This proposed procedure is illustrated by the application of a class of hierarchical models to a simple data set.

The link Simple explanation was added on 6 June 2017.

The likelihood principle as a relation

29 January 2015 Leave a comment

Evans, Michael
What does the proof of Birnbaum’s theorem prove? (English summary)
Electron. J. Stat. 7 (2013), 2645–2655.
62A01 (62F99)

According to Birnbaum’s theorem [A. D. Birnbaum, J. Amer. Statist. Assoc. 57 (1962), 269–326; MR0138176 (25 #1623)], compliance with the sufficiency principle and the conditionality principle of statistics would require compliance with the likelihood principle as well. The result appears paradoxical: whereas the first two principles seem reasonable in light of simple examples, the third is routinely violated in statistical practice. Although the theorem has provided ammunition for assaults on frequentist statistics [see, e.g., J. K. Ghosh, M. Delampady and T. K. Samanta, An introduction to Bayesian analysis, Springer Texts Statist., Springer, New York, 2006 (Section 2.4); MR2247439 (2007g:62003)], most Bayesian statisticians do not comply with it at all costs, as attested by current procedures of checking priors and assessing models more generally.
The author formalizes the theorem in terms of set theory to say that the likelihood relation is the equivalence relation generated by the union of the sufficiency relation and the conditionality relation. He finds the result trivial because it relies on extending the conditionality relation, itself intuitively appealing, to the equivalence relation it generates, which conflicts with usual frequentist reasoning and which may even be meaningless for statistical practice. This viewpoint is supported with a counterexample.
While some would regard the irrelevance of the theorem as repelling an attack on frequentist inference, emboldening the advancement of novel methods rooted in fiducial probability [R. Martin and C. Liu, Statist. Sci. 29 (2014), no. 2, 247–251; MR3264537; cf. J. Hannig, Statist. Sci. 29 (2014), no. 2, 254–258; MR3264539; S. Nadarajah, S. Bityukov and N. Krasnikov, Stat. Methodol. 22 (2015), 23–46; MR3261595], the author criticizes the conditionality principle as formalized by the conditionality relation. The problem he sees is that the equivalence relation generated by the conditionality relation and needed for the applicability of the theorem “is essentially equivalent to saying that it doesn’t matter which maximal ancillary we condition on and it is unlikely that this is acceptable to most frequentist statisticians”.
The author concludes by challenging frequentists to resolve the problems arising from the plurality of maximal ancillary statistics in light of the “intuitive appeal” of the conditionality relation. From the perspective of O. E. Barndorff-Nielsen [Scand. J. Statist. 22(1995), no. 4, 513–522; MR1363227 (96k:62010)], that might be accomplished by developing methods for summarizing and weighing “diverse pieces of evidence”, with some of that diversity stemming from the lack of a unique maximal ancillary statistic for conditional inference.

Reviewed by David R. Bickel


  1. Barndorff-Nielsen, O. E. (1995) Diversity of evidence and Birnbaum’s theorem (with discussion). Scand. J. Statist., 22(4), 513–522. MR1363227  MR1363227 (96k:62010) 
  2. Birnbaum, A. (1962) On the foundations of statistical inference (with discussion). J. Amer. Stat. Assoc., 57, 269–332. MR0138176  MR0138176 (25 #1623) 
  3. Cox, D. R. and Hinkley, D. V. (1974) Theoretical Statistics. Chapman and Hall. MR0370837  MR0370837 (51 #7060) 
  4. Durbin, J. (1970) On Birnbaum’s theorem on the relation between sufficiency, conditionality and likelihood. J. Amer. Stat. Assoc., 654, 395–398.
  5. Evans, M., Fraser, D. A. S. and Monette, G. (1986) On principles and arguments to likelihood (with discussion). Canad. J. of Statistics, 14, 3, 181–199. MR0859631  MR0859631 (87m:62017) 
  6. Gandenberger, G. (2012) A new proof of the likelihood principle. To appear in the British Journal for the Philosophy of Science.
  7. Halmos, P. (1960) Naive Set Theory. Van Nostrand Reinhold Co. MR0114756  MR0114756 (22 #5575) 
  8. Helland, I. S. (1995) Simple counterexamples against the conditionality principle. Amer. Statist., 49, 4, 351–356. MR1368487 MR1368487 (96h:62003) 
  9. Holm, S. (1985) Implication and equivalence among statistical inference rules. In Contributions to Probability and Statistics in Honour of Gunnar Blom. Univ. Lund, Lund, 143–155. MR0795054  MR0795054 (86k:62002) 
  10. Jang, G. H. (2011) The conditionality principle implies the sufficiency principle. Working paper.
  11. Kalbfleisch, J. D. (1975) Sufficiency and conditionality. Biometrika, 62, 251–259. MR0386075  MR0386075 (52 #6934) 
  12. Mayo, D. (2010) An error in the argument from conditionality and sufficiency to the likelihood principle. In Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D. Mayo and A. Spanos eds.). Cambridge University Press, Cambridge, 305–314. MR2640508  MR2640508 
  13. Robins, J. and Wasserman, L. (2000) Conditioning, likelihood, and coherence: A review of some foundational concepts. J. Amer. Stat. Assoc., 95, 452, 1340–1346. MR1825290  MR1825290

This review first appeared at “What does the proof of Birnbaum’s theorem prove?” (Mathematical Reviews) and is used with permission from the American Mathematical Society.

Model fusion & multiple testing in the likelihood paradigm

11 January 2015 Leave a comment

D. R. Bickel, “Model fusion and multiple testing in the likelihood paradigm: Shrinkage and evidence supporting a point null hypothesis,” Working Paper, University of Ottawa, deposited in uO Research at (2014). 2014 preprint | Supplement (link added 10 February 2015)

Errata for Theorem 4:

  1. The weights of evidence should not be conditional.
  2. Some of the equal signs should be “is a member of” signs.

Assessing multiple models

1 June 2014 Comments off

Integrated likelihood in light of de Finetti

13 January 2014 Leave a comment

Coletti, Giulianella; Scozzafava, Romano; Vantaggi, Barbara
Integrated likelihood in a finitely additive setting. (English summary) Symbolic and quantitative approaches to reasoning with uncertainty, 554–565, Lecture Notes in Comput. Sci., 5590, Springer, Berlin, 2009.
62A01 (62A99)

For an observed sample of data, the likelihood function specifies the probability or probability density of that observation as a function of the parameter value. Since each sample hypothesis corresponds to a single parameter value, the likelihood of any simple hypothesis is an uncontroversial function of the data and the model. However, there is no standard definition of the likelihood of a composite hypothesis, which instead corresponds to multiple parameter values. Such a definition could be useful not only for quantifying the strength of statistical evidence in favor of composite hypotheses that are faced in both science and law, but also for likelihood-based measures of corroboration and of explanatory power for epistemological research involving Popper’s critical rationalism or recent accounts of inference to the best explanation.
Interpreting the likelihood function under the coherence framework of de Finetti, this paper mathematically formulates the problem by defining the likelihood of a simple or composite hypothesis as a subjective probability of the observed data conditional on the truth of the hypothesis. In the probability theory of this framework, conditional probabilities given a hypothesis or event of probability zero are well defined, even for finite parameter sets. That differs from the familiar probability measures that Kolmogorov introduced for frequency-type probabilities, each of which, in the finite case, can only have zero probability mass if its event cannot occur. (The latter but not the former agrees in spirit with Cournot’s principle that an event of infinitesimally small probability is physically impossible.) Thus, in the de Finetti framework, the likelihood function assigns a conditional probability to each simple hypothesis, whether or not its probability is zero.
When the parameter set is finite, every coherent conditional probability of a sample of discrete data given a composite hypothesis is a weighted arithmetic mean of the conditional probabilities of the simple hypotheses that together constitute the composite hypothesis. In other words, the coherence constraint requires that the likelihood of a composite hypothesis be a linear combination of the likelihoods of its constituent simple hypotheses. Important special cases include the maximum and the minimum of the likelihood over the parameter set. They are made possible in the non-Kolmogorov framework by assigning zero probability to all of the simple hypotheses except those of maximum or minimum likelihood.
The main result of the paper extends this result to infinite parameter sets. In general, the likelihood of a composite hypothesis is a mixture of the likelihoods of its component simple hypotheses.

{For the entire collection see MR2907743 (2012j:68012).}

Reviewed by David R. Bickel

This review first appeared at “Integrated likelihood in a finitely additive setting” (Mathematical Reviews) and is used with permission from the American Mathematical Society.

Profile likelihood & MDL for measuring the strength of evidence

8 April 2013 Leave a comment