## The generalized fiducial distribution: A kinder, more objective posterior?

**MR3561954**

Generalized fiducial inference: a review and new results. (English summary)

*J. Amer. Statist. Assoc.*111 (2016), no. 515, 1346–1361.

62A01 (62F99 62G05 62J05)

Other approaches to fiducial inference bring subjectivity more to the forefront. For example, G. N. Wilkinson had highlighted the incoherence of fiducial distributions formulated in a more Fisherian flavor [J. Roy. Statist. Soc. Ser. B 39 (1977), no. 2, 119–171; MR0652326]. More recently, R. J. Bowater [AStA Adv. Stat. Anal. 101 (2017), no. 2, 177–197] endorsed an explicitly subjective interpretation of fiducial probability. For the place of generalized fiducial inference in the context of other fiducial approaches, see [D. L. Sonderegger and J. Hannig, in Contemporary developments in statistical theory, 155–189, Springer Proc. Math. Stat., 68, Springer, Cham, 2014; MR3149921] and the papers it {MR3149921} cites.

- A weak-limit definition of a generalized fiducial distribution.
- Sufficient conditions for a generalized fiducial distribution to have asymptotic frequentist coverage.
- Novel formulas for computing a generalized fiducial distribution and a fiducial probability of a model.

The fiducial probability of a model is applicable to both model selection and model averaging. A seemingly different fiducial method of averaging statistical models was independently proposed by D. R. Bickel [“A note on fiducial model averaging as an alternative to checking Bayesian and frequentist models”, preprint, Fac. Sci. Math. Stat., Univ. Ottawa, 2015].

Reviewed by David R. Bickel

## Against ideological philosophies of probability

Burdzy, Krzysztof

Resonance—from probability to epistemology and back. *Imperial College Press, London,* 2016. xx+408 pp. ISBN: 978-1-78326-920-4

60A05 (00A30 03A10 62A01)

Burdzy defines probability in terms of six “laws of probability”, intended as an accurate description of how probability is used in science (pp. 8–9, 217). Unlike the axiomatic systems from Kolmogorov onward that are distinct from their potential applications [see A. Rényi, Rev. Inst. Internat. Statist 33 (1965), 1–14; MR0181483], the laws require that mathematical probability by definition agree with features of objective events. Potentially subject to scientific or philosophical refutation (pp. 258–259), the laws are analogous to Maxwell’s equations (p. 222). The testable claim is that they accurately describe science’s use of epistemic probabilities as well as physical probabilities (pp. 259–261).

Laws 3, 4, and 6 are especially physical. Burdzy argues that probability theory could not be applied if symmetries such as physical independence (Law 3) could not be recognized and tentatively accepted by resonance (Section 11.4). Such symmetries do not include the law of the iterated logarithm or many other properties of Martin-Löf sequences, which he finds “totally useless from the practical point of view” (Section 4.14). Law 4, the requirement that assigning equal probabilities should be based on known physical symmetries rather than on ignorance (Section 11.25), echoes R. Chuaqui Kettlun’s Truth, possibility and probability [North-Holland Math. Stud., 166, North-Holland, Amsterdam, 1991 (Sections III.2 and XX.3); MR1159708]. Law 6 needs some qualification or further explanation since it does not apply directly to continuous random variables: “An event has probability 0 if and only if it cannot occur. An event has probability 1 if and only if it must occur” (p. 217).

There is some dissonance in applications to statistics. On the frequentist side, a confidence interval with a high level of confidence should be used to predict that the parameter value lies within the observed confidence interval (Section 11.11, as explained by pp. 292, 294). Even though that generalizes predicting that the parameter values corresponding to rejected null hypotheses are not equal to the true parameter value, Burdzy expresses doubt about how to formalize hypothesis testing in terms of prediction (Section 13.4). His predictive-testing idea may be seen as an application of Cournot’s principle (pp. 22, 278; see [M. R. Fréchet, Les mathématiques et le concret, Presses Univ. France, Paris, 1955 (pp. 201–202, 209–213, 216–217, 221); MR0075110]). On the Bayesian side, Burdzy concedes that priors based on resonance often work well and yet judges them too susceptible to prejudice for scientific use (Section 14.4.3). By ridiculing subjective Bayesian theory as if it legitimized assigning probabilities at will (Section 7.1), Burdzy calls attention to its failure to specify all criteria for rational probability assignment.

Burdzy adds color to the text with random references to religion from the perspective of an atheistic probabilist who left Catholicism (p. 178). Here are some representative examples. First, in contrast to attempts to demonstrate that an objective probability of God’s existence is low [R. Dawkins, The God delusion, Bantam Press, 2006] or high [R. Swinburne, The resurrection of God incarnate, Clarendon Press, Oxford, 2003], he denies the feasibility of computing such a probability (Section 16.7). Second, Burdzy is convinced that religions, like communism, philosophical theories of probability, and other secular ideologies, have inconsistencies to the point of hypocrisy, insisting that his “resonance’ theory” (p. 13) is not an ideology (Chapter 15), much as D. V. Lindley denied that his Bayesianism is a religion [Understanding uncertainty, revised edition, Wiley Ser. Probab. Stat., Wiley, Hoboken, NJ, 2014 (pp. 380–381); MR3236718]. Lastly, Burdzy attributes the infinite consequences of underlying Pascal’s Wager to efforts to deceive and manipulate (Section 16.2.2). However, documenting the historical origins of teachings of eternal bliss and eternal retribution on the basis of primitive Christian and pre-Christian sources lies far beyond the scope of the book.

Under the resonance banner, this probabilist rushes in with a unique barrage of controversial and well-articulated philosophical claims with implications for science and beyond. Those resisting will find themselves challenged to counter with alternative solutions to the problems raised.

Reviewed by David R. Bickel

## Entropies of a posterior of the success probability

Kelbert, M.; Mozgunov, P.

Asymptotic behaviour of the weighted Renyi, Tsallis and Fisher entropies in a Bayesian problem. (English summary)

Eurasian Math. J. 6 (2015), no. 2, 6–17.

94A17 (62B10 62C10)

This paper considers a weighted version of the differential entropy of the posterior distribution of the probability of success conditional on the observed value of a binomial random variable. The uniform (0,1)prior distribution of the success probability is used to derive large-sample results.

The weighting function allows emphasizing some values of the parameter more than other values. For example, since the success probability value of 1/2 has special importance in many applications, that parameter value may be assigned a higher weight than the others. This differs from the more common Bayesian approach of assigning more prior probability to certain parameter values.

The author proves asymptotic properties not only of the weighted differential entropy but also of weighted differential versions of the Renyi, Tsallis, and Fisher definitions of entropy or information. The results are concrete in that they are specifically derived for the posterior distribution of the success probability given the uniform prior.

Reviewed by David R. Bickel

## Entropy sightings

Entropy and its many avatars. (English summary)

*J. Math. Soc. Japan*67 (2015), no. 4, 1845–1857.

94A17 (37A35 60-02 60K35 82B05)

The author, a chief architect of the theory of large deviations, chronicles several manifestations of entropy. It made appearances in the realms indicated by these section headings:

- Entropy and information theory
- Entropy and dynamical systems
- Relative entropy and large deviations
- Entropy and duality
- Log Sobolev inequality
- Gibbs states
- Interacting particle systems

The topics are connected whenever a concept introduced in one section is treated in more depth in a later section. In this way, relative entropy is seen to play a key role in large deviations, Gibbs states, and systems of interacting particles.

Less explicit connections are left to the reader’s enjoyment and education. For example, the relation between Boltzmann entropy and Shannon entropy in the information theory section is a special case both of Sanov’s theorem, presented in the section on large deviations, and of the relation of free energy and relative entropy, in the section on Gibbs states.

The paper ends with a tribute to Professor Kiyosi Itô.

Reviewed by David R. Bickel

**References**

- J. Axzel and Z. Daroczy, On Measures of Information and Their Characterizations, Academic Press, New York, 1975. MR0689178
- L. Boltzmann, Über die Mechanische Bedeutung des Zweiten Hauptsatzes der Wärmetheorie, Wiener Berichte, 53 (1866), 195–220.
- R. Clausius, Théorie mécanique de la chaleur, lère partie, Paris: Lacroix, 1868.
- H. Cramer, On a new limit theorem in the theory of probability, Colloquium on the Theory of Probability, Hermann, Paris, 1937.
- J. D. Deuschel and D. W. Stroock, Large deviations, Pure and Appl. Math., 137, Academic Press, Inc., Boston, MA, 1989, xiv+307 pp. MR0997938
- M. D. Donsker and S. R. S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time, IV, Comm. Pure Appl. Math., 36 (1983), 183–212. MR0690656
- A. Feinstein, A new basic theorem of information theory, IRE Trans. Information Theory PGIT-4 (1954), 2–22. MR0088413
- L. Gross, Logarithmic Sobolev inequalities, Amer. J. Math., 97 (1975), 1061–1083. MR0420249
- M. Z. Guo, G. C. Papanicolaou and S. R. S. Varadhan, Nonlinear diffusion limit for a system with nearest neighbor interactions, Comm. Math. Phys., 118 (1988), 31–59. MR0954674
- A. I. Khinchin, On the fundamental theorems of information theory, Translated by Morris D. Friedman, 572 California St., Newtonville MA 02460, 1956, 84 pp. MR0082924
- A. N. Kolmogorov, A new metric invariant of transitive dynamical systems and automorphisms of Lebesgue spaces, (Russian) Topology, ordinary differential equations, dynamical systems, Trudy Mat. Inst., Steklov., 169 (1985), 94–98, 254. MR0836570
- O. Lanford, Entropy and equilibrium states in classical statistical mechanics, Statistical Mechanics and Mathematical Problems, Lecture notes in Physics, 20, Springer-Verlag, Berlin and New York, 1971, 1–113.
- D. S. Ornstein, Ergodic theory, randomness, and dynamical systems, James K. Whittemore Lectures in Mathematics given at Yale University, Yale Mathematical Monographs, No. 5. Yale University Press, New Haven, Conn.-London, 1974, vii+141 pp. MR0447525
- I. N. Sanov, On the probability of large deviations of random magnitudes, (Russian) Mat. Sb. (N. S.), 42 (84) (1957), 11–44. MR0088087
- C. E. Shannon, A mathematical theory of communication, Bell System Tech. J., 27 (1948), 379–423, 623–656. MR0026286
- Y. G. Sinai, On a weak isomorphism of transformations with invariant measure, (Russian) Mat. Sb. (N.S.), 63 (105) (1964), 23–42. MR0161961
- H. T. Yau, Relative entropy and hydrodynamics of Ginzburg-Landau models, Lett. Math. Phys., 22 (1991), 63–80. MR1121850

## Frequentist inference principles

On some principles of statistical inference.

*Int. Stat. Rev.*83 (2015), no. 2, 293–308.

62A01 (62F05 62F15 62F25)

While agreeing with other frequentists on the necessity of guaranteeing good performance over repeated sampling, Reid and Cox also value neglected rules of inference such as the conditionality principle. Against the steady advance of nonparametric methods, Reid and Cox point to the interpretive power of parametric models.Frequentist decision theory is only mentioned. Glimpses of the authors’ perspectives on that appear in [D. R. Cox, Principles of statistical inference, Cambridge Univ. Press, Cambridge, 2006 (8.2); MR2278763 (2007g:62007)] and [N. M. Reid, Statist. Sci. 9 (1994), no. 3, 439–455; MR1325436 (95m:01020)].On the Bayes front, Reid and Cox highlight the success frequentist methods have enjoyed in scientific applications as a decisive victory over those Bayesian methods that are most consistent with their subjectivist foundations. Indeed, no one can deny what C. Howson and P. Urbach call the “social success” of frequentist methods [Scientific reasoning: the Bayesian approach, third edition, Open Court, Chicago, IL, 2005 (p. 9)]. Reid and Cox do not attribute their widespread use in scientific practice to political factors.

Rather, for scientific inference as opposed to individual decision making, they find frequentist methods more suitable in principle than fully Bayesian methods. For while the need for an agent to reach a decision recognizes no line between models of the phenomena under study and models of an agent’s thought, science requires clear reporting on the basis of the former without introducing biases from the latter. Although subjective considerations admittedly come into play in interpreting reports of statistical analyses, a dependence of the reports themselves on such considerations conflicts with scientific methodology. In short, the Bayesian theories supporting personal inference are irrelevant as far as science is concerned even if they are useful in personal decision making. This viewpoint stops short of that of Philip Stark, who went as far as to call the practicality of that private application of Bayesian inference into question [SIAM/ASA J. Uncertain. Quantif. 3 (2015), no. 1, 586–598; MR3372107].

On reference priors designed to minimize subjective input, Reid and Cox point out that those that perform well with low-dimensional parameters can fail in high dimensions. Eliminating the prior entirely leads to the pure likelihood approach, which, based on the strong likelihood principle, limits the scope even further, to problems with a scalar parameter of interest and no nuisance parameters [A. W. F. Edwards, Likelihood. An account of the statistical concept of likelihood and its application to scientific inference, Cambridge Univ. Press, London, 1972; MR0348869 (50 #1363)]. More recent developments of that approach were explained by R. M. Royall [Statistical evidence, Monogr. Statist. Appl. Probab., 71, Chapman & Hall, London, 1997; MR1629481 (99f:62012)] and C. A. Rohde [Introductory statistical inference with the likelihood function, Springer, Cham, 2014 (Chapter 18); MR3243684].

Reid and Cox see some utility in Bayesian methods that have good performance by frequentist standards, noting that such performance can require the prior to depend on which parameter happens to be of interest and, through model checking, on the data. Such dependence raises the question, “Is this, then, Bayesian? The prior distribution will then not represent prior knowledge of the parameter in [that] case, but an understanding of the model” [T. Schweder and N. L. Hjort, Scand. J. Statist. 29 (2002), no. 2, 309–332; MR1909788 (2003d:62085)].

Reviewed by David R. Bickel

This review first appeared at “On some principles of statistical inference” (Mathematical Reviews) and is used with permission from the American Mathematical Society.

## Meaningful constraints and meaningless priors

Constraints versus priors.

*SIAM/ASA J. Uncertain. Quantif.*3 (2015), no. 1, 586–598.

62A01 (62C10 62C20 62G15)

In this lucid expository paper, Stark advances several arguments for using frequentist methods instead of Bayesian methods in statistical inference and decision problems. The main examples involve restricted-parameter problems, those of inferring the value of a parameter of interest that is constrained to lie in an unusually restrictive set. When the parameter is restricted, frequentist methods can lead to solutions markedly different from those of Bayesian methods. For even when the prior distribution is a default intended to be weakly informative, it actually carries substantial information.

Stark calls routine Bayesian practice into question since priors are not selected according to the analyst’s beliefs but rather for reasons that have no apparent support from the Dutch book argument, the featured rationale for Bayesianism as a rational norm (pp. 589–590; [see D. V. Lindley, *Understanding uncertainty*, revised edition, Wiley Ser. Probab. Stat., Wiley, Hoboken, NJ, 2014; MR3236718]). Uses of the prior beyond the scope of the paper include those encoding (1) empirical Bayes estimates of parameter variability [e.g., B. Efron, *Large-scale inference*, Inst. Math. Stat. Monogr., 1, Cambridge Univ. Press, Cambridge, 2010; MR2724758 (2012a:62006)], (2) the beliefs of subject-matter experts [e.g., A. O’Hagan et al., *Uncertain judgements: eliciting experts’ probabilities*, Wiley, West Sussex, 2006, doi:10.1002/0470033312], or (3) the beliefs of archetypical agents of wide scientific interest [e.g., D. J. Spiegelhalter, K. R. Abrams and J. P. Myles, *Bayesian approaches to clinical trials and health-care evaluation*, Wiley, West Sussex, 2004 (Section 5.5), doi:10.1002/0470092602].

Stark finds Bayesianism to lack not only normative force but also descriptive power. He stresses that he does not know anyone who updates personal beliefs according to Bayes’s theorem in everyday life (pp. 588, 590).

In the conclusions section, Stark asks, “Which is the more interesting question: what would happen if Nature generated a new value of the parameter and the data happened to remain the same, or what would happen for the same value of the parameter if the measurement were repeated?” For the Bayesian who sees parameter distributions more in terms of beliefs than random events, the missing question is, “What should one believe about the value of a parameter given what happened and the information encoded in the prior and other model specifications?” That question would interest Stark only to the extent that the prior encodes meaningful information (p. 589).

Reviewed by David R. Bickel

This review first appeared at “Constraints versus priors” (Mathematical Reviews) and is used with permission from the American Mathematical Society.