Archive

Archive for the ‘reviews’ Category

Understanding Uncertainty (by Lindley)—a review

1 October 2015 Leave a comment

Lindley, Dennis V.
Understanding uncertainty.
Revised edition. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, 2014. xvi+393 pp. ISBN: 978-1-118-65012-7
62A99 (62C05 62C10)

In Understanding uncertainty, Dennis Lindley ably defends subjective Bayesianism, the thesis that decisions in the presence of uncertainty can only be guaranteed to cohere if made according to probabilities as degrees of someone’s beliefs. True to form, he excludes all other mathematical theories of modeling uncertainty, including subjective theories of imprecise probability that share the goal of coherent decision making [see M. C. M. Troffaes and G. de Cooman, Lower previsions, Wiley Ser. Probab. Stat., Wiley, Chichester, 2014; MR3222242].

In order to engage everyone interested in making better decisions in the presence of uncertainty, Lindley writes without the citations and cluttered notation of a research paper. His straightforward, disarming style advances the thesis that subjective probability saves uncertainty from getting lost in the fog of reasoning in natural-language arguments. A particularly convincing argument is that the reader who makes decisions in conflict with the strict Bayesian viewpoint will be vulnerable to a Dutch book comprising undesirable consequences regardless of the true state of the world (5.7). The axioms needed for the underlying theorem are confidently presented as self-evident.

Like many strict Bayesians, Lindley makes no appeal to epistemological or psychological literature supporting the alignment of belief and probability. In fact, he dismisses studies indicating that actual human beliefs can deviate markedly from the requirements of strict Bayesianism, likening them to studies indicating that people make errors in arithmetic (2.5; 9.12).

The relentlessly pursued thesis is nuanced by the clarification that strict Bayesianism is not an inviolable recipe for automatic decisions but rather a box of tools that can only be used effectively when controlled by human judgment or “art” in modeling (11.7). For example, when Lindley intuitively finds that the prior distribution under his model conflicts with observations, he reframes its prior probabilities as conditional on the truth of the original model by crafting a larger model. Such ingenuity demonstrates that Bayesian probability calculations cannot shackle his actual beliefs. (This suggests that mechanically following the Dutch book argument to the point of absurdity might not discredit strict Bayesianism as decisively as thought.) Similarly, Frank Lad, called “the purest of the pure” [G. Shafer, J. Am. Stat. Assoc. 94 (1999), no. 446, 645–656 (pp. 648–649), doi:10.1080/01621459.1999.10474158] and the best-informed [D. V. Lindley, J. Royal Stat. Soc. Ser. D 49 (2000), no. 3, 293–337] of the advocates of this school, permits replacing a poorly predicting model with one that reflects “a new understanding”, an enlightenment that no algorithm can impart [F. Lad, Operational subjective statistical methods, Wiley Ser. Probab. Statist. Appl. Probab. Statist., Wiley, New York, 1996 (6.6.4); MR1421323 (98m:62009)]. Leonard Savage, a leading critic of non-Bayesian statistical methods, likewise admitted that he was “unable to formulate criteria for selecting these small worlds [in which strict Bayesianism applies] and indeed believe[d] that their selection may be a matter of judgment and experience about which it is impossible to enunciate complete and sharply defined general principles” [L. J. Savage, The foundations of statistics, Wiley, New York, 1954 (2.5); MR0063582 (16,147a)]. The Bayesian lumberjacks have evidently learned when to stop chopping and sharpen the axe. This recalls the importance of the skill of the scientist as handed down and developed within the guild of scientists and never quite articulated, let alone formalized [M. Polanyi, Personal knowledge: towards a post-critical philosophy, Univ. Chicago Press, Chicago, IL, 1962]. The explicit acknowledgement of the role of this tacit knowledge in science may serve as a warning against relying on statistical models as if they were not only useful but also right [see M. van der Laan, Amstat News 2015, no. 452, 29–30].

While the overall argument for strict Bayesianism will command the assent of many readers, some will wonder whether there are equally compelling counter-arguments that would explain why so few statisticians work under that viewpoint. That doubt will be largely offset by the considerable authority Lindley has earned as one of the preeminent developers of the statistics discipline as it is known today. His many enduring contributions to the field include two that shed light on the chasm between Bayesian and frequentist probabilities: (1) the presentation of what is known as “Lindley’s paradox” [D. V. Lindley, Biometrika 44 (1957), no. 1-2, 187–192, doi:10.1093/biomet/44.1-2.187] and (2) specifying the conditions a scalar-parameter fiducial or confidence distribution must satisfy to be a Bayesian posterior distribution [D. V. Lindley, J. Royal Stat. Soc. Ser. B 20 (1958), 102–107; MR0095550 (20 #2052)].

Treading into unresolved controversies well outside his discipline, Lindley shares his simple philosophy of science and offers his opinions on how to apply Bayesianism to law, politics, and religion. He invites his readers to share his hope that if people communicate their beliefs and interests in strict Bayesian terms, they would quarrel less (1.7; 10.7), especially if they adopt his additional advice to consider their own religious beliefs to be uncertain (1.4). Lindley even holds forth the teaching that Jesus is the Son of God as having a probability equal to each reader’s degree of belief in its truth but stops short of assessing the utilities needed to place Pascal’s Wager (1.2).

Graduate students in statistics will benefit from Lindley’s introductions to his paradox, explained in Section 14.4 to discredit frequentist hypothesis testing, and the conglomerable rule in Section 12.9. These friendly and concise introductions could effectively supplement a textbook such as [J. B. Kadane, Principles of uncertainty, Texts Statist. Sci. Ser., CRC Press, Boca Raton, FL, 2011; MR2799022 (2012g:62001)], a much more detailed appeal for strict Bayesianism.

On the other hand, simpler works such as [J. S. Hammond, R. L. Keeney and H. Raiffa, Smart choices: a practical guide to making better decisions, Harvard Bus. School Press, Boston, MA, 1999] may better serve as stand-alone guides to mundane decision making. Bridging the logical gap between decision making rules of thumb and mathematical statistics, Understanding uncertainty excels as a straightforward and sensible defense of the strict Bayesian viewpoint. Appreciating Lindley’s stance in all its theoretical simplicity and pragmatic pliability is essential for grasping both the recent history of statistics and the more complex versions of Bayesianism now used by statisticians, scientists, philosophers, and economists.

{For the original edition see [D. V. Lindley, Understanding uncertainty, Wiley, Hoboken, NJ, 2006].}

Reviewed by David R. Bickel

This review first appeared at Understanding Uncertainty (Mathematical Reviews) and is used with permission from the American Mathematical Society.

Fiducial nonparametrics

24 August 2015 Leave a comment

Sonderegger, Derek L.; Hannig, Jan
Fiducial theory for free-knot splines. Contemporary developments in statistical theory, 155–189,
Springer Proc. Math. Stat., 68, Springer, Cham, 2014.
62F12 (62F10 62F99 65D07)

This chapter provides both asymptotic and finite-sample properties of a fiducial solution to the problem of free-knot splines with four or more degrees, assuming a known number of knot points. The authors lay a foundation for the solution by proving the asymptotic normality of certain multivariate fiducial estimators. After demonstrating that the fiducial solution meets the sufficient conditions for asymptotic normality, they quantify small-sample performance on the basis of simulations. The authors conclude that fiducial inference provides a promising alternative to Bayesian inference for the free-knot spline problem addressed.
The research reported reflects the recent surge in developments of Fisher’s fiducial argument [S. Nadarajah, S. Bityukov and N. Krasnikov, Stat. Methodol. 22 (2015), 23–46; MR3261595]. The work of this chapter is carried out within the framework of generalized fiducial inference [J. Hannig, Statist. Sinica 19 (2009), no. 2, 491–544; MR2514173 (2010h:62071)], which is built on the functional-model formulation of fiducial statistics [A. P. Dawid, M. Stone and M. Stone, Ann. Statist. 10 (1982), no. 4, 1054–1074; MR0673643 (83m:62008)] rather than on the broadly equivalent confidence-based tradition beginning with [G. N. Wilkinson, J. Roy. Statist. Soc. Ser. B 39 (1977), no. 2, 119–171; MR0652326 (58 #31491)] and generalized by [E. E. M. van Berkum, H. N. Linssen and D. Overdijk, J. Statist. Plann. Inference 49 (1996), no. 3, 305–317; MR1381161 (97k:62007)].

{For the entire collection see MR3149911.}

Reviewed by David R. Bickel

This review first appeared at “Fiducial theory for free-knot splines” (Mathematical Reviews) and is used with permission from the American Mathematical Society.

The likelihood principle as a relation

29 January 2015 Leave a comment

Evans, Michael
What does the proof of Birnbaum’s theorem prove? (English summary)
Electron. J. Stat. 7 (2013), 2645–2655.
62A01 (62F99)

According to Birnbaum’s theorem [A. D. Birnbaum, J. Amer. Statist. Assoc. 57 (1962), 269–326; MR0138176 (25 #1623)], compliance with the sufficiency principle and the conditionality principle of statistics would require compliance with the likelihood principle as well. The result appears paradoxical: whereas the first two principles seem reasonable in light of simple examples, the third is routinely violated in statistical practice. Although the theorem has provided ammunition for assaults on frequentist statistics [see, e.g., J. K. Ghosh, M. Delampady and T. K. Samanta, An introduction to Bayesian analysis, Springer Texts Statist., Springer, New York, 2006 (Section 2.4); MR2247439 (2007g:62003)], most Bayesian statisticians do not comply with it at all costs, as attested by current procedures of checking priors and assessing models more generally.
The author formalizes the theorem in terms of set theory to say that the likelihood relation is the equivalence relation generated by the union of the sufficiency relation and the conditionality relation. He finds the result trivial because it relies on extending the conditionality relation, itself intuitively appealing, to the equivalence relation it generates, which conflicts with usual frequentist reasoning and which may even be meaningless for statistical practice. This viewpoint is supported with a counterexample.
While some would regard the irrelevance of the theorem as repelling an attack on frequentist inference, emboldening the advancement of novel methods rooted in fiducial probability [R. Martin and C. Liu, Statist. Sci. 29 (2014), no. 2, 247–251; MR3264537; cf. J. Hannig, Statist. Sci. 29 (2014), no. 2, 254–258; MR3264539; S. Nadarajah, S. Bityukov and N. Krasnikov, Stat. Methodol. 22 (2015), 23–46; MR3261595], the author criticizes the conditionality principle as formalized by the conditionality relation. The problem he sees is that the equivalence relation generated by the conditionality relation and needed for the applicability of the theorem “is essentially equivalent to saying that it doesn’t matter which maximal ancillary we condition on and it is unlikely that this is acceptable to most frequentist statisticians”.
The author concludes by challenging frequentists to resolve the problems arising from the plurality of maximal ancillary statistics in light of the “intuitive appeal” of the conditionality relation. From the perspective of O. E. Barndorff-Nielsen [Scand. J. Statist. 22(1995), no. 4, 513–522; MR1363227 (96k:62010)], that might be accomplished by developing methods for summarizing and weighing “diverse pieces of evidence”, with some of that diversity stemming from the lack of a unique maximal ancillary statistic for conditional inference.

Reviewed by David R. Bickel

References

  1. Barndorff-Nielsen, O. E. (1995) Diversity of evidence and Birnbaum’s theorem (with discussion). Scand. J. Statist., 22(4), 513–522. MR1363227  MR1363227 (96k:62010) 
  2. Birnbaum, A. (1962) On the foundations of statistical inference (with discussion). J. Amer. Stat. Assoc., 57, 269–332. MR0138176  MR0138176 (25 #1623) 
  3. Cox, D. R. and Hinkley, D. V. (1974) Theoretical Statistics. Chapman and Hall. MR0370837  MR0370837 (51 #7060) 
  4. Durbin, J. (1970) On Birnbaum’s theorem on the relation between sufficiency, conditionality and likelihood. J. Amer. Stat. Assoc., 654, 395–398.
  5. Evans, M., Fraser, D. A. S. and Monette, G. (1986) On principles and arguments to likelihood (with discussion). Canad. J. of Statistics, 14, 3, 181–199. MR0859631  MR0859631 (87m:62017) 
  6. Gandenberger, G. (2012) A new proof of the likelihood principle. To appear in the British Journal for the Philosophy of Science.
  7. Halmos, P. (1960) Naive Set Theory. Van Nostrand Reinhold Co. MR0114756  MR0114756 (22 #5575) 
  8. Helland, I. S. (1995) Simple counterexamples against the conditionality principle. Amer. Statist., 49, 4, 351–356. MR1368487 MR1368487 (96h:62003) 
  9. Holm, S. (1985) Implication and equivalence among statistical inference rules. In Contributions to Probability and Statistics in Honour of Gunnar Blom. Univ. Lund, Lund, 143–155. MR0795054  MR0795054 (86k:62002) 
  10. Jang, G. H. (2011) The conditionality principle implies the sufficiency principle. Working paper.
  11. Kalbfleisch, J. D. (1975) Sufficiency and conditionality. Biometrika, 62, 251–259. MR0386075  MR0386075 (52 #6934) 
  12. Mayo, D. (2010) An error in the argument from conditionality and sufficiency to the likelihood principle. In Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D. Mayo and A. Spanos eds.). Cambridge University Press, Cambridge, 305–314. MR2640508  MR2640508 
  13. Robins, J. and Wasserman, L. (2000) Conditioning, likelihood, and coherence: A review of some foundational concepts. J. Amer. Stat. Assoc., 95, 452, 1340–1346. MR1825290  MR1825290

This review first appeared at “What does the proof of Birnbaum’s theorem prove?” (Mathematical Reviews) and is used with permission from the American Mathematical Society.

Causality, Probability, and Time (by Kleinberg)—a review

8 August 2014 Leave a comment

Kleinberg, Samantha
Causality, probability, and time. Cambridge University Press, Cambridge, 2013. viii+259 pp. ISBN: 978-1-107-02648-3
60A99 (03A05 03B48 62A01 62P99 68T27 91G80 92C20)

This informative and engaging book introduces a novel method of inferring a cause of an event on the basis of the assumption that each cause changes the frequency-type probability of some effect occurring later in time. Unlike most previous approaches to causal inference, the author explicitly models time lags between causes and effects since timing is often crucial to effective prediction and control.
Arguably an equally valuable contribution of the book is its integration of relevant work in philosophy, computer science, and statistics. While the first two disciplines have benefited from the productive interactions exemplified in [J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference, Morgan Kaufmann Ser. Represent. Reason., Morgan Kaufmann, San Mateo, CA, 1988; MR0965765 (90g:68003)] and [J. Williamson, Bayesian nets and causality, Oxford Univ. Press, Oxford, 2005; MR2120947 (2005k:68198)], the statistics community has developed its own theory of causal inference in relative isolation. Rather than following S. L. Morgan and C. Winship [Counterfactuals and causal inference: methods and principles for social research, Cambridge Univ. Press, New York, 2007] and others in bringing that theory into conversation with that of Pearl [op. cit.], the author creatively employs recent developments in statistical inference to identify causes.
For the specific situation in which many putative causes are tested but only a few are true causes, she explains how to estimate the local rate of discovering false causes. In this context, the local false discovery rate (LFDR) corresponding to a putative cause is a posterior probability that it is not a true cause. This is an example of an empirical Bayes method in that the prior distribution is estimated from the data rather than assigned.
Building on [P. Suppes, A probabilistic theory of causality, North-Holland, Amsterdam, 1970; MR0465774 (57 #5663)], the book emphasizes the importance for prediction not only of whether something is a cause but also of the strength of a cause. A cause is εsignificant if its causal strength, defined in terms of changing the probability of its effect, is at least ε, where ε is some nonnegative number. Otherwise, it is ε-insignificant.
The author poses an important problem and comes close to solving it, i.e., the problem of inferring whether a cause is ε-significant. The solution attempted in Section 4.2 confuses causal significance (ε-significance) with statistical significance (LFDR estimate below some small positive number α). This is by no means a fatal criticism of the approach since it can be remedied in principle by defining a false discovery as a discovery of an ε-insignificant cause. This tests the null hypothesis that the cause is ε-insignificant for a specified value of ε rather than the book’s null hypothesis, which in effect asserts that the cause is limε0ε-insignificant, i.e., ε-insignificant for all ε>0. In the case of a specified value of ε, a cause should be considered ε-significant if the estimated LFDR is less than α, provided that the LFDR is defined in terms of the null hypothesis of ε-insignificance. The need to fill in the technical details and to answer more general questions arising from this distinction between causal significance and statistical significance opens up exciting opportunities for further research guided by insights from the literature on seeking substantive significance as well as statistical significance [see, e.g., M. A. van de Wiel and K. I. Kim, Biometrics 63 (2007), no. 3, 806–815; MR2395718].

Reviewed by David R. Bickel

This review first appeared at Causality, Probability, and Time (Mathematical Reviews) and is used with permission from the American Mathematical Society.

Categories: empirical Bayes, reviews

Multivariate mode estimation

1 February 2014 Leave a comment

Hsu, Chih-Yuan; Wu, Tiee-Jian
Efficient estimation of the mode of continuous multivariate data. (English summary)
Comput. Statist. Data Anal. 63 (2013), 148–159.
62F10 (62F12)

To estimate the mode of a unimodal multivariate distribution, the authors propose the following algorithm. First, the data are transformed to become approximately multivariate normal by means of a transformation determined by maximum likelihood estimation (MLE) of a transformation parameter joint with the parameters of the multivariate normal distribution. Second, the resulting inverse transformation is applied to the MLE multivariate normal density function, yielding an estimate of the probability density function on the space of the original data. Third, the point at which that density function achieves its maximum is taken as the estimate of the multivariate mode. The paper features a theorem reporting the weak consistency of the estimator under the lognormality of the data.
The authors cite several papers indicating the need for such multivariate mode estimation in applications. They illustrate the practical use of their estimator by applying it to climatology and handwriting data sets.
Simulations indicate a large variety of distributions and dependence structures under which the proposed estimator performs substantially better than its competitors. An exception is the case of contamination with data from a distribution that has a different mode than the mode that is the target of inference.

Reviewed by David R. Bickel

This review first appeared at “Efficient estimation of the mode of continuous multivariate data” (Mathematical Reviews) and is used with permission from the American Mathematical Society.

Categories: reviews

Integrated likelihood in light of de Finetti

13 January 2014 Leave a comment

Coletti, Giulianella; Scozzafava, Romano; Vantaggi, Barbara
Integrated likelihood in a finitely additive setting. (English summary) Symbolic and quantitative approaches to reasoning with uncertainty, 554–565, Lecture Notes in Comput. Sci., 5590, Springer, Berlin, 2009.
62A01 (62A99)

For an observed sample of data, the likelihood function specifies the probability or probability density of that observation as a function of the parameter value. Since each sample hypothesis corresponds to a single parameter value, the likelihood of any simple hypothesis is an uncontroversial function of the data and the model. However, there is no standard definition of the likelihood of a composite hypothesis, which instead corresponds to multiple parameter values. Such a definition could be useful not only for quantifying the strength of statistical evidence in favor of composite hypotheses that are faced in both science and law, but also for likelihood-based measures of corroboration and of explanatory power for epistemological research involving Popper’s critical rationalism or recent accounts of inference to the best explanation.
Interpreting the likelihood function under the coherence framework of de Finetti, this paper mathematically formulates the problem by defining the likelihood of a simple or composite hypothesis as a subjective probability of the observed data conditional on the truth of the hypothesis. In the probability theory of this framework, conditional probabilities given a hypothesis or event of probability zero are well defined, even for finite parameter sets. That differs from the familiar probability measures that Kolmogorov introduced for frequency-type probabilities, each of which, in the finite case, can only have zero probability mass if its event cannot occur. (The latter but not the former agrees in spirit with Cournot’s principle that an event of infinitesimally small probability is physically impossible.) Thus, in the de Finetti framework, the likelihood function assigns a conditional probability to each simple hypothesis, whether or not its probability is zero.
When the parameter set is finite, every coherent conditional probability of a sample of discrete data given a composite hypothesis is a weighted arithmetic mean of the conditional probabilities of the simple hypotheses that together constitute the composite hypothesis. In other words, the coherence constraint requires that the likelihood of a composite hypothesis be a linear combination of the likelihoods of its constituent simple hypotheses. Important special cases include the maximum and the minimum of the likelihood over the parameter set. They are made possible in the non-Kolmogorov framework by assigning zero probability to all of the simple hypotheses except those of maximum or minimum likelihood.
The main result of the paper extends this result to infinite parameter sets. In general, the likelihood of a composite hypothesis is a mixture of the likelihoods of its component simple hypotheses.

{For the entire collection see MR2907743 (2012j:68012).}

Reviewed by David R. Bickel

This review first appeared at “Integrated likelihood in a finitely additive setting” (Mathematical Reviews) and is used with permission from the American Mathematical Society.