Statistics is the study of uncertainty

I began to study statistics with the notion that statistics is the study of information (retrieval) and a part of information is uncertainty which is taken for granted in our random world. Probably, it is the other way around; information is a part of uncertainty. Could this be the difference between Bayesian and frequentist?

The statistician’s task is to articulate the scientist’s uncertainties in the language of probability, and then to compute with the numbers found: cited from The Philosophy of Statistics by Dennis V. Lindley (2000). The Statistician, 49(3), pp.293-337. The article is a very good read (no theorems and their proofs. It does not begin with “Assume that …”).

The author starts the article by posing Statistics is the study of uncertainty and the rest is very agreeable as the quotes given above and below.

Because you do not know how to measure the distance to our moon, it does not follow that you do not believe in the existence of a distance to it. Scientists have spent much effort on the accurate determination of length because they were convinced that the concept of distance made sense in terms of krypton light. Similarly, it seems reasonable to attempt the measurement of uncertainty.

significance level – the probability of some aspect of the data, given H is true
probability – your probability of H, given the data

Many people, especially in scientific matters, think that their statements are objective, expressed through the probability, and are alarmed by the intrusion of subjectivity. Their alarm can be alleviated by considering reality and how that reality is reflected in the probability calculus.

I have often seen the stupid question posed ‘what is an appropriate prior for the variance σ2 of a normal (data) density?’ It is stupid because σ is just a Greek letter.

The statistician’s role is to articulate the client’s preferences in the form of a utility function, just as it is to express their uncertainty through probability,

where clients can be replaced with astronomers.

Upon accepting that statistics is the study of uncertainty, we’d better think about what this uncertainty is. Depending on the description of uncertainty, or the probability, the uncertainty quantification would change. As the author mentioned, statisticians formulate the clients’ uncertainty transcription, which I think astronomers should take the responsibility of. Nevertheless, I become to have a notion that astronomers do not care the subtleness in uncertainties. Generally, the probability model of this uncertainty is built on the independent property and at some point is approximated to Gaussian distribution. Yet, there are changes in this tradition and frequently I observe from arXiv:astro-ph that astronomers are utilizing Bayesian modeling for observed phenomenon and reflecting non gaussian uncertainty.

I heard that the effort on visualizing uncertainty is under progress. Prior to codifying, I wish those astronomers to be careful on the meaning of the uncertainty and the choice of statistics, i.e., modeling the uncertainty.

One Comment
  1. Simon Vaughan:

    Thanks for the link to the Lindley article, I found it and the proceeding discussion very stimulating, even though I couldn’t follow all the arguments in detail. This paper also made me appreciate for the first time the difference in usage of the term ‘model’ by statisticians and astronomers (‘clients’). As an astronomer I automatically interpret the word ‘model’ as referring to the physics of the system under study, and specifically the way this affects its electromagentic emission (since usually we are passive observers of light from cosmic sources). But ‘model’ seems to be used by statisticians to describe the assignments of probability functions needed for statistical analysis and inference. Astronomers usually take the latter for granted, assuming Poisson or Gauss-Normal distributions, often without explicitly stating so. Maybe we can agree of a terminology offensive to neither statisticans nor astronomers that will allow the two meanings to be used without confusion (in astro-statistics papers)? Any suggestions? How about probability-model and physical-model as a first attempt?

    04-05-2008, 8:14 am
Leave a comment