#### On the history and use of some standard statistical models

What if R. A. Fisher was hired by the Royal Observatory in spite that his interest was biology and agriculture, or W. S. Gosset[1] instead of brewery? An article by E.L. Lehmann made me think this what if. If so, astronomers could have handled errors better than now.

Every statistician, at least to my knowledge, knows E.L. Lehmann (his TPE and TSH are classic and Elements of Large Sample Theory was my textbook). Instead of reading daily astro-ph, I’m going through some collected papers from arxiv:stat and other references, in order to recover my hopefully existing research oriented mind (I like to share my stat or astrostat papers with you) and to continue slogging. The first one is an arxiv:math.ST paper by Lehmann.

His foremost knowledge and historic account on statistical models related to errors may benefit astronomers. Although I didn’t study history of astronomy and statistics, I’m very much aware of how astronomy innovated statistical thinking, particularly the area of large sample theory. Unfortunately, at the dawn of the 20th century, they went through an unwanted divorce. Papers from my [arXiv] series or a small portion of statistics papers citing astronomy, seem to pay high alimony without tax relief.

[math.ST:0805.2838] E.L.Lehmann
On the history and use of some standard statistical models

According to the author, the paper considers three assumptions: normality, independence, and the linear structure of the deterministic part. The particular reason for this paper into the slog is the following sentences:

The normal distribution became the acknowledged model for the distribution of errors of physical (particularly astronomical) measurements and was called the Law of Errors. It has a theoretical basis in the so called Law of Elementary Erros which assumed that an observational error is the sum of a large number of small independent errors and is therefore approximately normally distributed by the Central Limit Theorem.

A lot to be said but adding a quote referring Freedman that

“one problem noticeable to a statistician is that investigators do not pay attention to the stochastic assumptions behind the models. It does not seem possible to derive these assumptions from current theory, nor are they easily validated empirically on a case-by-case basis.”
The paper ends with the devastating conclusion:
“My opinion is that investigators need to think more about the underlying process, and look more closely at the data, without the distorting prism of convential (and largely irrelevant) stochastic models. Estimating nonexistent parameters cannot be very fruitful. And it must be equally a waste of time to test theories on the basis of statistical hypothesis that are rooted neither in prior theory nor in fact, even if the algorithms are recited in every statistics text without caveat.”

It is truly devastating.

A quote in the article referring the Preface of Snedecor’s book clearly tells the importance of collaborations.

“To the mathematical statistician must be delegated the task of developing the theory and devising the methods, accompanying these latter by adequate statements of the limitations of their use. …
None but the biologist can decide whether the conditions are fulfilled in his experiments.”

so does two sentences from the paper in the conclusion

A general text cannot provide the subject matter knowledge and the special features that are needed for successful modeling in specific cases. Experience with similar data is required, knowledge of theory and, as Freedman points out: shoe leather.

Other quotes in the article referring Scheffe,

“the effect of violation of the normality assumption is slightly on inferences about the mean but dangerous on inferences about variances.”

and Brownlee,

“applied statisticians have found empirically that usually there is no great need to fuss about the normality assumption”

and I confess that I’ve been fussing about astronomers’ gaussianity assumption; on the contrary, I advice my friends in other disciplines (for example, agriculture) treating their data with simpler analytic tools by assuming normality. To defend myself, I like to ask whether the independence assumption can be overlooked at the convenience of multiplying marginalized probabilities. I don’t think such concern/skepticism has not been addressed well enough compared to the normality assumption.

1. Gosset’s pen name was Student, from which the name, Student-t in t-distribution or t-test was spawned.[]