#### [MADS] plug-in estimator

I asked a couple of astronomers if they heard the term **plug-in estimator** and none of them gave me a positive answer.

Before using any adaptations of chi-square statistic, please spend a minute or two to ponder whether your strategy with chi-square belongs one of these categories.

**1**. Lack of independence among the single events or measures

**2**. Small theoretical frequencies

**3**. Neglect of frequencies of non-occurrence

**4**. Failure to equalize \sum O_i (the sum of the observed frequencies) and \sum M_i (the sum of the theoretical frequencies)

**5**. Indeterminate theoretical frequencies

**6**. Incorrect or questionable categorizing

**7**. Use of non-frequency data

**8**. Incorrect determination of the number of degrees of freedom

**9**. Incorrect computations (including a failure to weight by N when proportions instead of frequencies are used in the calculations)

From "**Chapter 10: On the Use and Misuse of Chi-square**" by K.L.Delucchi in *A Handbook for Data Analysis in the Behavioral Sciences* (1993). Delucchi acknowledged these nine principle sources of error to Lewis and Burke (1949), entitled "The Use and Misuse of the Chi-square" published in *Psychological Bulletin.*

I couldn't believe my eyes when I saw 4754 degrees of freedom (d.f.) and chi-square test statistic 4859. I've often enough seen large degrees of freedom from journals in astronomy, several hundreds to a few thousands, but I never felt comfortable at these big numbers. Then with a great shock 4754 d.f. appeared. I must find out why I feel so bothered at these huge degrees of freedom.

[stat.AP:0811.1663]

Open Statistical Issues in Particle PhysicsbyLouis Lyons

My recollection of meeting Prof. L. Lyons was that he is very kind and listening. I was delighted to see his introductory article about particle physics and its statistical challenges from an [arxiv:stat] email subscription.

I have been observing some sorts of misconception about statistics and statistical nomenclature evolution in astronomy, which I believe, are attributed to the lack of references in the astronomical society. There are some textbooks designed for junior/senior science and engineering students, which are likely unknown to astronomers. Example-wise, these books are not suitable, to my knowledge. Although I never expect astronomers to learn standard graduate (mathematical) statistics textbooks, I do wish astronomers go beyond Numerical Recipes (W. H. Press, S. A. Teukolsky, W. T. Vetterling, & B. P. Flannery) and Error Data Reduction and Analysis for the Physical Sciences (P. R. Bevington & D. K. Robinson). Here are some good ones written by astronomers, engineers, and statisticians:

From arxiv/astro-ph:0705.4199v1

**In search of an unbiased temperature estimator for statistically poor X-ray spectra**

A. Leccardi and S. Molendi

There was a delay of writing about this paper, which by accident was lying under the pile of papers irrelevant to astrostatistics. (It has been quite overwhelming to track papers with various statistical applications and papers with rooms left for statistical improvements from arxiv:astro-ph). Although there is a posting about this paper (see Vinay's posting), I'd like to give a shot. I was very excited because I haven't seen any astronomical papers discussing **unbiased estimators** solely.

Continue reading ‘[ArXiv] An unbiased estimator, May 29, 2007’ »

From arxiv/astro-ph:0708.4030v1

** Deep ACS Imaging in the Globular Cluster NGC 6397: The Cluster Color Magnitude Diagram and Luminosity Function** by H.B. Richer et.al

This paper presented an observational study of a globular cluster, named NGC 6397, enhanced and more informative compared to previous observations in a sense that 1) a truncation in the white dwarf cooling sequence occurs at 28 magnitude, 2) the cluster main sequence seems to terminate approximately at the hydrogen-burning limit predicted by two independent stellar evolution models, and 3) luminosity functions (LFs) or mass functions (MFs) are well defined. Nothing statistical, but the idea of defining color magnitude diagrams (CMDs) and LFs described in the paper, will assist developing suitable statistics on CMD and LF fitting problems in addition to the improved measurements (ACS imaging) of stars in NGC 6397.

Continue reading ‘[ArXiv] NGC 6397 Deep ACS Imaging, Aug. 29, 2007’ »

Mmm.. chi-square!

The withering criticisms Hyunsook has been directing towards the faulty use of chisquare by astronomers brings to mind this classic comment by [astronomer] Jeremy Drake during the 2005 Chandra Calibration Workshop:

During the International X-ray Summer School, as a project presentation, I tried to explain the inadequate practice of χ^2 statistics in astronomy. *If your best fit is biased (any misidentification of a model easily causes such bias), do not use χ^2 statistics to get 1σ error for the 68% chance of capturing the true parameter.*

Later, I decided to do further investigation on that subject and this paper came along: Astrostatistics: Goodness-of-Fit and All That! by Babu and Feigelson.

Continue reading ‘Astrostatistics: Goodness-of-Fit and All That!’ »

Since I start reading arxiv/astro-ph abstracts and a few relevant papers about a month ago, so often I see chi-square something as an optimization or statistical inference tool. Chi-square function, chi-square statistics, chi-square goodness-of-fit test are the words that serve different data analysis purposes but under the same prefix. As a newbie to statistics, although I learned chi-square distribution and chi-square test, doing statistics with chi-square are somewhat considered to be obsolete in terms of robust applications to modern data. These are introduced as one of many distributions and statistical tests. Nothing special. However, in astronomy, chi-square becomes the almost only method for statistical data analysis. I wonder how such strong bond between chi-square tactics and astronomer’s keen mind to data analysis has happened?

Continue reading ‘What is so special about chi square in astronomy?’ »

Leccardi & Molendi (2007) have a paper in A&A (astro-ph/0705.4199) discussing the biases in parameter estimation when spectral fitting is confronted with low counts data. Not surprisingly, they find that the bias is higher for lower counts, for standard chisq compared to C-stat, for grouped data compared to ungrouped. Peter Freeman talked about something like this at the 2003 X-ray Astronomy School at Wallops Island (pdf1, pdf2), and no doubt part of the problem also has to do with the (un)reliability of the fitting process when the chisq surface gets complicated.

Anyway, they propose an empirical method to reduce the bias by computing the probability distribution functions (*pdf*s) for various simulations, and then *averaging the pdfs* in groups of 3. Seems to work, for reasons that escape me completely.

[**Update:** links to Peter's slides corrected]

Despite some recent significant advances in Statistics and its applications to Astronomy (Cash 1976, Cash 1979, Gehrels 1984, Schmitt 1985, Isobe et al. 1986, van Dyk et al. 2001, Protassov et al. 2002, etc.), there still exist numerous problems and limitations in the standard statistical methodologies that are routinely applied to astrophysical data. For instance, the basic algorithms used in non-linear curve-fitting in spectra and images have remained unchanged since the 1960′s: the downhill simplex method of Nelder & Mead (1965) modified by Powell, and methods of steepest descent exemplified by Levenberg-Marquardt (Marquardt 1963). All non-linear curve-fitting programs currently in general use (Sherpa, XSPEC, MPFIT, PINTofALE, etc.) with the exception of Monte Carlo and MCMC methods are implementations based on these algorithms and thus share their limitations.

Continue reading ‘On the unreliability of fitting’ »