Use and Misuse of Chi-square

Before using any adaptations of chi-square statistic, please spend a minute or two to ponder whether your strategy with chi-square belongs one of these categories.

1. Lack of independence among the single events or measures
2. Small theoretical frequencies
3. Neglect of frequencies of non-occurrence
4. Failure to equalize \sum O_i (the sum of the observed frequencies) and \sum M_i (the sum of the theoretical frequencies)
5. Indeterminate theoretical frequencies
6. Incorrect or questionable categorizing
7. Use of non-frequency data
8. Incorrect determination of the number of degrees of freedom
9. Incorrect computations (including a failure to weight by N when proportions instead of frequencies are used in the calculations)

From “Chapter 10: On the Use and Misuse of Chi-square” by K.L.Delucchi in A Handbook for Data Analysis in the Behavioral Sciences (1993). Delucchi acknowledged these nine principle sources of error to Lewis and Burke (1949), entitled “The Use and Misuse of the Chi-square” published in Psychological Bulletin.

As described in my post, 4754 d.f., 2 is not a concern if any grouping schemes like >25 per bin is employed. As far as type I error and power is considered, 5 (10) or more in each bin is suggested from the literature of other sciences and astronomers adopt 20 or 25 according to publications in astronomy. However, I do care when grouping the insensitive part of detector channels that could be associated with 1, 3, 5 and 7 so that the chi-square statistic becomes inadequate. 8 and 9 are also done by computer so no worries. 6 is not applicable for astronomers in general because categorical data analysis is not a main subject of spectral or light curve analysis (For those who are curious about categorical data analysis, see a book by Alan Agresi, titled Categorical Data Analysis -Amazon link). Now, 1,3,4,5, and 7 are left among nine categories. One way or the other, they are intertwined due to different detector sensitivity and source models. It is hard to straighten out these categories in terms of X-ray spectral and light curve fitting in order to replace terms in behavior science. Therefore, I’d rather focus on 4.

I wonder if XSPEC and Sherpa offers a tool to check the balance between the sum of observed counts and the sum of expected (model) counts. I wonder if people check this condition when they apply chi-square statistics (not chi-square minimization, and I stated the difference in my post). I don’t think it’s easy as stated in other sciences of surveys and categorical data because high energy astrophysics has effective area, redistribution matrix, and point spread function which are non-linear and add uncertainties to the counts of each bin and as a consequence, the sum of counts. On the other hand, unless the difference is zero, it is obvious that chi-square statistic is biased and all the subsequent inference results like p-values and confidence intervals do not serve the way that they are meant to be.

My empathy toward the prevailed chi-square statistic in astronomy is expressed in Delucchi.

Like the good-natured next door neighbor who always lends a hand without complaining, however, the chi-square statistic is easy to take for granted and easy to misuse.

One Comment
  1. vlk:

    XSPEC has a function called “renorm” which simply rescales the unfrozen normalizations of all the model components such that the predicted counts equal the observed counts. (As also PINTofALE’s FITLINES.) Sherpa figures out the default values based on the data, so the initial guess is usually in the ballpark. This is all done before the actual fit. The result of the fit, naturally, is not expected to produce counts identically equal to the observed counts, but will differ as appropriate to the assumed error model. i.e., the difference between the summed predicted and observed counts will be consistent with the error bar on the normalizations (when there are no odd correlations to mess it up).

    04-01-2009, 3:37 pm
Leave a comment