The LRT is worthless for …

One of the speakers from the google talk series exemplified model based clustering and mentioned the likelihood ratio test (LRT) for defining the number of clusters. Since I’ve seen the examples of ill-mannerly practiced LRTs from astronomical journals, like testing two clusters vs three, or a higher number of components, I could not resist indicating that the LRT is improperly used from his illustration. As a reply, the citation regarding the LRT was different from his plot and the test was carried out to test one component vs. two, which closely observes the regularity conditions. I was relieved not to find another example of the ill-used LRT.

There are various tests applicable according to needs and conditions from data and source models but it seems no popular astronomical lexicons have these on demand tests except the LRT (Once I saw the score test since posting [ArXiv]s in the slog and a few non-parametric rank based tests over the years). I’m sure well knowledgeable astronomers soon point out that I jumped into a conclusion too quickly and bring up counter-examples. Until then, be advised that your LRTs, χ^2 tests, and F-tests ask for your statistical attention prior to their applications for any statistical inferences. These tests are not magic crystals, producing answers you are looking for. To bring such care and attentions, here’s a thought provoking titled paper that I found some years ago.

The LRT is worthless for testing a mixture when the set of parameters is large
JM Aza─▒s, E Gassiat, C Mercadier (click here :I found it from internet but it seems the link was on and off and sometimes was not available.)

Here, quotes replace theorems and their proofs[1] :

  • We prove in this paper that the LRT is worthless from testing a distribution against a two components mixture when the set of parameters is large.
  • One knows that the traditional Chi-square theory of Wilks[16[2]] does not apply to derive the asymptotics of the LRT due to a lack of identifiability of the alternative under the null hypothesis.
  • …for unbounded sets of parameters, the LRT statistic tends to infinity in probability, as Hartigan[7[3]] first noted for normal mixtures.
  • …the LRT cannot distinguish the null hypothesis (single gaussian) from any contiguous alternative (gaussian mixtures). In other words, the LRT is worthless[4].

For astronomers, the large set of parameters are of no concern due to theoretic constraints from physics. Experiences and theories bound the set of parameters small. Sometimes, however, the distinction between small and large sets can be vague.

The characteristics of the LRT is well established under the compactness set assumption (either compact or bounded) but troubles happen when the limit goes to the boundary. As cited before in the slog a few times, readers are recommend to read for more rigorously manifested ideas from astronomy about the LRT, Protassov, (2002) Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test, ApJ, 571, p. 545

  1. Readers might want to look for mathematical statements and proofs from the paper[]
  2. S.S.Wilks. The large sample distribution of the likelihood ratio for testing composite hypothesis, Ann. Math. Stat., 9:60-62, 1938[]
  3. J.A.Hartigan, A failure of likelihood asymptotics for normal mixtures. in Proc. Berkeley conf. Vol. II, pp.807-810[]
  4. Comment to Theorem 1. They proved the lack of worth in the LRT under more general settings, see Theoremm 2[]
Leave a comment