Eddington versus Malmquist

During the runup to his recent talk on logN-logS, Andreas mentioned how sometimes people are confused about the variety of statistical biases that afflict surveys. They usually know what the biases are, but often tend to mislabel them, especially the Eddington and Malmquist types. Sort of like using “your” and “you’re” interchangeably, which to me is like nails on a blackboard. So here’s a brief summary:

Eddington Bias: What you get because of statistical fluctuations in the measurement (Eddington 1913). A set of sources with a single luminosity will, upon observation, be spread out due to measurement error. When you have two sets of sources with different luminosities, the observed distribution will overlap. If there are more objects of one luminosity than the other, you are in danger of misunderestimating the fraction in that set because more of those “scatter” into the other’s domain than the reverse. Another complication — if the statistical scatter bumps up against some kind of detection threshold, then the inferred luminosity based on only the detected sources will end up being an overestimate.

Malmquist Bias: What you get because you can see brighter sources out to farther distances. This means that if your survey is flux limited (as most are), then the intrinsically brighter sources will appear to be more numerous than they ought to be because you are seeing them in a larger volume. This is the reason, for instance, that there are 10 times more A stars in the SAO catalog than there are M stars. This is a statistical effect only in the sense that a “true” dataset is filtered due to a detectability threshold. Anyone working with volume limited samples do not need to worry about this at all.

  1. hlee:

    Malmquist bias sounds equivalent to missing data model: upon knowing a cosmological model (IMF or some mass distribution?), roughly one knows the proportion of observables, although I understand that the imf or some relevant indicators of the material universe are linked to the complexity of this bias.

    Eddington bias seems not just a missing problem but combined with missing and random censoring. If the magnitude of a star is small, there’s no bias. If the magnitude of a star is big, then it could be not be observed (missing) but when it is observed it is the limit of observable magnitudes (or below the limit in magnitudes), not the true magnitude of the star (censoring).

    [Note that stars of smaller magnitudes are brighter than larger magnitudes and the same magnitude stars (in absolute magnitude) can be observed or not observed depending on their distances - Malmquist bias]

    By the way, what is the SAO catalog?

    03-27-2008, 1:09 pm
  2. vlk:

    The Eddington bias, as formulated by the man himself, applies at all intensities. Removing it is essentially the same as deconvolving with the Poisson (or Gaussian) distribution. It is prominent and unignorable for faint sources near the detection limit.

    Just to clarify, the Malmquist bias is not cosmological. (The SAO star catalog is at http://webviz.u-strasbg.fr/viz-bin/VizieR?-source=I/131A ) To make it work like data augmentation, your model will have to extend to the largest distance that your brightest conceivable source could be seen at. At which point your code will become extremely inefficient.

    03-27-2008, 10:01 pm
  3. hlee:

    That SAO is the SAO that I know of. :) Smithsonian Astrophysical Observatory

    03-28-2008, 12:38 am
Leave a comment