Statistics Jargon for Astronomers

Under Construction!

Akaike Information Criterion (AIC)
Background Marginalization
Bayes Factor (BF)
Bayes factor is defined by
 BF = P(M_2|X_n)/P(M_1|X_n) * P(M_1)/P(M_2),
where M_i (i=1,2) indicates models/distributions, P(M_1)/P(M_2) is the prior odds, and P(M_2|X_n)/P(M_1|X_n) is the posterior odds.
Bayes' Theorem
A means to update the model parameter probability distributions based on data. Since
	p(MD) = p(M|D) p(D) = p(M) p(D|M),
	p(M|D) = p(M) p(D|M) / p(D).
This is Bayes' Theorem. Usually, M represents the model parameters, D represents the data, and the | symbol indicates a statement of conditional probability. Here, p(M) are the a priori probabilities on the model parameters, p(D|M) is the likelihood, and p(D) is a normalizing factor.
Bayesian Information Criterion (BIC)
Also, called Schwarz Information Criterion (SIC).
Bayesian v/s Frequentist
Bootstrapping is a well known resampling method for estimating ...
Cash statistic
Cash statistics explains statistical inference methods based on the likelihood of a parametric model. Those methods are point estimation via the methods of maximum likelihood and the asymptotic convergence of the likelihood ratio test to Chi-square distribution, which allows to obtain a confidence interval in addition to hypothesis testing. A given parametric model does not have to be Gaussian. Any parametric model that explains underline physics can be plugged in the likelihood based methods for statistical inference on parameters. Its name is generally used among X-ray astrophysicists.
Chi-square statistic
Widely used measure of goodness of a fit. There are many forms of this expression, starting with the
-- Model variance chi^2 : chi^2 = (D - M)^2/M
-- Data variance chi^2 : chi^2 = (D - M)^2/D
-- Iterative Primini approximation : chi^2_{i} = (D - M_{i})^2/M_{i-1}
Confidence Interval
A confidence interval is a plausible range of values for mu, the parameter with a quantifiable measure of its plausibility (like 95%, 99%, the level of confidence)
Cramer-Rao Lower Bound
Counterpart of Heisenberg's uncertainty (physics) or Kraft's inequality (information theory).
Credible Region
Cumulative Distribution Function (CDF)
Data Augmentation
Data augmentation is an elegant computational construct that allows one to take advantage of the fact that if it were possible to collect additional data, statistical analysis would be greatly simplified. This is true regardless of why the so-called ``missing data'' are not observed. For example, if we were able to record the counts due to background contamination in addition to total counts in each bin, it would, of course, be a trivial task to account for the background. There is a large class of powerful statistical methods designed for ``missing data'' problems. With the insight that ``true'' values of quantities recorded with measurement error can be regarded as ``missing data,'' these methods can usefully be applied to almost any astrophysical problem.
Data Depth, Statistical
EM Algorithm
The EM Algorithm is a computational tool that uses the method of data augmentation and can be used to optimize a likelihood function in order to compute the Maximum Likelihood Estimate (MLE) of an unknown parameter. Bayesians also use the EM algorithm to optimize the a posteriori distribution to compute the Maximum A Posteriori (MAP) estimate of an unknown parameter. The EM algorithm tends to be easy to implement and enjoys monotone convergence in the objective function: When optimizing a function the EM algorithm is guaranteed to go uphill.
An estimator is a function of random variable, X. For example, the maximum likelihood estimator (MLE) of a given parameter is the function of X where the likelihood is maximized. Plugging in data into the estimator provides an estimate (value) of the parameter. Estimators are acquired from judicious guessing, the method of maximum likelihood, the method of moments, bayesian methods, decision theoretic methods, unbiased or consistent properties in the estimator.
M.A.Hendry and J.F.L. Simmons (1995), Distance Estimation in Cosmology, Vistas in Astronomy, Vol.39, pp297-314 explains these estimators from the astrophysicist point of views.
Gamma Distribution
When the parameter of an exponential distribution describes the expected rate of a physical interest such as atomic decay rate, the parameters of a gamma distribution describe the expected rate as well as the expected size of the target sample. Therefore, Gamma distribution is a superset of Exponential distribution.
Gibbs Sampler
The Gibbs Sampler is a MCMC sampler that constructs a Markov chain by dividing the set of unknown parameters into a number of groups and then simulating each group in turn conditional on the current values of all the other groups.
Hypothesis Testing
Hypothesis Testing is a statistical decision making process with respect to an uncertain hypothesis. In general, to know the truth of the given hypothesis, some evidence (data) is collected with an assumption that this data set was generated from the hypothesis, where summaries of the data set (statistics) should support the hypothesis. If the statistic is not consistent with the hypothesis, one can conclude that the hypothesis can be rejected based on the data. There are many test statistics, summarizing data to make a statistical decision on a given hypothesis.
Information, Fisher's
Information Theory
Informative and Non-informative Priors
Kullback-Leibler Distance
The likelihood (or sampling distribution) quantifies the likelihood of the data given the unknown model parameters.
Likelihood Ratio Test (LRT)
Markov-Chain Monte Carlo (MCMC)
MCMC is a computational tool that is used to generate simulations from a probability distribution. Because the simulations are generated with a Markov chain, care must be taken to insure that the chain has converged to the target probability distribution and to account for the autocorrelation in this simulations. Bayesians use MCMC to explore high-dimension posterior distributions in order to estimate unknown parameters and to construct error bars for these estimates.
In probability theory, a martingale satisfies the following,
 E(X_{n+1}|X_1,...,X_n)= X_n, 
where X_1,...,X_n are a sequence of random variables. In other words, the conditional expectation of X_{n+1}, given with all past observations, only depends on the immediate previous observation.
Mean, Median, Mode
--- Mean:
-Arithematic mean:
-Geometric mean:
-Harmonic mean:
--- Median: ...
--- Mode:
The Metropolis-Hastings sampler is a MCMC sampler that uses a convenient rule to generate simulations and then uses a accept-reject step to correct the simulation to match the target probability distribution.
Minimum Descriptive Length (MDL)
MDL shares commonality with BIC due to the same shape of criteria. However, their origins are different. p log n in MDL is a measure of model complexity when candidate models are from the exponential family (In information theory, most likely we treat binary systems). In BIC, p log n is a by product of Laplace transformation.
Model Averaging
According to Wasserman (2000), model averaging refers to the process of estimating some quantity under each model and then averaging the estimates according to how likely each model is; on the other hand, model selection refers to the problem of using the data to select one model from the list of candidate models.
Model Selection
Well known statistical model selection criteria are Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and their modifications which satisfy particular conditions such as small sample size. The best model is considered to minimize the distance to the true model, which, in general, is unknown. Various distance measures exist to suite data properties. In astronomy, these model selection criteria have been used to determine the number of components in star populations and to choose the best theoretical model among candidates in cosmology.
Most Compact Region
Neyman-Pearson Lemma
Normal Distribution
Normal distribution is known as Gaussian distribution in astronomy.
Odds Ratio
Poisson Likelihood
Posterior Distribution
The posterior distribution represents the updated knowledge regarding the unknown model parameters after observing the data and other information pertaining to the unknown parameters. Thus, the posterior distribution combines the information in the prior distribution and the likelihood (via Bayes theorem) and is a complete summary of the knowledge regarding the unknown model parameters.
Power, Statistical
Principal Components Analysis (PCA)
Prior Distribution
The prior distribution is a probability distribution that quantifies knowledge regarding unknown quantities (e.g., model parameters) prior to observing the data or other information pertaining to the the unknown quantities.
Probability Density Function (pdf)
Random Variable (r.v.)
A random variable is a function from S, the sample space, to R, the real line; in other words, a numerical value calculated from the outcome of a random experiment.
Sampling Distribution
Sampling distribution is the probability distribution of a point estimate.
Some commonly used symbols
Unbaised Estimator
Variance and Standard Deviation (SD)
Standard deviation is a measure of average fluctuation of X around mu and also the square root of variance, which is defined as V(X)=E[(X-mu)^2].
See also:
A description and usage of some of the terms listed here is available at
The Astro Jargon for Statisticians:
The Chandra/CIAO Dictionary:
The CIAO Why Topics:
The Statistics Glossary at IPAC/Level5

CHASC: The California-Harvard AstroStatistics Collaboration