- Akaike Information Criterion (AIC)
- ...
- Background Marginalization
- ...
- Bayes Factor (BF)
- Bayes factor is defined by
BF = P(M_2|X_n)/P(M_1|X_n) * P(M_1)/P(M_2),
where M_i (i=1,2) indicates models/distributions,
P(M_1)/P(M_2) is the prior odds, and P(M_2|X_n)/P(M_1|X_n)
is the posterior odds.
- Bayes' Theorem
- A means to update the model parameter probability distributions
based on data. Since
p(MD) = p(M|D) p(D) = p(M) p(D|M),
p(M|D) = p(M) p(D|M) / p(D).
This is Bayes' Theorem. Usually, M represents the model
parameters, D represents the data, and the |
symbol indicates a statement of conditional probability.
Here, p(M) are the a priori probabilities
on the model parameters, p(D|M) is the likelihood,
and p(D) is a normalizing factor.
- Bayesian Information Criterion (BIC)
- Also, called Schwarz Information Criterion (SIC).
- Bayesian v/s Frequentist
- ...
- Bootstrap
- Bootstrapping is a well known resampling method for estimating ...
- Cash statistic
- Cash statistics explains statistical inference methods based on
the likelihood of a parametric model. Those methods are
point estimation via the methods of maximum likelihood and
the asymptotic convergence of the likelihood ratio test to Chi-square
distribution, which allows to obtain a confidence interval in addition
to hypothesis testing. A given parametric model does not have to be
Gaussian. Any parametric model that explains underline physics can be
plugged in the likelihood based methods for statistical inference
on parameters. Its name is generally used among X-ray astrophysicists.
- Chi-square statistic
- Widely used measure of goodness of a fit. There are many
forms of this expression, starting with the
-- Model variance chi^2 : chi^2 = (D - M)^2/M
-- Data variance chi^2 : chi^2 = (D - M)^2/D
-- Iterative Primini approximation : chi^2_{i} = (D - M_{i})^2/M_{i-1}
- Confidence Interval
- A confidence interval is a plausible range of values for mu, the parameter
with a quantifiable measure of its plausibility (like 95%, 99%, the level
of confidence)
- Cramer-Rao Lower Bound
- Counterpart of Heisenberg's uncertainty (physics) or Kraft's inequality
(information theory).
- Credible Region
- ...
- Cumulative Distribution Function (CDF)
- ...
- Data Augmentation
- Data augmentation is an elegant computational construct that allows
one to take advantage of the fact that if it were possible to collect
additional data, statistical analysis would be greatly simplified. This
is true regardless of why the so-called ``missing data'' are not observed.
For example, if we were able to record the counts due to background
contamination in addition to total counts in each bin, it would, of
course, be a trivial task to account for the background. There is a large
class of powerful statistical methods designed for ``missing data''
problems. With the insight that ``true'' values of quantities recorded
with measurement error can be regarded as ``missing data,'' these methods
can usefully be applied to almost any astrophysical problem.
- Data Depth, Statistical
- ...
- EM Algorithm
- The EM Algorithm is a computational tool that uses the method
of data augmentation and can be used to optimize a likelihood
function in order to compute the Maximum Likelihood Estimate (MLE)
of an unknown parameter. Bayesians also use the EM algorithm to optimize
the a posteriori distribution to compute the Maximum A Posteriori
(MAP) estimate of an unknown parameter. The EM algorithm tends to be easy
to implement and enjoys monotone convergence in the objective function:
When optimizing a function the EM algorithm is guaranteed to go uphill.
- Estimator
- An estimator is a function of random variable,
X. For example,
the maximum likelihood estimator (MLE) of a given parameter is the function
of X where the likelihood is maximized. Plugging in data into the estimator
provides an estimate (value) of the parameter.
Estimators are acquired from judicious guessing, the method of maximum
likelihood, the method of moments, bayesian methods, decision theoretic
methods, unbiased or consistent properties in the estimator.
M.A.Hendry and J.F.L. Simmons (1995), Distance Estimation in Cosmology,
Vistas in Astronomy, Vol.39, pp297-314 explains these estimators
from the astrophysicist point of views.
- Gamma Distribution
- When the parameter of an exponential distribution describes the expected
rate of a physical interest such as atomic decay rate, the parameters of
a gamma distribution describe the expected rate as well as the expected size
of the target sample. Therefore, Gamma distribution is a superset of
Exponential distribution.
- Gibbs Sampler
- The Gibbs Sampler is a MCMC sampler that constructs a
Markov chain by dividing the set of unknown parameters into a number
of groups and then simulating each group in turn conditional on the
current values of all the other groups.
- Hypothesis Testing
- Hypothesis Testing is a statistical decision making process with
respect to an uncertain hypothesis. In general, to know the truth
of the given hypothesis, some evidence (data) is collected with an assumption
that this data set was generated from the hypothesis, where
summaries of the data set (statistics) should support
the hypothesis. If the statistic is not consistent with the hypothesis,
one can conclude that the hypothesis can be rejected based on the
data. There are many test statistics, summarizing data to
make a statistical decision on a given hypothesis.
- Information, Fisher's
- ...
- Information Theory
- ...
- Informative and Non-informative Priors
- ...
- Kullback-Leibler Distance
- ...
- Kurtosis
- ...
- Likelihood
- The likelihood (or sampling distribution) quantifies the likelihood of
the data given the unknown model parameters.
- Likelihood Ratio Test (LRT)
- ...
- Markov-Chain Monte Carlo (MCMC)
- MCMC is a computational tool that is used to generate simulations
from a probability distribution. Because the simulations are generated
with a Markov chain, care must be taken to insure that the chain has
converged to the target probability distribution and to account for the
autocorrelation in this simulations. Bayesians use MCMC to explore
high-dimension posterior distributions in order to estimate unknown
parameters and to construct error bars for these estimates.
- Marginalization
- ...
- Martingale
- In probability theory, a martingale satisfies the following,
E(X_{n+1}|X_1,...,X_n)= X_n,
where X_1,...,X_n are a sequence of random variables. In other words,
the conditional expectation of X_{n+1}, given with all past observations,
only depends on the immediate previous observation.
- Mean, Median, Mode
- --- Mean:
- -Arithematic mean:
-Geometric mean:
-Harmonic mean:
--- Median: ...
--- Mode:
- Metropolis-Hastings
- The Metropolis-Hastings sampler is a MCMC sampler that uses
a convenient rule to generate simulations and then uses a accept-reject
step to correct the simulation to match the target probability distribution.
- Minimum Descriptive Length (MDL)
- MDL shares commonality with BIC due to the same shape of criteria.
However, their origins are different. p log n in MDL is a measure of
model complexity when candidate models are from the exponential family
(In information theory, most likely we treat binary systems).
In BIC, p log n is a by product of Laplace transformation.
- Model Averaging
- According to Wasserman (2000), model averaging refers to
the process of estimating some quantity under each model and then averaging
the estimates according to how likely each model is; on the other hand,
model selection refers to the problem of using the data to select one
model from the list of candidate models.
- Model Selection
- Well known statistical model selection criteria are Akaike Information
Criterion (AIC), Bayesian Information Criterion (BIC), and their
modifications which satisfy particular conditions such as small sample size.
The best model is considered to minimize the distance to the true model,
which, in general, is unknown. Various distance measures exist to suite
data properties. In astronomy, these model selection criteria have been used
to determine the number of components in star populations and
to choose the best theoretical model among candidates in cosmology.
- Most Compact Region
- ...
- Neyman-Pearson Lemma
- ...
- Normal Distribution
- Normal distribution is known as Gaussian distribution in astronomy.
- Odds Ratio
- ...
- Poisson Likelihood
- ...
- Posterior Distribution
- The posterior distribution represents the updated knowledge regarding
the unknown model parameters after observing the data and other information
pertaining to the unknown parameters. Thus, the posterior distribution
combines the information in the prior distribution and the
likelihood (via Bayes theorem) and is a complete summary of
the knowledge regarding the unknown model parameters.
- Posteriors
- ...
- Power, Statistical
- ...
- Principal Components Analysis (PCA)
- ...
- Prior Distribution
- The prior distribution is a probability distribution that quantifies
knowledge regarding unknown quantities (e.g., model parameters) prior to
observing the data or other information pertaining to the the unknown
quantities.
- Probability
- ...
- Probability Density Function (pdf)
- ...
- Random Variable (r.v.)
- A random variable is a function from S, the sample space, to R, the
real line; in other words, a numerical value calculated from the outcome
of a random experiment.
- Sampling Distribution
- Sampling distribution is the probability distribution of a point estimate.
- Skewness
- ...
- Symbols
- Some commonly used symbols
- |
-- indicates conditional probability, e.g., p(A|B)
is read as the ``the probability that A is true
given that B is true.''
- ∼
-- Usually written as x ∼ f(...), denotes
that the variable x is distributed as a function of
the specified form. e.g.,
counts ∼ Po(lambda)
flux ∼ N(mean,stddev)
- E(.)
-- Expectation, or mean.
- lambda
-- Usually describes the Poisson intensity, unlike Astrophysical usage,
where it is shorthand for wavelength.
- Unbaised Estimator
- ...
- Variance and Standard Deviation (SD)
- Standard deviation is a measure of average fluctuation of X around mu
and also the square root of variance, which is defined as
V(X)=E[(X-mu)^2].
- See also:
- A description and usage of some of the terms listed here is
available at http://www.ics.uci.edu/~dvd/Astro/stat-jargon-for-astro.pdf
- The Astro Jargon for Statisticians: http://hea-www.harvard.edu/AstroStat/astrojargon.html
- The Chandra/CIAO Dictionary: http://asc.harvard.edu/ciao/dictionary/
- The CIAO Why Topics: http://asc.harvard.edu/ciao/why/
- The Statistics Glossary at IPAC/Level5