A Quote on Model

In order to understand a learning procedure statistically it is necessary to identify two important aspects: its structural model and its error model. The former is most important since it determines the function space of the approximator, thereby characterizing the class of functions or hypothesis that can be accurately approximated with it. The error model specifies the distribution of random departures of sampled data from the structural model.

From Additive logistic regression: a statistical view of boosting by J.Friedman, T. Hastie, and R. Tibshirani (2000) Ann. Stat. Vol. 28(2), pp.337-407.

I believe, structural models represent relations among parameters and variables like mixture models, generalized linear models, Bayesian hierarchical models, and so on. Error models are generally marginalized to describe data fluctuations from the given structural model. For astronomers, structural models are often driven from physics and only error models are built on statistics, where confusion came in when a communication between statistician and astronomer occurs about models. Without verification, I saw too often times that simple Gaussian error models are almost always adopted in astronomy. Error models, in general, are not standalone but associated with structural models.

We know that encyclopedias of statistical models exist to explain both structural and error models. Additional but small efforts of the statistically apprehensible quantification of astronomical structure models and associated errors would lead cornucopia of error models beyond simple Gaussian error model. I’m not saying adopting Gaussian is improper. Multivariate normal assumption serves well in many statistical data analysis problems and estimators under normal distribution are efficient. What I like to emphasize is that statistics has built useful models and strategies beyond Gaussian error model which properly account for non Gaussian cases foreseen by various exploratory data analysis tools.

One Comment
  1. Alex:

    It appears to me that many inference problems in astronomy are essentially variants of generalized linear models (of the Poisson distribution, log link variety, in particular). Granted, most of the problems we work on in the astrostatistics group are more complex due to a large number of parameters or other complications. However, for everyday astronomical analysis, GLMs seem like a natural improvement on linear regression and other methods based on Gaussian error models.

    10-08-2008, 10:12 pm
Leave a comment