Q: Lowess error bars?

It is somewhat surprising that astronomers haven’t cottoned on to Lowess curves yet. That’s probably a good thing because I think people already indulge in smoothing far too much for their own good, and Lowess makes for a very powerful hammer. But the fact that it is semi-parametric and is based on polynomial least-squares fitting does make it rather attractive.

And, of course, sometimes it is unavoidable, or so I told Brad W. When one has too many points for a regular polynomial fit, and they are too scattered for a spline, and too few to try a wavelet “denoising”, and no real theoretical expectation of any particular model function, and all one wants is “a smooth curve, damnit”, then Lowess is just the ticket.

Well, almost.

There is one major problem — how does one figure what the error bounds are on the “best-fit” Lowess curve? Clearly, each fit at each point can produce an estimate of the error, but simply collecting the separate errors is not the right thing to do because they would all be correlated. I know how to propagate Gaussian errors in boxcar smoothing a histogram, but this is a whole new level of complexity. Does anyone know if there is software that can calculate reliable error bands on the smooth curve? We will take any kind of error model — Gaussian, Poisson, even the (local) variances in the data themselves.

  1. hlee:

    First error has no physical meaning: this is my underline assertion to discuss regression problems to astronomers. At rudimentary levels, it is assumed that the error (Response – E[response|predictors or regressors]) distribution is normal. One of my first statistics classes was regression analysis and the professor used Applied Regression Including Computing and Graphics by Cook and Weisberg, which comes with a software called, Arc. We did play with lowess curves on errors to understand transformation of predictors and to check the normality of errors. Most standard statistics packages have lowess, I believe. Maybe IMSL, as well (I saw extensive stat routines under IMSL) Yet, I’m not sure lowess could take care of your concerns in errors.

    p.s. I’ll delve more when I go back from my first AAS.
    p.p.s “best-fit” lowess curve sounds weird to me.

    06-03-2008, 2:38 pm
  2. vlk:

    Yes, R does have a lowess function, but it doesn’t produce an estimate of reliability. It doesn’t matter (at this time) what the assumptions are about the underlying error distribution of the data. Lowess produces a curve based on fitting polynomials separately at each point (so that’s why I called it a “best-fit” curve), and the question is, how robust is that curve, given that the data have scatter and/or that the data have measurement uncertainty?

    I suppose it is always possible to run a thousand Monte Carlo simulations based on the measured data errors, but I was looking for a faster, hopefully analytical, way to get the confidence band on the curve.

    06-03-2008, 2:55 pm
  3. Nick:

    I don’t know about measured data errors or similar. Loess by itself doesn’t really come equipped with a standard error calculator….since…if you’re a frequentist…just how should the loess “parameters” be distributed according to the sampling distribution.

    Rather, people tend to use boostrap to find standard errors (like they use cross-validation to find “best fits”). For an example of bootstrapped standard errors in Loess, check out the link: toward the middle of the page under the heading “Curve Fitting Example, Efron & Tibshirani, 7.3″

    06-03-2008, 5:38 pm
  4. vlk:

    Thanks, Nick. I was afraid of that — no alternative to brute force bootstrap or Monte Carlo then!

    06-03-2008, 6:24 pm
  5. Nick:

    vlk, I don’t think it’s superhard to do the bootstrap. Also not, imho, super enlightening. I myself would love to become a pure bayesian even in areas of nonparametric, and in this case there may be some Bayesian alternatives which give similar results to Loess.

    You might check out Gelman’s post on it, , but he says that there are no Bayesian versions of it. The comments back in 2005 do mention some Bayesian alternatives.

    One such alternative I would think is Gaussian Processes. If you google Gaussian Processes, you’ll see that there is even a webpage on them. The difficult part is choosing a prior for the covariance function. This choice could give a wide range of alternatives (you could even get ARIMA/ARMA type fits or probably a wide range of splines). It’s extremely general. Since posteriors only give confidence intervals in parameter space, I guess I’d use predictive distributions to get the confidence intervals in data space.

    BTW, it would be great to have a “preview” button for comments on this blog.

    06-04-2008, 4:41 am
  6. Nick:

    Oops. Sorry about the hyperlink. This is the reason why I need a preview button! I’ll be more careful next time.

    06-04-2008, 4:43 am
  7. hlee:

    I was going to suggest quantile regression but surprised that there are many comments. Via quantile regression, you’ll get best fit regression results at the given quantile. 25% and 75% percentiles will give regression fits with 50% error range. On the other hand, I learn lowess as a diagnostic tool like astronomers add error bars and a straight line to show how good the fit is, not for best fit.

    By the way, thank you, Nick for pointing a technical improvement for the slog. I’m not sure it’s due to WordPress, or the current theme, or my laziness that I wasn’t able to find a plug in. I’ll definitely look into it and will do best to include a preview button.

    06-04-2008, 4:36 pm
  8. awblocker:

    Typically, loess analyses are accompanied by a plot showing the original fit with a large number of bootstrap replications, produced by resampling the original loess residuals. However, these are also quite difficult to read. I favor some type of shaded density plot for the bootstrap replications. If the program you’re plotting in supports transparency, this can be done quickly by increasing the line width and dropping the opacity when plotting the bootstrapped loess curves.

    06-06-2008, 1:51 pm
  9. vlk:

    Thanks for the link to Gelman’s post, Nick (btw, I fixed that hyperlink!).

    Hyunsook, could you explain how quantile regression helps to generate smooth curves? I was under the impression that they are just another way to fit straight lines.

    Alex, you bring up another bugaboo: when one bootstraps loess curves, it is easy to get them braided up like a frayed rope. In such cases, a density plot tells only half the story. What kind of strategies do statisticians use to deal with that?

    06-07-2008, 7:09 pm
  10. hlee:

    I don’t think it is only for straight lines as regression analysis although the given examples are the most simplest cases. I understood quantile regression as a versatile, robust, and nonparametric method compared to traditional regression analysis typically built under normal errors. Given that thousands of data points to be fit, I thought economically bootstrap is not viable and quantile regression can be an alternative. I can be wrong but under the objective of fitting, lowess does not appeal to me. It’s time to get rid of dusts on the book.

    06-08-2008, 8:25 pm
  11. hlee:

    Not a preview button, but now one can see how one’s comment looks like. Please, let us know any inconvenience from slogging. Thanks again.

    06-08-2008, 8:27 pm
Leave a comment