loess and lowess and locfit, oh my

Diab Jerius follows up on LOESS techniques with a very nice summary update and finds LOCFIT to be very useful, but there are still questions about how it deals with measurement errors and combining observations from different experiments:

A couple of weeks ago Vinay suggested using the LOESS algorithm to create smooth curves (separately) through the SSD and FPC points. LOESS has been succeeded by LOWESS and, finally LOCFIT, which is the 800lb gorilla of local regression fitting.

The LOCFIT algorithm uses local regression (i.e. fits over samples of the data) to generate smooth curves. There is an enormous body of literature on this, much of it summarized in the book

Local Regression and Likelikhood, by C. Loader
ISBN 0-387-98775-4

which also serves as documentation for the LOCFIT software. The techniques seem well established and accepted by the statistical community.

LOCFIT looks to be a very elegant approach, but, unfortunately, I have still not been able to glean any information as to how one introduces experimental errors into the regressions. The voluminous research in this field certainly deals with experimental data, so I’m not quite sure what to make of this.

One way around this might be to take a Monte-Carlo approach: resample the data using the experimental errors, generate a new smoothing function, and generate a measure of the distribution of the fit functions.

For those interested, I have a copy of the above book on loan.
It’s fascinating reading.

More about the actual code is available at this web site:

In addition, Ping Zhao asks: (paraphrasing) if you combine two separate sets of observations with vastly different numbers of data points in each, how do you weight them during a combined loess/lowess/locfit fit?

Comments and suggestions from statisticians are much appreciated!

  1. hlee:

    To get comments or suggestions from statisticians, it would be easier if you show a plot of data (response vs. predictors) on which locfit or loess is applied. Depending on data and the objective of the study, various approaches to local regression or smoothing are available. It’ll be nice to show people the looks of measurement errors (how they are associated with data) as well as two sets (approaching from a mixture model of two components or just providing proportions to pool the uncertainty) to derive the appropriate model.

    Graphics for diagnostics and modeling direction/assumption prior to any comments.

    08-15-2008, 8:51 pm
  2. vlk:

    See Fig 4.2 (lower panel) in the Chandra Proposer’s Observatory Guide. The data shown there are similar to those that Diab and Ping are considering. There is an unexplained discrepancy between SSD and FPC points (never mind what the acronyms mean; they are measurements of the same thing with different instruments) that need to be “spliced” smoothly after making suitable post-hoc adjustments.

    08-16-2008, 6:51 pm
Leave a comment