Thank you,Tom, for your explanation about what “bayes” in sherpa does and why the LM algorithm does not work with “bayes.” Particularly, related to the latter, it all comes to an objective function and how it is defined that determines those algorithms. Depending on the shape of objective functions, strategies must be changed. As a statistician, I rather like to work on robust one and make it work for spectral fitting. Instead of saying LM does not work, I’d like to give a reason why it does not work. However, not knowing what’s inside – “bayes” didn’t explain nor pointed references – I was curious. I hope the documentation you are preparing to be finished soon and “bayes” could hold better explanation about its function and shed more information about Bayesian statistics.

]]>*I would like to know why it’s not working with Levenberg-Marquardt (LM)*

The LM algorithm uses the form of the chi**2 function to develop an approximation to derivatives of the fitting function, used to guide steps to improve the fit. Since “bayes” changes the fit function to something that is not in the chi**2 form (sum of weighted squared differences between data and model), standard LM can’t work with it (nor can it work with other non-Gaussian likelihoods). Put another way, LM is not a generic optimization algorithm; it is specifically tailored to chi**2 minimization. Powell is a generic algorithm, so it can work with the “bayes” marginal likelihood.

Also, in case it wasn’t clear from my earlier comment, the “bayes” marginal likelihood does not *subtract* the background; it marginalizes over it (analytically). If you follow the same procedure for Gaussian noise, it just so happens that the result can be expressed in terms of subtracting a background estimate, but that is just a convenient “accident” that comes from the form of the Gaussian. From a Bayesian point of view, the right thing to do is *always* to marginalize an uncertain background, not to subtract off a background estimate.

The only reference for the “bayes” algorithm is my paper in the first SCMA volume (ADS link), though I’m working on a more complete description of it (and some related algorithms). It’s also described in most of my CASt summer school lectures. I believe Harrison Prosper independently derived a similar algorithm (for particle physics applications) around the same time. Finally, the quadrature version of the CHASC Bayesian hardness ratio work I think uses a similar algorithm as a first step; I think it’s described in the Appendix to that paper.

]]>Part of your confusion about this is the use of the word “constant.” To a frequentist statistician, it immediately implies a quantity that is not random (and thus cannot be legitimately assigned a frequency distribution). From the Bayesian point of view, what is distributed in a probability distribution p(x) is the *probability*, p (it’s distributed over the possible values of x, much like a matter density, rho(x)), not the argument, x (considered to be distributed over its possible values in many repeated observations, in the frequentist interpretation) . So it is legitimate to talk about the probability distribution for something believed to be constant.

Brian: There is nothing the least bit controversial in your statement that Bayesian inference is broader than Bayes’s theorem, the latter just being one of the arsenal of tools used in Bayesian calculations. For many years in my own tutorial lectures on this (e.g., as archived in the CASt summer school lectures), I explicitly emphasize that Bayesian inference uses *all* of probability theory, the important distinction being that it calculates probabilities *for hypotheses*. In fact, in my lectures, after deriving Bayes’s theorem, I also derive the law of total probability (the marginalization rule), and state that, in my own applications, I wind up using it more often than Bayes’s theorem. Anyway, I think you’d be hard-pressed to find someone who really uses Bayesian methods who would disagree with your insight!

using Bayes’ Theorem:”. I am not going to say that this quote is wrong but that the posterior can be derived without Bayes theorem. Posteriors can be derived just using definitions of conditional probabilities. It seems that whenever Bayesian statistics is mentioned it is automatically assumed that all posteriors come from Bayes’ theorem and that Bayes’ theorem is the backbone of Bayesian statistics (this was my believe as well when I first was introduced to the Bayesian paradigm). I do not believe that this is true and believe more that the Bayesian statistics backbone is epistemic probability. I just thought I would add this for fun and to generate more discussion. In particular I am really interested in other viewpoints of this. ]]>