Presentations 

Cecilia Garraffo (CfA) Sep 06 Noon EDT SC706 
 AstroAI: Integrating Artificial Intelligence into Astrophysics
 Abstract: AstroAI, launched at the Center for Astrophysics  Harvard & Smithsonian (CfA) in November 2022, is a novel initiative focused on developing machine learning (ML) and artificial intelligence (AI) algorithms to further astrophysical research. Its inception was driven by the recognized need, both within the CfA and the broader scientific community, for dependable and interpretable models in astrophysics research. At its core, AstroAI aims to create AI and ML models designed for astrophysical discovery, emphasizing a multidisciplinary approach and collaboration among a diverse group of researchers. This talk will outline the progress and growth of AstroAI since its beginning and highlight some of the key projects undertaken by our team, and showcase a few of our projects and their transformative potential in astrophysical research.
 Presentation Video [!yt]


Mengyang Gu (UC Santa Barbara) Sep 13 Noon EDT SC706 
 Calibration of imperfect geophysical models by multiple satellite interferograms with measurement bias
 Abstract:
Model calibration consists of using experimental or field data to estimate the unknown parameters of a mathematical model. The presence of model discrepancy and measurement bias in the data complicates this task. Satellite interferograms, for instance, are widely used for calibrating geophysical models in geological hazard quantification. In this work, we used satellite interferograms to relate ground deformation observations to the properties of the magma chamber at Kilauea Volcano in Hawai`i. We derived closedform marginal likelihoods and implemented posterior sampling procedures that simultaneously estimate the model discrepancy of physical models, and the measurement bias from the atmospheric error in satellite interferograms. We found that model calibration by aggregating multiple interferograms and downsampling the pixels in the interferograms can reduce the computation complexity compared to calibration approaches based on multiple data sets. The conditions that lead to no loss of information from data aggregation and downsampling are studied. Simulation illustrates that both discrepancy and measurement bias can be estimated, and real applications demonstrate that modeling both effects helps obtain a reliable estimation of a physical model's unobserved parameters and enhance its predictive accuracy. We implement the computational tools in the RobustCalibration package available on CRAN.
 References:
Gu, M., & Wang, L. (2018). Scaled Gaussian stochastic process for computer model calibration and prediction. SIAM/ASA Journal on Uncertainty Quantification, 6(4), 15551583
Gu, M., Xie, F., & Wang, L. (2022). A Theoretical Framework of the Scaled Gaussian Stochastic Process in Prediction and Calibration. SIAM/ASA Journal on Uncertainty Quantification, 10(4), 14351460.
Gu, M., Anderson, K., & McPhillips, E. (2023). Calibration of imperfect geophysical models by multiple satellite interferograms with measurement bias. Technometrics, in press, arxiv:1810.11664 [!arXiv]
Gu, M., He, Y., Liu, X., & Luo Y. (2023). Ab initio uncertainty quantification in scattering analysis of microscopy arXiv:2309.02468 [!arXiv]
 Presentation slides [.pdf]
 Presentation video [!yt]


Ashley Villar & Rafael MartinezGalarza (CfA) Oct 04, 2023 Noon EDT SC706 
 Project: A Variational Autoencoderinspired Mixture of Poissons to classify Xray photon lists
 In the lowcount limit, astrophysical phenomena follow Poisson distributions across a distribution of energies and time. Learning meaningful representations of these events remains a challenging endeavor; however, such representations can aid in a number of downstream scientific tasks: classification, anomaly detection and potentially inference. Here, we present a project pitch to build a probabilistic (Poissonbased) neural network (inspired by a variational autoencoder) to find meaningful representations of astronomical light curves.


Aneta Siemiginowska (CfA) Oct 11, 2023 Noon EDT SC706 
 Why timedelays?
 Timedelays are often encountered in astronomical measurements. They provide otherwise unresolved intrinsic scales of a variable source or, in the case of gravitational lensing, constraints on the cosmological parameters. I will present an astronomer's view on the timedelay applications, discuss our recent model for timedelays due to gravitational lensing, future directions, and open projects.
 Presentation slides [.pdf]
 Presentation video [!yt]
 See also: Tak et al. 2015, AoAS 11, 1309; Meyer et al. 2023, ApJ 950, 37


Pavlos Protopapas (SEAS) Oct 18, 2023 Noon EDT SC706 
 ResidualBased Error Bound for PhysicsInformed Neural Networks
 Abstract: Neural networks are universal approximators and are studied for their use in solving differential equations. However, a major criticism is the lack of error bounds for obtained solutions. In this talk I will describe a technique to rigorously evaluate the error bound of PhysicsInformed Neural Networks (PINNs) on most linear ordinary differential equations (ODEs), certain nonlinear ODEs, and firstorder linear partial differential equations (PDEs).
The error bound is based purely on equation structure and residual information and does not depend on assumptions of how well the networks are trained. We propose algorithms that bound the error efficiently.
 Reference:
Liu et al. 2023, arXiv:2306.03786 [!arXiv]
 Presentation video [!yt]


Herman Marshall (MIT), Subramania Athray (UAlabama), & Vinay Kashyap (CfA) Nov 8 Noon EST SciCen 706 
 Deconvolving dispersed gratings spectra from extended sources
 Abstract: We will present the mostly unsolved problem of deconvolving highresolution grating dispersed spectra of extended sources. We will show examples of the data from Chandra, and some examples of how solar physicists are modeling data from the dispersed Sun in the high counts regime when there are strong line features in the spectrum. Can this be extended to smoother spectra in the Poisson regime?
 See also: Winebarger et al. 2019, ApJ 882, 12, Unfolding Overlapped Slitless Imaging Spectrometer Data for Extended Sources [!ads]
 Slides:
Herman Marshall [.key]
Vinay Kashyap [.key]
Subramania Athiray [.pptx]


Adel Daoud (Linkoping/Chalmers) 24 Jan 2024 Noon EST SC706 
 Are You Devising an Observatory of Extraterrestrial Life? Lessons learned from Observatory of PovertyMeasuring Living Conditions on Planet Earth with AI and Earth Observations
 Abstract:
The question, "Is there other especially intelligent life in the Universe," is one of the most intriguing questions in the sciences and beyond. If there is indeed life on other planets and the only means of observing it is through highresolution satellite images, a followup question would be, "How may we use those images to measure extraterrestrial activities on the surface of their planets?" This talk gives some pointers to addressing that followup question by showing how we, at the AI and Global Development Lab, are measuring health and living conditions on Earth by using satellite images and deep learning. The Lab is currently measuring the historical and geographical development trajectories from satellite images from the 1990s to the present, focusing on the African continent. These measurements are our data product, capturing living conditions at unprecedented temporal and spatial granularity. This talk will discuss key scientific challenges and research prospects.
 Presentation video [!yt]


AnaSofia Uzsoy (Harvard) 7 Feb 2024 Noon EST SC706 
 Variational Inference for Acceleration of SN Ia Photometric Distance Estimation with BayeSN
 Abstract:
We use variational inference (VI) to fit the light curves of Type Ia supernovae (SN Ia) using the BayeSN hierarchical Bayesian model for SN Ia spectral energy distributions. We fit both simulated light curves and data from the Foundation Supernova Survey with two different forms of surrogate posterior  a multivariate normal and a custom multivariate zerolowertruncated normal distribution  and compare them with baseline MCMC fits and the Laplace Approximation. To evaluate the accuracy of our variational approximation, we calculate the paretosmoothed importance sampling (PSIS) diagnostic, and perform variational simulationbased calibration (VSBC). The VI approximation achieves similar results to MCMC but with significantly reduced runtime. Overall, we show that VI is a promising method for scalable parameter inference as we enter the era of "big data".
 Presentation slides [.pptx]
 Presentation video [!yt]


Axel Donath (CfA) 14 Feb 2024 Noon EST SC706 
 Joint Likelihood Deconvolution of Astronomical Images in the Presence of Poisson Noise
 Abstract:
I will present a new method for Joint Likelihood Deconvolution (Jolideco) of astronomical images in the presence of Poisson noise. The method reconstructs a single flux image from a set of observations of the same sky region by optimizing the a posteriori joint Poisson likelihood of all observations under a patch based image prior. Simulations demonstrate that both the combination of multiple observations as well as the patch based prior lead to a much improved reconstruction quality, compared to alternative methods like the RichardsonLucy method. I will showcase some results using example data from the Chandra observatory and conclude with an overview of open questions, most importantly the question of uncertainties on reconstructed flux images.
 Presentation slides [.pdf]
 Presentation video [!yt]


Xiangyu Zhang (Minnesota) Feb 21, 2024 11am CST Zoom 
 On smooth tests of goodnessoffit for astrophysical searches under high background
 Abstract:
Smooth tests were first introduced by Neyman (1937) as a comprehensive approach to the goodnessoffit (GOF). Compared to classical GOF tests, such as KolmogorovSmirnov or Cramer von Mises, smooth tests use an alternative model that incorporates the null through a series of orthonormal basis functions (e.g., Shifted Legendre Polynomial or Cosine bases). As a result, they concentrate their power on a limited number of directions. A particularly appealing feature of smooth tests is that, when the null model is rejected, they naturally provide a correction for it. This aspect will be illustrated in the context of detecting line emissions under a high background. New methodological developments on the construction of distributionfree smooth tests that are unaffected by postselection inference problems will also be discussed.
 Presentation slides [.pdf]
 Presentation Video [!yt]


Yang Chen (Michigan) & Max Bonamente (UAH) Feb 28, 2024 Noon EST/11am CST Zoom 
 Cstatapalooza

 Yang Chen: Comparison of Goodnessoffit Assessment Methods with C statistics in Astronomy
 Abstract:
In astrophysics, the C statistic, which is a likelihood ratio statistic, has been widely adopted for model fitting and goodnessoffit assessments for Poissoncount data with heterogeneous rates. It is well known that when the sample size is very large, the C statistics enjoy convenient theoretical properties, especially in the largemean limit. However, in many astronomy and highenergy physics applications, the observations are very sparse, making the theoretical properties of C statistics questionable. We comprehensively study the properties of C statistics and evaluate various algorithms for goodnessoffit assessment using C statistics, emphasizing lowcount scenarios.
 Presentation slides [.pdf]

 Max Bonamente: Systematic errors and Poisson regression
 Abstract: A new statistical method is proposed that includes systematic errors in the analysis of Poisson data, especially for the purpose of regression analysis and subsequent hypothesis testing. The method is based on the introduction of an intrinsic model variance, which is enforced after the usual maximumlikelihood regression is performed. With this method, the usual goodnessoffit statistic  the Poisson deviance also known as the Cash statistic  becomes distributed like a newlyintroduced overdispersed chisquared distribution under the null hypothesis, at least in the largemean limit. This new distribution defaults to the usual chisquared when systematic errors are negligible, and continues to be normallydistributed for extensive data. The method offers also the opportunity to estimate systematic errors, if they cannot be estimated a priori. It is hoped that this model, which is simple to use for most applications, offers an answer to the quest for a simple and statisticallymotivated means of handling systematic errors in count data.
 Presentation slides [.pdf]

 meeting chat [.txt]
 Presentation video [!yt]


Alexandre Bayle (Harvard) Apr 3, 2024 Noon EDT Zoom+SciCen706 
 How Good is my Learning Algorithm? Building CrossValidation Confidence Intervals for Test Error
 Abstract:
How good is my learning algorithm? Is algorithm A actually better than algorithm B? Crossvalidation is a de facto standard for addressing these questions by providing an estimate of the test error of prediction rules. However, for highstakes applications in which the uncertainty of an error estimate impacts decisionmaking, properly quantifying the uncertainty of the crossvalidation estimate is crucial and requires a valid treatment of the dependence that comes with this samplesplitting scheme. In this work, we present our method to achieve this objective and we prove its theoretical validity. We developed central limit theorems for crossvalidation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptoticallyexact confidence intervals for kfold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller kfold test error than another. These results are also the first of their kind for the popular choice of leaveoneout crossvalidation. In our realdata experiments with diverse learning algorithms, the resulting confidence intervals and tests outperform the most popular alternative methods from the literature (we will cover these methods in the presentation).
 Bayle et al. 2020, Crossvalidation Confidence Intervals for Test Error arXiv:2007.12671 [.pdf]
 Presentation slides [.pdf]


Souhardya Sengupta (Harvard) Apr 17, 2024 Noon EDT SciCen 706 
 A tutorial on Causal Inference and its relevance in Astrophysics
 Abstract:
This talk will provide a basic introduction to causation and statistical methodologies that aim for such inference. We will start with an introduction to the potential outcomes framework and build on that to discuss population estimands that help us draw causal conclusions from an experiment, along with various techniques for its inference. The majority of this talk will focus on observational studies, where the scientist has no control over the treatment mechanism. In this part, we will discuss the concepts of confounding and various relevant estimators in the presence of such confounders, followed by an introduction to sensitivity analysis that establishes how sensitive our results are to the presence of any unmeasured confounders. Finally, if time permits, I will talk about structural causal models and their applications in astrophysics.
 Presentation slides [.pdf]
 Presentation video [!yt]


Jason Siyang Li (Imperial) Apr 24, 2024 Noon EDT SciCen 706 
 Estimating the Luminosity Function in the presence of "Dark" sources (with a new method for statistical marginalisation)
 Abstract:
Studies on populations of Xray sources are strongly a5ected by detectability. We have developed a method to bypass limitations in Xray source detection algorithms and model luminosity functions using catalogue available from other wavelengths. We propose a hierarchical model that allows estimation of individual source intensities simultaneously with parameters that describe the population of sources. It allows sources to be Xraydark by using zeroinflated distributions on the source intensities parameters. This hierarchical model is typical of statistical models in highenergy astrophysics, in that it contains numerous parameters and latent variables.
This accounts for the complexities in the instruments, a large number of X ray sources in the population, and characteristics in the population.
However, posterior sampling, such as MCMC and nested sampling, can be ine5icient in large parameter spaces, making it hard to obtain posterior samples from the hierarchical model. A wellknown method is to deploy the posterior sampler on a lowerdimensional marginal distribution of the posterior distribution, in which we call it "statistical marginalisation" of the posterior distribution.
To obtain such a statistical marginalisation of the posterior distribution, we introduce a new method to integrate over the population of source intensity parameters using moment generating functions. We present the link between the integral over a population of parameters and marginal likelihood computation. As a natural extension, we can show that the moment generating function method is also useful for exact computations of marginal likelihoods under certain assumptions.
 Presentation slides [.pdf]
 Presentation video [!yt]


Siddharth Vishwanath (UCSD) May 1, 2024 Noon EDT Zoom 
 RepellingAttracting Hamiltonian Monte Carlo
 Abstract: We propose a variant of Hamiltonian Monte Carlo (HMC), called the RepellingAttracting Hamiltonian Monte Carlo (RAHMC), for sampling from multimodal distributions. The key idea that underpins RAHMC is a departure from the conservative dynamics of Hamiltonian systems, which form the basis of traditional HMC, and turning instead to the dissipative dynamics of conformal Hamiltonian systems. In particular, RAHMC involves two stages: a moderepelling stage to encourage the sampler to move away from regions of high probability density; and, a modeattracting stage, which facilitates the sampler to find and settle near alternative modes. We achieve this by introducing just one additional tuning parameter  the coefficient of friction. The proposed method adapts to the geometry of the target distribution, e.g., modes and density ridges, and can generate proposals that cross lowprobability barriers with little to no computational overhead in comparison to traditional HMC. Notably, RAHMC requires no additional information about the target distribution or memory of previously visited modes. We establish the theoretical basis for RAHMC, and we discuss repellingattracting extensions to several variants of HMC in literature. Finally, we provide a tuningfree implementation via dualaveraging, and we demonstrate its effectiveness in sampling from, both, multimodal and unimodal distributions in high dimensions.
 Presentation slides: [!sidvishwanath.com] ; [.pdf]
 Presentation video [!yt]


Giovanni Motta (Columbia) May 8, 2024 Noon EDT Zoom+SciCen706 
 Detecting stellar flares using conditional volatility
 Abstract: For more than forty years now, discretetime models have been developed to reflect the socalled stylized features of financial time series. These properties, which include tail heaviness, asymmetry, volatility clustering and serial dependence without correlation, cannot be captured with traditional linear time series ARMA. Continuoustime ARMA (CARMA) are the continuoustime version of the wellknown ARMA models, and they are convenient for modeling astronomical data, which are often unequally spaced in time. In this talk we will review ARMA and CARMA models and their application in astrophysics. We then present a novel and powerful method to analyze time series to detect flares in TESS light curves. First, we remove the trend using a timevarying deterministic harmonic fit so to capture changes in the deterministic amplitude of the light curve. Then we enlighten the analogy between the stochastic part of the light curves and GARCH processes. We demonstrate that flares can be detected as significantly large deviations from the baseline. We apply the method on exemplar light curves from two flaring stars, and discuss some of the diagnostics that become amenable to measurement.
 Presentation slides [.pdf]
 Presentation video [!yt]


Ann Lee (CMU) Oct 2024

 TBD





