Last Updated: 20170929

CfA AstroStat Day

Wednesday, Sep 20, 2017

10:00am-5:00pm // Phillips
| Description | Schedule | Contacts | changelog |


On Wednesday, September 20, 2017 we hosted an AstroStat Day. Our hope was that the event would gather CfA researchers who are involved and interested in AstroStatistics and AstroInformatics. The goal was to get together and learn about various research groups who are working on methods to deal with data issues. There are many such groups at the CfA working on developing new algorithms and this would be an opportunity to meet, exchange ideas, and learn about each other.

We thank the Wolbach Library for providing bagels, muffins, and coffee all through the day, and to the High Energy Phenomena seminars for providing pizza during lunch.

The sessions were streamed live at

Morning Session: [YouTube]
Lunch Session: [YouTube]
Afternoon Session: [YouTube]



Aneta Siemiginowska -- Introduction
Alexey Vikhlinin -- Overview of Lynx X-ray observatory
I will review the science topics driving the design and capability requirements of the Lynx X-ray observatory. Most of the required observations push the envelope in sensitivity and scales for spatially resolved spectroscopy. I will outline several areas with great promise for using better statistical tools.
Presentation slides [.key]
Doug Finkbeiner -- A transdimensional Bayesian approach to optical photometry
Our transdimensional MCMC changes the parameter space (e.g. number of stars) describing a catalog as the chain runs. This allows a novel approach to de-blending, and leads to estimates of the parameters describing a population of objects without necessarily deciding on the properties (or existence) of individual objects. I would like to quickly introduce this technique, show results, and suggest other areas where it might be fruitful.
Stephen Portillo -- Improved Source Detection in Crowded Fields using Probabilistic Cataloging
Cataloging is challenging in crowded fields because sources are extremely covariant with their neighbors and blending makes even the number of sources ambiguous. We present the first optical probabilistic stellar catalog, cataloging a crowded (~0.1 sources per pixel) Sloan Digital Sky Survey r band image from M2. Probabilistic cataloging returns an ensemble of catalogs inferred from the image and thus can capture source-source covariance and deblending ambiguities. By comparing to a traditional catalog of the same image and a Hubble Space Telescope catalog of the same region, we show that our catalog ensemble better recovers sources from the image. It goes more than a magnitude deeper than the traditional catalog while having a lower false discovery rate brighter than 20th magnitude. Future telescopes will be more sensitive, and thus more of their images will be crowded. We detail our efforts to extend probabilistic cataloging to galaxies, making the method applicable to the data that will be collected in the Large Synoptic Survey Telescope era.
Presentation slides [.pdf]
Tansu Daylan -- Probing the small-scale structure in strongly lensed systems via transdimensional inference
Strong lensing is a sensitive probe of the small-scale density fluctuations in the Universe. We implement a novel approach to modeling strongly lensed systems using probabilistic cataloging, which is a transdimensional, hierarchical, and Bayesian framework to sample from a metamodel (union of models with different dimensionality) consistent with observed photon count maps. Probabilistic cataloging allows us to robustly characterize modeling covariances within and across lens models with different numbers of subhalos. Unlike traditional cataloging of subhalos, it does not require model subhalos to improve the goodness of fit above the detection threshold. Instead, it allows the exploitation of all information contained in the photon count maps, for instance, when constraining the subhalo mass function. We further show that, by not including these small subhalos in the lens model, fixed-dimensional inference methods can significantly mismodel the data.
Presentation slides [.pdf]
Probabilistic Cataloger (PCAT) : software ; documentation [url]
References: [arXiv]
Inference of Unresolved Point Sources At High Galactic Latitudes Using Probabilistic Catalogs, Daylan et al. 2017, arXiv:1607.04637
Probing the small-scale structure in strongly lensed systems via transdimensional inference, Daylan et al. 2017, arXiv:1706.06111

11:15am-11:30am : Break


Josh Speagle -- Typical Sets: What are They and How to (Hopefully) Find Them
Although typical sets are important in understanding how/why sampling algorithms (do not) work, they are rarely taught when most astronomers are introduced to sampling methods such as Markov Chain Monte Carlo (MCMC). I'll introduce the idea of typical sets using some basic examples and show why they make sampling difficult in higher dimensions. I'll then outline how their behavior shapes various MCMC algorithms. I'll conclude by outlining their central role in Nested Sampling. If time permits I will discuss ways to deal with typical sets using Hamiltonian-based approaches.
Presentation slides [SpeakerDeck]
Notebook [github/.ipynb]
Ben Johnson -- Hierarchical Models of the Color Magnitude Diagram of Star Clusters
Clusters of stars that were formed coevally provide key tests of models of stellar evolution and serve as important benchmarks for studies of stellar spectra and the star formation history of the universe. I will discuss some approaches that we have been investigating to model the observed color-magnitude diagram. These include hierarchical Bayesian inference of cluster properties as well as key variables of stellar evolutionary theory.

12:30pm-1:30pm : High Energy Phenomena Seminars

Kathy Reeves -- Statistical Properties of Solar Filament Eruptions
Filaments are cool parcels of gas suspended in the Sun's atmosphere by magnetic forces. They are best observed as dark, filamentary structures against the bright background of the solar disk when viewed in H-alpha. Filaments often erupt, and when they do, 70-80% of these eruptions will correlate with coronal mass ejections (CMEs), resulting in the release of charged particles from the Sun into interplanetary space. Because these eruptions can effect the Earth, it is important to understand the mechanisms responsible for their initiation, and to improve the ability to predict them. The Heliophysics Event Knowledgebase is a database of metadata that provides a wealth of statistical data on solar features such as filaments. In this talk, I will review the processes responsible for filament eruptions, and examine some of the statistical properties of filament eruptions that give clues to their initiation.
Presentation slides: [.pdf] ; []
Henry Trae Winter -- Icarus Investigations: Finding Needles in 6 Petabyte Haystacks
NASA's Atmospheric Imaging Assembly (AIA) images the Sun with 4K resolution almost once every second, producing a mission data volume of over 6 petabytes to date-and AIA is just one of many imagers pointing toward our Sun. Searching through this deluge of solar data for features of scientific interest appears daunting, but is an excellent opportunity to engage the public via citizen science projects. We intend to use citizen science volunteers to provide the necessary volume of classifications to properly train machine learning algorithms which can efficiently sort through large data volumes. Once properly trained, the algorithms can detect occurrences of various solar events of interest to scientists across the solar physics community. Partnering with Zooniverse (by far the most popular citizen science website with over a million registered users), we propose to host an ever-changing array of solar-based citizen science projects under a single collaborative project header: Icarus Investigations. Though individual tasks will change, volunteers will become familiar with the overall pattern of project set up and become increasingly comfortable with solar images and solar science. Icarus Investigations would build and retain a dedicated user base within Zooniverse, with additional projects building on the successes (and learning from the missteps) of their predecessors. We hope that by providing a framework we can reduce the barriers for scientists to engage with citizen science and utilize machine learning algorithms, and create an environment for new, big data scientific discovery in solar physics.


Josh Grindlay -- DASCH data access, rewards, and challenges
DASCH (Digital Access to a Sky Century @ Harvard) data are fully-reduced ~100y (1888 - 1992) lightcurves (B band) on (eventually) full sky and so include all classes of variables: from exoplanet hosts to distant quasars. A "typical" lightcurve might contain ~200 points (for B ~17) to ~2000 (for B ~12). A ~15y "Menzel gap" (1953 - 1968) and lack of the deeper coverage telescopes (B ~16 - 18.5) after 1953 impose completeness challenges. Occasional plate "defects" at the position of an object of interest must be considered or rejected. Smoothing well sampled data (as in a folded lightcurve) works well but is it optimum for full lightcurve or Lomb-Scargle analysis? What is the optimum way to isolate and characterize individual flares in blazar data with randomly changing coverage? These are a few of the questions I will consider in a brief presentation.
Michael Johnson -- Stochastic Optics: A Scattering Mitigation Framework for the Event Horizon Telescope
Just as turbulence in the Earth's atmosphere can severely limit the angular resolution of optical telescopes, turbulence in the ionized interstellar medium fundamentally limits the resolution of radio telescopes. The scattering is especially strong along the line of sight to the Galactic Center, making it a key consideration for the Event Horizon Telescope. I will discuss a new scattering mitigation framework, "stochastic optics," which provides significant improvements over existing scattering mitigation strategies.
Lindy Blackburn -- Choosing observables in VLBI fitting
Katie Bouman -- Reconstructing Video from Interferometric Measurements of Time-Varying Sources
Very long baseline interferometry (VLBI) makes it possible to recover images of astronomical sources with extremely high angular resolution. Most recently, the Event Horizon Telescope (EHT) has extended VLBI to short mm wavelengths with a goal of achieving angular resolution sufficient for imaging the event horizons of supermassive black holes. VLBI provides measurements related to the underlying source image through a sparse set spatial frequencies. An image can then be recovered from these measurements by making assumptions about the underlying image. One of the most important assumptions made by conventional imaging methods is that over the course of a night's observation the image is static. However, for quickly evolving sources, such as the galactic center's supermassive black hole (SgrA*) targeted by the EHT, this assumption is violated and these conventional imaging approaches fail. In this work we propose a new way to model VLBI measurements that allows us to recover both the appearance and dynamics of an evolving source by reconstructing a video rather than a static image. By modeling VLBI measurements using a Gaussian Markov Model, we are able to propagate information across observations in time to reconstruct a video, while simultaneously learning about the dynamics of the source's emission region. We demonstrate our proposed Expectation-Maximization (EM) algorithm, StarWarps, on realistic, synthetic observations of black holes, and show how it substantially improves results compared to conventional imaging algorithms.

3pm-3:15pm : Break


Arturo Avelino -- Near-infrared Hubble diagrams Type Ia Supernovae in the nearby universe
Type Ia Supernovae light curves (SN Ia) in the near infrared (NIR) exhibit low dispersion in their peak luminosities and are less vulnerable to extinction by interstellar dust in their host galaxies. The increasing number of high quality NIR SNe Ia light curves, including the recent CfAIR2 sample obtained with PAIRITEL, provides updated evidence for their utility as standard candles for cosmology. Using NIR YJHKs photometric time series of ~150 nearby SNe Ia, Gaussian-Processes regression and a hierarchical Bayesian model we construct the YJHKs light curve templates and determine the distance moduli from each NIR band. In this talk I will describe the statistical procedure we have implemented to infer distance moduli and the Hubble diagrams for YJHKs bands. This work contributes to a firm local anchor for supernova cosmology studies in the NIR which will help to reduce the systematic uncertainties due to host galaxy dust present in optical-only studies.
Presentation slides [.pdf]
Catherine Zucker -- Interactive multi-dimensional data exploration and linking with Glue
An abbreviated version of the talk that Tom Robitaille (the developer of glue) gave last year at EuroScipy:
  Modern data analysis and research projects often incorporate multi-dimensional data from several sources, and new insights are increasingly driven by the ability to interpret data in the context of other data. Glue is a graphical environment built on top of the standard scientific Python stack to visualize relationships within and between data sets. With Glue, users can load and visualize multiple related data sets simultaneously, specify the logical connections that exist between data, and Glue transparently uses this information as needed to enable visualization across files. Glue includes an easy mechanism for users to customize many aspects of the application, and also features a plugin system for third-party packages to provide further customization, for example custom data viewers. In this talk, I will give an overview of the Glue package, and will demonstrate the latest functionality including recently added viewers based on VisPy and OpenGL to interactively explore data in 3D. Glue is currently being used to analyze astronomical, medical, and other scientific data, and is also being used by data scientists outside of academia.
For a quick preview of what Glue can do, you can view the following 2-minute introductory video at [YouTube]
Demo [.glu.gz] (178MB)
Alberto Accomazzi & Michael Kurtz -- Metadata extraction via Conditional Random Fields and topic modeling via Latent Dirichlet Allocation
The SAO/NASA Astrophysics Data System maintains a bibliographic database consisting of over twelve million documents and 100 million citations. Nearly every Physics, Astronomy, and Geophysics article refereed in the past 20 years is fully indexed and served by the ADS. 50,000 scientists and librarians use the ADS daily. The ADS bibliographic database, full-text corpus, citation network and usage logs provide a unique dataset for people interested in data science.
In this talk I will briefly discuss two problems we face: metadata extraction via Conditional Random Fields and topic modeling via Latent Dirichlet Allocation.

4:15pm-5pm : Discussion


Aneta Siemiginowska (Chair),
Ashley Villar, James Damon, Josh Speagle, Kaisey Mandel, Vinay Kashyap


2017-sep-14: set up page.
2017-sep-15: some reorg, added some missing titles/abstracts
2017-sep-17: schedule fully hyperlinked
2017-sep-18: added some more missing titles/abstracts
2017-sep-19: added streaming link, small mods
2017-sep-20: more titles/abstracts, updated stream URLs, added some presentation slides and supplementary data
2017-sep-21: added more presentation slides and supplementary data
2017-sep-22: added more presentation slides and supplementary data
2017-sep-23: added KR presentation slides and supplementary data
2017-sep-29: added AV presentation slides

| Description | Schedule | Contacts | changelog |

CfA AstroStat Day
Wed 20 Sep 2017

A.Siemiginowska / A.Vikhlinin / D.Finkbeiner / S.Portillo / T.Daylan / J.Speagel / B.Johnson / K.Reeves / H.Winter / J.Grindlay / M.Johnson / L.Blackburn / K, Bouman / A.Avelino / C.Zucker / A.Accomazzi & M.Kurtz
General Discussion

CHASC / Wolbach Library