Big Data "Inference": Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts
Abstract: Current and upcoming large-scale surveys will collect multi-band images (photometry) for billions of galaxies. To learn things about these galaxies, however, we need to convert these data into physical quantities we are interested in such as redshifts (distances). We outline how rigorous (hierarchical) Bayesian inference, combined with some "machine learning", can be used to probabilistically "map" galaxies from an unknown (testing) dataset to a known (training) dataset in the "big data" limit. These mappings can then be used to derive photometric redshift (photo-z) PDFs to individual galaxies along with estimates to the entire unknown sample (the "parent" population). Along the way we will also describe how this framework can be adapted to deal with noise, missing data, and domain mismatches from both a statistical and computational perspective.