论文标题
使用机器学习估算无线电选择数据集中的Galaxy红移
Estimating Galaxy Redshift in Radio-Selected Datasets using Machine Learning
论文作者
论文摘要
全天候的广播调查将通过新发现彻底改变该领域。但是,绝大多数射电星系中的绝大多数都不具有大量科学案例所需的光谱红移测量值。在这里,我们评估了从无线电选择调查中估算星系红移的技术。我们使用带有宽带光度法的无线电筛选样品在红外和光学波长下,我们测试了K-Nearest邻居(KNN)和随机森林机器学习算法,并在其回归和分类模式下对其进行测试。此外,我们测试了KNN算法使用的不同距离指标,包括标准的欧几里得距离,Mahalanobis距离和回归模式的学习距离度量(内核回归度量的公制学习)和分类模式(大范围近距离邻居公制)。我们发现,所有基于回归的模式都以红移$ z> 1 $失败。但是,在此范围以下,使用Mahalanobis距离度量的KNN算法表现最佳,$η_{0.15} $离群率为5.85 \%。在分类模式下,使用Mahalanobis距离度量的KNN算法也表现最佳,$η_{0.15} $离群速率为5.85 \%,正确地将74 \%的星系放入顶部$ z> 1.02 $ bin中。最后,我们还测试了一个领域训练的效果,并将训练有素的算法应用于另一个领域的类似数据,发现跨场的变化不会导致预测的红移统计上显着差异。重要的是,我们发现,尽管我们可能无法预测高红移无线电来源的连续价值,但我们可以使用现有技术的分类模式来识别其中的大多数。
All-sky radio surveys are set to revolutionise the field with new discoveries. However, the vast majority of the tens of millions of radio galaxies won't have the spectroscopic redshift measurements required for a large number of science cases. Here, we evaluate techniques for estimating redshifts of galaxies from a radio-selected survey. Using a radio-selected sample with broadband photometry at infrared and optical wavelengths, we test the k-Nearest Neighbours (kNN) and Random Forest machine learning algorithms, testing them both in their regression and classification modes. Further, we test different distance metrics used by the kNN algorithm, including the standard Euclidean distance, the Mahalanobis distance and a learned distance metric for both the regression mode (the Metric Learning for Kernel Regression metric) and the classification mode (the Large Margin Nearest Neighbour metric). We find that all regression-based modes fail on galaxies at a redshift $z > 1$. However, below this range, the kNN algorithm using the Mahalanobis distance metric performs best, with an $η_{0.15}$ outlier rate of 5.85\%. In the classification mode, the kNN algorithm using the Mahalanobis distance metric also performs best, with an $η_{0.15}$ outlier rate of 5.85\%, correctly placing 74\% of galaxies in the top $z > 1.02$ bin. Finally, we also tested the effect of training in one field and applying the trained algorithm to similar data from another field and found that variation across fields does not result in statistically significant differences in predicted redshifts. Importantly, we find that while we may not be able to predict a continuous value for high-redshift radio sources, we can identify the majority of them using the classification modes of existing techniques.