论文标题
超越哈勃序列 - 通过无监督的机器学习探索星系形态
Beyond the Hubble Sequence -- Exploring Galaxy Morphology with Unsupervised Machine Learning
论文作者
论文摘要
我们使用特征提取与矢量定量的变异自动编码器(VQ-VAE)和分层聚类(HC)的组合探索了无监督的机器学习,用于银河形态分析。我们提出了一种新方法,其中包括:(1)当从图像中学习功能时同时考虑聚类性能; (2)允许HC算法内的各种距离阈值; (3)使用星系方向确定簇数。该设置提供了27个无监督学习创建的群集,我们表明,基于星系形状和结构(例如,sérsic索引,浓度,不对称性,吉尼系数)非常分开。这些产生的群集也与物理特性(例如颜色 - 磁性图)良好相关,并跨越了尺度关联的范围,例如不同机器定义的簇中的质量和大小。当我们将这些多个簇合并为两个大型初步簇以提供二进制分类时,使用不平衡的数据集将$ \ sim87 \%$的精度达到匹配,与真实的星系分布相匹配,其中包括22.7 \%早期类型的星系和77.3 \%\%晚期型星系。将给定的簇与经典的哈勃类型(椭圆形,宽角,早期的螺旋,晚螺旋和不规则)进行比较,我们表明,视觉分类系统,特别是具有过渡性特征的星系中存在固有的模糊性。基于此,这项工作的主要结果并不是我们的无监督方法与视觉分类和物理属性匹配,而是该方法提供了一种独立的分类,其物理上可能比任何视觉基于任何基于视觉的分类更有意义。
We explore unsupervised machine learning for galaxy morphology analyses using a combination of feature extraction with a vector-quantised variational autoencoder (VQ-VAE) and hierarchical clustering (HC). We propose a new methodology that includes: (1) consideration of the clustering performance simultaneously when learning features from images; (2) allowing for various distance thresholds within the HC algorithm; (3) using the galaxy orientation to determine the number of clusters. This setup provides 27 clusters created with this unsupervised learning which we show are well separated based on galaxy shape and structure (e.g., Sérsic index, concentration, asymmetry, Gini coefficient). These resulting clusters also correlate well with physical properties such as the colour-magnitude diagram, and span the range of scaling-relations such as mass vs. size amongst the different machine-defined clusters. When we merge these multiple clusters into two large preliminary clusters to provide a binary classification, an accuracy of $\sim87\%$ is reached using an imbalanced dataset, matching real galaxy distributions, which includes 22.7\% early-type galaxies and 77.3\% late-type galaxies. Comparing the given clusters with classic Hubble types (ellipticals, lenticulars, early spirals, late spirals, and irregulars), we show that there is an intrinsic vagueness in visual classification systems, in particular galaxies with transitional features such as lenticulars and early spirals. Based on this, the main result in this work is not how well our unsupervised method matches visual classifications and physical properties, but that the method provides an independent classification that may be more physically meaningful than any visually based ones.