论文标题

基于MG II光谱的太阳活动分类:针对压缩数据进行分类

Solar activity classification based on Mg II spectra: towards classification on compressed data

论文作者

Ivanov, Sergey, Tsizh, Maksym, Ullmann, Denis, Panos, Brandon, Voloshynovskiy, Slava

论文摘要

尽管可以进行大量太阳能数据,但这些数据中的绝大多数仍然没有标记,因此不适合监督机器学习方法。非常需要一种方法来准确,自动将光谱分类为与太阳能活动相关的类别,这将有助于和加快太阳能物理学的未来研究工作。同时,大量的原始观察数据是机器学习的严重瓶颈,需要强大的计算手段,而这些计算手段并非许多实验室。此外,原始数据通信对实时数据观察施加了限制,并且需要大量的带宽和能量,用于机载太阳能观察系统。为了解决这些问题,我们提出了一个框架,以对压缩数据进行太阳活动分类。为此,我们使用了与不同的机器学习算法结合使用的预先存在的矢量量化技术的标记方案,将通过NASA的界面区域成像光谱仪(IRIS)衡量的单离子化镁MG II的光谱分类为五种类型的solar活性。我们的培训数据集是85个包含29097帧的85个虹膜观察的注释列表。带注释的太阳活动类型是活动区域,前掷活动,太阳耀斑,黑子和安静的太阳。我们在培训分类器之前压缩这些数据并降低其复杂性。我们发现,XGBoost分类器在压缩数据上产生最准确的结果,产生超过95 \%的预测率,并且表现优于其他ML方法,例如卷积神经网络,K-Neartialt邻居,天真的贝叶斯分类器和SVM。我们发现,在压缩和未压缩数据上的分类性能是可比的,这意味着相对较低的信息损失程度的压缩率很大。

Although large volumes of solar data are available for study, the vast majority of these data remain unlabeled and are therefore not amenable to supervised machine learning methods. Having a way to accurately and automatically classify spectra into categories related to solar activity is highly desirable and will assist and speed up future research efforts in solar physics. At the same time, the large volume of raw observational data is a serious bottleneck for machine learning, requiring powerful computational means that are not at the disposal of many laboratories. Besides, the raw data communication imposes restrictions on real time data observations and requires considerable bandwidth and energy for the onboard solar observation systems. To solve these issues, we propose a framework to classify solar activity on compressed data. For this, we used a labeling scheme from a pre-existing vector quantization technique in conjunction with different machine learning algorithms to categorize spectra of singly-ionized magnesium Mg II measured by NASA's Interface Region Imaging Spectrograph satellite (IRIS) into five types of solar activity. Our training dataset is a human annotated list of 85 IRIS observations containing 29097 frames. The annotated types of Solar activities are active region, pre-flare activity, Solar flare, Sunspot, and quiet Sun. We compress these data and reduce its complexity before training classifiers. We found that the XGBoost classifier produces the most accurate results on the compressed data, yielding over a 95\% prediction rate, and outperforming other ML methods like convolution neural networks, K-nearest neighbors, naive Bayes classifiers, and SVM. We find that the classification performance on compressed and uncompressed data is comparable, implying the possibility of large compression rates for relatively low degrees of information loss.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源