论文标题
在CRESST深入学习的情况下进行自动数据清洁
Towards an automated data cleaning with deep learning in CRESST
论文作者
论文摘要
CRESST实验采用低温量热法来对暗物质颗粒诱导的核后坐力进行敏感测量。记录的信号需要进行仔细的清洁过程,以避免因堆积和读出的人工制品引起的错误重建的后坐力。我们将此过程构架为时间序列分类任务,并建议通过神经网络自动化它。 Cresst在2013年至2019年之间记录的68个检测器的一百万个标记记录的数据集,我们测试了四个常用的神经网络体系结构的能力,以学习数据清洁任务。我们最佳性能模型在我们的测试集中达到了0.932的平衡精度。我们在一个模范检测器上表明,大约一半的错误预测事件实际上被错误地标记为事件,其余的事件中的很大一部分具有上下文依赖的基础真理。我们还通过模拟数据评估了分类器的回忆和选择性。结果证实,训练有素的分类器非常适合数据清洁任务。
The CRESST experiment employs cryogenic calorimeters for the sensitive measurement of nuclear recoils induced by dark matter particles. The recorded signals need to undergo a careful cleaning process to avoid wrongly reconstructed recoil energies caused by pile-up and read-out artefacts. We frame this process as a time series classification task and propose to automate it with neural networks. With a data set of over one million labeled records from 68 detectors, recorded between 2013 and 2019 by CRESST, we test the capability of four commonly used neural network architectures to learn the data cleaning task. Our best performing model achieves a balanced accuracy of 0.932 on our test set. We show on an exemplary detector that about half of the wrongly predicted events are in fact wrongly labeled events, and a large share of the remaining ones have a context-dependent ground truth. We furthermore evaluate the recall and selectivity of our classifiers with simulated data. The results confirm that the trained classifiers are well suited for the data cleaning task.