论文标题
萨奇管道:通过公民科学分类的数据的一般工具
The SATCHEL pipeline: A general tool for data classified through citizen science
论文作者
论文摘要
公民科学是一种强大的分析工具,能够在很短的时间内处理大量数据。为了弥合分类数据产品从基于Web的公民科学平台到统计上强大的信号意义得分之间的差距,我们介绍了在Lightcurves(Satchel)管道中搜寻公民科学狩猎中的过渡中的搜索算法。该开源的可自定义管道的构建是为了识别和将显着性估计分配给由志愿者标记的一维特征。我们通过应用于开普勒太空望远镜的光度时间序列数据中的功能来描述Satchel管道的功能功能,该功能由志愿者分类为“星球猎人”公民科学项目的一部分。我们根据已知信号的恢复(模拟和对应于感兴趣的官方开普勒对象的仿真和信号)和伪造特征的相对污染来评估萨奇管道的整体性能。我们发现,对于一系列管道超级参数,并具有合理的分数截止,Satchel能够从对应于exoplanets $> 2〜r_ \ oplus $ in Radius中的模拟中恢复98%的信号的志愿者识别,并且在radius中$> 2〜R_ \ oplus $,并且与KOIS相同范围相对范围的信号中约为85%。 Satchel透明地适应其他公民科学分类数据集,并在GitHub上获得。
Citizen science is a powerful analysis tool, capable of processing large amounts of data in a very short time. To bridge the gap between classification data products from web-based citizen science platforms to statistically robust signal significance scores, we present the Search Algorithm for Transits in the Citizen science Hunt for Exoplanets in Lightcurves (SATCHEL) pipeline. This open source, customizable pipeline was constructed to identify and assign significance estimates to one-dimensional features marked by volunteers. We describe the functional capabilities of the SATCHEL pipeline through application to features in photometric time-series data from the Kepler Space Telescope, classified by volunteers as part of the Planet Hunters citizen science project hosted on the Zooniverse platform. We evaluate the SATCHEL pipeline's overall performance based on recovery of known signals (both simulations and signals corresponding to official Kepler Objects of Interest) and relative contamination by spurious features. We find that, for a range of pipeline hyperparameters and with a reasonable score cutoff, SATCHEL is able to recover volunteer identifications of over 98% of signals from simulations corresponding to exoplanets $>2~R_\oplus$ in radius and about 85% of signals corresponding to the same size range of KOIs. SATCHEL is transparently adaptable to other citizen science classification datasets, and available on GitHub.