一种近似机器学习似然的最佳统计显着性的方法

论文标题

一种近似机器学习似然的最佳统计显着性的方法

A method for approximating optimal statistical significances with machine-learned likelihoods

论文作者

Arganda, Ernesto, Marcano, Xabier, Lozano, Víctor Martín, Medina, Anibal D., Perez, Andres D., Szewc, Manuel, Szynkman, Alejandro

论文摘要

机器学习技术已成为高能物理学的基础，对于新的物理搜索，至关重要的是要从实验灵敏度方面了解它们的性能，这被认为是信号 - 背景基地假设对仅一种背景的统计意义。我们在这里提出了一种简单的方法，该方法结合了当前机器学习技术的力量，以面对高维数据以及传统分析中使用的基于可能性的推理测试，这使我们能够通过单个利益的参数（信号强度）估算发现和排除限制的敏感性。基于监督的学习技术，当传统技术不能时，它也可以通过高维数据来表现良好。我们首先将该方法应用于玩具模型，以便我们可以探索其潜力，然后将其对Dijet最终状态的新物理颗粒进行LHC研究。考虑到最佳的统计显着性，如果知道真正的生成函数，我们将获得的统计学意义，我们表明我们的方法比通常的幼稚计数实验结果提供了更好的近似值。

Machine-learning techniques have become fundamental in high-energy physics and, for new physics searches, it is crucial to know their performance in terms of experimental sensitivity, understood as the statistical significance of the signal-plus-background hypothesis over the background-only one. We present here a simple method that combines the power of current machine-learning techniques to face high-dimensional data with the likelihood-based inference tests used in traditional analyses, which allows us to estimate the sensitivity for both discovery and exclusion limits through a single parameter of interest, the signal strength. Based on supervised learning techniques, it can perform well also with high-dimensional data, when traditional techniques cannot. We apply the method to a toy model first, so we can explore its potential, and then to a LHC study of new physics particles in dijet final states. Considering as the optimal statistical significance the one we would obtain if the true generative functions were known, we show that our method provides a better approximation than the usual naive counting experimental results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题