COPOD：基于Copula的离群值检测

论文标题

COPOD：基于Copula的离群值检测

COPOD: Copula-Based Outlier Detection

论文作者

Li, Zheng, Zhao, Yue, Botta, Nicola, Ionescu, Cezar, Hu, Xiyang

论文摘要

离群值检测是指鉴定与一般数据分布不同的稀有项目。现有方法具有高计算复杂性，低预测能力和有限的解释性。作为一种补救措施，我们提出了一种称为COPOD的新型离群检测算法，该算法的灵感来自用于对多元数据分布进行建模的Copulas。 COPOD首先构建了经验库，然后使用它来预测每个给定数据点的尾巴概率，以确定其“极端性”水平。直觉上，我们认为这是计算异常的P值。这使得COPOD既无参数，高度可解释又在计算上有效。在这项工作中，我们做出了三个关键贡献，1）提出了一种具有出色性能和可解释性的新颖，无参数的离群检测算法，2）在30个基准数据集上执行广泛的实验，以表明Copod在大多数情况下都超过了PSORS，并且在大多数情况下也是最快的算法之一，以及3）易于使用的Python pythonsimitions pythonshon croppython croppython croppython croppython croppython croppython。

Outlier detection refers to the identification of rare items that are deviant from the general data distribution. Existing approaches suffer from high computational complexity, low predictive capability, and limited interpretability. As a remedy, we present a novel outlier detection algorithm called COPOD, which is inspired by copulas for modeling multivariate data distribution. COPOD first constructs an empirical copula, and then uses it to predict tail probabilities of each given data point to determine its level of "extremeness". Intuitively, we think of this as calculating an anomalous p-value. This makes COPOD both parameter-free, highly interpretable, and computationally efficient. In this work, we make three key contributions, 1) propose a novel, parameter-free outlier detection algorithm with both great performance and interpretability, 2) perform extensive experiments on 30 benchmark datasets to show that COPOD outperforms in most cases and is also one of the fastest algorithms, and 3) release an easy-to-use Python implementation for reproducibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题