论文标题
内核MRCD中非胸盘数据中的离群值检测
Outlier detection in non-elliptical data by kernel MRCD
论文作者
论文摘要
最小正则协方差决定符(MRCD)是多变量位置和散射的强大估计器,它通过将强大的协方差矩阵与数据拟合来检测异常值。它的正则化确保协方差矩阵在任何维度上都具有良好的条件。 MRCD假设非出色的观测值大致分布在椭圆形上,但是许多数据集不是该形式的。此外,当变量数量增加时,MRCD的计算时间大大增加,如今具有许多变量的数据集很常见。拟议的内核最小正规化协方差决定因素(KMRCD)估计器解决了这两个问题。它不仅限于椭圆数据,因为它隐含地计算了内核诱导的特征空间中的MRCD估计值。构建了一种快速算法,该算法从基于内核的初始估计开始,并利用内核技巧来加快后续计算。根据KMRCD的估计,提出了一项规则以标记异常值。 KMRCD算法在模拟中表现良好,并在现实生活数据上进行了说明。
The minimum regularized covariance determinant method (MRCD) is a robust estimator for multivariate location and scatter, which detects outliers by fitting a robust covariance matrix to the data. Its regularization ensures that the covariance matrix is well-conditioned in any dimension. The MRCD assumes that the non-outlying observations are roughly elliptically distributed, but many datasets are not of that form. Moreover, the computation time of MRCD increases substantially when the number of variables goes up, and nowadays datasets with many variables are common. The proposed Kernel Minimum Regularized Covariance Determinant (KMRCD) estimator addresses both issues. It is not restricted to elliptical data because it implicitly computes the MRCD estimates in a kernel induced feature space. A fast algorithm is constructed that starts from kernel-based initial estimates and exploits the kernel trick to speed up the subsequent computations. Based on the KMRCD estimates, a rule is proposed to flag outliers. The KMRCD algorithm performs well in simulations, and is illustrated on real-life data.