论文标题
在不平衡设置中改进了二次判别分析分类器的设计
Improved Design of Quadratic Discriminant Analysis Classifier in Unbalanced Settings
论文作者
论文摘要
由于其对协方差矩阵的估计噪声的高度敏感性,通常不建议使用二次判别分析(QDA)或其正则版本(R-QDA)进行分类。在不平衡的数据设置中,发现R-QDA变得等同于分类器,将所有观察值分配给同一类。在本文中,我们提出了一种改进的R-QDA,该R-QDA基于使用两个正则化参数和一个修改后的偏见,适当地选择了R-QDA在不平衡的设置中的不适当行为,并确保最佳的分类性能。当样本和特征的数量同时增长时,提出的分类器的设计基于对其性能的精制渐近分析,这允许在大数据范式中经常满足的高维度有效地应对。对所提出的分类器的性能进行了评估,对实际和合成数据集进行了评估,并且被证明比传统R-QDA的期望要好得多。
The use of quadratic discriminant analysis (QDA) or its regularized version (R-QDA) for classification is often not recommended, due to its well-acknowledged high sensitivity to the estimation noise of the covariance matrix. This becomes all the more the case in unbalanced data settings for which it has been found that R-QDA becomes equivalent to the classifier that assigns all observations to the same class. In this paper, we propose an improved R-QDA that is based on the use of two regularization parameters and a modified bias, properly chosen to avoid inappropriate behaviors of R-QDA in unbalanced settings and to ensure the best possible classification performance. The design of the proposed classifier builds on a refined asymptotic analysis of its performance when the number of samples and that of features grow large simultaneously, which allows to cope efficiently with the high-dimensionality frequently met within the big data paradigm. The performance of the proposed classifier is assessed on both real and synthetic data sets and was shown to be much better than what one would expect from a traditional R-QDA.