论文标题
基于TSNE算法的簇加权模型用于高维数据
Cluster Weighted Model Based on TSNE algorithm for High-Dimensional Data
论文作者
论文摘要
与许多机器学习模型类似,群集加权模型(CWM)的准确性和速度都可以受到高维数据的阻碍,从而导致以前的作品对既定的技术,以减少“维度诅咒”对混合模型的影响。在这项工作中,我们回顾了集群加权模型(CWM)的背景研究。我们进一步表明,在庞大的高维数据的情况下,简约的技术不足以使混合模型蓬勃发展。我们通过使用“ FlexCWM” R软件包中的默认值选择位置参数的初始值来讨论一种用于检测隐藏组件的启发式。我们引入了一种称为T-分布的随机邻居嵌入(TSNE)的维度降低技术,以增强高维空间中的简约CWM。最初,CWM适用于回归,但出于分类目的,所有多级变量都会用一些噪声对数转换。模型的参数是通过预期最大化算法获得的。使用来自不同字段的实际数据集证明了讨论技术的有效性。
Similar to many Machine Learning models, both accuracy and speed of the Cluster weighted models (CWMs) can be hampered by high-dimensional data, leading to previous works on a parsimonious technique to reduce the effect of "Curse of dimensionality" on mixture models. In this work, we review the background study of the cluster weighted models (CWMs). We further show that parsimonious technique is not sufficient for mixture models to thrive in the presence of huge high-dimensional data. We discuss a heuristic for detecting the hidden components by choosing the initial values of location parameters using the default values in the "FlexCWM" R package. We introduce a dimensionality reduction technique called T-distributed stochastic neighbor embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Originally, CWMs are suited for regression but for classification purposes, all multi-class variables are transformed logarithmically with some noise. The parameters of the model are obtained via expectation maximization algorithm. The effectiveness of the discussed technique is demonstrated using real data sets from different fields.