论文标题
EEMC:嵌入增强的多标签分类
EEMC: Embedding Enhanced Multi-tag Classification
论文作者
论文摘要
最近发生的代表性学习在NLP和复杂的网络中具有吸引力的性能,它正在成为机器学习和数据挖掘中的基本技术。如何使用表示形式学习来提高分类器的性能是一个非常重要的研究方向。我们使用表示学习技术将原始数据(图的节点)映射到低维特征空间。在这个空间中,每个原始数据都获得了较低的维矢量表示形式,我们使用这些向量和虚拟数据来训练多标签分类器,为这些向量进行一些简单的线性操作,以生成一些虚拟数据。之后,我们通过F1分数(宏%f1和micro%F1)测量了分类器的性能。我们的方法使宏F1从28%-450%上升,平均F1得分从12%-224%上升。相比之下,我们直接使用较低维矢量训练了分类器,并测量了分类器的性能。我们验证了三个公共数据集的算法,我们发现虚拟数据有助于分类器大大提高了F1分数。因此,我们的算法是提高分类器性能的有效方法。这些结果表明,在表示空间中,通过简单线性操作生成的虚拟数据仍然保留原始数据的信息。这对于学习小样本数据集也具有重要意义。
The recently occurred representation learning make an attractive performance in NLP and complex network, it is becoming a fundamental technology in machine learning and data mining. How to use representation learning to improve the performance of classifiers is a very significance research direction. We using representation learning technology to map raw data(node of graph) to a low-dimensional feature space. In this space, each raw data obtained a lower dimensional vector representation, we do some simple linear operations for those vectors to produce some virtual data, using those vectors and virtual data to training multi-tag classifier. After that we measured the performance of classifier by F1 score(Macro% F1 and Micro% F1). Our method make Macro F1 rise from 28 % - 450% and make average F1 score rise from 12 % - 224%. By contrast, we trained the classifier directly with the lower dimensional vector, and measured the performance of classifiers. We validate our algorithm on three public data sets, we found that the virtual data helped the classifier greatly improve the F1 score. Therefore, our algorithm is a effective way to improve the performance of classifier. These result suggest that the virtual data generated by simple linear operation, in representation space, still retains the information of the raw data. It's also have great significance to the learning of small sample data sets.