混合动力改进的文档级嵌入（hide）

论文标题

混合动力改进的文档级嵌入（hide）

Hybrid Improved Document-level Embedding (HIDE)

论文作者

Mitra, Satanik, Jenamani, Mamata

论文摘要

最近，单词嵌入在情感分析中起着重要作用。由于嵌入单词的一代需要大量的语料库，因此许多应用程序都使用了验证的嵌入。尽管取得了成功，但单词嵌入仍存在某些缺点，例如它不会捕获单词的情感信息，语境信息的一部分和特定于域的信息。在这项工作中，我们建议隐藏一个混合的改进的文档级别嵌入，该层次将域信息，一部分语音信息和情感信息包含在现有单词嵌入中，例如Glove和Word2Vec。它将改进的单词嵌入到文档级别的嵌入中。此外，潜在的语义分析（LSA）已被用来表示文档作为向量。隐藏是生成的，结合了LSA和文档级嵌入，这是根据改进的单词嵌入来计算的。我们使用六个不同的数据集测试隐藏，并在现有验证的单词矢量（例如手套和Word2Vec）的准确性上显示出很大的提高。我们将我们的工作与两种现有的文档级别情感分析方法进行比较。隐藏性能比现有系统更好。

In recent times, word embeddings are taking a significant role in sentiment analysis. As the generation of word embeddings needs huge corpora, many applications use pretrained embeddings. In spite of the success, word embeddings suffers from certain drawbacks such as it does not capture sentiment information of a word, contextual information in terms of parts of speech tags and domain-specific information. In this work we propose HIDE a Hybrid Improved Document level Embedding which incorporates domain information, parts of speech information and sentiment information into existing word embeddings such as GloVe and Word2Vec. It combine improved word embeddings into document level embeddings. Further, Latent Semantic Analysis (LSA) has been used to represent documents as a vectors. HIDE is generated, combining LSA and document level embeddings, which is computed from improved word embeddings. We test HIDE with six different datasets and shown considerable improvement over the accuracy of existing pretrained word vectors such as GloVe and Word2Vec. We further compare our work with two existing document level sentiment analysis approaches. HIDE performs better than existing systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题