Scoregan：基于对受管制的GAN的多任务学习的欺诈审查检测器，并增加数据

论文标题

Scoregan：基于对受管制的GAN的多任务学习的欺诈审查检测器，并增加数据

ScoreGAN: A Fraud Review Detector based on Multi Task Learning of Regulated GAN with Data Augmentation

论文作者

Shehnepoor, Saeedreza, Togneri, Roberto, Liu, Wei, Bennamoun, Mohammed

论文摘要

深度神经网络（DNN）在文本分类中的有希望的表现，吸引了研究人员将其用于欺诈审查检测。但是，缺乏可信赖的标记数据限制了当前解决方案在检测欺诈评论时的性能。作为半监督方法的生成对抗网络（GAN）已证明对数据增强目的有效。最先进的解决方案利用GAN克服数据稀缺问题。但是，他们无法将欺诈生成中的行为线索纳入。此外，最新的方法忽略了数据集中可能的机器人生成的评论。最后，他们还遭受了gan的可伸缩性和稳定性的共同限制，从而减慢了训练程序。在这项工作中，我们提出了用于欺诈审查检测的Scoregan，该检测使用审核文本和评论评分分数在生成和检测过程中。由于三个原因，通过信息增益最大化（IGM）通过信息增益最大化（IGM）纳入分数。一种是根据给出的发电机分数生成与得分相关的评论。其次，采用生成的评论来训练歧视者，因此歧视者可以通过从文本和分数中提取的单词表示（手套）的串联串联（Glove）中学到的联合表示，正确标记了可能的机器人生成的评论。最后，它可用于提高GAN的稳定性和可扩展性。结果表明，所提出的框架的表现分别超过了现有的最新框架，即伪造的框架，分别以7 \％的速度和5 \％的Yelp和TripAdvisor数据集优于AP的表现。

The promising performance of Deep Neural Networks (DNNs) in text classification, has attracted researchers to use them for fraud review detection. However, the lack of trusted labeled data has limited the performance of the current solutions in detecting fraud reviews. The Generative Adversarial Network (GAN) as a semi-supervised method has demonstrated to be effective for data augmentation purposes. The state-of-the-art solutions utilize GANs to overcome the data scarcity problem. However, they fail to incorporate the behavioral clues in fraud generation. Additionally, state-of-the-art approaches overlook the possible bot-generated reviews in the dataset. Finally, they also suffer from a common limitation in scalability and stability of the GAN, slowing down the training procedure. In this work, we propose ScoreGAN for fraud review detection that makes use of both review text and review rating scores in the generation and detection process. Scores are incorporated through Information Gain Maximization (IGM) into the loss function for three reasons. One is to generate score-correlated reviews based on the scores given to the generator. Second, the generated reviews are employed to train the discriminator, so the discriminator can correctly label the possible bot-generated reviews through joint representations learned from the concatenation of GLobal Vector for Word representation (GLoVe) extracted from the text and the score. Finally, it can be used to improve the stability and scalability of the GAN. Results show that the proposed framework outperformed the existing state-of-the-art framework, namely FakeGAN, in terms of AP by 7\%, and 5\% on the Yelp and TripAdvisor datasets, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题